10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This study presents a valuable contribution by introducing a model-based, Bayesian method for inferring action potentials from calcium imaging data that directly quantifies uncertainty in spike timing through posterior distributions. Using a Monte Carlo particle Gibbs sampling approach, the method achieves temporal resolution and accuracy comparable to existing techniques while offering the key added benefit of principled uncertainty estimates. The underlying methodology and characterization are convincing, and the work will be of particular interest to theoretically oriented neuroscientists seeking rigorous new tools for data-driven parameter inference.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Diana et al. present a Monte Carlo-based method to perform spike inference from calcium imaging data. A particular strength of their approach is that they can estimate not only averages but also uncertainties of the modeled process. The authors focus on the quantification of spike time uncertainties in simulated data and in data recorded with high sampling rate in cebellar slices with GCaMP8f, and they demonstrate the high temporal precision that can be achieved with their method to estimate spike timing.

      Strengths:

      - The author provide a solid ground work for sequential Monte Carlo-based spike inference, which extends previous work of Pnevmatikakis et al., Greenberg et al. and others.

      - The integration of two states (silence vs. burst firing) seems to improve the performance of the model.

      - The acquisition of a GCaMP8f dataset in cerebellum is useful and helps make the point that high spike time inference precision is possible under certain conditions.

      Weaknesses:

      - Although the algorithm is compared (in the revised manuscript) to other models to infer individual spikes (e.g., MLSpike), these comparisons could be more comprehensive. Future work that benchmarks this and other algorithms under varying conditions (e.g., noise levels, temporal resolution, calcium indicators) would help assess and confirm robustness and useability of this algorithm.

      - The mathematical complexity underlying the method may pose challenges for experimentalist who may want to use the methods for their analyses. While this is not a weakness of the approach itself, this highlights the need for further validation and benchmarking in future work, to build user confidence.

      Comments on revisions:

      Thank you for addressing the final comments, and congrats on this study!

    3. Reviewer #2 (Public review):

      Summary:

      Methods to infer action potentials from fluorescence-based measurements of intracellular calcium dynamics are important for optical measurements of activity across large populations of neurons. The variety of existing methods can be separated into two broad classes: a) model-independent approaches that are trained on ground truth datasets (e.g., deep networks), and b) approaches based on a model of the processes that link action potentials to calcium signals. Models usually contains parameters describing biophysical variables, such as rate constants of the calcium dynamics and features of the calcium indicator. The method presented here, PGBAR, is model-based and uses a Bayesian approach. A novelty of PGBAR is that static parameters and state variables are jointly estimated using particle Gibbs sampling, a sequential Monte Carlo technique that can efficiently sample the latent embedding space.

      Strengths:

      A main strength of PGBAR is that it provides probability distributions rather than point estimates of spike times. This is different from most other methods and may be an important feature in cases when estimates of uncertainty are desired. Another important feature of PGBAR is that it estimates not only the state variable representing spiking activity, but also other variables such as baseline fluctuations and stationary model variables, in a joint process. PGBAR can therefore provide more information than various other methods. The information in the github repository is well-organized. The authors demonstrate convincingly that PGBAR can resolve inter-spike intervals in the range of 5 ms using fluorescence data obtained with a very fast genetically encoded calcium indicator at very high sampling rates (line scans at >= 1 kHz).

      Weaknesses:

      The accuracy of spike train reconstructions is not higher than that of other model-based approaches, and lower than the accuracy of a model-independent approach based on a deep network in a regime of commonly used acquisition rates.

      Comments on revisions:

      I have no further comments on the manuscript.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Diana et al. present a Monte Carlo-based method to perform spike inference from calcium imaging data. A particular strength of their approach is that they can estimate not only averages but also uncertainties of the modeled process. The authors focus on the quantification of spike time uncertainties in simulated data and in data recorded with high sampling rate in cebellar slices with GCaMP8f, and they demonstrate the high temporal precision that can be achieved with their method to estimate spike timing.

      Strengths:

      - The author provide a solid ground work for sequential Monte Carlo-based spike inference, which extends previous work of Pnevmatikakis et al., Greenberg et al. and others.

      - The integration of two states (silence vs. burst firing) seems to improve the performance of the model.

      - The acquisition of a GCaMP8f dataset in cerebellum is useful and helps make the point that high spike time inference precision is possible under certain conditions.

      Weaknesses:

      - Although the algorithm is compared (in the revised manuscript) to other models to infer individual spikes (e.g., MLSpike), these comparisons could be more comprehensive. Future work that benchmarks this and other algorithms under varying conditions (e.g., noise levels, temporal resolution, calcium indicators) would help assess and confirm robustness and useability of this algorithm.

      The metrics used for comparison follow the field's benchmarking conventions (see the CASCADE paper, Rupprecht et al. 2021). Indeed, improved standardized methods would be ideal to develop, which is beyond the scope of this manuscript.

      - The mathematical complexity underlying the method may pose challenges for experimentalist who may want to use the methods for their analyses. While this is not a weakness of the approach itself, this highlights the need for further validation and benchmarking in future work, to build user confidence.

      We acknowledge the challenges of understanding the mathematics underlying our method, but such a study is necessary to ensure its accuracy and reliability. Indeed, we will strive to improve the technique's user-friendliness in future instantiations.

      Reviewer #2 (Public review):

      Summary:

      Methods to infer action potentials from fluorescence-based measurements of intracellular calcium dynamics are important for optical measurements of activity across large populations of neurons. The variety of existing methods can be separated into two broad classes: a) model-independent approaches that are trained on ground truth datasets (e.g., deep networks), and b) approaches based on a model of the processes that link action potentials to calcium signals. Models usually contains parameters describing biophysical variables, such as rate constants of the calcium dynamics and features of the calcium indicator. The method presented here, PGBAR, is model-based and uses a Bayesian approach. A novelty of PGBAR is that static parameters and state variables are jointly estimated using particle Gibbs sampling, a sequential Monte Carlo technique that can efficiently sample the latent embedding space.

      Strengths:

      A main strength of PGBAR is that it provides probability distributions rather than point estimates of spike times. This is different from most other methods and may be an important feature in cases when estimates of uncertainty are desired. Another important feature of PGBAR is that it estimates not only the state variable representing spiking activity, but also other variables such as baseline fluctuations and stationary model variables, in a joint process. PGBAR can therefore provide more information than various other methods. The information in the github repository is well-organized.

      Weaknesses:

      On the other hand, the accuracy of spike train reconstructions is not higher than that of other model-based approaches, and clearly lower than the accuracy of a model-independent approach based on a deep network. The authors demonstrate convincingly that PGBAR can resolve inter-spike intervals in the range of 5 ms using fluorescence data obtained with a very fast genetically encoded calcium indicator at very high sampling rates (line scans at >= 1 kHz).

      In the revision, Figure 9 shows that temporal accuracy is very similar between PGBAR and the supervised method, CASCADE, and that PGBAR has a lower false positive rate. These results support the effectiveness of unsupervised Monte Carlo sampling, even with a simple autoregressive model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I'd like to thank the authors for their revisions. Their comments have addressed all my concerns, and I thank them for the clarifications. I have no further comments, except a few minor notes that the authors may consider or not:

      - The paragraph starting in line 367 is newly written and not yet as clear and mature as other parts of the manuscript. It is at several sentences roughly clear what it is about, but the precision of the wording is lacking. For example "distributions of the average time from ground-truth" seems a bit unclear, maybe "distributions of the average time of estimate spikes from ground-truth spikes" instead. Similarly, "the false detection rate, defined as the difference between detected and ground-truth spikes ..." could be rephrased using the difference between "numbers of spikes" instead of the difference between "spikes". But all of this is minor.

      - In the new Figure 9A, the error bars for the MLSpike method seem to be absent. In the same figure legend, it should be "excess" instead of "excess".

      We thank the reviewer for the feedback. We revised the wording of the new paragraph in response to the reviewer’s suggestions, restored the missing error bar in Figure 9, and corrected the figure legend.

      Reviewer #2 (Recommendations for the authors):

      Comparison to CASCADE: as far as I know there are no CASCADE models that have been trained on ground truth data in the regime of very fast (line scan) sampling, which is rarely used. A fair comparison of spike time estimates between PGBAR and CASCADE should take this into account. This can be done by training a new CASCADE model using the dataset of this paper. Given that performance of PGBAR and CASCADE is very similar already now (except for the false positive rate), a CASCADE model optimized for high sampling rate may be expected to catch up with (or even exceed) the performance of PGBAR. At a minimum, this possibility should be discussed.

      While this may be true, retraining a CASCADE model on high-frequency ground-truth data is beyond the scope of this manuscript. Indeed, a retrained CASCADE model optimized for line-scan or GCaMP8f data could improve performance and potentially match or exceed PGBAR, particularly in reducing false positives.

      Our aim, however, is not to benchmark supervised methods under their optimal retraining conditions, but to provide an unsupervised alternative that does not rely on labeled training data. In practice, retraining supervised models is constrained by the availability of suitable ground-truth datasets and by the uncertainty in how the method generalizes to acquisition regimes that differ substantially from the training set.

      We have therefore added a sentence in the Discussion (at the end of the subsection Comparison with benchmark datasets):

      [...] “While retraining supervised methods such as CASCADE on high-frequency or GCaMP8f ground-truth datasets could further improve its performance, limitations in dataset availability and generalization across acquisition regimes motivate complementary, training-free approaches such as PGBAR.”

      As stated in the manuscript, future extensions, such as using nonlinear biophysical models as the generative model for Monte Carlo–based inference, may further improve spike estimation accuracy.

    1. eLife Assessment

      This study presents a well-executed investigation into how the olfactory system disconnects from the environment during sleep and anesthesia, identifying a potential gating mechanism at the earliest synaptic stages of the olfactory bulb. The findings are important, as they challenge current theories by demonstrating that sensory gating occurs in non-thalamic pathways even under controlled airflow conditions. The strength of evidence is solid, supported by rigorous multimodal recordings, although the reliance on anesthetic models to draw conclusions about natural sleep is a limitation that requires further contextualization.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of Serantes et al. produced a well-designed set of experiments to address the mechanisms of olfactory disconnection during sleep. In contrast to other sensory modalities, olfaction is not filtered or potentially gated by the thalamus, potentially opening the door to unimodal sensory stimulation during sleep. Recent work (Schreck, 2022) used optogenetically activated Olfactory Sensory Neurons to show that local field potential and activity across the olfactory pathway, not only remained open during sleep but were potentially even accentuated under these brain states. However, their optogenetic manipulation is an artificial perturbation to the system that could override naturalistic early-gating mechanisms. In a set of careful experiments, Serantes et al. show that coupling between airflow and brain activity at the Olfactory Bulb is diminished under sleep and anesthetic brain states. In contrast to a peripheral gating mechanism proposed by Schreck, this lack of respiration-locked activity, measured with EEG and LFP, persists even in the presence of intense respiration and even when nasal airflow is artificially induced and controlled. Their results point to nonthalamic early sensory gating of olfactory information during sleep, which is independent of nasal airflow but dependent on internal brain states. Their work elicits questions about potentially undiscovered mechanisms at the level of the early sensory pathway.

      Strengths:

      The strengths of this paper lie in the level of control afforded by the multiple preps and the wide array of physiological recordings. Specifically, both their control of airflow with a dual tracheotomy and their control of internal states using both sleep and urethane anaesthesia have a cumulative impact on the results.

      The paper is simple, well-written, well executed, has clear questions, describes the literature comprehensively, and points out conflicting results with precision and transparency. The same transparency and judgment should be used on their own results.

      Another strength of the paper is the clear, unambiguous results. The effect sizes presented in the paper are sizable and convincing.

      Weaknesses:

      The paper's shortcomings include open questions and a lack of a full mechanistic understanding of the suggested internal gating process. There are some open questions about the relative importance of airflow sensing vs. odorant sensing. Recent work by Mahajan et al., Sci.Adv 2025 points to OSN as sensing both odorants and airflow to produce anemotaxis. Potentially, other cells could contribute to anemosensation as well, so that gated or non-gated information might depend on the ratio of airflow to odorant information. Perhaps, optogenetic stimulation of OSN acts as an unnatural sensory stimulation that can alter both olfaction and anemosensation.

      Detailed ablation, pharmacological, and optogenetic experiments may be needed to elucidate the suggested mechanisms and determine the correct answer to the question posed by the authors.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Serantes and colleagues analysed how sleep and anesthesia impact the processing of olfactory inputs, focusing on early sensory processing (occurring at the first or second synaptic contacts). First, they show that the transition to sleep has a major impact on breathing-dependent gamma activity. Second, they show that this decrease originates at the first synaptic contact and is independent of respiration itself. Third, they show a decrease in connectivity associated with neocortical slow waves. These results are very interesting and supported by a robust methodology. However, I have two major concerns regarding this work.

      First, the authors fail to adequately contextualize their work. For example, the impact of sleep on respiration-locked gamma activity was reported several years ago and is, in fact, used in some laboratories to score sleep using data from the olfactory bulb.

      Second, the authors should exercise much more caution when comparing the urethane anesthesia model with NREM/REM sleep cycles. There are very significant differences between the two. Yet, the title and abstract of the article mention only sleep and anesthesia. More concerningly, the results obtained under urethane anesthesia are uncritically generalized to sleep.

      In conclusion, the first finding was already shown in previous studies, and the second and third results were obtained not during sleep but during an anesthetic state that only resembles certain aspects of sleep.

      Strengths:

      The authors deploy an interventional approach that allows them to determine with compelling evidence the relationship of the gamma activity time-locked to breathing and different aspects of breathing, proving in particular that the disconnection is independent of respiratory dynamics. They leveraged invasive recordings that allow them to pinpoint at which level the disconnection occurs.

      Weaknesses:

      (1) My first comment concerns how this work fits within the state of the art. The introduction of the article leaves out very important and highly relevant work.

      (1a) First, "disconnection" is not a defining feature of sleep; "unresponsiveness" is. It is often assumed that this unresponsiveness (which can be directly measured, contrary to disconnection) is due to a form of disconnection, but there has been substantial work over the past decade showing that disconnection is not as extensive as initially expected. It is therefore incorrect, in my view, to state that "most models attribute sensory gating to thalamocortical mechanisms". Most models attribute sensory gating to a combination of thalamocortical and cortical mechanisms.

      (1b) The rationale of the article appears unclear ("the olfactory system-bypassing the thalamus-offers a unique window into earlier stages of sensory disconnection"). If the idea is to investigate gating mechanisms before the thalamus, then any sensory modality would suffice, since even modalities that later relay through the thalamus involve pre-thalamic processing stages. I assume that the authors instead mean that, because olfactory information does not relay through the thalamus, gating mechanisms in the olfactory stream could occur very early. However, this also implies that focusing on olfactory processing would say little about other sensory modalities.

      (1c) Key previous results have been completely overlooked. First, the impact of sleep on respiration-locked gamma activity was reported several years ago (Bagur et al., Plos Biology 2018). Second, important articles investigating olfactory processing during sleep have been overlooked (e.g., Arzi et al., Nature Neuroscience 2012; Arzi et al., Journal of Neuroscience 2014). I am not providing an exhaustive list here, but these articles are not only extremely relevant to the present study; they have also become classics in the sleep literature.

      (2) For most of their findings (Figures 2 to 5), the authors used urethane anesthesia. They show that this pharmacological manipulation results in alternation between periods of high-amplitude delta waves (SWSt) and a desynchronized state (ASt). However, the parallel with NREM and REM sleep, respectively, is rough and insufficiently justified. Differences can already be noted by contrasting the short examples provided in the figures. While NREM and REM sleep differ in terms of muscle tone (EMG), no such difference is discernible between SWSt and ASt. In SWSt, the slow waves appear to overlap with fast activity at the cortical level (M1, S1), which is not typically the case during NREM sleep. In addition, because the time scale is not the same in Figures 1 and 2 (1 s vs 2 s), yet the slow waves appear to have similar durations, it is also possible that the slow waves generated during SWSt and NREM differ. To better support the proposed parallel between NREM and SWSt on the one hand, and ASt and REM on the other, the authors should provide a thorough comparison of these states (spectral features, properties of the slow waves, duration and frequency of each state, etc.). Without this, inferences from results obtained under urethane anesthesia to sleep are not warranted.

      The authors acknowledge this issue in the Discussion ("These findings suggest that there is no functional equivalence between urethane-activated states and REM sleep"), but this caveat should be integrated from the very beginning (title, abstract, and introduction).

      (3) In some graphs, the power spectrum is normalized. Under anesthesia, this normalization was performed "within each animal to the SWSt maximum for that signal". However, I could not find equivalent information for sleep. This is key information needed to correctly interpret the results shown in Figure 1.

      (4) The authors should also clarify their criteria for concluding on the absence or presence of a given effect. For example, in the legend of Figure 1c, they write: "Note the presence of coherence during wakefulness, demonstrating the internalization of the respiratory signal, and its drop during sleep". Unless coherence is exactly zero, some degree of coherence is always "present". Figure 1 instead shows that coherence is modulated across frequencies during wakefulness, with peaks in the delta and theta ranges.

      In Figure 2, they write: "PAC between respiration and OB gamma amplitude was present during ASt but disappeared during SWSt". Again, the authors should clarify what is meant by "disappeared", as they only tested for differences between ASt and SWSt.

      Given that the authors implemented a strategy to test for above-chance coherence using surrogate datasets, they should consistently provide statistical tests showing which conditions or frequency bands exhibit coherence above chance in order to justify claims about the presence or absence of an effect.

      (5) Likewise, comparisons across states should always be supported by statistical tests, for example, in Figure 4. In addition, despite the apparent absence of coherence during SWSt in Figures 4f and 4g (which again should be formally tested), Figure 4h shows an increase in coherence around 2 Hz, which suggests some degree of coherence between nasal airflow and the olfactory bulb.

      (6) Figures should more clearly distinguish results based on a single "representative" animal from population averages. For example, were Figures 4g and 2h computed at the population level?

    4. Reviewer #3 (Public review):

      Summary:

      Sleep is typified by a behavioural attenuation of responsiveness to external stimuli (higher arousal thresholds). There are various mechanisms through which sensory perception could be dampened, and while thalamic and cortical gate points have been well studied, the focus here is on peripheral ones - at the level of the olfactory bulb (OB). While something conceptually similar has been shown in insects, this paper represents an important contribution to understanding attenuation of sensory perception during rodent sleep and anaesthesia.

      This paper shows that respiration-locked potentials and gamma activity in the olfactory bulb, which are important for olfactory coding, are diminished during sleep and when under anaesthesia compared to wake. Further, this state-dependent activity in OB is likely to be locally generated. Using a tracheotomy procedure aimed to dissociate nasal airflow from natural inhalations, authors demonstrate that local field potentials (LFPs) in the OB phase lock with artificially generated air pulses (delivered into the nasal cavity) during the active phase of anaesthesia but not during a more passive state. LFPs did not synchronise with respiratory signals during either anaesthesia state. Lastly, the authors showed that as delta power increased (typical of slow-wave-sleep), the coherence between nasal inhalation rhythms and OB LFP coherence decreased, indicating that as rats experienced something akin to slow-wave-sleep (during anaesthesia), disconnection from the external environment could be augmented. Taken together, the authors argue that the change in activity observed in the olfactory bulb during sleep and anaesthesia provides a non-permissive state for sensory processing and manifests as sensory dissociation

      Strengths:

      The manuscript is well-written, and the experiments are thorough. Experiments examining coupling of nasal respiration with OB potentials and delta activity are particularly interesting as they point to augmented sensory disconnection during a sleep phase typically associated with higher arousal thresholds.

      Weaknesses:

      (1) An experiment addressing the following points, is missing:

      Does odour stimulation that wakes up a subject restore gamma activity and respiration-locked potentials?

      Is OB/respiration desynchrony maintained when presented with a non-rousing stimulus?

      Is waking upon stimulus delivery less likely as delta activity increases and coherence between OB/respiratory rhythms weakens?

      (2) Many of the experiments are performed under anaesthesia, which I understand is for practical reasons. While authors are forthcoming about limitations of using anaesthesia in lieu of natural sleep states, I would have preferred to see more experiments performed on sleeping animals.

    5. Author response:

      We thank the reviewing editor and the reviewers for their careful evaluation of our manuscript “Early sleep dependent sensory gating in the olfactory system”, and for their constructive feedback. We are encouraged by the overall positive assessment of the work.

      In the revised version, we will address all the points raised by the reviewers. Below, we outlined the main aspects of the revision.

      (1) Contextualization within prior literature.

      We will expand the text to better situate our findings within the existing literature and clarify the specific contribution of our work, particularly with respect to state dependent changes in olfactory bulb activity.

      (2) Distinction between sleep and urethane anaesthesia.

      We will revise the text to more clearly distinguish findings obtained during natural sleep from those obtained under urethane anaesthesia. While avoiding direct equivalence between states, we will clarify that the comparison is intended to highlight shared features of slow wave brain dynamics associated with sensory gating.

      (3) Clarification of analytical methods and statistical criteria.

      We will provide additional details regarding normalisation procedures, surrogate based analysis, and statistical criteria used to assess the presence or absence of coherence and phase amplitude coupling, ensuring consistency across figures.

      (4) Improvements in figures in terminology.

      We will revise figure annotations to improve clarity (axis, colour scales, units and labelling) and ensure consistent terminology throughout the manuscript.

      We believe these revisions will further strengthen the manuscript while preserving its central conclusions.

    1. eLife Assessment

      The present work provides new insights into detailed brain morphology. Using state-of-the-art methods, it provides compelling evidence for the relevance of sucal morphology for the precise localization of brain function. The fundamental findings have great relevance for the fields of imaging neuroscience and individualized medicine as ever-improving techniques improve precision to the point where individual brain anatomy is taking centre stage.

    2. Reviewer #1 (Public Review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      Ever-improving techniques allow the detailed capture of brain morphology and function to the point where individual brain anatomy becomes an important factor. This study investigated detailed sulcal morphology in the parieto-occipital junction. Using cutting-edge methods, it provides important insights into local anatomy, individual variability, and local brain function. The presented work advances the field and will stimulate future research into this important area.

      Strengths:

      Detailed, very thorough methodology. Multiple raters mapped detailed sulci in a large cohort. The identified sulcal features and their functional and behavioural relevance are then studied using various complementary methods. The results provide compelling evidence for the importance of the described sulcal features and their proposed relationship to cortical brain function.

    3. Reviewer #2 (Public Review):

      Summary:

      After manually labelling 144 human adult hemispheres in the lateral parieto-occipital junction (LPOJ), the authors 1) propose a nomenclature for 4 previously unnamed highly variable sulci located between the temporal and parietal or occipital lobes, 2) focus on one of these newly named sulci, namely the ventral supralateral occipital sulcus (slocs-v) and compare it to neighbouring sulci to demonstrate its specificity (in terms of depth, surface area, gray matter thickness, myelination, and connectivity), 3) relate the morphology of a subgroup of sulci from the region including the slocs-v to the performance in a spatial orientation task, demonstrating behavioural and morphological specificity. In addition to these results, the authors propose an extended reflection on the relationship between these newly named landmarks and previous anatomical studies, a reflection about the slocs-v related to functional and cytoarchitectonic parcellations as well as anatomic connectivity and an insight about potential anatomical mechanisms relating sulcation and behaviour.

      Strengths:

      - To my knowledge, this is the first study addressing the variable tertiary sulci located between the superior temporal sulcus (STS) and intra-parietal sulcus (IPS).

      - This is a very comprehensive study addressing altogether anatomical, architectural, functional and cognitive aspects.

      - The definition of highly variable yet highly reproductible sulci such as the slocs-v feeds the community with new anatomo-functional landmarks (which is emphasized by the provision of a probability map in supp. mat., which in my opinion should be proposed in the main body).

      - The comparison of different features between the slocs-v and similar sulci is useful to demonstrate their difference.

      - The detailed comparison of the present study with state of the art contextualises and strengthens the novel findings.

      - The functional study complements the anatomical description and points towards cognitive specificity related to a subset of sulci from the LPOJ

      - The discussion offers a proposition of theoretical interpretation of the findings

      - The data and code are mostly available online (raw data made available upon request).

    4. Reviewer #3 (Public Review):

      Summary:

      72 subjects, and 144 hemispheres, from the Human Connectome Project had their parietal sulci manually traced. This identified the presence of previous undescribed shallow sulci. One of these sulci, the ventral supralateral occipital sulcus (slocs-v), was then demonstrated to have functional specificity in spatial orientation. The discussion furthermore provides an eloquent overview of our understanding of the anatomy of the parietal cortex, situating their new work into the broader field. Finally, this paper stimulates further debate about the relative value of detailed manual anatomy, inherently limited in participant numbers and areas of the brain covered, against fully automated processing that can cover thousands of participants but easily misses the kinds of anatomical details described here.

      Strengths:

      - This is the first paper describing the tertiary sulci of the parietal cortex with this level of detail, identifying novel shallow sulci and mapping them to behaviour and function.

      - It is a very elegantly written paper, situating the current work into the broader field.

      - The combination of detailed anatomy and function and behaviour is superb.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public Review):

      Strengths

      (1) The definition of highly variable yet highly reproducible sulci such as the slocs-v feeds the community with new anatomo-functional landmarks (which is emphasized by the provision of a probability map in supp. mat., which in my opinion should be proposed in the main body).

      We agree with Reviewer 2 that there is merit to including the probability maps as a main text Figure rather than Supplementary Figure. We have now added it to the main text.

      Weaknesses

      (1) While the identification of the sulci has been done thoroughly with expert validation, the sulci have not been labeled in a way that enables the demonstration of the reproducibility of the labeling.

      Our group was unable to use an approach amenable to calculating inter-rater agreements to expedite the process of defining thousands of sulci at the individual level in multiple regions as this was our first study comprehensively documenting the sulcal organization of this region. Nevertheless, our method followed a rigorous, three-tiered procedure to ensure accurate sulcal definitions were identified in all participants. In the case of this study, authors YT and TG first defined sulci. These sulci were then checked by a trained expert (EHW). Finally, sulcal definitions were finalized by the senior author, an expert neuroanatomist (KSW). We emphasize that this process has produced reproducible anatomical results when charting other regions such as posteromedial cortex (Willbrand et al., 2023 Science Advances; Willbrand et al., 2023 Communications Biology; Maboudian et al., 2024 The Journal of Neuroscience; Ramos Benitez et al., 2024 Neuropsychologia), ventral temporal cortex (Miller et al., 2020 Scientific Reports; Parker et al., 2023 Brain Structure and Function), and lateral prefrontal cortex (Miller et al., 2021 The Journal of Neuroscience; Voorhies et al., 2021 Nature Communications; Yao et al., 2022 Cerebral Cortex; Willbrand et al., 2022 Brain Structure and Function; Willbrand et al., 2023 The Journal of Neuroscience; Willbrand et al., 2024 Brain Structure and Function) across age groups, species, and clinical populations. For the present study, by the time the final tier of our method was reached, we emphasize that a very small percentage (~2%) of sulcal definitions were actually modified. We will include an exact percentage in future publications in LPC/LOPJ.

      Our Methods have been edited to describe these features (Pages 21-22):

      “As this is the first time the sulcal expanse of LPC/LOPJ was comprehensively charted with a focus on pTS, the location of each sulcus was confirmed through a three-tiered procedure for each participant in each hemisphere. First, trained independent raters (Y.T. and T.G.) identified sulci. Second, these definitions were checked by a trained expert (E.H.W.). Third, these labels were finalized by a neuroanatomist (K.S.W.). We emphasize that this procedure has produced reproducible results in our prior work across the cortex (Miller et al. 2021; Voorhies et al. 2021; Yao et al. 2022; Willbrand et al. 2023; Willbrand et al. 2022; Willbrand et al. 2024; Parker et al. 2023; Miller et al. 2020; Willbrand et al. 2022; Willbrand et al. 2023; Maboudian et al. 2024; Ramos Benitez et al. 2024). All LPC sulci were then manually defined and saved as .label files in FreeSurfer using tksurfer tools, from which morphological and anatomical features were extracted. We defined LPC/LPOJ sulci for each participant based on the most recent schematics of sulcal patterning by Petrides (2019) as well as pial, inflated, and smoothed white matter (smoothwm) FreeSurfer cortical surface reconstructions of each individual. In some cases, the precise start or end point of a sulcus can be difficult to determine on a surface (Borne et al., 2020); however, examining consensus across multiple surfaces allowed us to clearly determine each sulcal boundary in each individual. For four example hemispheres with these 13-17 sulci identified, see Fig. 1a (Supplementary Fig. 5 for all hemispheres). The specific criteria to identify the slocs and pAngs are outlined in Fig. 1b.”

      Reviewer #3 (Public Review):

      Weaknesses

      (1) The numbers of subjects are inherently limited both in number as well as in being typically developing young adults.

      First, although the sample size of the present study is small in number in comparison to large N, group-level neuroimaging analyses, it is comparable to precision neuroimaging studies examining sulcal features in individual participants (for example, Cachia et al., 2021 Frontiers in Neuroanatomy; Garrison et al., 2015 Nature Communications; Lopez-Persem et al., 2019 The Journal of Neuroscience; Miller et al., 2021 The Journal of Neuroscience; Roell et al., 2021 Developmental Cognitive Neuroscience; Voorhies et al., 2021 Nature Communications; Weiner, 2019 The Anatomical Record; Willbrand, et al., 2022 Science Advances; Willbrand, et al., 2022 Brain Structure & Function; Yao et al., 2022 Cerebral Cortex). We discuss this point in detail in the Limitations subsection of the Discussion (Page 17):

      “This manual method is also arduous and time-consuming, which, on the one hand, limits the sample size in terms of number of participants, while on the other, results in thousands of precisely defined sulci. This push-pull relationship reflects a broader conversation in the human brain mapping and cognitive neuroscience fields between a balance of large N studies and “precision imaging” studies in individual participants (Gratton et al., 2022; Naselaris et al., 2021; Rosenberg and Finn, 2022). Though our sample size is comparable to other studies that produced reliable results relating sulcal morphology to brain function and cognition (for example, Cachia et al., 2021; Garrison et al., 2015; Lopez-Persem et al., 2019; Miller et al., 2021; Roell et al., 2021; Voorhies et al., 2021; Weiner, 2019; Willbrand et al., 2022a, 2022b; Yao et al., 2022), ongoing work that uses deep learning algorithms to automatically define sulci should result in much larger sample sizes in future studies (Borne et al., 2020; Lee et al., 2024, 2025; Lyu et al., 2021). The time-consuming manual definitions of primary, secondary, and PTS also limit the cortical expanse explored in each study, thus restricting the present study to LPC/LPOJ.”

      Second, we utilized a young adult sample as this is what is the standard of the field when charting features of sulci for the first time (for example, Paus et al., 1996 Cerebral Cortex; Chiavaras & Petrides, 2000 Journal of Comparative Neurology; Segal & Petrides, 2012 European Journal of Neuroscience; Zlatkina & Petrides, 2014 Proceedings of the Royal Society B Biological Science; Sprung-Much & Petrides, 2018 Brain Structure & Function; Miller et al., 2021 The Journal of Neuroscience; Willbrand et al., 2022 Science Advances; Willbrand et al., 2023 Communications Biology; Drudik et al., 2023 Cerebral Cortex). Nevertheless, it is indeed crucial to confirm that this schematic is translatable to other age groups; however this exploration is beyond the scope of the present project and is for future investigation. We have added text to the Limitations subsection of the Discussion to emphasize the points (Pages 17-18):

      “Additionally, the scope of the present study is limited in that the sample was only in young adults. This sample was selected as it is the standard of the field when charting features of sulci for the first time (for example, Paus et al. 1996; Chiavaras and Petrides 2000; Segal and Petrides 2012; Zlatkina and Petrides 2014; Sprung-Much and Petrides 2018; Miller et al. 2021; Willbrand et al. 2022; Willbrand et al. 2023; Drudik et al. 2023). Nevertheless, it is necessary to explore how well this updated schematic translates to different age groups, species, and clinical populations.”

      Finally, it is worth mentioning that we have begun preliminary analyses on the translatability of this schematic, and have shown that it does hold in a pediatric sample (ages 6-18 years old; Author response image 1).

      Author response image 1.

      Example pediatric participant with all LPC/LOPJ sulci identified in both hemispheres. Incidence rates for the variable pTS identified in the present work in a pediatric sample are included below (N = 79 participants)

      (2) While the paper begins by describing four new sulci, only one is explored further in greater detail.

      We focused on the slocs-v as it has a high incidence rate, making it amenable to our analytic pipelines relating sulci to cortical morphology, architecture, and function, as well as cognition (Miller et al., 2021 The Journal of Neuroscience; Voorhies et al., 2021 Nature Communications; Yao et al., 2022 Cerebral Cortex; Willbrand et al., 2022 Science Advances; Willbrand et al., 2023 The Journal of Neuroscience; Maboudian et al., 2024 The Journal of Neuroscience). However, we want to emphasize that throughout the paper there are multiple analyses that further describe the three more variable sulci: 1) detailing their sulcal patterning (Supplementary Tables 1-4) and 2) detailing their morphology and architecture (Supplementary Fig. 6). We do agree though that it is a worthwhile endeavor to further describe these sulci—especially if the data is readily available. As such, to complement our behavioral analysis identifying a relationship between the morphology of the consistent sulci and spatial orientation and considering the well-documented relationship between sulcal incidence and cognition (for review see Cachia et al., 2021 Frontiers in Neuroanatomy), we tested whether the number of variable sulci and the incidence of each variable sulcus specifically were related to spatial orientation. This procedure produced null results on all neuroanatomical variables, which we now mention in the Results (Page 11):

      “Finally, as in prior work examining variably-present PTS in other cortical expanses (for example, (Amiez et al., 2018; Cachia et al., 2014; Fornito et al., 2004; Willbrand et al., 2024b), we assessed whether the presence/absence of the more variable PTS identified in the present work (slocs-d, pAngs-v, and pAngs-d) was related to spatial orientation, reasoning, and processing speed task performance. We identified no significant associations between the presence/absence of these sulci in either hemisphere with performance on these tests (ps > .05).”

      (3) There is some tension between calling the discovered sulci new vs acknowledging they have already been reported, but not named.

      To resolve this tension, we have revised the text to 1) ensure proper acknowledgment that sulci have been noticed in this region, 2) point out that these sulci were left unnamed and undescribed, and 3) emphasize that one of the primary goals of this project was to comprehensively detail the sulcal organization of this region at a precise, individual-level considering these often-overlooked sulci.

      This is primarily done at the beginning of the Results (Pages 4-5), where we now write:

      “Four previously undescribed small and shallow sulci in the lateral parieto-occipital junction (LPOJ)

      In previous research in small sample sizes, neuroanatomists noticed shallow sulci in this cortical expanse, but did not describe them beyond including an unlabeled sulcus in their schematic at best (Supplementary Methods and Supplementary Figs. 1-4 for historical details). In the present study, we fully update this sulcal landscape considering these overlooked indentations. In addition to defining the 13 sulci previously described within the LPC/LPOJ, as well as the posterior superior temporal cortex in individual participants (Methods) (Petrides, 2019), we could also identify as many as four small and shallow PTS situated within the LPC/LPOJ that were highly variable across individuals and left undescribed until now (Supplementary Methods and Supplementary Figs. 1-4). Though we officially name and characterize features of these sulci in this paper for the first time, it is necessary to note that the location of these four sulci is consistent with the presence of variable “accessory sulci” in this cortical expanse mentioned in prior modern and classic studies (Supplementary Methods). For four example hemispheres with these 13-17 sulci identified, see Fig. 1a (Supplementary Fig. 5 for all hemispheres).”

      (4) The anatomy of the sulci, as opposed to their relation to other sulci, could be described in greater detail.

      To detail these sulci above and beyond their relation to other sulci, we document the anatomical metrics of all sulci in Supplemental Figure 6:

      Results (Page 8):

      The morphological and architectural features of all LPC/LPOJ sulci are described in Supplementary Fig. 6.

    1. eLife Assessment

      The study investigates an emerging research field: the interaction between sleep and development. The authors used Drosophila larvae sleep as a study model and provide insight into how neuropeptide circuitry control sleep differentially between larvae and adult Drosophila. By using board range of behaviour and imaging methods and analysis, the authors provide a valuable investigation that demonstrates a larvae-specific sleep regulatory neural pathway of Hugin-PK2-Dilps in the Drosophila neurosecretory centre IPC. While some further text clarifications are still required, the revision presented convincing evidence supporting the claims with the new imaging data, sleep parametric analysis, and further clarification addressing the reviewers' comments.

    2. Reviewer #1 (Public review):

      The manuscript investigates how neuropeptidergic signaling affects sleep regulation in Drosophila larvae. The authors first conduct a screen of CRISPR knock-out lines of genes encoding enzymes or receptors for neuropeptides and monoamines. As a result of this screen, the authors follow up on one hit, the hugin receptor, PK2-R1. They use genetic approaches including mutants and targeted manipulations of PK2-R1 activity in insulin-producing cells (IPCs) to increase total sleep amounts in 2nd instar larvae. Similarly, dilp3 and dilp5 null mutants and genetic silencing of IPCs show increases in sleep. The authors also show that hugin mutants and thermogenetic/optogenetic activation of hugin-expressing neurons caused reductions in sleep. Furthermore, they show through imaging-based approaches that hugin-expressing neurons activate IPCs. A key finding is that wash on of hugin peptides, Hug-γ and PK-2, in ex vivo brain preparations activates larval IPCs, as assayed by CRTC::GFP imaging. The authors then examine how the PK2-R1, hugin, and IPC manipulations affect adult sleep. Finally, the authors examine how Ca2+ responses through CRTC::GFP imaging in adult IPCs are influenced by the wash on of hugin peptides.

      Strengths:

      (1) This paper builds on previously published studies that examine Drosophila larval sleep regulation. Through the power of Drosophila genetics, this study yields additional insights into what role neuropeptides play in regulation of Drosophila larval sleep.

      (2) This study utilizes several diverse approaches to examine larval and adult sleep regulation, neural activity, and circuit connections. The impressive array of distinct analyses provides new understanding into how Drosophila sleep-wake circuitry in regulated across the lifespan.

      (3) The imaging approaches used to examine IPC activation upon hugin manipulation (either thermogenetic activation or wash on of peptides) demonstrate a powerful approach for examining how changes in neuropeptidergic signaling affect downstream neurons. These experiments involve precise manipulations as the authors use both in vivo and ex vivo conditions to observe an effect on IPC activity.

      Weaknesses:

      (1) There is limited discussion of why statistically significant differences are observed in some genetic and temperature controls. This discussion would better support the authors' conclusions.

      (2) The functional connectivity of the huginPC-IPC circuit in larvae could be better supported by chemogenetics using real-time calcium imaging (GCaMP).

      Comments on revisions:

      I would like to thank the authors for the revisions. The inclusion of all sleep metrics, more detailed descriptions in the methods, & a more thorough comparison to other published articles has addressed most of my concerns.

    3. Reviewer #3 (Public review):

      Summary:

      Sleep affects cognition and metabolism, evolving throughout development. In mammals, infants have fast sleep-wake cycles that stabilize in adults via circadian regulation. In this study, the author performed a genetic screen for neurotransmitters/peptides regulating sleep and identified the neuropeptide Hugin and its receptor PK2-R1 as essential components for sleep in Drosophila larvae. They showed that IPCs express Pk2-R1 and silencing IPCs resulted in significant increase in the sleep amount, which was consistent with the effect they observed in PK2-R1 knock out mutants. They also showed that Hugin peptides, secreted by a subset of Hugin neurons (Hug-PC), activate IPCs through the PK2-R1 receptor. This activation prompts IPCs to release insulin-like peptides (Dilps), which are implicated in the modulation of sleep. They showed that Hugin peptides induce a PK2-R1 dependent calcium (Ca²⁺) increase in IPCs, which they linked to the release of Dilp3, showing a connection between Hugin signaling to IPCs, Dilp3 release and sleep regulation. Additionally, the activation of Hug-PC neurons reduced sleep amounts, while silencing them had the opposite effect. In contrast to the larval stage, the Hugin/PK2-R1 axis was not critical for sleep regulation in Drosophila adults, suggesting that this neuropeptidergic circuitry has divergent roles in sleep regulation across different stages of development.

      Strengths:

      This study used an updated system for sleep quantification in Drosophila larvae and this method allowed precise measurement of larval sleep patterns which is essential for the understanding of sleep regulation.

      The authors performed unbiased genetics screening and successfully identified novel regulators for larval sleep, Hugin and its receptor PK2-R1, making a substantial contribution to the understanding of neuropeptidergic control of sleep regulation.

      They clearly demonstrated the mechanism by which Hugin expressing neurons influence sleep through the activation of IPCs via PK2-R1 with Ca2+ responses and can modulate sleep.

      Based on the demonstrated activation of PK2-R1 by the human Hugin orthologue Neuromedin U, research on human sleep disorders may benefit from the discoveries from Drosophila since sleep regulating mechanisms are conversed across species.

      Weaknesses:

      Previously identified weaknesses have been largely addressed by the authors.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates how neuropeptidergic signaling affects sleep regulation in Drosophila larvae. The authors first conduct a screen of CRISPR knock-out lines of genes encoding enzymes or receptors for neuropeptides and monoamines. As a result of this screen, the authors follow up on one hit, the hugin receptor, PK2-R1. They use genetic approaches, including mutants and targeted manipulations of PK2-R1 activity in insulin-producing cells (IPCs) to increase total sleep amounts in 2nd instar larvae. Similarly, dilp3 and dilp5 null mutants and genetic silencing of IPCs show increases in sleep. The authors also show that hugin mutants and thermogenetic/optogenetic activation of hugin-expressing neurons caused reductions in sleep. Furthermore, they show through imaging-based approaches that hugin-expressing neurons activate IPCs. A key finding is that wash-on of hugin peptides, Hug-γ and PK-2, in ex vivo brain preparations activates larval IPCs, as assayed by CRTC::GFP imaging. The authors then examine how the PK2-R1, hugin, and IPC manipulations affect adult sleep. Finally, the authors examine how Ca2+ responses through CRTC::GFP imaging in adult IPCs are influenced by the wash-on of hugin peptides. The conclusions of this paper are somewhat well supported by data, but some aspects of the experimental approach and sleep analysis need to be clarified and extended.

      Strengths:

      (1) This paper builds on previously published studies that examine Drosophila larval sleep regulation. Through the power of Drosophila genetics, this study yields additional insights into what role neuropeptides play in the regulation of Drosophila larval sleep.

      (2) This study utilizes several diverse approaches to examine larval and adult sleep regulation, neural activity, and circuit connections. The impressive array of distinct analyses provides new understanding into how Drosophila sleep-wake circuitry in regulated across the lifespan.

      (3) The imaging approaches used to examine IPC activation upon hugin manipulation (either thermogenetic activation or wash-on of peptides) demonstrate a powerful approach for examining how changes in neuropeptidergic signaling affect downstream neurons. These experiments involve precise manipulations as the authors use both in vivo and ex vivo conditions to observe an effect on IPC activity.

      Weaknesses:

      Although the paper does have some strengths in principle, these strengths are not fully supported by the experimental approaches used by the authors. In particular:

      (1) The authors show total sleep amount over an 18-hour period for all the measures of 2nd instar larval sleep throughout the paper. However, published studies have shown that sleep changes over the course of 2nd instar development, so more precise time windows are necessary for the analyses in this study.

      (2) Previously published reports of sleep metrics in both Drosophila larvae and adults include the average number of sleep episodes (bout number) and the average length of sleep episodes (bout length). Neither of these metrics is included in the paper for either the larval sleep or adult sleep data. Not including these metrics makes it difficult for readers to compare the findings in this study to previously published papers in the established Drosophila sleep literature.

      (3) Because Drosophila adult & larval sleep is based on locomotion, the authors need to show the activity values for the experiments supporting their key conclusions. They do show travel distances in Figure 2 - Figure Supplement 1, however, it is not clear how these distances were calculated or how the distances relate to the overall activity of individual larvae during sleep experiments. It is also concerning that inactivation of the PK2-R1-expressing neurons causes a reduction in locomotion speed. This could partially explain the increase in sleep that they observe.

      (4) The authors rely on homozygous mutant larvae and adult flies to support many of their conclusions. They also rely on Gal4 lines with fairly broad expression in the Drosophila brain to support their conclusions. Adding more precise tissue-specific manipulations, including thermogenetic activation and inhibition of smaller populations of neurons in the study would be needed to increase confidence in the presented results. Similarly, demonstrating that larval development and feeding are not affected by the broad manipulations would strengthen the conclusions.

      (5) Many of the experiments presented in this study would benefit from genetic and temperature controls. These controls would increase confidence in the presented results.

      (6) The authors claim that their findings in larvae uncover the circuit basis for larval sleep regulation. However, there is very little comparison to published studies demonstrating that neuropeptides like Dh44 regulate larval sleep. Because hugin-expressing neurons have been shown to be downstream of Dh44 neurons, the authors need to include this as part of their discussion. The authors also do not explain why other neuropeptides in the initial screen are not pursued in the study. Given the effect that these manipulations have on larval sleep in their initial screen, it seems likely that other neuropeptidergic circuits regulate larval sleep.

      We thank Reviewer #1 for the constructive comments. According to the suggestions, we have compared the relative sleep amounts of wild-type control and Hugin/PK2-R1/IPCs mutants/manipulations between 6hr-period and 18-hour periods in the 2nd instar larval stage and found consistent sleep phenotypes. We have also showed the sleep metrics data of larva and adults. We have included additional data of locomotion and feeding behavior in wild-type control and Hugin/PK2-R1/IPCs mutants/manipulations, which suggest that sleep phenotypes of Hugin/PK2-R1/IPCs mutants/manipulations are less affected by locomotion and feeding behavior changes. As pointed out, our study could not exclude the possibility that in addition to the Hugin/PK2-R1/IPCs axis, other pathways including DH44 could act in larval sleep control. We have included these points in Discussion. Please see point-to-point responses for details.

      Reviewer #2 (Public review):

      Summary:

      This study examines larval sleep patterns and compares them to sleep regulation in adult flies. The authors demonstrate hallmark sleep characteristics in larvae, including sleep rebound and increased arousal thresholds. Through genetic and behavioral analyses, they identify PK2-R1 as a key receptor involved in sleep modulation, likely via the HuginPC-IPC signaling pathway. Loss of PK2-R1 results in increased sleep, which aligns with previous findings in hugin knockout mutants. While the study presents significant contributions to the field, further investigation is needed to address discrepancies with earlier research and strengthen mechanistic claims.

      Strengths:

      (1) The study explores a relatively understudied aspect of sleep regulation, focusing on larval development.

      (2) The use of an automated behavioral measurement system ensures precise quantification of sleep patterns.

      (3) The findings provide strong genetic and behavioral evidence supporting the role of the HuginPC-IPC pathway in sleep regulation.

      (4) The study has broader implications for understanding the evolution and functional divergence of sleep circuits.

      Weaknesses:

      (1) The manuscript does not sufficiently discuss previous studies, particularly concerning hugin mutants and their metabolic effects.

      (2) The specificity of IPC secretion mechanisms is unclear, particularly regarding potential indirect effects on Dilp2.

      (3) Alternative circuits, such as the HuginPC-DH44 pathway, require further consideration.

      (4) Functional connectivity between HuginPC neurons and IPCs is not directly validated.

      (5) Developmental differences in sleep regulatory mechanisms are not thoroughly examined.

      We thank Reviewer #2 for the positive comments. As suggested, our study could not exclude the possibility that in addition to the Hugin/PK2-R1/IPCs axis, alternative pathways including the Hugin/DH44 axis could contribute to sleep control in larvae. We have added these points in Discussion. We also have added additional data to show mechanistic differences of larval and adult sleep control. Please see point-to-point responses for details.

      Reviewer #3 (Public review):

      Summary:

      Sleep affects cognition and metabolism, evolving throughout development. In mammals, infants have fast sleep-wake cycles that stabilize in adults via circadian regulation. In this study, the author performed a genetic screen for neurotransmitters/peptides regulating sleep and identified the neuropeptide Hugin and its receptor PK2-R1 as essential components for sleep in Drosophila larvae. They showed that IPCs express Pk2-R1 and silencing IPCs resulted in a significant increase in the sleep amount, which was consistent with the effect they observed in PK2-R1 knock-out mutants. They also showed that Hugin peptides, secreted by a subset of Hugin neurons (Hug-PC), activate IPCs through the PK2-R1 receptor. This activation prompts IPCs to release insulin-like peptides (Dilps), which are implicated in the modulation of sleep. They showed that Hugin peptides induce a PK2-R1 dependent calcium (Ca²⁺) increase in IPCs, which they linked to the release of Dilp3, showing a connection between Hugin signaling to IPCs, Dilp3 release, and sleep regulation. Additionally, the activation of Hug-PC neurons reduced sleep amounts, while silencing them had the opposite effect. In contrast to the larval stage, the Hugin/PK2-R1 axis was not critical for sleep regulation in Drosophila adults, suggesting that this neuropeptidergic circuitry has divergent roles in sleep regulation across different stages of development.

      Strengths:

      This study used an updated system for sleep quantification in Drosophila larvae, and this method allowed precise measurement of larval sleep patterns which is essential for the understanding of sleep regulation.

      The authors performed unbiased genetics screening and successfully identified novel regulators for larval sleep, Hugin and its receptor PK2-R1, making a substantial contribution to the understanding of neuropeptidergic control of sleep regulation.

      They clearly demonstrated the mechanism by which Hugin-expressing neurons influence sleep through the activation of IPCs via PK2-R1 with Ca2+ responses and can modulate sleep.

      Based on the demonstrated activation of PK2-R1 by the human Hugin orthologue Neuromedin U, research on human sleep disorders may benefit from the discoveries from Drosophila since sleep-regulating mechanisms are conserved across species.

      Weaknesses:

      The study primarily focused on sleep regulation in Drosophila larvae, showing that the Hugin/PK2-R1 axis is critical for larval sleep but not necessary for adult sleep. The effects of the Hugin axis in the adult are, however, incompletely explained and somewhat inconsistent. PK2-R1 knockout adults also display increased sleep, as does HugPC silencing, at least for daytime sleep. The difference lies in Dilp3/5 mutant animals showing decreased sleep and IPCs seemingly responding with reduced Dilp3 release to PK-2 treatment (Figure 6). It seems difficult to reconcile the author's conclusions regarding this point without additional data. It could be argued that PK2-R1 still regulates adult sleep, but not via Hugin and IPCs/Dilps.

      Another issue might be that the authors show relative sleep levels for adults using Trikinetics monitoring. From the methods, it is not clear if the authors backcrossed their line to an isogenic wild-type background to normalize for line-specific effects on sleep. Thus, it is likely that each line has differences in total sleep time due to background effects, e.g., their Kir2.1 control line showed reduced sleep relative to the compared genotypes. This might limit the conclusions on the role of Hugin/PK2-R1 on adult sleep.

      We thank Reviewer #3 for the valuable comments. According to the suggestions, we have included additional data of adult sleep phenotypes with IPCs/Dilps and HugPC/PK-2 manipulations. We believe that these additional data further support the idea that the Hugin/PR2/IPCs axis acts differently in larval and adult sleep control.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Show all data as individual data points in the graphs. The use of box-and-whisker plots makes it difficult to determine how much variation there is in each experiment.

      According to the comments, we have changed all graphs to the dots-and-whisker plots (Figures 1–6; Figure 1—figure supplements 2–4; Figure 2—figure supplement 1; Figure 3—figure supplement 1 and 3; Figure 5—figure supplement 1; and Figure 6— figure supplements 1 and 3).

      (2) Show all larval sleep metrics (total sleep duration, bout #, bout length, & activity) over the first 6-hour period of 2nd instar development. Larval sleep changes over the course of 2nd instar development so showing an 18-hour period is not as informative for the different manipulations in the study. This also allows for a more thorough comparison to Szuperak et al 2018.

      According to the comments, we have shown all larval sleep metrics (total sleep duration, bout #, bout length, & activity) over the first 6 hours for PK2-R1 KO mutants (Figure 1-figure supplemental 5). These PK2-R1 mutant phenotypes are consistent with those described by our sleep amount data over an 18 hr period (Figure 1-figure supplemental 5). We thus consistently show all the sleep phenotype data in the 18 hr period window in the 2nd instar larvae in this paper.

      (3) Show activity values for every experiment. Behavior is based on locomotion, so there is a need to show that larvae in each manipulation do not have locomotive defects.

      According to the reviewer’s comments, we have shown the activity values for each experiment (Figure 2—figure supplement 1 and Figure 3—figure supplement 1). These data clearly indicated that changes in sleep amounts in each manipulation are not only due to locomotion alterations. We have thus added the sentence below at line 151156 in the manuscript.

      Locomotion changes were not consistently observed upon either activation or suppression of Hug neurons (Figure 3—figure supplement 1), suggesting that changes in sleep amounts is unrelated to locomotor alterations.

      (4) Provide additional explanation as to why PK2-R1 was pursued in the study. There are several candidates in Figure 1 - Figure Supplement 4 (like sNPF-Gal4, Dh31-Gal4, and DskGal4) that have effects on sleep. These have also not been studied in the context of larval sleep regulation.

      According to the reviewer’s comments, we have added the following sentences at line 108-114 in the manuscript.

      The role of PK2-R1 in larval sleep, on the other hand, has been unknown to date. Given its strong expression in insulin-producing cells (Schlegel et al., 2016) and its function as a receptor for the neuropeptide Hugin, which modulates feeding (Schoofs et al., 2014), we hypothesized that PK2-R1 might mediate neuropeptidergic signaling that links metabolic and sleep regulation during development. We thus focused on this gene as a candidate connecting behavioral and endocrine sleep control.

      (5) Insulin manipulations are known to disrupt Drosophila development (Rulifson et al, 2002). Therefore, it would be beneficial to show that larvae develop normally in dilp3 and dilp5 mutants by examining the time to pupal formation in these mutants compared to controls. If the mutant larvae take longer to reach the pupal stage, how do the authors know that the 2nd instar control and mutant larvae are the same developmental age? As indicated above, the developmental age of larvae does affect the total amount of sleep, so this could affect the authors' conclusions.

      We agree that this is an important point in this study. In each experiment, we carefully checked the developmental stage of larvae progeny by mouth hook analysis and measuring larval size and used only larvae with characteristics comparable to wildtype 2nd instar larvae. We have added these descriptions in Methods (line 411–416).

      (6) Figure 1 data is only supported by homozygous mutants & 1 fairly-broadly expressed Gal4 driver. The authors need to show that inactivation of PK2-R1 neurons with more tissuerestrictive Gal4 driver lines has the same effect as the other manipulations to further support the conclusions. Examining sleep in activation of PK2-R1 neurons with the broadly expressed Gal4 driver & UAS-TrpA1 would also provide better support for the conclusions.

      We agree. Indeed, we tried to narrow down to small subsets of neurons using multiple different Gal4 drivers, but unfortunately, we did not obtain potential candidates.

      Therefore, although our data show that the Hugin/PK2-R1axis contributes to sleep control in larvae, we cannot rule out the possibility that other axises could also function in larval sleep control. We mentioned this point in the original version of the submitted manuscript (line 134-137).

      (7) Provide more explanation as to how your methods of defining sleep compare/contrast to published papers. It is not clear how many frames = 1 sec in your recordings. The definition of sleep as 12 frames needs to include a time component as well. This allows for easier comparison to other published papers examining Drosophila larval sleep (Szuperak et al 2018; Churgin et al 2019; Poe et al 2023; Poe et al 2024).

      Our recordings were acquired at 0.87 frames per second. We have added this information in Method (line 431).

      (8) Figure 2 data is only supported by mutants & inactivation with 1 Gal4 driver per cell population. Showing activation of Gal4-expressing cells with UAS-TrpA1 would add more support to the conclusions.

      We have already showed the reduced sleep amounts in both HuginGAL4>ReaChR and HuginGAL4>TrpA larvae (Figure 3 C & D) in the original version.

      (9) Need to clarify in the methods how the authors calculated travel distances as a measure of locomotive activity. It's not clear if this is done during larval sleep experiments or in independent experiments. It is also not clear why the y-axes of Figure 2-Figure Supplement 1 are not consistent across the panels. Finally, the authors do see decreases in locomotive activity in PK2-R1>Kir2.1 and in dilp3 mutants, so the conclusions presented in the results section of the paper need to be modified to reflect those results.

      We calculated travel distances from the same video recording datasets used for sleep quantification. We have added this information in Method (line 431-435). As the reviewer indicated, locomotor activity was reduced in a part of conditions/mutants including PK2-R1 > Kir2.1 and dilp3 mutants, and therefore we cannot exclude the possibility that locomotion changes might contribute to sleep phenotypes. On the other hand, a large part of manipulations of Hugin neurons and IPCs caused a sleep increase without significant changes in locomotor activity (Figure 2—figure supplement 1 and Figure 3—figure supplement 1). It is thus likely that Hugin and IPCs contribute to sleep control independent of locomotion, whereas other neurons trapped by PK2-R1 GAL4 might contribute to locomotion control.

      (10) Given the role that hugin neurons play in Drosophila feeding (Schlegel et al, 2016), the authors should include feeding data for the hugin/PK2-R1 manipulations. It is also unclear from the methods if their thresholding for defining sleep can detect feeding behaviors. Changes in feeding behavior could explain some of the reported increases in sleep if feeding is not classified as a waking but is instead picked up as inactivity.

      We agree that this is an important point. According to reviewer’s points, we have added feeding amounts of the wild-type control and the HuginPC>Kir2.1 larvae (Figure 3-figure supplement 3). These data suggest that feeding amounts of the HuginPC>Kir2.1 larvae are significantly reduced compared to those of the control. Given that our data analysis typically categorized feeding behavior into “moving (not sleep)” (see Materials and Methods) and that HuginPC>Kir2.1 larvae showed increased sleep amounts compared to the wild-type control, it is likely that the increased sleep amounts in HuginPC>Kir2.1 larvae are unrelated to changes in feeding behavior.

      (11) The Hugin-IPC localization data (Figure 3E) would be better supported by the use of more specific synaptic and dendritic markers. Specifically, expressing Syt-eGFP (axon marker) in hugin neurons & DenMark (dendritic marker) in IPCs. Using GRASP or P2X2 to demonstrate the anatomical/functional connections between hugin & IPC neurons would also provide better support for this conclusion.

      According to the reviewer’s suggestion, we have added Syt-eGFP signals in HuginPC neurons (Figure 4—figure supplement 1). We also tried DenMark expression in IPCs, but we could not obtain dipl3>DenMark F1 progeny for unknown season. We also applied GRASP to the HuginPC-IPCs interaction, but we could not detect obvious GRASP signals. It is well known that peptidergic transmission is often independent of conventional synapse structures, called as volume transmission, in which peptidergic signals can transmit over a long-range distance to targeting neurons. It is thus possible that IPCs might receive Hugin signals from HuginPC neurons through volume transmission.

      (12) Figure 4 is missing temperature controls for thermal activation experiments. Also missinggenetic control for UAS/+. It would be more convincing to see experiments in Figure 4 with the more specific hug-PC-Gal4 line instead of the broadly expressed hugin-Gal4 line.

      According to reviewer’s comments, we have added the control data in Figure 4.

      (13) Representative images for Figure 4B & 4C would provide better support for the quantifications & conclusions presented.

      According to the reviewer’s suggestions, we show the representative imagine for Figure 4B and 4C (please see Author response image 1). We are, however, afraid that these images might not help readers’ further understanding in addition to the quantitative data, so we have decided to not add these images in the manuscript.

      Author response image 1.

      mCD8::mCherry (top) and CRTC::GFP (bottom) are shown under high-temperature conditions without ("−") or with ("+") hugin neuron activation. "-" denotes a high-temperature genetic control lacking LexAop-TrpA1, thus no thermogenetic activation occurs. CRTC::GFP is shown in pseudocolor.

      (14) A more zoomed-out image of all the IPC neurons in the bath application of hugin peptides (Figure 5D) would help with the interpretation of the results. It's not clear if the authors only measured the same exact neuron in each IPC cluster or if they examined all of the IPC neurons. If they measured all of the IPC neurons, did they observe similar results across the different neurons? How much variability is there in the response of IPC neurons to hugin peptide application?

      For Figure 5, we obtained images of multiple brains from each genotype and quantified the NLI values from all IPC neurons. For reference, we show plots of the CRTC signals of Figure 5C each brain by bran (Author response image 2). We have added detailed information of CRTC analysis in Methods (lines 552-554).

      Author response image 2.

      Distribution of CRTC signals across individual brains. Plots of nuclear localization index (NLI) for individual brains, corresponding to the conditions shown in Figure 5C. The x-axis represents each larval brain preparation, and each dot indicates the NLI value of a single IPC neuron. Horizontal bars represent the median within each brain. These plots illustrate variability both within and across individual brains.

      (15) The conclusion that application of Hug peptides results in dilp3 release is not well supported (Figure 5E). There is a large amount of variation in anti-dilp3 signal. Representative images for these quantifications would be beneficial. The authors also don't directly show that dilp3 vesicles are released. They only see a reduction in antibody accumulation in IPCs. Could there be other reasons for the reduction in accumulation in the IPCs? Would changes in dilp3 gene expression or membrane localization cause a reduction in signal? Showing that actual release of dilp3 is affected by Hug peptides using a reporter like ANF-GFP would be more convincing.

      According to the reviewer’s comments, we have added representative images (Figure 5—figure supplement 2). As for the ex vivo experiments in Fig5, we treated the extracted brain tissues with Hugin/NMU peptides for only 5minutes. It is thus most likely that reduction of Dilps in IPCs is mediated by Hugin/PK2-R1 signal-dependent secretion, rather than transcriptional control and/or degradation of Dilps.

      (16) Show all sleep metrics (total sleep duration, bout #, bout length, and activity) for adult sleep experiments. Showing relative total sleep for the adult experiments is confusing & would benefit from plots of total average sleep in minutes for each genotype.

      According to the reviewer’s comments, we have added the sleep metrics in adults (Figure 6; Figure 6-figure supplement 3).

      (17) The authors can't conclude that expression patterns of PK2-R1 & hug between larvae & adults are "almost comparable." They don't track neurons over development or immortalize neurons in larvae & check expression patterns in adults. They need to show some type of quantification to support these claims. Or revise the text to remove this conclusion.

      We agree. We have changed our augments as follow (line 211-214).

      Interestingly, the expression patterns of PK2-R1 and Hug as well as the morphology of HugPC neurons in adults appeared to be similar to those in larvae (Figure 6—figure supplement 2), implying that the differential roles of Hug in larvae vs adults are likely due to physiological differences in HugPC neurons and/or IPCs.

      (18) For Figure 6, what effect does genetic inactivation of IPCs have on adult sleep? A more specific manipulation of these cells would provide better support for the conclusion that IPC manipulations have distinct effects on larval & adult sleep. The sleep traces for the hugin manipulation & dilp mutants (Figure 6-Figure Supplement 1) also look inconsistent when comparing genetic controls in (Figure 6-Figure Supplement 1D) or when comparing the dilp mutants. Plotting this data as total sleep amount in the day & night (2 separate graphs) would be beneficial. It would also be helpful to see additional sleep traces for these experiments.

      According to the reviewer’s comments, we have added the sleep amounts of added dilp3 and dilp5 adults (Figure 6-figure supplement 1C-D) as well as IPC silencing (Figure6-figure supplement 3D) in a daytime/night time sleep-separated manner.

      (19) For Figure 6, what effect does thermogenetic activation of hugin neurons have on IPC activity? The authors demonstrate in Figure 5 that thermal activation results in an increase in larval IPC activity, but they do not show these experiments in the adult brain. These experiments would provide more support for their conclusion that hugin has differential effects on IPC activity depending on the developmental age (larvae vs adults).

      According to the reviewer’s comments, we performed thermo-activation of hugin neurons and found no significant effects on adult IPCs (see Author response image 3), consists with the ex vivo data in Figure 6.

      Author response image 3.

      (20) A figure legend is needed for Figure 7. The model is not self-explanatory, nor is there an adequate explanation in the discussion section.

      We have added legends (line 781-785).

      (21) Since hugin is known to be downstream of Dh44 in larvae, the discussion needs to include comparison to published work on Dh44 in larvae (Poe et al, 2023). The hugin receptor, PK2R1, is also expressed in Dh44 & DMS neurons (Schlegel et al, 2016), so a discussion of what role Dh44/DMS neurons may play in their model is necessary.

      We agree. We have added discussion as follow in Discussion (line 313-320).

      We cannot rule out the possibility that other neurons could function downstream of HuginPC neurons in sleep regulation. For instance, given that Dh44 neurons in the brain promote arousal (Poe et al. 2023) and are PK2-R1-positive (Schlegel et al. 2016), Hugin might control sleep in part through Dh44 neurons.

      (22) Minor point: Line 97 should say "resulted in a significant sleep increase." Currently, it says "decrease" which is not what is depicted in the figure.

      We appreciate the reviewer’s point. We have corrected this.

      (23) Minor point: Figure 5 should be renamed as Figure 4 since the text describing the results in Figure 5A & 5B occurs before the text describing the results in Figure 4.

      We do understand the point the reviewer arose. However, since Fig5A explains the experimental setup of the whole Fig5s, we would like to keep Fig5A at the original position.

      Reviewer #2 (Recommendations for the authors):

      First, the study would benefit from a more comprehensive discussion of previous research, particularly the work by Schlegel et al. (2016) and Melcher and Pankratz (2006). A key inconsistency that should be addressed is the observation that hugin mutant larvae exhibit reduced body size and feeding behavior, which may influence Dilp2 secretion. The selective effect on Dilp3 and Dilp5 without affecting Dilp2 warrants further clarification. Conducting conditional gene expression experiments to control hugin, dilp3, and dilp5 expression, along with neuronal activity modulation, would help determine whether the observed effects are direct or secondary consequences.

      According to the review’s comments, we tried to manipulate neuronal activity in IPCs, but unfortunately, expression of Kir2.1 in IPCs caused die or very weak animals. Instead, we cited a recent paper that shows a differential secretion of Dilp2 and Dilp6 in a stimulant-dependent manner (Suzawa et al. PNAS 2025) and added more discussion about selective Dilp3/5 secretion by Hugin-PK2-R1 signals (line 275-297).

      Second, the specificity of IPC secretion mechanisms should be clarified. Given that IPCs coexpress Dilp2, Dilp3, and Dilp5, it remains unclear how the pathway selectively modulates Dilp3 and Dilp5 while leaving Dilp2 unaffected. Additional experiments, such as electron microscopy, could provide insights into whether anatomical differences in vesicular pools influence peptide secretion. Since hugin mutants are reported to have reduced body size, confirming that Dilp2 secretion remains truly unchanged is crucial for eliminating potential indirect effects.

      We thank this reviewer for the valuable suggestions. Since the selective Dilp secretion mechanisms in IPCs are not the main scope in this paper, we would like to attempt detailed EM analysis in next studies. We cited a recent paper that shows a differential secretion of Dilp2 and Dilp6 from IPCs in a stimulant-dependent manner (Suzawa et al. PNAS 2025) and added more discussion about selective Dilp3/5 secretion by Hugin-PK2-R1 signals (line 275-297).

      Third, the study should explore the potential role of alternative circuits, such as the HuginPCDH44 pathway, in sleep regulation. The observation that DH44 mutants exhibit even greater sleep amounts than PK2-R1 mutants suggests the involvement of additional regulatory mechanisms. Prior studies indicate that HuginPC neurons may influence DH44 neuron activity, which could impact sleep. Furthermore, recent findings link DH44 with starvation-induced sleep loss in adult flies. Discussing and experimentally investigating the HuginPC-DH44 axis in larval sleep regulation would provide additional depth to the study.

      As far as we understand, any direct evidence for HuginPC→DH44 pathway has not been reported in larvae as well as adults. Instead, DH44 influences Hugin neuron activity in adults (King et al. 2017). We thus examined whether optogenetic DH44 activation could influence HuginPC activity using CRTC analysis, but unfortunately, we could not detect significant changes in HuginPC activity.

      Given that PK2-R1 is expressed in DH44-positive neurons (Schelgel et al 2016) and that DH44-positive neurons are localized at the regions to which HuginPC neurons innervate, it is still possible that the HuginPC→DH44 pathway might function in parallel to the HuginPC→IPCs pathway. We feel that this is quite an interesting possibility and should be a nice scope in the next paper.

      Fourth, validating the functional connectivity between HuginPC neurons and IPCs using calcium imaging would significantly enhance the study. Employing real-time calcium imaging with GCaMPs would provide direct evidence of synaptic activity between these neuronal populations. Such data would strengthen the claim that the observed sleep regulatory effects result from direct neural communication rather than secondary systemic influences.

      We agree. Indeed, we tried Ca<sup>2+</sup> imaging of HuginPC neurons and IPCs in living larvae as well as using ex vivo preparations, and realized that it was quite technically difficult to obtain reliable Ca<sup>2+</sup> dynamics data in the brain of living larvae/ex vivo brain tissue. Therefore, instead of live Ca<sup>2+</sup> imaging, we performed the CRTC analysis using fixed brain preparations. We have added the mention that we tried Ca<sup>2+</sup> imaging in the larval brain, but it did not work well (line 555-558).

      Finally, a more detailed discussion of developmental differences in sleep regulatory mechanisms would be beneficial. The manuscript should address why genes involved in sleep modulation during development may function differently from their roles in adult sleep regulation. Providing a conceptual framework or experimental evidence to explain these developmental differences would enhance the study's contribution to understanding the evolution of sleep circuits. Clarifying how these findings fit into broader sleep regulation models would increase the impact of the research.

      We agree. We would like to add discussions about how factors/circuits involved in sleep modulation during development may function differently from their roles in adult sleep regulation as follows (line 349-371), as it is rather difficult to discuss why.

      It is thus possible that Hugin/PK2-R1 signaling along the HugPC-IPCs circuitry is suppressed in adults. IPCs in adults receive multiple positive and negative modulatory inputs through GPCRs including the metabotropic GABA<sub>B</sub> receptors (Enell et al., 2010), which suppresses IPC activity and Dilp release in adult IPCs (Enell et al., 2010). It is thus plausible that such negative modulatory inputs to IPCs in adults might counteract with the Hugin/PK2-R1 axis to suppress Dilp release. In addition, our data suggest that Dilps modulate sleep amount in the opposite directions in larvae and adults (Figure 7). Comparing the expression levels and activities of GPCRs in larval and adult IPCs would be essential to better understand how the same modulatory signals over the course of development come to exert differential impacts on sleep. Interestingly, Hugin in adults appears irrelevant for the baseline sleep amount but is required for homeostatic regulation of sleep (Schwarz et al., 2021). Thus, testing if Hugin/PK2-R1 axis is involved in the homeostatic regulation of larval sleep, and how such a system compares to its adult counterpart, may further provide mechanistic insights into how homeostatic sleep regulation matures over development.

      By addressing these aspects, the manuscript will provide a clearer, more robust, and wellsupported analysis of larval sleep regulation. These refinements will help improve the study's clarity and impact, ensuring that its findings are effectively communicated to the research community.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 97: "Silencing neurons expressing Oamb and PK2-R1 resulted in a significant sleep decrease?" But there is an increase in sleep amounts from Figure 1A. (Typo error).

      We thank the reviewer for pointing out this typo. We have corrected this typo in the revised version.

      (2) Line139: "HugPC and IPCs labeled by Dilp3-GAL4 are located in close proximity to each other." While proximity does not equal synaptic connections, direct connectivity of HugPC and IPCs was already shown in larval connectome analyses with HugPC providing the strongest input of larval IPCs (Hückesfeld et al. eLife 2021). This could be cited in this context instead.

      We agree. We have cited this paper in References (line 163).

      (3) Figure 2 Supplement 1: Locomotion speed is affected in PK2-R1 knockouts; what is the significance regarding the observed sleep increase?

      We agree that this is a very important point. As the reviewer pointed out, since locomotion speed was reduced in PK2-R1 KO larvae, sleep increase phenotype in PK2-R1 KO larvae might be in part due to reduction of locomotion. On the other hand, IPCs silencing by Kir2.1caused sleep increase phenotype without significant changes in locomotion (Figure 2; Figure 2-figure supplement 1). It is thus possible that since PK2-R1 is broadly expressed in the nervous system in addition to IPCs (Figure 2), PK2-R1 neurons other than IPCs might contribute to locomotion control.

      (4) Why are Dilp3 levels changing (increasing) in adult IPCs after PK-2 treatment? This is not mentioned in the text and is not discussed at all.

      As the reviewer indicated, this data is unexpected to us. At this moment, we could only assume that PK-2 could act in larval and adult IPCs in a different manner. We have added this sentence in Results (line 211-214).

      (5) It has been shown in other publications that Dilps play a role in sleep regulation (Cong et al., Sleep 2015), this study should be cited.

      We have cited this paper in References (line 224).

      (6) The order of discussing figure panels is sometimes confusing, e.g. Figure 6C is discussed at the very end after 6D-F.

      We agree. Indeed, we discussed a lot about this order during preparation of the first draft. However, we finally decided the current order, as grouping “sleep phenotype data” and “ex vivo data” should be easier to understand for readers. We thus keep the current order in the revised submission.

    1. eLife Assessment

      This manuscript addresses an important and conceptually ambitious question by using a synthetic biology strategy to perturb ATP homeostasis in yeast and examine its causal relationship with lifespan. While the experimental approach and lifespan data are intriguing, the current evidence is incomplete and internally inconsistent, particularly regarding intracellular ATP measurements, transporter directionality, mitochondrial dependence, and the proposed mechanistic model. Substantial clarification, additional controls, and further experimentation will be necessary before the main conclusions can be considered robust and the biological significance of the findings can be fully assessed.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to engineer a synthetic system for manipulating ATP homeostasis in budding yeast by expressing the microsporidian nucleotide transporter NTT1, thereby enabling ATP import from the extracellular environment. Using this system, they attempt to test whether intracellular ATP abundance causally regulates replicative lifespan and whether extracellular ATP sensing contributes independently to longevity pathways. The manuscript presents data from ATP biosensing, transcriptomics, mitochondrial perturbations, and microfluidic aging assays to build a dual-mechanism model linking ATP availability, MAPK signaling, mitochondrial function, and aging trajectories.

      Strengths:

      A major strength of the study is its creative application of xenotopic synthetic biology to directly manipulate ATP homeostasis-an ambitious approach that addresses an important and difficult question in aging biology. The use of complementary methods, including single-cell ATP reporters, microfluidic lifespan measurements, and RNA-seq, generates a rich experimental dataset with the potential to reveal multiple layers of ATP-dependent physiological regulation. The manuscript also raises interesting hypotheses regarding extracellular nucleotide sensing and HOG/MAPK pathway involvement, opening conceptual space for future exploration of ATP-based signaling in yeast.

      Weaknesses:

      Despite these strengths, the manuscript suffers from several critical weaknesses that undermine the central conclusions. Foremost, the intracellular ATP measurements contradict key interpretations: NTT1 expression lowers ATP levels, yet multiple sections assert or assume that NTT1 increases intracellular ATP via import. This unresolved contradiction propagates throughout the mechanistic model. The authors do not consider or experimentally address the more parsimonious explanation that NTT1 may be a bidirectional ATP transporter, which would unify many perplexing results. Several important analyses are missing (e.g., transcriptomic comparison of NTT1 cells with vs. without ATP), and key signaling claims lack proper validation (e.g., Hog1 quantification, AMPK controls). Additionally, inconsistencies in figures-such as incorrect scale bars, mismatched ATP measurements, and a conceptual model contradicted by the data-further detract from clarity. As a result, the manuscript does not yet convincingly achieve its stated aims, and the current evidence does not adequately support the proposed causal relationships between ATP homeostasis and lifespan.

    3. Reviewer #2 (Public review):

      Summary:

      This work presents interesting findings where the addition of exogenous ATP extends the replicative lifespan of yeast cells in a way that seems uncorrelated with actual increased intracellular ATP levels or mitochondria. To be clear, the addition of ATP to yeast growth media increases the number of cell divisions per cell in yeast. Expression of the NTT1 ATP transporter gene increases intracellular ATP levels according to LCMS analysis, but the effect on replicative lifespan works without the NTT1 gene and without an intracellular increase in ATP (possibly with a decrease in intracellular ATP), so the effect appears to be independent of the effect on intracellular ATP levels or mitochondria, as mitochondria-less R0 yeast cells also have increased numbers of cell division when grown with extracellular ATP. The plots in Figure 5 make it seem like exogenous ATP addition lowers intracellular ATP for both the NTT1 cells and the wild-type cells, and that is not what the data in Figure 2d with LCMS shows.

      As an aside, this seems like a better model for increased tumor cell growth in the presence of increased extracellular ATP, which happens in some cancers.

      Restated, the data suggest they were successful in increasing intracellular ATP by LCMS, but not by queen reporter, and that the seemingly likely increased intracellular ATP was not causative, as cells that did not have an increase in intracellular ATP, but had the same exogenous ATP addition, also gained an increase in replicative lifespan. There could also be two distinct mechanisms extending replicative lifespan to the same degree in these two different strains. More measurements, controls, and analyses are needed to accurately determine what is happening with intracellular ATP levels with age. It is currently unknown if there is any correlation between ATP levels and replicative aging (with properly controlled longitudinal measurements).

      Strengths:

      Longitudinal imaging of single cells. Analyzed ATP levels with two approaches. Creative approach to use NTT1 transporter to increase intracellular ATP levels. Solid replicative lifespan data.

      Weaknesses:

      Mostly unclear about ATP levels with age and the relationship, or lack thereo,f between intracellular ATP levels and replicative lifespan. No idea what this effect depends on, but some ideas what it does not depend on (mitochondria or increased intracellular ATP). Experiments seem to lack biological controls (cells without gfp) for age related changes in autofluorescence (and pH that can affect gfp signal) for the fluorescent microscopy quantifying ATP with age using the QUEEN reporter (seems that way as written); conflicting evidence on ATP levels; lack of LC-MS measurements in old cells; no apparent correlation between ATP levels and replicative lifespan, but that could be wrong - just not apparent from the longitudinal data plots. The LCMS data seems better than the microscopy data on ATP because the microscopy approach seems to lack proper biological controls, and the selection of only the top 40% of pixels to quantify signal seems unjustified as written, and possibly prone to technical artifacts. Figure 2 B&C plots of ATP levels should show what the cells were normalized to. The figures also seem too diluted and should probably be combined or put in the supplements (hog1 western) if they do not relate to the lifespan effect. There seem to be some technical scientific editorial errors, like in Figure 7.

    4. Author response:

      Thank you for considering our manuscript, “Engineering ATP Import in Yeast Uncovers a Synthetic Route to Extend Cellular Lifespan” (eLife-RP-RA-2025-109761) for publication in eLife. We appreciate the time and effort invested by the reviewers and editors.

      We have carefully read the eLife assessment and both public reviews. After thorough evaluation, we believe there is a significant factual misunderstanding that has propagated through both reviews and fundamentally affected the interpretation of our central findings and the overall evaluation.

      We must also express concern regarding the review process duration. We were informed that the manuscript experienced an extended review period (107 days) due to delay from a third reviewer. Ultimately, we received only two reviews.

      The raised problem of our manuscript containing obvious internal contradictions or technical inconsistencies are not due to flawed data but due to a misinterpretation of measurement directionality.

      We also acknowledge the fact that we should more explicitly describe the figure legend 5, and that the methods sections should include the experimental design that led to the reverse correlation of the AU units.

      Together these facts led to the misinterpretation of the ATP measurements presented in Figure 5, specifically the directionality of the fluorescence-based ATP readout by both reviewers. In this essay, arbitrary units (AU) are reversely correlated with intracellular ATP abundance. Higher AU values correspond to lower ATP levels. This inverse relationship was clearly described in the Results section and figures marked with “Low versus High” of the manuscript, but it appears to have been overlooked. As a result, reviewers interpreted Figure 5 as contradicting Figure 2, when in fact the two datasets are fully consistent.

      Because this misunderstanding affected interpretation of the foundational ATP data, it appears to have influenced evaluation of all downstream conclusions. For example, neither reviewer meaningfully engaged with:

      - The identification of distinct cell death trajectories.

      - The mitochondrial dependency of NTT1-associated toxicity.

      - The integration of ATP depletion with mitochondrial function.

      - The distinction between intracellular ATP manipulation and extracellular ATP sensing mechanisms.

      We fully understand that when foundational data appears contradictory, reviewers naturally deprioritize downstream conclusions. However, in this case, the foundational contradiction does not exist it arises from a misreading of the reporter’s scale.

      From the Results section of the manuscript:

      “Our analysis of ATP abundance throughout the yeast lifespan showed that yeast cells are born with low ATP levels, which gradually increase during their lifespan. Some cells completed their lifespan without any observable reduction in ATP abundance, while others showed a drastic decrease in ATP levels during late life (Fig. 5A–D, Supplementary File S3), consistent with previous observations supporting two modes of yeast lifespan, mediated by mitochondrial and/or SIR2 function (42,46–49). Consistent with our data presented in Figure 2, we also observed significantly lower ATP abundance in NTT1-expressing cells throughout their entire lifespan compared to Wt control cells (Fig. 5A–C). Furthermore, these cells displayed significantly reduced mean and maximum replicative lifespan (RLS), directly indicating that intracellular ATP depletion shortens lifespan (Fig. 5D). Next, we assessed RLS and age-associated ATP changes under ATP supplementation. We found that exposing NTT1 cells to medium supplemented with 10 µM ATP restored intracellular ATP levels (Fig. 5A–C) and significantly (p = 4.03E-18) increased both mean and maximum RLS to levels comparable to WT cells (Fig. 5D).”

      This section explicitly explains that Figure 5 is consistent with Figure 2. LC-MS data (Figure 2) show intracellular ATP depletion in NTT1 cells under baseline conditions and restoration upon extracellular ATP supplementation. Figure 5 shows the same pattern longitudinally. The apparent contradiction raised by both reviewers stems entirely from misreading the directionality of the AU scale.

      In the public assessment,

      Concerns are raised about:

      - “Internally inconsistent, particularly regarding intracellular ATP measurements”

      - “Mismatched ATP measurements”

      - “Conceptual model contradicted by the data”

      - “The plots in Figure 5 make it seem like exogenous ATP addition lowers intracellular ATP…”

      These statements arise directly from the reversed interpretation of the AU scale. If the inverse relationship had been recognized, these perceived inconsistencies would not exist. Unfortunately, this misunderstanding then influenced broader interpretations, including the conclusion that the fundamental NTT1 model is internally contradictory.

      Similarly, Reviewer #2 states that LC-MS and QUEEN reporter data conflict and that ATP supplementation appears to lower intracellular ATP. This again reflects the same directional misunderstanding. There is no conflict between Figure 2 and Figure 5. Both show reduced ATP in NTT1 cells and restoration upon ATP supplementation.

      A second major point concerns the bidirectional transporter hypothesis. Reviewer #1 suggests that NTT1 may be bidirectional. However, NTT1 is well-characterized in the literature as a nucleotide transporter that exchanges extracellular ATP for intracellular ADP. We clearly described this in Figure 1C and cited the appropriate primary literature. The suggestion that we failed to consider directionality appears to stem from the same misinterpretation of intracellular ATP levels. We agree that clarifying the role of ADP/AMP depletion in NTT1-expressing cells would strengthen the manuscript, and we are prepared to revise the text to more explicitly describe how intracellular nucleotide exchange dynamics contribute to ATP depletion under baseline conditions.

      We also note that several criticisms, such as:

      -“Incorrect scale bars”

      - “Figure 5C does not match 5AB”

      - “Conceptual model contradicted by the data”

      - “No apparent correlation between ATP levels and lifespan”

      Are all rooted in this central misunderstanding of how ATP abundance is represented in the fluorescence measurements.

      To address this constructively during the next revision, we are willing to:

      (1) Revise all relevant figure legends to explicitly state that AU values are inversely correlated with ATP abundance. We will expand materials and methods section for clarifying reverse correlation and/or will generate new figures to minimize the confusion.

      (2) Add clarifying annotations directly onto the figures.

      (3) Include new figures for further validation of observed nucleotide changes.

      (4) We will expand our RNAseq data analyses.

      (5) Expand discussion of nucleotide exchange dynamics and transporter directionality

      (6) Adress remaining concerns with additional analyses, experiments and clarification throughout the manuscript.

    1. eLife Assessment

      This important article reports on the role of specific interneurons in the motion processing circuitry of the fruit fly, and marshals convincing evidence from neural recording, genetic manipulation, and behavioral analysis. A significant result ties the activity of C2/C3 neurons to the temporal resolution of the motion vision system. It remains unclear whether disrupting this pathway affects the dynamics of vision more generally.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Henning et al. examine the impact of GABAergic feedback inhibition on the motion-sensitive pathway of flies. Based on a previous behavioral screen, the authors determined that C2 and C3, two GABAergic inhibitory feedback neurons in the optic lobes of the fly, are required for the optomotor response. Through a series of calcium imaging and disruption experiments, connectomics analysis, and follow-up behavioral assays, the authors concluded that C2 and C3 play a role in temporally sharpening visual motion responses. While this study employs a comprehensive array of experimental approaches, I have some reservations about the interpretation of the results in their current form. I strongly encourage the authors to provide additional data to solidify their conclusions. This is particularly relevant in determining whether this is a general phenomenon affecting vision or a specific effect on motion vision. Knowing this is also important for any speculation on the mechanisms of the observed temporal deficiencies.

      Strengths:

      This study uses a variety of experiments to provide a functional, anatomical, and behavioral description of the role of GABAergic inhibition in the visual system. This comprehensive data is relevant for anyone interested in understanding the intricacies of visual processing in the fly.

      Weaknesses:

      The most fundamental criticism of this study is that the authors present a skewed view of the motion vision pathway in their results. While this issue is discussed, it is important to demonstrate that there are no temporal deficiencies in the lamina, which could be the case since C2 and C3, as noted in the connectomics analysis, project strongly to laminar interneurons. If the input dynamics are indeed disrupted, then the disruption seen in the motion vision pathway would reflect disruptions in temporal processing in general and suggest that these deficiencies are inherited downstream. A simple experiment could test this. Block C2, C3, and both together using Kir2.1 and shibiere independently, then record the ERG. Alternatively, one could image any other downstream neuron from the lamina that does not receive C2 or C3 input.

      Figure 6c. More analysis is required here, since the authors claim to have found a loss in inhibition (ND). However, the difference in excitation appears similar, at least in absolute magnitude (see panel 6c), for PD direction for T4 C2 and C3 block. Also I predict that C2&C3 block statistically different from C3 only, why? In any case, it would be good to discuss the clear trend in the PD direction by showing the distribution of responses as violin plots to better understand the data. It would be also good to have some raw traces to be able to see the differences more clearly, not only polar plots and averages.

      The behavioral experiments are done with a different disruptor than the physiological ones. One blocks chemical synapses, the other shunts the cells. While one would expect similar results in both, this is not a given. It would be great if the authors could test the behavioral experiments with kir2.1 too.

      Comments on revisions:

      I have no further comments.

    3. Reviewer #2 (Public review):

      The work by Henning et al. explores the role of feedback inhibition in motion vision circuits, providing the first identification of inhibitory inheritance in motion-selective T4 and T5 cells of Drosophila. Among the strengths of this work is the verification of the GABAergic nature of C2 and C3 with genetic and immunohistochemical approaches. In addition, double-silencing C2&C3 experiments help to establish a functional role for these cells. The authors holistically use the Drosophila toolbox to identify neural morphologies, synaptic locations, network connectivity, neuronal functions and the behavioral output.

      A limitation of the study is that the mediating neural correlates from C2&C3 to T4&T5 are not clarified, rather Mi1 is found to be one of them. In the future, the same set of silencing experiments performed for C2-Mi1 could be extended to C2 &C3-Tm1 or Tm4 to find the T5 neural mediators of this feedback inhibition loop. Future experiments might also disentangle the parallel or separate function of C2 and C3 neurons.

      In summary, this work advances our current knowledge in Drosophila motion vision and sets the way for further exploring the intricate details of direction selective computations.

      Comments on revisions:

      A label for T5 is missing from Figure 5b. Thank you for addressing our concerns and considering each of our suggestions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Henning et al. examine the impact of GABAergic feedback inhibition on the motion-sensitive pathway of flies. Based on a previous behavioral screen, the authors determined that C2 and C3, two GABAergic inhibitory feedback neurons in the optic lobes of the fly, are required for the optomotor response. Through a series of calcium imaging and disruption experiments, connectomics analysis, and follow-up behavioral assays, the authors concluded that C2 and C3 play a role in temporally sharpening visual motion responses. While this study employs a comprehensive array of experimental approaches, I have some reservations about the interpretation of the results in their current form. I strongly encourage the authors to provide additional data to solidify their conclusions. This is particularly relevant in determining whether this is a general phenomenon affecting vision or a specific effect on motion vision. Knowing this is also important for any speculation on the mechanisms of the observed temporal deficiencies.

      Strengths:

      This study uses a variety of experiments to provide a functional, anatomical, and behavioral description of the role of GABAergic inhibition in the visual system. This comprehensive data is relevant for anyone interested in understanding the intricacies of visual processing in the fly.

      Weaknesses:

      (1) The most fundamental criticism of this study is that the authors present a skewed view of the motion vision pathway in their results. While this issue is discussed, it is important to demonstrate that there are no temporal deficiencies in the lamina, which could be the case since C2 and C3, as noted in the connectomics analysis, project strongly to laminar interneurons. If the input dynamics are indeed disrupted, then the disruption seen in the motion vision pathway would reflect disruptions in temporal processing in general and suggest that these deficiencies are inherited downstream. A simple experiment could test this. Block C2, C3, and both together using Kir2.1 and Shibire independently, then record the ERG. Alternatively, one could image any other downstream neuron from the lamina that does not receive C2 or C3 input.

      Given the prominent connectivity of C2 and C3 to lamina neurons, we actually expected that lamina processing is also affected. We did the experiment of silencing C2 and recording in the lamina neuron L2 and found no significant difference in their response profile (Author response image 1).

      Author response image 1.

      Calcium responses of L2 axon terminals to full field ON and PFF flashes for controls (grey, N=8 flies, 59 cells) or while genetically silencing C2 using shibire<sup>ts</sup> (magenta, N=4 flies, 26 cells). Traces show mean +- SEM.

      We could include these data in the main manuscript, but we do not really feel comfortable in claiming that C2 and C3 have a specific role in motion processing only, even if it was predominantly affecting medulla neurons. To our knowledge, how peripheral visual circuitry contributes to any other visual behaviors, such as object detection, including the pursuit of mating partners, or escape behaviors, is not well understood. Instead, we added a sentence to the discussion stating that our work does not exclude that, given their wide connectivity, C2 and C3 are also involved in other visual computations.

      (2) Figure 6c. More analysis is required here, since the authors claim to have found a loss in inhibition (ND). However, the difference in excitation appears similar, at least in absolute magnitude (see panel 6c), for PD direction for the T4 C2 and C3 blocks. Also, I predict that C2 & C3 block statistically different from C3 only, why? In any case, it would be good to discuss the clear trend in the PD direction by showing the distribution of responses as violin plots to better understand the data. It would also be good to have some raw traces to be able to see the differences more clearly, not only polar plots and averages.

      We apologize: The plots in the manuscript show the mean across all cells, but the statistics were done more conservatively, across flies. We corrected this mismatch and the figure now shows the mean ± ste across flies after first averaging across cells within each fly. Thank you for pointing this out. Since we recorded n=6-8 flies per genotype, we did not include violin plots, which would indeed make sense if we showed data for each cell.

      (3) The behavioral experiments are done with a different disruptor than the physiological ones. One blocks chemical synapses, the other shunts the cells. While one would expect similar results in both, this is not a given. It would be great if the authors could test the behavioral experiments with Kir2.1, too.

      We have tried this experiment, but unfortunately, flies were not walking well on the ball, and we were not able to obtain data of sufficient quality.

      Reviewer #2 (Public review):

      Summary:

      The work by Henning et al. explores the role of feedback inhibition in motion vision circuits, providing the first identification of inhibitory inheritance in motion-selective T4 and T5 cells of Drosophila. This work advances our current knowledge in Drosophila motion vision and sets the way for further exploring the intricate details of direction-selective computations.

      Strengths:

      Among the strengths of this work is the verification of the GABAergic nature of C2 and C3 with genetic and immunohistochemical approaches. In addition, double-silencing C2&C3 experiments help to establish a functional role for these cells. The authors holistically use the Drosophila toolbox to identify neural morphologies, synaptic locations, network connectivity, neuronal functions, and the behavioral output.

      Weaknesses:

      The authors claim that C2 and C3 neurons are required for direction selectivity, as per the publication's title; however, even with their double silencing, the directional T4 & T5 responses are not completely abolished. Therefore, the contribution of this inherited feedback in direction-selective computations is not a prerequisite for its emergence, and the title could be re-adjusted.

      We adjusted the title to “are involved in motion detection.”

      Connectivity is assessed in one out of the two available connectome datasets; therefore, it would make the study stronger if the same connectivity patterns were identified in both datasets.

      We did not assume large differences between the datasets because Nern et al. 2025 described no major sexual dimorphism. To verify this, we now plotted C2 and C3 connectivity from the three major EM datasets that include C2/C3 connectivity, the female FAFB dataset (Zheng et al. 2018, Dorkenwald et al. 2024, Schlegel et al. 2024) the male visual system (Nern et al. 2025), and the 7-column dataset (Takemura et al. 2015) and see no major differences (Author response image 2 and Author response image 3).

      Author response image 2.

      Relative pres- and post-synaptic counts for C3 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      Author response image 3.

      Relative pres- and post-synaptic counts for C2 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      The mediating neural correlates from C2 & C3 to T4 & T5 are not clarified; rather, Mi1 is found to be one of them. The study could be improved if the same set of silencing experiments performed for C2-Mi1 were extended to C2 &C3-Tm1 or Tm4 to find the T5 neural mediators of this feedback inhibition loop. Stating more clearly from the connectomic analysis, the potential T5 mediators would be equally beneficial. Future experiments might also disentangle the parallel or separate functions of C2 and C3 neurons.

      We fully agree that one could go down this route. Given the widespread connectivity of C2 and C3, and the fact that these are time-consuming experiments with often complex genetics, we had decided to instead study the “compound effect” of C2 and C3 silencing by analyzing T4/T5 physiological properties and motion-guided behavior. We now explicitly explain this logic by saying, “To understand the compound effect of C2 and C3 on motion processing, we focused on the direction-selective T4/T5 neurons, which are downstream of many of the neurons that C2 and C3 directly connect to.”

      Finally, the authors' conclusions derive from the set of experiments they performed in a logical manner. Nonetheless, the Discussion could benefited from a more extensive explanation on the following matters: why do the ON-selective C2 and C3 neurons control OFF-generated behaviors, why the T4&T5 responses after C2&C3 silencing differ between stationary and moving stimuli and finally why C2 and not C3 had an effect in T5 DS responses, as the connectivity suggests C3 outputting to two out of the four major T5 cholinergic inputs.

      Apart from the behavioral screen results, we only tested ON edges in our more detailed behavioral characterizations. And while we show phenotypes for the OFF-DS cell T5, it is well established that inhibitory cells that respond to one contrast polarity can function in the pathway with the opposite contrast polarity (e.g., the OFF-selective Mi9 in the ON pathway). We realized that our narrative in the results section was misleading in this regard (we had given the ON selectivity of C2/C3 as one argument why we first focused on the ON pathway) and eliminated this argument.

      For the differential involvement of C2/C3 for T4/T5 responses to stationary and moving stimuli (C2 and C3 silencing affects both T4 and T5 DS responses, but mostly T4 flash responses): We mostly took the disinhibition of flash responses in T4 as a motivation to look more specifically at a potential role in motion-computation. We now added a sentence about the potential emergence of these flash responses to the already extensive discussion paragraph “How could inhibitory feedback neurons affect motion detection in the ON pathway?”

      Last, we added a discussion point about the relationship between C2 and C3 connectivity and the functional consequences, and discussed the fact that C3 connectivity alone does not correlate with a functional role of C3 (alone) in DS computation.

      Reviewer #3 (Public review):

      Summary:

      This article is about the neural circuitry underlying motion vision in the fruit fly. Specifically, it regards the roles of two identified neurons, called C2 and C3, that form columnar connections between neurons in the lamina and medulla, including neurons that are presynaptic to the elementary motion detectors T4 and T5. The approach takes advantage of specific fly lines in which one can disable the synaptic outputs of either or both of the C2/3 cell types. This is combined with optical recording from various neurons in the circuit, and with behavioral measurements of the turning reaction to moving stimuli.

      The experiments are planned logically. The effects of silencing the C2/C3 neurons are substantial in size. The dominant effect is to make the responses of downstream neurons more sustained, consistent with a circuit role in feedback or feedforward inhibition. Silencing C2/C3 also makes the motion-sensitive neurons T4/T5 less direction-selective. However, the turning response of the fly is affected only in subtle ways. Detection of motion appears unaffected. But the response fails to discriminate between two motion pulses that happen in close succession. One can conclude that C2/C3 are involved in the motion vision circuit, by sharpening responses in time, though they are not essential for its basic function of motion detection.

      Strengths:

      The combination of cutting-edge methods available in fruit fly neuroscience. Well-planned experiments carried out to a high standard. Convincing effects documenting the role of these neurons in neural processing and behavior.

      Weaknesses:

      The report could benefit from a mechanistic argument linking the effects at the level of single neurons, the resulting neural computations in elementary motion detectors, and the altered behavioral response to visual motion.

      We agree that we cannot fully draw this mechanistic argument, but we also do not think that this is a realistic goal of this study. Even in a scenario where one would measure the temporal and spatial properties of “all” neurons that are connected to C2 and C3, this would likely not reveal the full mechanisms linking the single neurons to DS computation, but would require silencing specific connections, or specific molecular components of the connection, or could be complemented by models. A beautiful example where such a mechanistic understanding was achieved, recently published in Nature, essentially focused on a single synaptic connection (between Mi9 and T4) (Groschner et al. 2024), and built on extensive work that had already highlighted the importance of these neurons. We would further argue that the field does not have a good understanding of how T4/T5 responses are translated into behavior. Although possible pathways emerge from connectomes, it is for example not understood why the temporal frequency tuning of T4/T5 substantially differs from the temporal frequency tuning of the optomotor response.

      We therefore would like to highlight that the focus of our study was not to connect all those pieces, but rather to highlight the hitherto unknown overall importance of inhibitory feedback neurons for visual computations along the visual hierarchy, from individual neuron properties, via DS computation, to the temporal precision of the optomotor response.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood."

      This is incorrect not only because it is referred to as a general statement, but also because many studies have examined inhibition in flies. It may not be solely GABAergic inhibition, but that is just one type. While some discussions later address feedback from horizontal cells in the retina, etc., there is no mention of work on color vision, which requires feedback. Please rephrase.

      We now say “visual motion processing” in this sentence, and added a sentence on color vision: “... color-opponent signalling requires reciprocal inhibition between photoreceptors as well as feedback inhibition from distal medulla (Dm) neurons. (Schnaitmann et al., 2018, Heath et al., 2020, Schnaitmann et al., 2024). “

      (2) Line 197: "Because a previous studies" One or many?, but more important, please cite them.

      We corrected to “a previous study” and cite Tuthill et al. 2013

      (3) Line 172: I noticed a few minor grammatical errors and wording issues, such as the use of "we next" twice in one sentence. "To next identify potential GABAergic neurons that are important for motion computation in the ON pathway, we next intersected 12 InSITE-Gal4." I am bad at picking them out, but since I noticed them, I would strongly suggest looking at the text carefully again.

      We deleted one occurrence of ‘next’, thank you for catching that.

      (4) Question to the authors. Why did you use twice independent lines and not checkers for the white noise analysis in Figure 3e?

      We used flickering bars because many visual system neurons tested in our lab respond with a better signal-to-noise ratio as compared to checkerboards. Flickering bars also appear to be more suited to isolate the spatial surround of neurons. This type of stimulus has been successfully used in previous studies to extract receptive fields of neurons in the fly visual system (Arenz et al. 2017; Leong et al., 2016, Salazar-Gatzimas et al. 2016; Fisher et al. 2015, …).

      (5) Line 248: "Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects..." Please state how here. I would need to go to the methods.

      We added a sentence “C2 was silenced by expression of UAS-shibire<sup>ts</sup> (UAS-shi<sup>ts</sup>) for temporal control of the inhibition of synaptic activity.”

      (6) Much of the work in the blowfly uses picrotoxinin to block GABAergic inhibition in the visual motion pathway. It would be useful to mention some of this early work and its results, particularly that of Single et al. (1997). It might be interesting to reinterpret their results.

      Thank you for pointing this out. We added this paragraph to the discussion: ‘Work in blowflies has found a severe impact of GABAergic signaling for DS in LPTCs downstream of T4 and T5 cells, using application of picrotoxin to the whole brain (Single et al. 1997; Schmid and Bülthoff 1988). Although the loss of DS in LPTCs could originate from direct inhibitory synapses onto LPTCs (Mauss et al. 2015; Ammer et al. 2023), the disruption of GABAergic signaling in upstream circuitry, which reduces DS in T4 and T5, may also contribute to the phenotype seen in LPTCs.’

      Reviewer #2 (Recommendations for the authors):

      The following set of corrections aims to better the scientific and presentation aspects of this work.

      (1) The title of the work implies that C2 and C3 neurons are required for motion processing, whereas the study shows their participation in motion computations, which persists post their silencing. Therefore, "Inhibitory columnar feedback neurons contribute to Drosophila motion processing" would be a more appropriate title.

      We rephrased the title to say that inhibitory feedback neurons “are involved in” motion processing.

      (2) The morphology of C2 and C3 neurons, i.e., ramifications in medulla & cell body in medulla and axonal targeting to lamina, implies their feedback role. It would be important to mention the specific feedback loop they participate in and the role of Mi1 more extensively in lines 36, 120.

      We find it hard to speculate on the specific feedback loops that C2 and C3 are involved in from their widespread input and output connectivity. If we had, we would have wanted to support this by functional measurements of this specific loop, which was not the goal of this study.

      (3) In lines 55-89, the authors explore the instances of feedback inhibition within and across species and modalities. For the Drosophila visual example (lines 76-89), given that it also addresses motion circuits, the following studies should be included:

      Ammer, G., Serbe-Kamp, E., Mauss, A.S., et al. Multilevel visual motion opponency in Drosophila. Nat Neurosci 26, 1894-1905 (2023). https://doi.org/10.1038/s41593-023-01443-z. Mabuchi Y, Cui X, Xie L, Kim H, Jiang T, Yapici N. Visual feedback neurons fine-tune Drosophila male courtship via GABA-mediated inhibition. Curr Biol. 2023 Sep 25;33(18):3896-3910.e7. doi: 10.1016/j.cub.2023.08.034.

      We added a sentence on the Ammer et al. finding to the introduction. Since the introduction paragraph focuses on known physiological effects within the visual system, we did not find a good fit for the Mabuchi et al. study, which focuses on serotonergic feedback neurons with a role far downstream in courtship behavior.

      (4) In lines 102-103, the following work should be referenced: Groschner LN, Malis JG, Zuidinga B, Borst A. A biophysical account of multiplication by a single neuron. Nature. 2022 Mar;603(7899):119-123. doi: 10.1038/s41586-022-04428-3.

      We cited a few of the many papers that used “modeling frameworks” and selected the ones focusing on the entire feedforward circuitry. To also give credit to the Borst lab, we instead added Serbe et al. 2016 here.

      (5) In lines 107-108, the Braun et al. (2023) study has not performed Rdl knockdown experiments in T4 cells; hence, it needs to be better clarified in the text.

      We corrected this in the text.

      (6) Even though the dataset was previously published, a summary plot of the different phenotypes would be very helpful to the reader. Moreover, in line 131, as the study focuses on motion vision, it would be better to use "early motion visual processing" rather than "early visual processing.”

      We added a summary plot of the behavioral screen data to Supplementary figure 1, and rephrased previous line 131.

      (7) The first result section title excludes C3 neurons, even though in lines 172-179 they are addressed; therefore, the C3 inclusion is suggested as in "GABAergic C2 and C3 neurons control behavioral responses to motion cues". The term "required" should be excluded from the title as the other neuronal types encountered in the InSITE drivers were never quantified; thus, the "behavioral requirement" might come from these other neurons as well.

      From the experiments shown in this paragraph alone we cannot make conclusive claims about C3, as it was also weakly visible in one of our genetic control in the intersectional strategy that we took (we had written: “This strategy also revealed other GABAergic cell types, including the columnar neuron C3 and the large amacrine cell CT1 which were however also weakly present in the gad1-p65AD control).

      We changed the title of this paragraph to: A forward genetic behavioral screen identifies GABAergic C2 neurons to be involved in motion detection.

      (8) In line 142, it should be clearly stated that the MultiColor FlpOut technique was used and should also be cited: Nern A, Pfeiffer BD, Rubin GM. Optimized tools for multicolor stochastic labeling reveal diverse stereotyped cell arrangements in the fly visual system. Proc Natl Acad Sci U S A. 2015 Jun 2;112(22):E2967-76. doi: 10.1073/pnas.1506763112.

      We did not use MCFO clones, but simple Flp-out clones, and the genotype and reference for this were given in the methods: UAS-FRT-CD2y+-RFT-mCD8::GFP; UAS-Flp , (Wong et al. 2002). To make this clearer, we now also cite (Wong et al. 2002) in the results section.

      (9) In Figure 1c, a description of RFP should be written as it is already in Supplementary Figure 1c.

      We added this to the Figure caption.

      (10) In line 172, "next" is redundant as it was previously used at the beginning of the sentence.

      Removed

      (11) In line 175, based on both figures that the authors refer to, instead of C2, C3 should be written.

      We do indeed see C3 labeled in the images, but also in a gad1-p65AD control. We thus cannot be sure if C3 indeed reflects the intersection pattern. However, the three lines shown in Figure 1d clearly also label C2, which is not seen in the control condition.

      (12) In line 184, a split-C2 line is used (and a split C3 as in Supplementary Figure 2). It would enhance the credibility of the work and even be appropriate afterwards to use the word "requirement" if this split-C2 line was used for behavioral experiments, as in Gohl et al., 2011, and Sillies et al.,2013 studies.

      We are indeed using the same split-C2 line for imaging and for behavioral experiments in Figure 7. We see Figure 1 (and with that, Silies et al. 2013) as a first pass screen, from which we obtained candidates, which we then more thoroughly tested throughout the remaining manuscript, with more specific lines. We are no longer using the word “requirement”

      (13) In lines 186-188, is DenMark used as a postsynaptic marker? If yes, an additional control would be the use of Discs-large (DLG) as a postsynaptic marker, as DenMark would not be restricted to postsynaptic densities.

      Yes, we used DenMark as written in the sentence “we expressed GFP-tagged Synaptotagmin (Syt::GFP) to label pre-synapses together with the dendritic marker DenMark (Nicolai et al., 2010)”. Since our claims about widespread C2 and C3 connectivity are further supported by connectomics, we did not use another postsynaptic marker.

      (14) In line 191, L2 is mentioned as presynaptic, whereas in Figure 2b is clearly postsynaptic.

      We write “This revealed that C2 forms several presynaptic contacts with the lamina neurons L5, L1, and L2” . L5, L1, and L2 are hence postsynaptic to C2, which is what is plotted in Figure 2b. 

      (15) In line 197, the "a" in "because a previous studies" should be removed, and these studies should be cited as the authors do in line 514.

      Done as suggested.

      (16) In line 1191, the figure title uses the term "required", whereas the plotted data suggest that T4 and T5 responses remain DS after C2&C3 silencing. Rephrasing to "C2 and C3 affect direction-selective.." would be better suited.

      We replaced “required” with “contribute to”

      (17) In the legend of Figure 2b, the "Counts of synapses" is misleading. The number plotted refers to the percentage of synapse counts from the target neuron.

      Corrected.

      (18) A general question about the C2 and C3 ON selectivity: How would the authors explain the OFF deficits from the published behavioral screening in Supplementary Figure 1a? Do the other InSITE neurons contribute to it? This needs to be further elaborated in the discussion.

      A neuron being ON selective does not imply that it is functionally required in the ON pathway only. In fact, Mi9, a major component of the ON pathway (even if not “required” under many stimulus conditions), is OFF selective.

      Furthermore, both we (Ramos-Traslosheros and Silies, 2021) and others (Salazar-Gatzimas et al. 2019) have shown that both ON and OFF signals are combined in ON and OFF pathways, which is further supported by connectomics data. We clarified the transition from physiology to function in the results section, as already explained above.

      (19) In line 216, the authors' image from layer M1, but the reasoning behind this choice is missing. The explanation gap intensifies after you proceed with further examining the layer-specific responses in Supplementary Figure 2. Is this because C2 and C3 receive their inputs in M1, as is insinuated in line 219?

      As Supplementary Figure 2 shows, we initially imaged from all layers of the medulla, where C2 arborizes. Because the response properties, including kinetics, weren’t different, we had no reason to believe that C2 is highly compartmentalized. We thus subsequently focused on layer M1, where amplitudes were highest. We clarified this in the text.

      (20) In line 229, it should be clear whether the STRFs come from M1 measurements. STRF analysis in M5, M8, and M9/10 also verifies that the C2, C3 multicolumnar span would further strengthen the results. Given the focus of the work in Mi1 and T4/T5, Mi1-C2 connections should be clarified in terms of which medulla layer they formulate. Additionally, the reasoning behind showing in Figure 3 STRFs from M1 measurements, even though Supplementary Figure 2b implies equal responses in M9/10, where also Tm1 and Tm4 output from C3, should be explained.

      We never recorded STRFs in the silenced condition and make no claims about C2 changing spatial properties of Mi1. We added the information that STRFs were recorded in layer M1 to the figure caption. We checked the specific connectivity of C2 and Mi1 and they indeed connect in M1 (Author response image 4), but regardless of this result, there is no evidence for compartmentalization in these columnar neurons.

      Author response image 4.

      Image of a C2 (blue) and Mi1 (yellow) neuron from EM Data (FAFB). Circles depict synapses from C2 to Mi1 in layer M1 of the medulla.

      (21) In Figure 3e, the statistical significance or lack thereof is not visible at the bar plot.

      Consistently throughout the manuscript, we now just indicate if a comparison is significant. If nothing is shown, it means that it is not.

      To clarify this, we added a sentence to the statistics section in the methods now saying: We show significant differences in figures using asterisks (p<0.05 *,p<0.01 **, p<0.001***). Non-significant differences are not further indicated.

      Please note that based on another reviewer comment, we also adapted the analysis of the kernels. This changed the statistics to be significant for the timing of the on peak response (Figure 3e).’

      (22) In line 249, it is mentioned that the strongest C2 connection is Mi1; this does not derive from the data shown in Figure 2b.

      We intended to look at medulla neurons, and Mi1 is the most connected medulla neuron to C2. We clarified that in the text, which now reads: “Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects temporal and spatial filter properties of the medulla neurons that provide direct input to T4 neurons. We chose to test Mi1 as it is the medulla neuron most strongly connected to C2.”

      (23) The result section title "C2 & C3 neurons shape response properties of the ON pathway medulla neuron Mi1" does not include C3 results. This would be fundamental to have. As previously mentioned, the neural correlates of this inhibitory feedback loop should be clearly defined, and the current version of this work evades doing so.

      We corrected the title. As discussed elsewhere, it was not the goal of this study to work the specific contributions of C2 (and C3) to all neurons they connect to, but rather focus on the compound effect for motion detection.

      (24) In line 276, the following work should be cited: Maisak MS, Haag J, Ammer G, Serbe E, Meier M, Leonhardt A, Schilling T, Bahl A, Rubin GM, Nern A, Dickson BJ, Reiff DF, Hopp E, Borst A. A directional tuning map of Drosophila elementary motion detectors. Nature. 2013 Aug 8;500(7461):212-6. doi: 10.1038/nature12320.

      We added the citation.

      (25) In line 273, the title implies the investigation of the spatial filtering of T4 and T5 cells. This does not take place in the respective result section.

      We changed the title to: “C2 and C3 shape temporal and spatial response properties of T4 and T5 neurons.”

      (26) In line 280, Kir2.1 is used, whereas previously thermogenetic silencing with Shibirets was preferred; could the authors elaborate on this choice in the text, for example, genetic reasons?

      We generally prefer shibire[ts] because of its inducible nature. However, our T4/T5 recordings too included more stimuli (motion stimuli) than the Mi1 recordings, and the effect of shi[ts] mediated silencing by pre-heating the flies (as established by Joesch et al. 2010) was not longlasting enough for these experiments, which is why we used Kir2.1. In a previous set of experiments, we had tried incubating flies while imaging, but this induced too large movements of the brain and T4/T5 recordings were not stable enough.

      (27) In lines 290-291, T5 ON suppression is found to be affected by C2 silencing, but the bar plot in Figure 5b uses the OFF-step data. It would be best if the ON-step data for T5 cells were also plotted.

      ON-step data for T5 are plotted in Supplementary Fig. 3e

      (28) In line 288, "when C2 was also blocked", "also" should be included, as you are referring to double silencing.

      Sorry for the confusion, we called the wrong figure in that sentence. Here, we wanted to point at the increased response of T4 to the ON-step upon C2 silencing, which was quantified in Supplementary Fig. 3e.

      (29) In line 312, it is important to mention in the discussion why it is the case that C2 and not C3 had an effect on T5 DS responses. C2 outputs to Tm1, whereas C3 to Tm1 and Tm4, based on Figure 2b, with Tm1 and Tm4 being one of the four major cholinergic T5 inputs. Hence, it would be natural to think that C3 and not C2 would affect T5 responses.

      We addressed this in the discussion.

      (30) In lines 326-328, it is crucial to mention the neural correlates that connect C2 and C3 to T4 and T5. Additionally, the Shinomiya et al. (2019) study shows C3 to T4 connections, which are mentioned in the discussion and should be cited in line 429.

      We do not think that mentioning neural correlates at this point is crucial, as these sentences were concluding a paragraph in which we link C2/C3 silencing to T4/T5 responses. We also do not know the neural correlates (but for Mi1) so this would not be accurate.

      We have been mentioning C3 to T4 connection in both the results and discussion, and our analysis (Figure 2) stems from the FAFB dataset. We added citations to both results and discussion.

      (31) In Figure 6a, compared to Figure 3b, the term compass plots is used instead of polar plots. It would be best to use one consistent term. Additionally, in Figure 6c, it is not mentioned if the responses across genotypes are the outcome of averaging across subtype responses.

      These two plots are not the same; a compass plot is a sub-category of polar plots. Polar plots, as in Figure 3, show the response amplitude of the neurons to the different directions of motion. Instead, compass plots, as in Figure 6, show vectors that depict the tuning direction and the strength of tuning of individual neurons.

      We added the following sentence to clarify the calculation in Figure 6c: ‘To average responses of all neurons, the PD of each neuron was determined by its maximal response to one of 8 directions shown.'

      (32) In line 344, the title could be adjusted to "C2 is controlling the temporal dynamics of ON behavior", under the same reasoning of 'requirements' explained before.

      We think that “is controlling” is a stronger claim than “being required”. For a geneticist, the word “required” simply means that there is a(ny) loss of function phenotype, i.e., a reduction in DS when C2 and C3 are silenced/blocked. Many neurons are sufficient but not required to induce a certain behavior (i.e., they can induce a behavior when ectopically activated, but show no significant loss of function phenotype). We therefore consider it remarkable that C2 and C3 silencing indeed shows a significant reduction in DS.

      However, we do not want to overclaim anything, and the title now reads: “T4 tunes the temporal dynamics of ON behavior”

      (33) In Figure 7c, the plot legend should be "deceleration".

      Corrected

      (34) In line 424, the Braun et al. (2023) experiments were performed in T5 cells as previously mentioned.

      Corrected

      (35) In line 435, the authors mention that both ON-selective C2 and C3 neurons act partially in parallel pathways. In Figure 2b, the upstream circuitry between C2 and C3 is identical. How would they explain the functional-connectivity contradiction?

      In terms of acting in parallel pathways, downstream, not upstream, connectivity of C2 and C3 will matter, which is not identical. C2 for example connects to Mi1, L1, and L4, whereas C3 does not. On the other hand, C3 connects to Mi9 and Tm4, which C2 does not.

      (36) In lines 445-447, the authors address C2 and C3 neurons as columnar, whereas they previously showed in Figure 3 that they are multicolumnar.

      Here, we refer to the nomenclature of Nern et al, that use the term “columnar” whenever something is present in each column. We specifically define this by saying “only 15 cells are truly columnar in the sense that they are present once per column and present in each column”. In the results section, we instead talk about “functionally multicolumnar” and changed a sentence in the discussion to say “The spatial receptive fields of C2 and C3 are consistent with the multicolumnar branching of their projections in the medulla” to avoid any such confusion.

      (37) In line 448, "thus" is repetitive, and the extracted view in line 449 does not contribute to the essence of the study.

      Fixed.

      (38) In line 459, the authors refer to inhibition inheritance; this term should be used frequently in the text in case the neural correlates between C2 & C3 and T4 & T5 are not deciphered.

      We think this point is very clear throughout the manuscript now. As one prominent example, we added a sentence to the first paragraph of the discussion saying “Given the widespread connectivity of C2 and C3 to neurons upstream of T4/T5, this effect [on DS tuning] is likely inherited from upstream neurons of T4/T5.”

      (39) In line 521, the transition between sentences is problematic.

      Corrected

      (40) For Supplementary Figure 1, why were the ON-motion deficits not addressed with the antibody approach used for Supplementary Figure 1a?

      The approach using anti-GABA stainings turned out to be largely redundant with the intersectional strategy. Furthermore, the intersectional strategy provided the full morphology of the cell and, hence, led to easier identification of the cell types involved.

      (41) In line 1169, C2 is mentioned, whereas C3 is annotated in the figure.

      Corrected

      (42) A general comment is that Tm1 inputs could be a good candidate for assessing T5 inputs, as performed for Mi1-T4 in Fig.4. Such experiments would enhance the understanding of inhibitory inheritance to T5 responses.

      We fully agree.

      (42) Do the authors have any indication or experiments done regarding the C2&C3 role in T4&T5 velocity tuning? This would be complementary to the direction of this study.

      This is a good idea, that we had tried. However, we did not see a difference between control and C2 silencing for the temporal frequency tuning of T4/T5. As velocity is closely related to temporal frequency tuning, we would not expect to see a difference there either.

      While it would have been nice to be able to draw such a link, we would also state that our behavioral data are a bit different: We did not look at temporal frequency tuning per se, and overall, it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tunings (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013). 

      (43) As a suggestion, Figure 7 would be better positioned as Figure 4, right after the ON-selectivity finding of C2 neurons.

      We preferred to keep the current order.

      Reviewer #3 (Recommendations for the authors):

      Main recommendation:

      It would be useful to propose a neural circuit model that connects the various observations. One can draw here on the many circuit models for motion vision in the prior literature.

      (1) How might the extended response in upstream neurons Mi1 lead to the inappropriate nulldirection responses in T4/T5?

      This is a good question and we can only speculate. Mi1 responses are enhanced upon C2 silencing and T4 responses to full field flash responses are also enhanced. Likely, these motionindependent responses are also seen when the edge travels into the non-preferred direction, whereas this non-motion response would likely be masked by the motion response to the preferred direction. The phenotype seen in T5 is likely inherited from medulla neurons, e.g. Tm1, to which C2 connects. How the delay of the Mi1 response upon C2 silencing may specifically affect ND responses, we don’t know. 

      (2) How is the loss of DS in T4/T5 compatible with the continued sensitivity to motion in the turning response? Perhaps the signal from 180-degree oppositely tuned T-cells gets subtracted, so as to remove the baseline activity?

      This is a great question that we cannot answer. Overall, perturbations that affect T4/T5 physiology do not necessarily manifest in equivalent phenotypes when looking at behavioral turning responses. Prominent examples come from silencing core neurons of motion-detection circuits, such as Mi1 and Tm3 (see Figure 4, Strother et al. 2017).

      (3) How do the altered dynamics in upstream neurons relate to the loss of high-frequency discrimination in the behavior? One would want to explain why the normal fly has a pronounced decay in the response even though the motion is still ongoing (Figure 7b left, starting at 0.4 s). That decay is missing in the mutant response.

      That is an excellent question that we unfortunately do not have an answer for. Please note that our visual stimuli is a single edge which is sweeping across the eye, and which might not elicit equally strong responses at each position of the eye, or each time during the stimulus presentation.

      In terms of linking the dynamics of upstream neurons to behavior, we already pointed out above that it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tuning, with T4/T5 neurons being tuned to lower temporal frequencies than the turning behavior of a fly walking on a ball (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013).

      Other recommendations:

      (1) Abstract line 37 "At the behavioral level, feedback inhibition temporally sharpens responses to ON stimuli, enhancing the fly's ability to discriminate visual stimuli that occur in quick succession." It may be worth specifying *moving* stimuli.

      Done as suggested

      (2) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood." This seems overly negative. Subsequent text mentions a number of such instances that are understood, and one could add more from the retina.

      We agree. We rephrased to say ‘motion vision’ and added more examples of known roles of feedback inhibition

      (3) Line 69: "inhibitory feedback signals from horizontal cells and amacrine cells to photoreceptors and bipolar cells, respectively, are involved in multiple mechanisms of retinal processing, including global light adaptation, spatial frequency tuning, or the center-surround organization (Diamond 2017)." Maybe add the proven role in temporal sharpening of responses, which is of relevance to the present report.

      We added temporal sharpening to that introduction point.

      (4) Figure 1: The text for this figure talks about behavioral motion detection deficits in various lines. Maybe add an example of the behavioral effects to this figure.

      We added a summary plot of the behavioral screen data to Supplementary figure 1.

      (5) Line 325: "the timing of the ON peak tended to be slower for C3 compared to C2 for both the vertical and the horizontal STRF": It's hard to see evidence for that in the data.

      Based on your next comment we reanalysed the kernels of C2 and C3. This resulted in a significant difference in peak timing between C2 and C3. 

      (6) When presenting kernels as in Figure 3d and Figure 4b, extend the time axis to positive times until the kernel goes to zero. This "prediction of future stimuli" allows the reader to see the degree of correlation within the stimulus, which affects how one interprets the shape of the kernel. Also, plotting the entire peak gives a better assessment of whether there are any shape differences between conditions. An alternative is to compute the kernel via deconvolution, which gets closer to the actual causal kernel, but that procedure tends to highlight high-frequency noise in the measurement.

      We replotted the kernels in Figure 3d and 4b to show positive times. The kernels of C2 and C3 stayed at a positive level. Going back through the data we found a severe decrease in GCaMP signal in the first 2 seconds of the recording. We reanalyzed the kernels by ignoring the first seconds. All kernels now go back to zero. The shape of the kernels did not change but we now find a significant difference in peak timing between C2 and C3. Thank you for pointing this out.

      (7) Line 280 "simultaneously blocked C2 and C3 using Kir2.1": First use of that acronym. Please explain what the method is.

      We now explain “we simultaneously blocked C2 and C3 by overexpression of the inwardrectifying potassium channel Kir2.1”

      (8) Line 350 "temporal dynamics for C2 silencing": suggests "dynamics of silencing"; maybe better "response dynamics during C2 silencing".

      Edited as suggested

      (9) Figure 7: Explain the details of the stimulus containing two subsequent on edges. What happens between one edge and the next? Does the screen switch back to black? Or does the second edge ride on top of the final level of the first edge? This matters for interpreting the response.

      Yes, the screen turns dark between subsequent edge presentations. We added a sentence to the methods to clarify that. 

      (10) Line 402 "novel, critical components of motion computation.": This seems exaggerated. At the behavioral level, motion computation is mostly unaffected, except for some details of time resolution. Whether those matter for the fly's life is unclear.

      We deleted the word ‘critical.’

      (11) Line 413 "GABAergic inhibition required for motion detection is mediated by C2 and C3": Again, this seems exaggerated. Motion *detection* appears to work fine, but the *discrimination* of two closely successive motion stimuli is affected. The rest of the text does properly distinguish "discrimination" from "detection".

      We changed the title to say: ‘GABAergic inhibition in motion detection is mediated by C2 and C3.’

      (12) Line 489 "Whereas the role of C2 and C3 for the OFF pathway may be more generally to suppress neuronal activity,": Unclear to what this refers. The present report emphasizes that there is no effect on OFF activity (Figure 5).

      We did not see an effect of T5 responses to OFF flashes as shown in Figure 5 but we found a significant reduction of DS when silencing C2, as well as slightly overall increased responses to all directions for C2 and C3 silencing, which was significant for null directions when silencing C2. This is shown in Figure 6.

      Typos:

      (1) Line 521.

      Fixed

      (2) Line 1170: context of the citation unclear.

      Fixed

    1. eLife Assessment

      This is a solid paper on intermittent fasting that will be of interest to readers. The data presented are certainly valuable as a resource. The findings of both shared and tissue-specific signatures, both at the proteomic and transcriptomic levels, align well with what has been established and bring new insight into metabolic adaptation and its consequences in muscle, cortex, and liver. The organ specific changes unveiled by proteomics in response to IF reveal unique rewiring of metabolic, signaling and physiological function.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, authors employed comprehensive proteomics and transcriptomics analysis to investigate the systemic and organ-specific adaptations to IF in male and they found that shared biological signaling processes were identified across tissues, suggesting unifying mechanisms linking metabolic changes to cellular communication, which reveal both conserved and tissue-specific responses by which IF may optimize energy utilization, enhance metabolic flexibility, and promote cellular.

      Strengths:

      This study detected multiple organs including liver, brain and muscle and revealed both conserved and tissue-specific responses to IF.

      Weaknesses:

      (1) Why did the authors choose liver, brain and muscle but not other organs such as heart and kidney? The latter are proven to be the large consumer of ketones, which is also changed in the IF treatment of this study.

      (2) The proteomics and transcriptomics analysis were only performed at 4 months. However, a strong correlation between IF and the molecular adaptions should be time points-dependent.

      (3) The context lack section of "discussion", which shows the significance and weakness of the study.

      (4) There is no confirmation for the proteomic and transcriptomic profiling. For example, the important changes in proteomics could be further identified by a Western blot.

    3. Reviewer #2 (Public review):

      Summary:

      Fan and colleagues measure proteomics and transcriptomics in 3 organs (liver, skeletal muscle, cerebral cortex) from male C57BL/6 mice to investigate whether intermittent fasting (IF; 16h daily fasting over 4 months) produces systemic and organ-specific adaptations.

      They find shared signaling pathways, certain metabolic changes and organ-specific responses that suggest IF might affect energy utilization, metabolic flexibility while promoting resilience at the cellular level.

      Strengths:

      The fact that there are 3 organs and 2 -omics approaches is a strength of this study.

      Weaknesses:

      Poor figures presentation and knowledge of the literature. One sex (male).

      On resubmission the Authors' decision to discriminate the organ-specific from the organ-shared effects of intermittent fasting (IF) also enabled them to more precisely determine the lack of correspondence between transcriptomics and proteomics, i.e., not all transcripts lead to protein translation.

    4. Reviewer #3 (Public review):

      Summary:

      Fan et al utilize large omics data sets to give an overview of proteomic and gene expression changes after 4 moths of intermittent fasting (IF) in liver, muscle and brain tissue. They describe common and district pathways altered under IF across tissues using different analysis approaches. Main conclusions presented are the variability in responses across tissues with IF. Some common pathways were observed, but there were notable distinctions between tissues.

      Strengths:

      (1) The IF study was well conducted and ran out to 4 months which was a nice long-term design.

      (2) The multi omics approach was solid and additional integrative analysis was complementary to the illustrate the differential pathways and interactions across tissues.

      (3) The authors did not over-step their conclusions and imply an overreached mechanism.

      Weaknesses:

      The weaknesses, which are minor, include use of only male mice and the early start (6 weeks) of the IF treatment. However, the authors have provided justification on why they chose male mice and the time points used in the study.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors employed comprehensive proteomics and transcriptomics analysis to investigate the systemic and organ-specific adaptations to IF in males. They found that shared biological signaling processes were identified across tissues, suggesting unifying mechanisms linking metabolic changes to cellular communication, which revealed both conserved and tissue-specific responses by which IF may optimize energy utilization, enhance metabolic flexibility, and promote cellular resilience.

      Strengths:

      This study detected multiple organs, including the liver, brain, and muscle, and revealed both conserved and tissue-specific responses to IF.

      We appreciate the recognition of the study’s strengths and the opportunity to clarify the points raised.

      Weaknesses:

      (1) Why did the authors choose the liver, brain, and muscle, but not other organs such as the heart and kidney? The latter are proven to be the largest consumers of ketones, which is also changed in the IF treatment of this study.

      We agree that the heart and kidney are critical organs in ketone metabolism. Our selection of the liver, brain, and muscle was guided by their distinct metabolic functions and relevance to systemic energy balance, neuroplasticity, and locomotor activity, key domains influenced by intermittent fasting (IF). These tissues also offer complementary perspectives on central and peripheral adaptations to IF. Notably, we have previously examined the effects of IF on the heart (eLife 12:RP89214), and we fully acknowledge the importance of the kidney. We intend to include it in future studies to broaden the scope and deepen our understanding of IF-induced systemic responses.

      (2) The proteomics and transcriptomics analyses were only performed at 4 months. However, a strong correlation between IF and the molecular adaptations should be time point-dependent.

      We appreciate this insightful comment. The 4-month time point was selected to capture long-term adaptations to IF, beyond acute or transitional effects. While we acknowledge that molecular responses to IF are time-dependent, our goal in this study was to establish a foundational understanding of sustained systemic and tissue-specific changes. We fully agree that a longitudinal approach would provide deeper insights into the temporal dynamics of IF-induced adaptations. To address this, we are currently undertaking a comprehensive 2-year study that is specifically designed to explore these time-dependent effects in greater detail.

      (3) The context lacks a "discussion" section, which would detail the significance and weaknesses of the study.

      We appreciate this observation. The manuscript was originally structured to emphasize results and interpretation within each section, but we recognize that a dedicated discussion section would enhance clarity and contextual depth. In the revised version, we will add a comprehensive discussion section addressing broader implications, limitations, and future directions of the study.

      (4) There is no confirmation for the proteomic and transcriptomic profiling. For example, the important changes in proteomics could be further identified by a Western blot. 

      We acknowledge the importance of orthogonal validation to support high-throughput findings. While our study primarily focused on uncovering systemic patterns through proteomic and transcriptomic profiling, we agree that targeted confirmation would strengthen the conclusions. To this end, we have included immunohistochemical validation of a key protein common to all three organs— Serpin A1C. Additionally, we are planning a dedicated follow-up study to expand functional validation of several key proteins identified in this manuscript, which will be pursued as a separate project.

      Reviewer #2 (Public review):

      Summary:

      Fan and colleagues measure proteomics and transcriptomics in 3 organs (liver, skeletal muscle, cerebral cortex) from male C57BL/6 mice to investigate whether intermittent fasting (IF; 16h daily fasting over 4 months) produces systemic and organ-specific adaptations. 

      They find shared signaling pathways, certain metabolic changes, and organ-specific responses that suggest IF might affect energy utilization, metabolic flexibility, while promoting resilience at the cellular level.

      Strengths:

      The fact that there are 3 organs and 2 -omics approaches is a strength of this study. 

      We appreciate the reviewer’s recognition of the breadth of our study design. By integrating proteomics and transcriptomics across three metabolically distinct organs, we aimed to provide a comprehensive view of systemic and tissue-specific adaptations to IF. This multi-organ, multi-omics approach was central to uncovering both conserved and divergent biological responses.

      Weaknesses:

      (1) The analytical approach of the data generated by the present study is not well posed, because it doesn't help to answer key questions implicit in the experimental design. Consequently, the paper, as it is for now, reads as a mere description of results and not a response to specific questions.

      We thank the reviewer for this important observation. Our initial aim was to establish a foundational atlas of molecular changes induced by IF across key organs. However, we recognize that clearer framing of the biological questions would enhance interpretability. In the revised manuscript, we will have restructured the introduction, results, and discussion to align more explicitly with specific hypotheses, particularly those related to energy metabolism, cellular resilience, and inter-organ signaling. We have also added targeted analyses and clarified how each dataset contributes to answering these questions.

      (2) The presentation of the figures, the knowledge of the literature, and the inclusion of only one sex (male) are all weaknesses.

      We appreciate this feedback and agree that these are important considerations. Regarding figure presentation, we will revise several figures for improved clarity, add more descriptive legends, and reorganize supplemental materials to better support the main findings. On the literature front, we will expand the discussion to include recent and relevant studies on IF, metabolic adaptation, and sex-specific responses. As for the use of only male mice, this was a deliberate choice to reduce hormonal variability and focus on establishing baseline molecular responses. We fully acknowledge the importance of sex as a biological variable and will soon be conducting studies in female mice to address this gap.

      Reviewer #3 (Public review):

      Summary:

      Fan et al utilize large omics data sets to give an overview of proteomic and gene expression changes after 4 months of intermittent fasting (IF) in liver, muscle, and brain tissue. They describe common and distinct pathways altered under IF across tissues using different analysis approaches. The main conclusions presented are the variability in responses across tissues with IF. Some common pathways were observed, but there were notable distinctions between tissues.

      Strengths:

      (1) The IF study was well conducted and ran out to 4 months, which was a nice long-term design.

      (2) The multiomics approach was solid, and additional integrative analysis was complementary to illustrate the differential pathways and interactions across tissues. 

      (3) The authors did not overstep their conclusions and imply an overreached mechanism.

      We sincerely thank the reviewer for acknowledging the strengths of our study design and analytical approach. We aimed to strike a careful balance between comprehensive data generation and cautious interpretation, and we appreciate the recognition that our conclusions were appropriately framed within the scope of the data.

      Weaknesses:

      The weaknesses, which are minor, include the use of only male mice and the early start (6 weeks) of the IF treatment. See specifics in the recommendations section.

      We appreciate the reviewer’s thoughtful comments. The decision to use male mice and initiate IF at 6 weeks was based on minimizing hormonal variability and capturing early adult metabolic programming. We acknowledge that sex and developmental timing are important biological variables. To address this, we are conducting parallel studies in female mice and evaluating IF initiated at later life stages. These follow-up investigations will help determine the extent to which sex and timing influence the molecular and physiological outcomes of IF.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The editor suggested addressing points regarding the young age at diet onset, use of males only, and justification for the choice of tissues analyzed without requiring new data generation.

      We agree that these are important points for context. We have now added a dedicated paragraph to the Discussion section (page 22) to explicitly acknowledge and discuss these as limitations of our study. We justify our initial experimental design choices in the context of the existing literature while acknowledging the valuable insights that studies in females and with different diet onset timings would provide.

      The editor and reviewers recommended a more integrative analysis, suggesting the use of freely available tools, and a deeper discussion to frame the work against the existing literature.

      We thank the editor for this excellent suggestion. In response to this and the detailed points from Reviewer #2, we have performed a new, integrated multi-omics analysis using Latent variable approaches (DIABLO), implemented in the mixOmics R package version 6.28.0 tool, a state-of-the-art, freely available package for integrative multi-omics analysis. This new analysis, presented in a new Figure 4 and described in the Results section (pages 20-23), identifies the key sources of variation across tissues and omics layers, directly addressing the request for a true integrative approach. Furthermore, we have thoroughly revised the Results and Discussion to more sharply frame our findings and highlight the new insights gleaned from our study.

      The editor requested clarification on whether mice were fasted at euthanasia and to rephrase the statement on page 12 regarding mitochondrial pathways.

      - We have clarified in the Methods section (page 4) that mice were euthanized at the end of their fasting period, precisely detailing the stage of the IF cycle.

      - We thank the editor for this critical correction. We have rephrased the statement on page 12 to more accurately reflect that we observed a lower abundance of proteins involved in mitochondrial oxidative pathways, and we now carefully discuss the important distinction between protein abundance and functional activity in this context.

      The editor noted that the introduction is missing key citations and should acknowledge foundational work.

      We apologize for this oversight. We have now revised the Introduction to include several key foundational citations that were previously missing, ensuring proper credit to the important work of our colleagues.

      Reviewer #2 (Recommendations for the authors):

      We thank the reviewer for their exceptionally detailed and helpful technical suggestions, which have greatly improved the analytical rigor of our manuscript.

      (1) & (4) 3D PCA and Integrated Multi-Omics Analysis:

      We agree with the reviewer that a more sophisticated integrative analysis was needed. As detailed in our response to the editor, we have replaced the original side-by-side analysis with a proper integrated multi-omics analysis using Latent variable approaches (DIABLO), implemented in the mixOmics R package version 6.28.0 tool. This new analysis simultaneously models the proteomic and transcriptomic data from all three organs, identifying shared and tissue-specific sources of variation. This directly and more powerfully validates our claim of "conserved and tissue-specific responses." The results of this analysis are now central to our revised Results section and Figure 4 and supplementary figures (PCA analysis). 

      (2) Concordance/Discordance Analysis:

      This is an excellent point. We have now performed a comprehensive analysis of transcript-protein concordance for the differentially expressed molecules in each tissue. A new figure 4 summarizes these findings, and we discuss the biological implications of both concordant and discordant pairs in the Results section.

      (3) Organ-Specific Functional Remodeling:

      We have taken this advice to heart. The new analysis inherently addresses whether the functional remodeling is shared or tissue-specific. 

      (5) Missing Citations:

      We have thoroughly reviewed the literature and added key citations throughout the manuscript, particularly in the Introduction and Discussion, to properly situate our work within the field.

      (6) Starting Results with Supplementary Data:

      As the study design, including the timing of experimental interventions and blood and tissue collections, is summarized in the supplementary figures, the Results and Discussion section begins with those figures. However, we have now renamed the figures according to the eLife style, in which supplementary figures are linked to the main figures. This ensures a more logical and coherent flow.

      (7) Figure Presentation and Explanation:

      We have completely revised all figures to improve their clarity, consistency, and professional appearance. We have also carefully gone through the manuscript to ensure that every panel in every figure is explicitly mentioned and explained in the main text.

      Reviewer #3 (Recommendations for the authors):

      We thank the reviewer for their important comments regarding the model system.

      (1) Sex Differences and Limitations:

      We fully agree that studying sex differences is a critical and profound aspect of dietary interventions. As noted in our response to the editor, we have added a paragraph to the Discussion to explicitly acknowledge this as a key limitation of our current study. We discuss the existing evidence for sex-specific responses to IF and state that this is an essential direction for future research.

      (2) Early Diet Onset and Developmental Programs:

      This is a valuable point. We have added text to the Discussion acknowledging that starting IF at 6 weeks of age could potentially interact with developmental programs. We discuss this as a consideration for interpreting our data and for the design of future studies.

      We believe that our revised manuscript is substantially stronger as a result of addressing these comments. We are grateful for the opportunity to improve our work and hope that you and the reviewers find these responses and revisions satisfactory.

    1. eLife Assessment

      This useful and interesting study provides evidence that EABR mRNA is at least as effective as standard S mRNA vaccines for SARS-CoV-2. The authors provide convincing justification for the conclusion that the inconsistent statistical significance for Omicron is likely due to immune imprinting or original antigenic sin. In this regard, the significance of the findings is stronger as it points to possible challenges for updated vaccine strategies in overcoming immune imprinting.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigated the immunogenicity of a novel bivalent EABR mRNA vaccine for SARS-CoV-2 that expresses enveloped virus-like particles in pre-immune mice as a model for boosting the population that is already pre-immune to SARS-CoV-2. The study builds on promising data showing a monovalent EABR mRNA vaccine induced substantially higher antibody responses than a standard S mRNA vaccine in naïve mice. In pre-immune mice, the EABR booster increased the breadth and magnitude of the antibody response, but for Omicron, the effects were modest and often not statistically significant. The authors provide compelling evidence to support this may be due to immune imprinting.

      This study also builds on prior work with additional experiments to elucidate the mechanisms that contributed to the EABR increased immunogenicity in naive mice including evidence that the vaccine is inducing responses to more RBD epitopes and a potential role for heterodimer formation as a mechanism whereby bivalent vaccines induce cross-reactive B cell responses.

      Strengths:

      Evaluating a novel SARS-CoV-2 vaccine that was substantially superior in naive mice in pre-immune mice as a model for its potential in the pre-immune population.

      Providing insight into a possible role of immune imprinting in shaping immune responses to updated booster immunizations.

      Minor weaknesses:

      (1) Overall, immune responses against Omicron variants were substantially lower than against the ancestral Wu-1 strain that the mice were primed with. The authors speculate this is evidence of immune imprinting. While parallel controls (mice immunized 3 times with just the bivalent EABR vaccine) were not tested, the authors point to prior published work showing Omicron S antigen is a strong immunogen. This indicates the lower immune responses to Omicron are likely due to immune imprinting (or original antigenic sin) and not due to S immunogen being inherently less immunogenic than the S protein from the ancestral Wu-1 strain.

      (2) The authors reported statistically significant increase in antibody responses with the bivalent EABR vaccine booster when compared to the monovalent S mRNA vaccine but consistently failed to show significantly higher responses when compared to the bi-valent S mRNA vaccine suggesting that in pre-immune mice, the EABR vaccine has no apparent advantage over the bivalent S mRNA vaccine which is the current standard. There were, however, some trends indicating the group sizes were insufficiently powered to see a difference. The discussion acknowledges these limitations of their studies and potential limited benefits of the EABR strategy in pre-immune mice vs standard bivalent mRNA vaccine.

      (3) The EABR S mRNA vaccine was superior to the conventional mRNA S vaccine in naïve mice but not in pre-immune mice. The authors expanded the discussion to propose a possible role for immune imprinting in this result which is supported by the data.

    3. Reviewer #3 (Public review):

      Summary:

      The authors evaluated a novel bivalent (Wu1/BA.5 based) mRNA platform that uses the EABR strategy to produce enveloped virus-like particles for vaccination. These were tested as boosters in the context of pre-existing immunity in mice that received two prior immunizations with conventional Wu1 mRNA vaccines. The animal experimental timeline aimed at mimicking the vaccinations/booster schedule implemented during the COVID-19 pandemia. The authors tested and compared different booster strategies: (1) conventional Wu1 S protein encoding mRNA vaccine, (2) EABR Wu1 S protein encoding mRNA vaccine that produces enveloped virus-like particles, (3) conventional Wu1/BA.5 S protein encoding mRNA vaccine, and (4) EABR Wu1/BA.5 S protein encoding mRNA vaccine that produces enveloped virus-like particles. The EABR approach (monovalent or bivalent) enhanced the antibody response against Wu1 and Omicron subvariants. Interestingly, the bivalent EABR Wu1/BA.5 mRNA (strategy 4) generated polyclonal sera targeting multiple receptor-binding domain epitopes: these sera were more diverse than those generated with the other tested booster strategies (1 to 3).

      Strengths:

      The monovalent Wu1 S-EABR mRNA booster led to increase in antibody binding to tested Omicron variants (BA.5, BQ.1.1, XBB.1), while the bivalent Wu1/BA.5 S-EABR mRNA booster led to the highest Ab response against Omicron variants (BA.5, BQ.1.1, XBB.1) in pre-vaccinated mice.

      Neutralization assays showed that the monovalent Wu1 S-EABR mRNA booster had the highest Wu1 neutralization activity and to a lesser extent the early BA.1 early Omicron variant. The monovalent Wu1 S-EABR mRNA booster and bivalent Wu1/BA.5 S-EABR mRNA booster had similar BA.5 neutralizing activity. Neutralizing activity of the different boosters was less pronounced with later Omicron variants BQ.1.1 and XBB.1. However, of the different boosters tested, the bivalent Wu1/BA.5 S-EABR mRNA booster induced the highest neutralizing titers. These results support that the EABR mRNA vaccine strategy helps improve neutralizing activity against different tested Omicron subvariants: a few (1 or 2) mRNA constructs expressing major antigens in enveloped virus-like particles likely provide a novel strategy to elicit an immune response that has the potential to neutralize subsequent variants.

      The EABR enveloped virus-like particle strategy induces a more diverse antibody response, including epitopes not recognized by the other booster strategies: these new epitopes could play a role in neutralizing activity against new future variants.

      Moreover, the bivalent Wu1/BA.5 S-EABR mRNA booster could potentially produce heterotrimeric S proteins to help activation of cross-reactive B cells and increase polyclass antibody responses.

      Weaknesses:

      When it comes to later Omicron variants (BQ.1.1 and XBB.1), there is a discrepancy between epitope binding response and neutralization titers: only a few binding antibodies have neutralizing activity with these later variants, showing a limitation of the EABR strategy.

      The authors showed that the EABR mRNA strategy represents a novel antigen exposing strategy where antigens are produced at the cell surface and also at the surface of enveloped virus-like particles. This allows the production of novel antigens in addition to those that would be typically generated against cell surface exposed antigens. These novel antigens targeting new epitopes could potentially have neutralizing activity.

      Using a bivalent EABR mRNA booster led to higher antibody titers and higher neutralizing activity. The challenge is to select the best antigen target/variant to support neutralizing activity against later virus variants.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This report provides useful evidence that EABR mRNA is at least as effective as standard S mRNA vaccines for the SARS-CoV-2 booster vaccine. Although the methodology and the experimental approaches are solid, the inconsistent statistical significance throughout the study presents limitations in interpreting the results. Also, the absence of results showing possible mechanisms underlying the lack of benefit with EABR in the pre-immune makes the findings mostly observational.

      Thank you for your assessment of our study. Respectfully, we do not agree that our study shows a lack of benefit of using the EABR approach. For the monovalent boosters, the S-EABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the regular S mRNA booster, which is consistent with the findings from our prior study in naïve mice. In addition, the bivalent S-EABR booster consistently elicited the highest neutralizing titers against all tested variants, including significantly higher titers against BA.5 and BQ.1.1 than the monovalent S booster. The bivalent S-EABR booster also induced detectable neutralization activity in a larger number of mice than all other boosters.

      Consistent with this analysis, please note that reviewers 1 and 2 commented that “the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant” (reviewer 1) and “the authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting” (reviewer 2).

      We agree with the reviewers’ assessment that the EABR booster-mediated improvements were mostly modest, in particular against the BQ.1.1 and XBB.1 strains. We also acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and time-consuming given that we already included 10 mice per group, which is standard practice in the vaccine field.

      Finally, we also wish to point out that we did include experiments that addressed potential mechanistic differences between booster groups. For example, we conducted deep mutational scanning studies to determine polyclonal antibody epitope mapping profiles, showing that bivalent S-EABR boosters induced more balanced targeting of multiple RBD epitopes, which likely contributed to the observed improvements in neutralization. Our work also included cryo-EM studies demonstrating that bivalent S mRNA boosters promote heterotrimer formation, which could potentially drive preferential stimulation of cross-reactive B cells via intra-spike crosslinking. This represents a potential mechanism explaining how bivalent boosters outperformed monovalent boosters in our and many prior studies, which warrants further investigation. Finally, we also performed serum depletion assays, showing that the BA.5 neutralizing activity elicited by the bivalent Wu1/BA.5 S and S-EABR mRNA boosters was primarily driven by cross-neutralizing Abs induced by the primary vaccination series.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigated the immunogenicity of a novel bivalent EABR mRNA vaccine for SARS-CoV-2 that expresses enveloped virus-like particles in pre-immune mice as a model for boosting the population that is already pre-immune to SARS-CoV-2. The study builds on promising data showing a monovalent EABR mRNA vaccine induced substantially higher antibody responses than a standard S mRNA vaccine in naïve mice. In pre-immune mice, the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant.

      We thank the reviewer for their accurate summary of our study. Please see our comments to the reviewer’s individual points below, as well as our responses to the editor’s assessment above.

      Strengths:

      Evaluating a novel SARS-CoV-2 vaccine that was substantially superior in naive mice in pre-immune mice as a model for its potential in the pre-immune population.

      Weaknesses:

      (1) Overall, immune responses against Omicron variants were substantially lower than against the ancestral Wu-1 strain that the mice were primed with. The authors speculate this is evidence of immune imprinting, but don't have the appropriate controls (mice immunized 3 times with just the bivalent EABR vaccine) to discern this. Without this control, it's not clear if the lower immune responses to Omicron are due to immune imprinting (or original antigenic sin) or because the Omicron S immunogen is just inherently more poorly immunogenic than the S protein from the ancestral Wu-1 strain.

      The reviewer raises an important point, and we agree that including additional groups receiving three immunizations with the bivalent spike and/or spike-EABR mRNA vaccines would have improved the experimental design. However, we believe that several prior studies have already demonstrated that Omicron S immunogens are not inherently poorly immunogenic compared to the ancestral S; e.g., Scheaffer et al., Nat Med (2022); Ying et al., Cell (2022); Muik et al., Sci Immunol (2022). Based on these prior reports, we conclude that the lower neutralizing titers against Omicron variants in our study are most likely driven by immune imprinting as a result of the initial vaccination series with the ancestral S immunogen.

      (2) The authors reported a statistically significant increase in antibody responses with the bivalent EABR vaccine booster when compared to the monovalent S mRNA vaccine, but consistently failed to show significantly higher responses when compared to the bivalent S mRNA vaccine, suggesting that in pre-immune mice, the EABR vaccine has no apparent advantage over the bivalent S mRNA vaccine which is the current standard. There were, however, some trends indicating the group sizes were insufficiently powered to see a difference. This is mostly glossed over throughout the manuscript. The discussion section needs to better acknowledge these limitations of their studies and the limited benefits of the EABR strategy in pre-immune mice vs the standard bivalent mRNA vaccine.

      We acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and timeconsuming given that we already included 10 mice per group, which is standard practice in the vaccine field. We added a “Limitations of the study” section at the end of the discussion to address all of these points in detail (lines 570-598 in the revised version).

      (3) The discussion would benefit from additional explanation about why they think the EABR S mRNA vaccine was substantially superior in naïve mice vs the standard S mRNA vaccine in their previously published work, but here, there is not much difference in pre-immune mice.

      As we pointed out in our response to the editor’s assessment above, the monovalent SEABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the conventional monovalent S mRNA booster, which is largely consistent with the findings from our prior study in naïve mice. Although the bivalent S-EABR mRNA booster consistently elicited higher neutralizing titers than the conventional bivalent S mRNA booster, we agree with the reviewer that these improvements were modest and not statistically significant. Overall, neutralizing activity against later Omicron variants, such as BQ.1.1 and XBB.1 was low. We attributed this finding to immune imprinting (see response to point (1) above) and acknowledged that the EABR approach was not able to effectively overcome this effect (see discussion section of the paper, lines 537-558; and “Limitations of the study” section, lines 570-598 in the revised version).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Fan, Cohen, and Dam et al. conducted a follow-up study to their prior work on the ESCRT- and ALIX-binding region (EABR) mRNA vaccine platform that they developed. They tested in mice whether vaccines made in this format will have improved binding/neutralization antibody capacity over conventional antigens when used as a booster. The authors tested this in both monovalent (Wu1 only) or bivalent (Wu1 + BA.5) designs. The authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting. Deep mutational scanning experiments suggested that the improvement of the EABR format may be due to a more diversified antibody response. Finally, the authors demonstrate that co-expression of multiple spike proteins within a single cell can result in the formation of heterotrimers, which may have potential further usage as an antigen.

      We thank the reviewer for their support and for the accurate summary and evaluation of our study.

      Strengths:

      (1) The experiments are conducted well and are appropriate to address the questions at hand. Given the significant time that is needed for testing of pre-existing immunity, due to the requirement of pre-vaccinated animals, it is a strength that the authors have conducted a thorough experiment with appropriate groups.

      (2) The improvement in titers associated with EABR antigens bodes well for its potential use as a vaccine platform.

      Weaknesses:

      As noted above, this type of study requires quite a bit of initial time, so the authors cannot be blamed for this, but unfortunately, the vaccine designs that were tested are quite outdated. BA.5 has long been replaced by other variants, and importantly, bivalent vaccines are no longer used. Testing of contemporaneous strains as well as monovalent variant vaccines would be desirable to support the study.

      We thank the reviewer for bringing up this important point. We agree that the variants used for this study are now outdated, and it would have been informative to evaluate conventional and EABR boosters against contemporaneous strains. However, as the reviewer correctly pointed out, this type of study requires a substantial amount of time to conduct and will therefore will likely always be outdated by the time the data are analyzed and prepared for publication. To accurately assess immune responses against recent or current strains in mice, multiple boosters would have been needed to mimic the pre-existing immune context in the human population in 2025. Assuming intervals of 6-7 months between boosters (as used in this study to mimic booster intervals in the human population as closely as possible), this type of study would have been challenging to conduct, especially given the limited lifespan of mice. Thus, we performed this proof-of-concept study using outdated variants to assess the potential of EABR-modified boosters. We greatly appreciate the reviewer’s understanding and acknowledge this limitation of our study, which is highlighted in the added “Limitations of the study” section in the revised version of the manuscript (lines 570-598).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The acronym RBD in the title should be spelled out.

      We thank the reviewer for raising this point. We made this change in the revised version of the paper.

      (2) Lines 167-168 describe no differences between the cohorts at day 244. It should also be stated that for all timepoints, there are no significant differences.

      We modified the revised manuscript according to the reviewer’s suggestion (line 170).

      Reviewer #2 (Recommendations for the authors):

      (1) Given the focus on developing broad vaccines for future coronavirus outbreaks, it would be particularly informative to test whether the EABR antigens elicit broadened/heightened responses against other (beta)coronaviruses. If enough serum is left, it would seem straightforward to conduct neutralization assays against non-SARSCoV-2 coronaviruses.

      We thank the reviewer for this valid suggestion. Unfortunately, the extensive analysis of the serum samples, including spike and RBD ELISAs and neutralization assays against multiple variants, deep mutational scanning, and depletion assays, used up the serum samples for most mice. We agree that it would be interesting to investigate whether bivalent EABR boosters elicit pan-sarbecovirus responses in future studies.

      (2) In the bar plots for antibody titer changes, shown as log10 fold change, it is quite hard to interpret the difference between bars (e.g., what is the fold change difference between each bar in the same time point?). A table of mean {plus minus} SD values would be helpful.

      That’s a great suggestion. We added a table (Table S1) presenting all the geometric mean neutralization titers for all timepoints and variants in the revised version of the manuscript.

      (3) The development of heterotrimers as potential antigens is very interesting, but it seems out of place in the current manuscript. This should likely be in a separate, standalone manuscript.

      We thank the reviewer for commenting on the heterotrimer part of our manuscript. The presented work was not intended to advance the development of heterotrimers as potential antigens. Instead, our findings demonstrate that bivalent spike mRNA vaccines readily generate heterotrimers, which could promote intra-spike crosslinking and potentially impact antibody epitope targeting profiles as suggested by the deep mutational scanning data for the bivalent S-EABR mRNA booster (Fig. 4; Fig. S7-8). We think this is an important consideration that warrants further investigation with regards to the development of future bivalent or multivalent vaccines.

      (4) As a minor note, the sequences of the variants used or accession numbers should be provided in the Methods, since different groups have used different mutations for variants.

      We added the accession numbers for the vaccine strains used in this study (lines 604605).

    1. eLife Assessment

      These findings are among some of the first to identify a behavioral and neurobiological substrate that disentangles nonassociative from associative fear responses following stress, providing a fundamental push forward in the field. The evidence supporting this is compelling and uses a variety of conceptual and technological approaches. This investigation will be of interest to neuroscientists and behaviourists broadly, as well as clinicians for its relevance to post-traumatic stress disorder.

    2. Reviewer #1 (Public review):

      Summary:

      This study delineates a highly specific role for the pPVT in unconditioned defensive responses. The authors use a novel, combined SEFL and SEFR paradigm to test both conditioned and unconditioned responses in the same animal. Next, a c-fos mapping experiment showed enhanced PVT activity in the stress group when exposed to the novel tone. No other regions showed differences. Fiber photometry measurements in pPVT showed enhancement in response to the novel tone in the stressed but not non-stressed groups. Importantly, there were also no effects when calcium measurements were taken during conditioning. Using DREADDS to bidirectionally manipulate global pPVT activity, inhibition of the PVT reduced tone freezing in stressed mice while stimulation increased tone freezing in non-stressed mice.

      Strengths:

      A major strength of this research is the use of a multi-dimensional behavioral assay that delineates behavior related to both learned and non-learned defensive responses. The research also incorporates high-resolution approaches to measure neuronal activity and provide causal evidence for a role for PVT in a very narrow band of defensive behavior. The data are compelling, and the manuscript is well-written overall.

      Weaknesses:

      Figure 1 shows a small, but looks to be, statistically significant, increase in freezing in response to the novel tone in the no-stress group relative to baseline freezing. This observation was also noticed in Figures 2 and 7. The tone presented is relatively high frequency (9 kHz) and high dB (90), making it a high-intensity stimulus. Is it possible that this stimulus is acting as an unconditioned stimulus? In addition, in the final experiment, the tone intensity was increased to 115 dB, and the freezing % in the non-stressed group was nearly identical (~20%) to the non-stressed groups in Figures 1-2 and Figure 7. It seems this manipulation was meant as a startle assay (Pantoni et al., 2020). Because the auditory perception of mice is better at high frequencies (best at ~16 kHz), would the effect seen be evident at a lower dB (50-55) at 9 kHz? If the tone was indeed perceived as "neutral," there should be no freezing in response to the tone. This complicates the interpretation of the results somewhat because while the authors do admit the stimulus is loud, would a less loud stimulus result in the same effect? Could the interaction observed in this set of studies require not a novel tone, but rather a high-intensity tone that elicits an unconditioned response? Along these same lines, it appears there may be an elevation in c-fos in the PVT in the non-stress tone test group versus the no-stress home cage control, and overall it appears that tone increases c-fos relative to homecage. Could PVT be sensitive to the tone outside of stress? Would there be the same results with a less intense stimulus? I would also be curious to know what mice in the non-stressed group were doing upon presentation of the tone besides freezing. Were any startle or orienting responses noticed?

      Comments on revisions:

      Following revision, this reviewer felt all of the above concerns were addressed.

    3. Reviewer #2 (Public review):

      Summary:

      Nishimura and colleagues present findings of a behavioral and neurobiological dissociation of associative and nonassociative components of Stress Enhanced Fear Responding (SEFR).

      Strengths:

      This is a strong paper that identifies the PVT as a critical brain region for SEFR responses using a variety of approaches, including immunohistochemistry, fiber photometry, and bidirectional chemogenetics. In addition, there is a great deal of conceptual innovation. The authors identify a dissociable behavior to distinguish the effects of PVT function (among other brain regions).

      Weaknesses:

      (1) The authors find a lack of difference between the Stress and No Stress groups in pPVT activity during SEFL conditioning with fiber photometry but an increase in freezing with Gq DREADD stimulation. How do authors reconcile this difference in activity vs function?

      (2) Because the PVT plays a role in defensive behaviors, it would be beneficial to show fiber photometry data during freezing bouts vs exclusively presented during tone a shock cue presentations.

      (3) Similar to the above point, were other defensive behaviors expressed as a result of footshock stress or PVT manipulations?

      (4) Tone attenuation in Figure 8 seems to be largely a result of minimal freezing to a 115-dB tone. While not a major point of the paper, a more robust fear response would be convincing.

      (5) In the open field test, the authors measure total distance. It would be beneficial to also show defensive behavioral (escape, freezing, etc) bouts expressed.

      (6) The authors, along with others, show a behavioral and neural dissociation of footshock stress on nonassociative vs associative components of stress; however, the nonassociative components as a direct consequence of the stress seem to be necessary for enhancement of associative aspects of fear. Can authors elaborate on how these systems converge to enhance or potentiate fear?

      (7) In the discussion, authors should elaborate on/clarify the cell population heterogeneity of the PVT since authors later describe PVT neurons as exclusively glutamatergic.

      Comments on revisions:

      Following revision, this reviewer felt all of the above concerns were addressed.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Nishimura et al. examines the behavioural and neural mechanisms of stress-enhanced fear responding (SEFR) and stress-enhanced fear learning (SEFL). Groups of stressed (4 x shock exposure in a context) vs non-stressed (context exposure only) animals are compared for their fear of an unconditioned tone, and context, as well as their learning of new context fear associations. Shock of higher intensity led to higher levels of unlearned stress-enhanced fear expression. Immediate early gene analysis uncovered the PVT as a critical neural locus, and this was confirmed using fiber photometry, with stressed animals showing an elevated neural signal to an unconditioned tone. Using a gain and loss of function DREADDs methodology, the authors provide convincing evidence for a causal role of the PVT in SEFR.

      Strengths:

      (1) The manuscript uses critical behavioural controls (no stress vs stress) and behavioural parameters (0.25mA, 0.5mA, 1mA shock). Findings are replicated across experiments.

      (2) Dissociating the SEFR and SEFL is a critical distinction that has not been made previously. Moreover, this dissociation is essential in understanding the behavioural (and neural) processes that can go awry in fear.

      (3) Neural methods use a multifaceted approach to convincingly link the PVT to SEFR: from Fos, fiber photometry, gain and loss of function using DREADDs.

      Weaknesses:

      No weaknesses were identified by this reviewer; however, I have the following comments:

      A closer examination of the Test data across time would help determine if differences may be present early or later in the session that could otherwise be washed out when the data are averaged across time. If none are seen, then it may be worth noting this in the manuscript.

      Given the sex/gender differences in PTSD in the human population, having the male and female data points distinguished in the figures would be helpful. I assume sex was run as a variable in the statistics, and nothing came as significant. Noting this would also be of value to other readers who may wonder about the presence of sex differences in the data.

      Comments on revisions:

      Following revision, this reviewer felt all of the above comments were addressed.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study delineates a highly specific role for the pPVT in unconditioned defensive responses. The authors use a novel, combined SEFL and SEFR paradigm to test both conditioned and unconditioned responses in the same animal. Next, a c-fos mapping experiment showed enhanced PVT activity in the stress group when exposed to the novel tone. No other regions showed differences. Fiber photometry measurements in pPVT showed enhancement in response to the novel tone in the stressed but not nonstressed groups. Importantly, there were also no effects when calcium measurements were taken during conditioning. Using DREADDS to bidirectionally manipulate global pPVT activity, inhibition of the PVT reduced tone freezing in stressed mice while stimulation increased tone freezing in non-stressed mice.

      Strengths:

      A major strength of this research is the use of a multi-dimensional behavioral assay that delineates behavior related to both learned and non-learned defensive responses. The research also incorporates high-resolution approaches to measure neuronal activity and provide causal evidence for a role for PVT in a very narrow band of defensive behavior. The data are compelling, and the manuscript is well-written overall.

      Weaknesses:

      Figure 1 shows a small, but looks to be, statistically significant, increase in freezing in response to the novel tone in the no-stress group relative to baseline freezing. This observation was also noticed in Figures 2 and 7. The tone presented is relatively high frequency (9 kHz) and high dB (90), making it a high-intensity stimulus. Is it possible that this stimulus is acting as an unconditioned stimulus?

      We thank the reviewer for this insightful comment. In our view, the freezing behavior elicited by the tone reflects an unconditioned response; accordingly, the tone functions as an unconditioned stimulus. Indeed, in our data we found a modest increase in freezing in the no-stress group during the tone presentation relative to baseline (Figures 1, 2, and 7). This effect, however, was considerably smaller in magnitude than the robust freezing observed in stressed mice. We conclude that prior footshock stress enhances the unconditioned tone response.

      In addition, in the final experiment, the tone intensity was increased to 115 dB, and the freezing % in the non-stressed group was nearly identical (~20\%) to the non-stressed groups in Figures 1-2 and Figure 7. It seems this manipulation was meant as a startle assay (Pantoni et al., 2020).

      We appreciate the opportunity to clarify this aspect of the model. In Figure 7, the rationale for selecting a tone amplitude to 115 dB was not to conduct a startle assay. Instead, we sought to determine whether chemogenetic inhibition of the pPVT influenced tone-elicited unconditioned fear in stress naïve mice. Given our prior experiments demonstrating that a 90 dB tone elicits relatively low levels of freezing in non-stressed groups, we increased the tone amplitude to 115 dB in an attempt to elicit a more robust freezing response that would be sufficient to detect meaningful group differences (i.e., prevent a floor effect). As noted by the reviewer, the 115 dB tone yielded moderate levels of freezing behavior. Although freezing levels were not very high, we believe they were sufficient to avoid a floor effect. There was no effect pPVT inhibition in this version of the task, which suggests that pPVT is preferentially engaged after stress. Future studies that identify tone parameters capable of eliciting high levels of freezing will be necessary to further strengthen this finding.

      Because the auditory perception of mice is better at high frequencies (best at ~16 kHz), would the effect seen be evident at a lower dB (50-55) at 9 kHz? If the tone was indeed perceived as “neutral,” there should be no freezing in response to the tone. This complicates the interpretation of the results somewhat because while the authors do admit the stimulus is loud, would a less loud stimulus result in the same effect? Could the interaction observed in this set of studies require not a novel tone, but rather a highintensity tone that elicits an unconditioned response?

      Within our framework, it is important to emphasize that tone intensity (amplitude and frequency), rather than the perceived novelty of the stimulus, is the primary determinant of unconditioned freezing behavior. Moreover, numerous studies have demonstrated that auditory stimuli have the capacity to elicit unconditioned fear responses, as in the case of pseudoconditioning. Accordingly, we agree with the reviewer that decreasing the tone amplitude from 90 dB to 50 dB would diminish the unconditioned freezing response. For example, Kamprath and Wotjak (2004) demonstrated that stress-naïve mice exposed to a 95 dB tone exhibited significantly greater levels of freezing compared to those exposed to an 80 dB tone. This graded effect of tone amplitude on unconditioned freezing was also observed in mice previously exposed to footshock stress. Notably, the authors also reported a plateau effect, such that increases in tone amplitude beyond 95 dB did not further elevate freezing levels. As it relates to our findings, this plateau effect may explain the rather modest changes in freezing behavior that we observed between the 90 dB and 115 dB tone.

      Along these same lines, it appears there may be an elevation in c-fos in the PVT in the non-stress tone test group versus the no-stress home cage control, and overall it appears that tone increases c-fos relative to homecage. Could PVT be sensitive to the tone outside of stress? Would there be the same results with a less intense stimulus?

      Indeed, as the reviewer noted, we observed an increase in PVT c-Fos expression in non-stressed animals exposed to the SEFR tone test relative to homecage controls. The finding is consistent with previous reports demonstrating that PVT neurons are robustly activated by salient stimuli and regulate properties of arousal (Penzo and Gau, 2022). Moreover, the PVT has been shown to exhibit neuronal activity responses that are scaled to stimulus intensity. For example, PVT neurons display increased firing rates in response to a tail shock compared to an air puff (Zhu, 2018). Thus, it is conceivable that a less intense stimuli would evoke a diminished level of c-Fos expression.

      I would also be curious to know what mice in the non-stressed group were doing upon presentation of the tone besides freezing. Were any startle or orienting responses noticed?

      We thank the reviewer for raising this important question. Regarding startle responses, we have found that our standard 90 dB, 9 kHz tone parameter elicits similar degrees of startle between stressed and non-stressed mice (data unpublished). However, Golub et al. (2009) observed effects of prior footshock stress on acoustic startle. Further investigation of behavioral responses expressed during the tone is certainly warranted.

      Reviewer #2 (Public review):

      Summary:

      Nishimura and colleagues present findings of a behavioral and neurobiological dissociation of associative and nonassociative components of Stress Enhanced Fear Responding (SEFR).

      Strengths:

      This is a strong paper that identifies the PVT as a critical brain region for SEFR responses using a variety of approaches, including immunohistochemistry, fiber photometry, and bidirectional chemogenetics. In addition, there is a great deal of conceptual innovation. The authors identify a dissociable behavior to distinguish the effects of PVT function (among other brain regions).

      Weaknesses:

      (1) The authors find a lack of difference between the Stress and No Stress groups in pPVT activity during SEFL conditioning with fiber photometry but an increase in freezing with Gq DREADD stimulation. How do authors reconcile this difference in activity vs function?

      The reviewer points out a curious dissociation. Fiber photometry showed no effect of prior stress on the PVT response during single-shock contextual fear conditioning; however, Gq DREADD stimulation of PVT led to increased postshock freezing during this session. We don’t have a definitive explanation for this dissociation, but we wish to emphasize two relevant points. The first is that in our experience, post-shock freezing during the one-shock contextual fear conditioning session is modest, variable, and an unreliable predictor of long-term contextual fear. Thus, we are hesitant to draw firm conclusions from these data. Second, we did not observe differences in freezing during the SEFL context test, indicating that stimulation of pPVT during conditioning is not sufficient to elicit long-term enhancement of conditioned fear (i.e., SEFL). This suggests that the acute freezing response following shock exposure is mechanistically distinct from expression of conditioned contextual fear. Clearly, further research will be needed to clarify the conditions under which PVT activity regulates / does not regulate freezing.

      (2) Because the PVT plays a role in defensive behaviors, it would be beneficial to show fiber photometry data during freezing bouts vs exclusively presented during tone a shock cue presentations.

      We appreciate the reviewer's suggestion. Unfortunately, freezing data are not available for the fiber photometry experiment because the fiber optic patch cable interfered with mouse activity. We now acknowledge this as a limitation in the paper (line #202).

      (3) Similar to the above point, were other defensive behaviors expressed as a result of footshock stress or PVT manipulations?

      In addition to freezing behavior and locomotor activity in the open field, we examined the time and distance spent in the center of the open field arena. Consistent with our previous report (Hassien, 2020), we did not observe significant group differences between stress conditions, nor did we detect differences across the various experiential manipulations. We did not examine other defensive behaviors in this study. Ongoing research in the lab is examining a broader range of defensive behaviors in this paradigm.

      (4) Tone attenuation in Figure 8 seems to be largely a result of minimal freezing to a 115-dB tone. While not a major point of the paper, a more robust fear response would be convincing.

      Although our data indicate that DREADD-mediated inhibition of the pPVT did not attenuate freezing in non-stressed mice, we agree with the reviewer’s assessment that the 115 dB tone elicited only minimal freezing. Therefore, we remain open to the possibility that higher baseline levels of freezing might reveal a significant behavioral effect. We found it challenging to identify a decibel range that reliably evokes robust freezing in non-stressed mice. Future studies could explore varying tone frequencies to achieve a stronger freezing response.

      (5) In the open field test, the authors measure total distance. It would be beneficial to also show defensive behavioral (escape, freezing, etc) bouts expressed.

      We agree this would be valuable information, and we have noted it as a future direction in the discussion.

      (6) The authors, along with others, show a behavioral and neural dissociation of footshock stress on nonassociative vs associative components of stress; however, the nonassociative components as a direct consequence of the stress seem to be necessary for enhancement of associative aspects of fear. Can authors elaborate on how these systems converge to enhance or potentiate fear?

      We appreciate the reviewer for recognizing this important point regarding the mechanistic relationship between nonassociative fear sensitization and associative fear learning that occurs following footshock stress. At present, the majority of research on this topic has been conducted using the SEFL paradigm.

      At the behavioral level, previous studies indicate that manipulations that interfere or attenuate associative fear memory of the footshock stress event fail to block nonassociative fear sensitization. For example, both SEFL and SEFR persist in animals that have successfully undergone fear extinction training in the footshock stress context (Rau et al., 2005; Hassien et al., 2020). Furthermore, reports also find that infantile or pharmacological amnesia of the footshock stress memory does not occlude the emergence of SEFL (Rau et al., 2005; Poulos et al., 2014). Taken together, associative fear memory of the footshock stress event does not appear to be necessary for fear sensitization.

      If and how the associative and nonassociative mechanisms interact is an interesting question that we are currently investigating. PVT has direct projections to the central and basolateral amygdala, regions well known to mediate conditioned fear acquisition and expression (Penzo et al., 2015). Why PVT activity does not modulate conditioned fear in our hands is intriguing. PVT is a heterogeneous structure with a variety of projections (e.g., Shima et al., 2023), and it is possible that the PVT-Amygdala projections are not hyperactive in our paradigm. As we alluded above, further research will be needed to understand why stress-induced PVT hyperactivity affects some forms of fear and not others.

      (7) In the discussion, authors should elaborate on/clarify the cell population heterogeneity of the PVT since authors later describe PVT neurons as exclusively glutamatergic.

      The reviewer is correct that additional explanation of PVT cellular heterogeneity is warranted. We now provide clarity on this point in the discussion.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Nishimura et al. examines the behavioural and neural mechanisms of stress-enhanced fear responding (SEFR) and stress-enhanced fear learning (SEFL). Groups of stressed (4 x shock exposure in a context) vs non-stressed (context exposure only) animals are compared for their fear of an unconditioned tone, and context, as well as their learning of new context fear associations. Shock of higher intensity led to higher levels of unlearned stress-enhanced fear expression. Immediate early gene analysis uncovered the PVT as a critical neural locus, and this was confirmed using fiber photometry, with stressed animals showing an elevated neural signal to an unconditioned tone. Using a gain and loss of function DREADDs methodology, the authors provide convincing evidence for a causal role of the PVT in SEFR.

      Strengths:

      (1) The manuscript uses critical behavioural controls (no stress vs stress) and behavioural parameters (0.25mA, 0.5mA, 1mA shock). Findings are replicated across experiments.

      (2) Dissociating the SEFR and SEFL is a critical distinction that has not been made previously. Moreover, this dissociation is essential in understanding the behavioural (and neural) processes that can go awry in fear.

      (3) Neural methods use a multifaceted approach to convincingly link the PVT to SEFR: from Fos, fiber photometry, gain and loss of function using DREADDs.

      Weaknesses:

      No weaknesses were identified by this reviewer; however, I have the following comments:

      A closer examination of the Test data across time would help determine if differences may be present early or later in the session that could otherwise be washed out when the data are averaged across time. If none are seen, then it may be worth noting this in the manuscript.

      Given the sex/gender differences in PTSD in the human population, having the male and female data points distinguished in the figures would be helpful. I assume sex was run as a variable in the statistics, and nothing came as significant. Noting this would also be of value to other readers who may wonder about the presence of sex differences in the data.

      We appreciate the reviewer’s thoughtful feedback and have addressed these points as follows: In the methods section, we clarify that pre-tone and post-tone freezing behavior was averaged because we did not detect a significant effect of time across all experiments (line #474). With regards to sex differences, we clarify in the methods section that we did not detect sex as a statistically significant variable across tests (line #443). In addition, we have revised the figures to denote male and female subjects separately.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Following discussion, the reviewers and editors agreed that the strength of the evidence could be updated to compelling, provided the comments were adequately addressed.

      Reviewer #1 (Recommendations for the authors):

      (1) In the discussion around line 333, there is also data indicating a time-dependent role for PVT in conditioned fear (Quinones-Laracuente 2021; Do-Monte 2015).

      We agree with the reviewer’s assessment and have revised the discussion accordingly (line #364).

      (2) The 129S6/SvEvTac mouse exhibits impaired fear extinction but intact discrimination (Temme, 2014). Was there any rationale for using this line of mice?

      The reviewer is correct that additional explanation is warranted. We have amended the manuscript to include additional rationale for using the 129S6/SvEvTac mouse strain as well as address the findings of Temme, 2014 as they relate to our study (line #94).

      (3) Was there any reason why there were no c-fos results in the PAG and IPBM? You discuss those brain regions and their importance in the circuit in the discussion.

      In the current manuscript, we do show c-fos results for the lPAG, dlPAG, and lPBN (Figure 3). We highlight in the discussion the relevance of these regions in the fear circuit.

      (4) Take a look at Sillivan et al., 2018 for an additional reference in the introduction (around lines 61).

      We thank the reviewer for their suggestion and have included the reference in the introduction (line #63).

      (5) Can the authors show the c-fos data for aPVT and pPVT separately? The authors focus on pPVT for later manipulations, but the c-fos data is collapsed. Along these same lines, were there any corrections for multiple comparisons across the brain regions? While the subsequent experiments firmly support a role for pPVT in unlearned stressinduced fear response, a proper correction for multiple comparisons is warranted.

      We have revised Figure 3 to include c-fos expression for both the anterior and posterior PVT separately. To correct for multiple comparisons, we conducted twoway ANOVA (Brain Region X Group) with Tukey's-corrected posthoc tests detailed in methods section (line #577).

      (6) Do the authors provide rationale for why they began to focus specifically on pPVT versus aPVT?

      We agree that additional clarity is warranted. We have provided additional rationale for selecting pPVT as our primary focus in the results section (line #197).

      (7) Lines 298-337 of the discussion could be shortened. This long preamble is a summary of the results.

      We agree with the reviewer’s assessment and have revised the manuscript accordingly.

      Reviewer #2 (Recommendations for the authors):

      Additional analyses for fiber photometry and open field data to probe for PVT-related changes in defensive behaviors beyond freezing.

      As stated above, we agree with the reviewer that additional behavioral analyses would be valuable. Unfortunately, such measures are not available for the current experiment.

      Reviewer #3 (Recommendations for the authors):

      As mentioned in the weaknesses, just checking for differences across time on the Tests, highlighting the M vs. F datapoints in the figures, and reporting if there are sex differences in any of the analyses.

      In the revised manuscript, we have included separate male and female data points for each figure. In addition, we provided clarity in the methods section reporting a lack of statistically significant sex differences across each experiment (line #443).

    1. eLife Assessment

      This valuable study shows that targeted mutations in specific cassava eIF4E-family genes can reduce infection and disease symptoms caused by cassava brown streak viruses. Through systematic knockouts across the eIF4E gene family, the authors provide convincing evidence that certain double mutants show resistance-associated outcomes. Overall, the work supports practical routes to engineer cassava with improved resistance and clarifies which host factors are relevant for this disease.

    2. Reviewer #1 (Public review):

      It is well established that many potivirids (viruses in the Potiviridae family) particularly potyviruses (viruses in the Potyvirus genus) recruit (selectively) either eIF4E or eIF(iso)4E, while some others can use both of them to ensure a successful infection. CBSD caused by two potyvirids, i.e., ipomoviruses CBSV and UCBSV severely impedes cassava production in West Africa. In a previous study (PBI, 2019), Gomez and Lin (co-first authors), et al. reported that cassava encodes five eIF4E proteins including eIF4E, eIF(iso)4E-1, eIF(iso)4E-2, nCBP-1 and nCBP-2, and CBSV VPg interacts with all of them (Co-IP data). Simultaneous CRISPR/Cas9-mediated editing of nCBp-1 and -2 in cassava significantly mitigate CBSD symptoms and incidence. In this study, Lin et al further generated all five eIF4E family single mutants as well as both eIF(iso)4E-1/-2 and nCBP-1/-2 double mutants in a farmer-preferred casava cultivar. They found that both eIF(iso)4E and nCBP double mutants show reduced symptom severity and the latter is of better performance. Analysis of mutant sequences revealed one important point mutation L51F of nCBP-2 that may be essential for the interaction with VPg. The authors suggest that introduction of L51F mutation into all five eIF4E family proteins may lead to strong resistance. Overall I believe this is an important study enriching knowledge about eIF4E as a host factor/susceptibility factor of potyvirids and proposing new information for the development of high CBSD resistance in cassava. I suggest the following two major comments for authors to consider for improvement:

      (1) As eIF(iso)4e-1/-2 or nCBP-1/-2 double mutans show resistance, why not try to generate a quadruple mutant? I believe it is technically possible through conventional breeding.

      (2) I agree that L51F mutation may be important. But more evidence is needed to support this idea. For example. Authors may conduct quantitative Y2H assay on binding of VPg to each of eIF4E (L51F) mutants. Such data may

      Comments on revisions:

      (1) The authors explained it is technically challenging to generate quadruple mutant.<br /> (2) The authors have properly addressed my comment 2.<br /> I do not have more concerns.

    3. Reviewer #2 (Public review):

      Eukaryotic translation initiation factor 4E (eIF4E) acts as a key susceptibility factor for members of the Potyviridae family, and knockout of eIF4E family members enables the generation of corresponding virus-resistant germplasm. In this study, the authors performed systematic knockout experiments on the members of eIF(iso)4E and nCBP clades in cassava, which demonstrated that simultaneous knockout of the eIF4E-family genes nCBP-1 and nCBP-2 in the cultivar 60444 significantly attenuates Cassava Brown Streak Disease (CBSD) root symptoms and reduces viral titer. The authors further screened for CBP mutants without VPg-binding activity and identified the nCBP-2 L51F mutant, which loses the ability to interact with VPg. In the revised manuscript, the authors have addressed most of my previous questions and revised the relevant content accordingly. Overall, this study is a well-performed work, with extensive explorations carried out particularly in the gene knockout of members of eIF(iso)4E and nCBP. It provides an important value for investigating the functions of eIF(iso)4E and nCBP clade members in the development of disease-resistant germplasm, and the identified nCBP-2 L51F mutant also offers a crucial gene editing site target for the generation of virus-resistant cassava germplasm in future.

    4. Reviewer #3 (Public review):

      In the manuscript, the authors generated several mutant plants defective in the eIF4E family proteins and detected cassava brown streak viruses (CBSVs) infection in these mutant plants. They found that CBSVs induced significantly lower disease scores and virus accumulation in the double mutant plants. Furthermore, they identified important conserved amino acid for the interaction between eIF4E protein and the VPg of CBSVs by yeast two hybrid screening. The experiments are well designed, however, some points need to be clarified:

      (1) The authors reported that the ncbp1 ncbp2 double mutant plants were less sensitive to CBSVs infection in their previous study, and all the eIF4E family proteins interact with VPg. In order to identify the redundancy function of eIF4E family proteins, they generated mutants for all eIF4E family genes, however, these mutants are defective in different eIF4E genes, they did not generate multiple mutants (such as triple, quadruple mutants or else) except several double mutant plants, it is hard to identify the redundant function eIF4E family genes.

      (2) The authors identified some key amino acids for the interaction between eIF4E and VPg such as the L51, it is interesting to complement ncbp1 ncbp2 double mutant plants with L51F form of eIF4E and double check the infection by CBSVs.

      Comments on revisions:

      The reviewer understand Cassava is not a model plant, it is hard for the authors to generate multiple genetic mutant plants for experiments, so nothing was done to respond to the comments raised by the reviewer.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      It is well established that many potivirids (viruses in the Potiviridae family), particularly potyviruses (viruses in the Potyvirus genus), recruit (selectively) either eIF4E or eIF(iso)4E, while some others can use both of them to ensure a successful infection. CBSD caused by two potyvirids, i.e., ipomoviruses CBSV and UCBSV, severely impedes cassava production in West Africa. In a previous study (PBI, 2019), Gomez and Lin (co-first authors), et al. reported that cassava encodes five eIF4E proteins, including eIF4E, eIF(iso)4E-1, eIF(iso)4E-2, nCBP-1 and nCBP-2, and CBSV VPg interacts with all of them (Co-IP data). Simultaneous CRISPR/Cas9-mediated editing of nCBp-1 and -2 in cassava significantly mitigates CBSD symptoms and incidence. In this study, Lin et al further generated all five eIF4E family single mutants as well as both eIF(iso)4E-1/-2 and nCBP-1/-2 double mutants in a farmer-preferred casava cultivar. They found that both eIF(iso)4E and nCBP double mutants show reduced symptom severity, and the latter is of better performance. Analysis of mutant sequences revealed one important point mutation, L51F of nCBP-,2 that may be essential for the interaction with VPg. The authors suggest that the introduction of the L51F mutation into all five eIF4E family proteins may lead to strong resistance. Overall I believe this is an important study enriching knowledge about eIF4E as a host factor/susceptibility factor of potyvirids and proposing new information for the development of high CBSD resistance in cassava. I suggest the following two major comments for authors to consider for improvement:

      (1) As eIF(iso)4e-1/-2 or nCBP-1/-2 double mutants show resistance, why not try to generate a quadruple mutant? I believe it is technically possible through conventional breeding.

      (2) I agree that L51F mutation may be important. But more evidence is needed to support this idea. For example, the authors may conduct a quantitative Y2H assay on the binding of VPg to each of the eIF4E (L51F) mutants. Such data may add as additional evidence to support your claim.

      We thank the reviewer for their overall assessment. Regarding investigating a quadruple mutant, we agree that this is a logical next step to investigate. A conventional breeding approach with existing mutant lines, however, is problematic for several reasons; 1) cassava does not flower where this work was conducted, and 2) cassava is subject to inbreeding depression, resulting in both low seed set and considerable heterogeneity among progeny that do arise. Editing existing double mutants is possible, but would require a significant, multi-year investment to produce embryogenic tissue from existing lines and generate the new lines. Cassava has practical limits as a non-model plant. Given these constraints, we conclude that investigating a quadruple mutant is beyond the scope of the current work.

      For investigating the HPL to HPF mutation in other cassava eIF4E-family proteins and their interaction with VPg in yeast, we have now completed this experiment and included the data in the paper. Notably we find that generating this mutant for eIF(iso)4E-2 attenuates VPg interaction without impairing eIF(iso)4E-2 accumulation, while similarly mutating nCBP-1 and eIF(iso)4E-1 results in total and reduced protein accumulation, respectively.

      Reviewer #2 (Public review):

      Summary:

      The authors generated single and double knockout mutants for the eIF4E family members eIF4E, iso4E1, iso4E2, nCBP1, and nCBP2 in cassava. While a single knockout of these eIF4E genes did not abolish viral infection, the nCBP1/nCBP2 double knockout mutant displayed the weakest symptoms and viral infection. Through yeast two-hybrid screening, the nCBP-2 L51F mutant was identified, and the mutant was unable to interact with VPg, yet the nCBP-2 L51F mutant could complement the eIF4E yeast mutant. This L51F is a potentially important editing site for eIF4E.

      Strengths:

      This study systematically generated single and double knockout mutants for the eIF4E family members and investigated their antiviral activity. It also identified a L51F site as a potentially important antiviral editing site in eIF4E, however, its antiviral genetic evidence remains to be validated.

      Weaknesses:

      (1) The symptoms of the iso4E1 & iso4E2 double-knockout mutant are slightly alleviated, and those of the nCBP1 & nCBP2 double-knockout mutant are alleviated the most. If the iso4E1 & iso4E2 and nCBP1 & nCBP2 mutants are crossed to obtain quadruple-knockout mutant plants, whether the resistance of the quadruple mutant will be more excellent should be further investigated.

      (2) Although the yeast two-hybrid identified the nCBP-2 L51F mutant, there is no direct biological evidence demonstrating its antiviral function. While the 6-amino acid deletion mutant (including L51F) showed attenuated symptoms, this deletion might be sufficient to cause loss-of-function of nCBP-2. These indirect observations cannot definitively establish that the L51F mutation specifically confers antiviral activity.

      (3) Given that nCBP-2 can rescue yeast eIF4E mutants, introducing wild type and L51F nCBP2 into the Arabidopsis iso4e mutant viral infectious clones into yeast systems could clarify whether the L51F mutation (and the same mutations in eIF4E, iso4E1, iso4E2) abrogates their roles as viral susceptibility factors - critical genetic evidence currently missing.

      We sincerely thank the reviewer for their constructive feedback.

      With regards to investigating a quadruple eIF4E mutant, please see our response to reviewer 1.

      The reviewer makes a salient point regarding the nCBP-2 L51F and K45_L51del mutations. Ideally, complementation of the ncbp double mutant with nCBP-2 L51F, followed by viral challenge, would address this question. However, the practical limitations, as noted in our response to reviewer 1, make this difficult within the context of this manuscript. We acknowledge that this is a limitation of our study and have been cautious in not overstating our conclusions.

      Reviewer #3 (Public review):

      In the manuscript, the authors generated several mutant plants defective in the eIF4E family proteins and detected cassava brown streak viruses (CBSVs) infection in these mutant plants. They found that CBSVs induced significantly lower disease scores and virus accumulation in the double mutant plants. Furthermore, they identified important conserved amino acid for the interaction between eIF4E protein and the VPg of CBSVs by yeast two hybrid screening. The experiments are well designed, however, some points need to be clarified:

      (1) The authors reported that the ncbp1 ncbp2 double mutant plants were less sensitive to CBSVs infection in their previous study, and all the eIF4E family proteins interact with VPg. In order to identify the redundancy function of eIF4E family proteins, they generated mutants for all eIF4E family genes, however, these mutants are defective in different eIF4E genes, they did not generate multiple mutants (such as triple, quadruple mutants or else) except several double mutant plants, it is hard to identify the redundant function eIF4E family genes.

      (2) The authors identified some key amino acids for the interaction between eIF4E and VPg such as the L51, it is interesting to complement ncbp1 ncbp2 double mutant plants with L51F form of eIF4E and double check the infection by CBSVs.

      We thank the reviewer for their assessment and feedback.

      Regarding analysis of higher-order mutants, please see our response to Reviewer #1’s public review.

      For investigation of nCBP-2 L51F in planta, please see our response to Reviewer #2’s public review.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Since nCBP2 can complement a yeast mutant, it indicates that nCBP2 can also complement Arabidopsis. Wild-type nCBP2 should be introduced into the Arabidopsis iso4e mutant to determine whether it can complement Arabidopsis iso4e and whether the virus can re-establish the infection. The nCBP2 L51F mutant should also be introduced into the Arabidopsis iso4e mutant to see if this mutant fails to re-establish the virus infection. Similarly, eIF4E, iso4E1, iso4E2, nCBP1, etc., should be introduced into the Arabidopsis iso4e mutant to determine whether they can truly complement the virus-infected mutant Arabidopsis, while the L51F mutants cannot.

      Arabidopsis encodes multiple eIF4E proteins, an nCBP protein, and an eIF(iso)4E protein, and knocking out the eIF(iso)4e gene specifically confers resistance to TuMV. Introducing cassava nCBP-2 into arabidopsis eif(iso)4e mutants is unlikely to restore TuMV susceptibility. Because TuMV belongs to a different genus than CBSV, we used the TuMV VPg interaction with arabidopsis eIF(iso)4E to test the generality of mutating the eIF4E HPL motif to HPF potyvirid VPg-eIF4E interaction. However, since this mutation disrupts arabidopsis eIF(iso)4E’s endogenous translation initiation activity in yeast, this mutant protein is not worth pursuing further. In contrast, cassava eIF(iso)4E-2 L27F retains translation initiation activity and has reduced interaction with CBSV VPg by quantitative yeast two-hybrid. It would be interesting to see if this particular mutant protein could interact with TuMV VPg, and if not, would then be worth testing for the ability to restore TuMV susceptibility in Arabidopsis eif(iso)4e. Unfortunately, we are unable to pursue these experiments at this time.

      (2) Given that nCBP-2 can complement yeast eIF4E mutants, the authors may introduce viral infectious clones into yeast systems expressing nCBP-2 variants to determine whether nCBP-2 supports viral translation. This approach could further clarify whether the L51F mutation (and mutations in eIF4E, iso4E1, so4E2) abolishes their roles as viral susceptibility factors.

      This is an intriguing suggestion, but challenging for a few reasons. First, an infectious clone of CBSV Naliendele isolate does not exist, although we have tried to construct one, without success. There is also no guarantee such a clone could infect yeast. We are aware of yeast being used as a surrogate host for a few plant viruses, such as Tomato bushy stunt virus and Brome mosaic virus but are unaware of a similar system for any potyvirid. Developing such a system would undoubtedly require a significant investmentbeyond the scope of this manuscript.

      (3) Phenotypes of all mutant lines with and without virus inoculation in Table 1 should be presented.

      Photos of un-challenged mutants are included in supplemental figures. Representative storage root symptoms for all lines have now been included in the supplemental figures as well.

      (4) In Figure 1c, the results of viral accumulation assays should be presented for additional mutant lines beyond ncbp-1, ncbp-2, ncbp-1 nCBP-2 K45_L51del, and ncbp-1 ncbp-2, particularly eif(iso)4e-1 & eif(iso)4e-2#172 and eif(iso)4e-1 & eif(iso)4e-2#92.

      We have previously found that subtle reductions in visible disease do not always translate to clear differences in viral titer when analyzed by qPCR (Gomez et al., 2018). As such, we focused on lines with the strongest phenotypes in viral titer experiments.

      (5) Inconsistently, the ncbp-1 nCBP-2 K45_L51del line showed reduced symptoms compared to wild-type in Figures 1a and 1b, yet viral accumulation levels were comparable to wild-type in Figure 1c. The explanations for this discrepancy are required.

      Please see our response to (4).

      (6) Root phenotypic data for all mutant lines shown in Figure 1d should be presented.

      Please see our response to (3).

      (7) In Figure 2b, GST control pulldowns showed detectable proteins. This background signal requires explanation.

      It is not uncommon to see weak signal in bead or tag-only negative control pulldown and IP reactions. Importantly, we see strong enrichment of VPg relative to these controls in our experimental samples.

      (8) Contrary to the abstract's implication, Figure 5c indicates that the L51F mutation impacts yeast growth, suggesting potential pleiotropic effects of this mutant.

      We interpret the results to be that nCBP2 L51F does not fully complement the yeast eif4e mutation, rather than nCBP2 L51F impacts yeast growth.

      (9) In vivo protein-protein interaction assays (e.g., co-immunoprecipitation) should be performed to complement the in vitro GST pull-down data in Figure 6.

      We appreciate the desire for these experiments and agree that they would bolster our Y2H and pulldown data. Unfortunately, we are not able to complete these experiments at this time, so have been careful not to over interpret the data.

      (10) Since the AteIF(iso)4E L28F mutant fails to complement yeast, the authors should test whether introducing the L51F mutation into other family members (eIF4E, iso4E1, iso4E2, nCBP1) preserves their yeast complementation capacity.

      This has now been done for additional cassava eIF4E-family proteins.

      (11) Indicate molecular weight sizes in all Western blots.

      This was done. As differences in buffer formulations between gel types can affect the mobility and thus apparent molecular weight of markers, we have provided in the methods section SDS-PAGE gel chemistries and specific protein ladders used in this study. Importantly we note in our experience that certain markers, in relation to proteins of interest, can vary up to 15 kDa between gel chemistries.

      (12) Figures 4d,e are not provided in the paper. Based on the content of the paper, the description in the paper likely corresponds to Figures 5c, d.

      Thank you for catching this error, this has now been corrected.

    1. eLife Assessment

      This useful study uses in vitro electrophysiology, projection-specific chemogenetics, and different behavioural tasks to investigate the role of Vglut1-expression in basolateral amygdala neurons projecting to the nucleus accumbens in aspects of motivated behaviour. Although the manuscript is clearly written, the strength of the evidence supporting claims about the role of this pathway is incomplete. Currently, the work may be of interest to some behavioural neuroscientists, but additional controls and further clarification of specific analyses would strengthen their broader significance.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to determine whether reward conditioning increases inhibitory regulation of Vglut1-expressing BLA→NAc neurons and whether this inhibition shapes motivated behaviors. They used whole-cell electrophysiology to measure conditioning-induced changes in synaptic inhibition and intrinsic excitability. Subsequently, they employed dual-recombinase chemogenetics to selectively inhibit this projection during behavioral tasks. The goal was to test whether suppressing the activity of Vglut1-expressing neurons would alter reward learning, valuation, and fear discrimination.

      Strengths:

      (1) The combination of electrophysical and behavioral assessments to dissect the function of Vglut1-expressing BLA→NAc neurons.

      (2) The various behavioral assessments employed to determine the effect of silencing Vglut1-expressing BLA→NAc neurons.

      Weaknesses:

      (1) The introduction underscores the importance of molecular identity and population dynamics when studying the function of BLA→NAc neurons. Yet, the experiments and manuscript provide little to no information about the Slc17a7-expressing population under study. In fact, there is no evidence that the viral manipulations targeted this neuronal population (e.g., extent and specificity of viral transduction). Regarding population dynamics, evidence is meant to be provided by Experiment 1, but the results are difficult to interpret. The control mice were not exposed to the conditioning chambers, stimuli, or food rewards. These exposures may have been sufficient to produce the changes observed in the experimental mice (i.e., they may have had nothing to do with cue-reward learning). Further, the experiments provide no evidence that the observed effects result from prolonged conditioning, since there is no group receiving a single conditioning session.

      (2) The dual-recombinase approach employed does not permit conclusions about the BLA→NAc pathway specifically, because the effects of silencing NAc-projecting BLA neurons could be driven by modulation of activity in other brain regions innervated by these same neurons through collateral projections. This limitation must be clearly acknowledged by the authors, and the manuscript should refrain from making definitive claims about the BLA→NAc pathway per se.

      (3) The experimental parameters and measures used for cued-reward conditioning complicate any firm conclusions about the observed effects. The use of a 2-second cue provides a minimal temporal window to monitor cue-related behavior. This issue is masked in the data presented because what is labeled as "cued responses" includes responses that occur after the cue has terminated and overlap with those triggered by sucrose delivery itself. These post-cue responses cannot be classified as cue-reward responses since the cue is no longer present; they are reward-related responses. Perhaps the z-score calculation addresses this issue, but this is difficult to assess since the authors do not explain how this calculation was performed or what baseline period was used.

      (4) Throughout the manuscript, there is conceptual confusion regarding the fundamental distinction between Pavlovian (cue-outcome) and instrumental (action-outcome) responses. It is unclear why the authors aimed to study both types of conditioning, but greater caution is necessary when interpreting the findings labeled as "instrumental conditioning." First, no evidence is provided that initiation port entries constitute an instrumental or goal-directed response rather than a Pavlovian approach behavior. Second, many of the conclusions are based on analyzing reward port entries-a Pavlovian conditioned response identical to that measured in the cued-reward conditioning task. This conflation undermines claims about instrumental learning.

      (5) The data from the reward valuation and reversal learning experiments are difficult to interpret. The animals are not tested under extinction conditions (with the flavors present but without reward delivery), making it impossible to establish whether their behavior relies on learned associations or ongoing reinforcement. Further, the behavior generated by these procedures appears unreliable, with substantial inconsistencies across figures (compare Figure 4A with Figures 5B, C, G, H).

      (6) The results from the auditory fear discrimination procedure are also difficult to interpret. No conditioning data are presented, and the "enhanced discrimination" could simply reflect reduced overall responding to the CS-. It is not clear how this selective impact on the CS- fits with the authors' conclusions about enhanced associative salience (noting that the meaning of the latter remains obscure).

      (7) The manuscript contains several statements about behavioral outcomes that are not supported by statistical evidence. The list provided here is non-exhaustive, and the authors should carefully correct any conclusions that lack statistical support.<br /> a) Line 294 (Figure 2F): the control mice gradually reached a similar performance to the experimental mice.<br /> b) Lines 301-303 (Figures 3D-F): inhibition strengthened the temporal association between initiation and reward consumption.<br /> c) Lines 337-339 (Figure 4A): both groups increased their preference for 10% sucrose.

      (8) The manuscript suffers from a lack of clarity and/or transparency about experimental parameters and data. Clarifications about the following would be necessary for the reader to confidently interpret the findings.<br /> a) Number of animals of each sex in each group.<br /> b) Number of animals excluded and justification.<br /> c) Analysis of sex differences.<br /> d) A clarification on the control group used in the electrophysiological experiment.<br /> e) Whether the same animals progress through multiple behavioral paradigms or if separate cohorts are used.<br /> f) All protocols should be described in the methods section.

      Without clarifying the points made above, a reliable and fair assessment of the discussion is impossible.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Mercer et al. focused on Vglut1 neurons in the BLA that project to the NAc. They characterized reward conditioning-induced electrophysiological changes in these neurons, including a decrease in membrane excitability and an increase in inhibitory synaptic inputs onto them, and showed the consequences of reducing their activity in enhancing reward-seeking behaviors. Considering that Vglut1 neurons represent the majority of the BLA→NAc projecting neurons, the findings are important for potentially correcting some of the previous biases in understanding the role of BLA-to-NAc projection in reward processing, for example, the notion that this projection generally promotes reward seeking by conveying reward-associated cue information.

      Strengths:

      The paper is clearly written, with results strongly supporting the main conclusions for the most part.

      There are a few weaknesses noted. For example:

      (1) They used a retrograde recombinase strategy to drive DREADD expression in these cells; however, it is not known if they project exclusively to NAc or to other brain regions as well, and whether those other potential regions may mediate the DREADDs (Gi) effects on reward seeking. They also did not show which subregions of the NAc were innervated by these neurons.

      (2) They did not assess potential changes in excitatory synaptic transmission onto these cells after reward conditioning, which leaves a gap in concluding a shift toward inhibition.

      (3) They also did not report on whether the inhibition was specific to Vglut1 neurons.

      (4) Some statistics appear missing (Figure 3D-F), not optimal (Figure 5CEF and HJK using separate t-tests rather than repeated measure ANOVA), not clear (Figure 2I on peak timing or port entry), or has low n number (Figure 1 Ephys, animal-based manipulations).

      (5) They did not clarify why they used two different doses of the DREADDs ligand Compound 21 at 0.1 or 0.3 mg/kg for different experiments.

    4. Reviewer #3 (Public review):

      Summary:

      This study by Mercer et al. investigates how inhibitory modulation of basolateral amygdala neurons expressing Vglut1 and projecting to the nucleus accumbens (Vglut1BLA→NAc) influences motivated behavior in both appetitive and aversive tasks. Using a combination of whole-cell electrophysiology, chemogenetic inhibition and behavioral tests, the authors demonstrate that (1) reward conditioning increases inhibitory synaptic input and reduces intrinsic excitability of Vglut1BLA→NAc neurons, (2) chemogenetic inhibition of these neurons enhances the number of conditioned approaches in a Pavlovian task and the number of nosepoke responses in an instrumental task, elevates reward valuation, and increases fear discrimination and (3) these effects are linked to salience assignment and associative strength, rather than altered learning or reversal flexibility. The work challenges the classical excitatory function usually reported about the BLA projection to the NAc and highlights an interesting and thought-provoking result. Nevertheless, the study does not address the potential effect of their manipulation on motoric impulsivity, nor did they provide a theoretical framework explaining this unorthodox yet interesting effect.

      Strengths:

      The study establishes the initial finding with a correlational approach that informs a causal study. They find convincingly that Pavlovian conditioning induces an increase in inhibitory inputs onto Vglut1BLA→NAc neurons that leads to reduced excitability. Causality is studied using a powerful dual recombinase chemogenetic strategy to selectively inhibit this population of Vglut1BLA→NAc neurons and determine the effect on different behavioral tasks. The use of different tasks provides convergence on their effect. This surprising finding provokes interest and will stimulate further investigation into the mechanisms underlying these effects.

      Weaknesses:

      Several important aspects of the evidence remain incomplete.

      (1) First, an important aspect of the underlying processes at play remains to be investigated. In all behavioral tasks, the authors find that their manipulation increases responding that they interpret as a facilitation of learning. However, none of the appetitive tasks include a control stimulus that could address the specificity of their effect. Given that on the Pavlovian task, responding to the CS is almost 100%, I suspect that their manipulation may induce motoric impulsivity. This aspect would clearly benefit from additional controls.

      (2) Second, I have several interrogations about the time-resolved probability of port entries (PSTHs).

      a) There is a mismatch between the results presented in Figure 1. Panel D shows a peak of responses on the PSTH at ~2s on day 5 (my remark applies to all days), suggesting that the average should lie around this value. However, panel C reports a latency to respond at ~4sec. Could the authors double-check their PSTHs?

      b) More generally, the fact that in the Pavlovian task all PSTHs show a peak at almost exactly 2 sec is quite surprising and raises questions about how they are constructed. Sure, the most salient event is the water drop occurring 2s after cue onset. Yet, if mice responded only to these drops, the peak response should occur at 2s+reaction time, which is not the case. Figure 2 shows that on the first acquisition day, responding is already centered around 2s and does not decrease with learning, except for treated animals.

      (3) Several methodological flaws are present.

      a) The authors need to report clearly the statistics. In most cases, the statistical test used is mentioned in the figure caption with a single P-value. Thus, on two-way ANOVAs, I do not know whether the P-value relates to the interaction, the main effects, or the post-hoc tests.

      b) Another important issue is related to the average time-resolved z-score probability of port entries. The bin size used, the smoothing (that is much too strong), and the baseline period used to calculate the z-score are absent from the methods.

      (4) This study reports that manipulating 70% of the glutamatergic projection to the NAc induces an effect opposed to what has been previously reported in many different studies. Such a surprising finding deserves a more elaborate discussion about the mechanism that could be at play.

    1. eLife Assessment

      In this study, the authors investigated how inference about the current task context, by weighting evidence based on surprise and uncertainty in the environment, is encoded in the cortex. Using MEG imaging and an impressive amount of analytic work based on normative decision modeling, they provided solid evidence for the involvement of the visual and parietal cortex. These results are a valuable complement to and extension of a previous study using fMRI measurements, by identifying the candidate regions that are of importance for the inference process, not just for encoding the end product.

    2. Reviewer #1 (Public review):

      This paper presents another excellent, sophisticated analysis from this group of brain-wide neural activity correlated with the tracking of belief about the generative state of a stochastic visual environment under volatile conditions. Whereas previous work focussed on the normative belief-updating dynamics mainly in brain areas related to motor planning, under conditions where the environmental state translates directly to a correct action, here, they abstract the belief-updating DV from a specific action by instead associating the environmental state to a stimulus-response mapping rule, to be used in a simple perceptual decision coming up after the environmental state cues. A decoding analysis shows that a remarkably large portion of the brain has activity correlated with the normatively evolving belief about environmental state and the evidence samples feeding into that belief. What the authors were trying to achieve, however, seems far more general than the above, namely, to study "the algorithmic and neural basis of higher-order internal decisions about behavioural context, formed under multiple sources of uncertainty", and I think that the loose implication of such grand notions (such phrasing brings to mind someone's choice to believe in God, to regulate their behaviour depending on whether they are on a rugby pitch or at church, etc, not how grating orientations link to left/right hand movements) muddies the value of the study. The authors thus may have overestimated the generality of the findings. I hope my impressions are a useful guide to focus the interpretations more.

      Strengths:

      One of the main strengths of the study is that it is a technical tour de force. As reflected in an unusually extensive methods section, the authors put an extraordinary amount of work into rigorous data collection and analysis, and all of it is described in excellent detail. The study also builds in a very valuable way on previous landmark studies on tracking of volatile environmental state linked to correct actions using MEG (Murphy et al 2021) and tracking of volatile stimulus-response mappings using fMRI (van den Brink et al 2023). Here, the environmental state is not directly linked to actions during the cues informing about the state, but instead linked to a stimulus-response mapping rule.

      Weaknesses:

      It is surprising, given this main innovation of abstracting the decision about visual position-distribution from particular actions, that the authors do not engage with the literature using EEG and fMRI to study such 'abstract,' 'motor-independent' or 'domain-general' (synonymous terms) decisions. The discussion, for example, mentions the curious lack of involvement of the frontal cortex, and the possibility of intermingled opposites being represented there; motor-independent EEG decision signals have been characterised by regressing against the absolute value of the differential belief-updating process for this very reason (e.g., see Pares-Pujolras et al 2025). Single-unit studies like Bennur & Gold (2011) have also found activity related to a decision about environmental state (non-volatile motion) even when that state does not yet translate directly to an action, and, like the current study, is instead specified in a later frame of the trial.

      Another weakness, as mentioned above, is that of overgeneralisation. It is not clear how "higher-order, internal decisions" are generally defined, and terms more concretely grounded in the paradigm at hand (as in van den Brink et al (2023)), e.g., 'tracking of environmental state dictating a sensory-motor mapping rule,' would seem more useful. Since this task tracks a belief about a sensory feature and how it maps to motor actions, it may not be as surprising a revelation that a range of sensorimotor areas correlate with it, as compared to more general, truly internal decisions about behavioural context involving no sensory input (e.g., deciding one has become hungry). Similarly, the authors paint the belief-tracking process of Murphy et al (2021) as "lower-order" and the current one as higher-order, but both cases are the same in that a hidden binary generative state needs to be inferred on a continual basis from a series of discrete spatial positions presented visually. The only difference is that in the current case, the belief about the current binary state is not transformed directly into an immediate action choice but rather utilised to map a follow-up stimulus to its appropriate action. These decisions then happen one after the other in sequence, with a contingency, but I'm not sure this constitutes a 'high-level' and 'low-level' in the way implied by the authors.

      The paper left me confused on the question of what these widespread decoding effects reflect - whether all areas directly compute and represent the normative DV in concert, or whether at least some areas reflect other processes that may correlate with the DV. Although the discussion mentions things like feedback modulation in V1, which seems to allow for the possibility that it is not directly involved in DV computation, the phrasing used ('encoding' and 'representation' and never 'secondary modulation') from Abstract to Results tends to imply direct involvement.

      Related to this, it seems that the extensive model comparison was done for behaviour, but not for the activation in each area, which may have suggested some dissociations in role - for example, for areas that showed decoding of the evidence (LLR), at least some of them may more closely correspond to the related lower-level quantity of simply spatial position itself, or the higher-level quantity of the transformed belief update (the change in prior from before to after the current cue). There is a map of areas that correlate with the difference of new vs old prior (if I understand correctly - Figure 4D), but not of areas for which activity conforms better to this belief update than to the objective LLR or location. Aside from such model-defined quantities, a critical factor is spatial attention. The authors highlight that the correlated activation of visual regions may reflect feedback modulations akin to attention in nature, but it might actually reflect attention itself, since it is plausible that subjects would pay more attention to the upper field when it is more likely that the centre of the generative distribution is up there (i.e., belief leans upwards). It seems the data could provide insight into this: If the visual cortical effects reflect a spatial attention modulation towards the likely generative source (upper/lower), then the relationship with prior, coded so that upper and lower have opposite sign, should flip in ventral versus dorsal visual cortex. Figure 4A seems like it could be positioned to answer this, but I can't fully interpret it because the prior coding is not explicit in the methods - the relevant section (lines 989-1001) refers back to the normative model description (without pointing to specific equations), which does not say what states S1 and S2 mean (upper and lower? Correct and incorrect? The former is needed to test for this spatial-specificity expected of attention). Even if there are reasons not to perform extra analyses related to the above, the impressions could guide edits to clarify what the data can and cannot say about what these DV-decoding effects reflect. Finally, it could be acknowledged that because the environmental state (upper or lower field generative source) is directly linked to stimulus-response mapping, even decoding effects that are not spatially-specific could equally reflect a representation of either one of these.

      The motivation for the decoding analysis running up to the response is not clear - what are the hypotheses here? Is the idea that if these areas truly represented the belief about the currently active context, then they should continue to do so during the response and beyond, since the next trial will begin in the same context as the previous ended? Or is this section tackling a different question? Is it that there is a potential confound in finding the significant decoding during the cue tokens, because it could be driven by the visual responses to the different spatial positions, and there are no such visual responses later at the response?

    3. Reviewer #2 (Public review):

      Summary:

      Calder-Travis et al. investigate how people form decisions about abstract rules in environments that may change over time. They show that individuals adaptively accumulate information, adjusting how much weight they give new evidence depending on how surprising or uncertain the environment is. Using whole-brain recordings (MEG), they further report that signals reflecting beliefs about the current rule are broadly distributed, particularly in visual and parietal regions. They further argue that these belief-related signals cannot be reduced to representations of momentary sensory evidence alone.

      Overall, the behavioral results convincingly demonstrate adaptive evidence accumulation consistent with the normative model. The neural data provide solid evidence for temporally structured belief-related signals that are broadly distributed across cortical regions. However, the evidence for sustained belief maintenance "across" cues and for full dissociation from gaze-related influences in visual cortex is less definitive. These issues temper, but do not undermine, the central conclusions.

      Strengths:

      A major strength of the study is the integration of normative modeling with temporally resolved neural data. The authors exploit the fine temporal scale of the recordings to examine belief updating across distinct task epochs, and they show that neural signals evolve in a manner consistent with the normative model that best captures behavior. This alignment between behavioral modeling and neural dynamics is carefully executed and conceptually coherent.<br /> Another strength is the authors' cautious interpretation of their findings. They explicitly acknowledge limitations in distinguishing between direct representation of a latent variable and neural modulation driven by that variable. This restraint strengthens the credibility of the conclusions and avoids overstatement.

      Weaknesses:

      (1) Evidence for sustained belief representation across cues

      Behaviorally, the data clearly demonstrate accumulation across sequential cues. However, the neural analyses primarily focus on responses around individual samples (from pre-cue to late post-cue windows). While these analyses demonstrate belief updating following each sample, they do not fully establish whether belief representations are maintained continuously across cues.

      Specifically, it remains unclear whether the neural representation of the prior belief is sustained from the late post-cue period of cue t-1 into the pre-cue period of cue t. Without explicit evidence of such continuity, it is difficult to conclude that the neural signals reflect a maintained belief state rather than repeated sample-locked updating processes. This distinction is important for interpreting the neural mechanism of accumulation.

      (2) Interpretation of belief signals in the visual cortex

      The claim that belief-related signals in the visual cortex cannot be explained by gaze position requires stronger support. The distribution of gaze positions across contexts appears largely non-overlapping, raising the possibility that context-related gaze biases could contribute to the observed neural effects.

      In particular, the "gaze-inconsistent" analysis based on a median split may not fully dissociate belief from gaze if the absolute gaze positions remain systematically different between contexts. As currently presented, the evidence does not fully rule out the possibility that gaze-related modulation contributes to the belief-related signal in visual areas. This affects the strength of the interpretation regarding abstract belief representation in early sensory cortex.

      (3) Clarity and transparency of task and model description

      Several aspects of the task and modeling framework would benefit from clearer exposition. The description of the noise distribution in the context cue would be easier to interpret if the overlapping distributions were visualized explicitly, allowing readers to assess how much accumulation is required versus reliance on strong individual cues. Similarly, the main text would benefit from a clearer explanation of how change point probability and uncertainty are computed (not just in Methods), as these quantities are central to the analyses and interpretation.

      In addition, temporal epochs (e.g., pre-cue, early post-cue, late post-cue) are not clearly defined with specific time ranges in the main text, making it difficult to compare across figures.

      (4) Interpretation of neural dynamics

      Several neural findings are intriguing but underinterpreted. For example, the absence of clear sensory evidence representation in early post-cue epochs in any regions (Figure 4B) is surprising and not discussed. The relative stability of belief-related signals in visual cortex compared to parietal regions (Figure 4E) is also unexpected and warrants interpretation. Additionally, the temporal dynamics of change point probability and uncertainty representations appear different from each other, but such a pattern was not described in detail.

      Clarifying these points would strengthen the interpretability of the results and help readers understand the mechanistic implications.

    4. Reviewer #3 (Public review):

      Summary

      In this study, the authors investigated how inference about the current task context is encoded in the cortex, using MEG measurements. Using the same behavioral task that was initially developed for an fMRI study to identify the loci of task context representation, the current results complement and extend the previous study by identifying the candidate regions that are important for the inference process, not just for encoding the end product. They reported widespread modulation of cortical activity by uncertainty in evidence and volatility of task context changes. In comparison, modulation correlated with the decision variable underlying the task context inference process was more restricted to the parietal and visual cortices, particularly in alpha-band activity.

      Strengths:

      (1) The normative model provides a solid computational foundation for disambiguating quantities related to decision variables from those related to task factors (e.g., uncertainty and volatility).

      (2) The MEG technique allows examination of cortical activity that is modulated by the temporally evolving decision variable.

      (3) Rigorous modeling efforts, including comparisons of well-reasoned alternative/reduced models and examinations of diagnostic features using participant-matched simulations.

      Weaknesses:

      (1) There are two major surprises in the results that raise concerns about how to interpret these data. The first is the absence of modulation of prefrontal cortical activities by prior or posterior. As the authors acknowledged, there are extensive single-neuron recording data (e.g., from the Miller group) demonstrating the presence of task rule modulation in the monkey PFC and prior representation in the PFC in the mouse study that they cited. The second surprise is that the strongest modulation of prior/posterior/evidence was almost always observed in the visual cortex, in contrast to the common embodied cognition assumption. A more elaborated discussion about these discrepancies would help contextualize the current results.

      (2) It is not clear why the effects in Figures 2D and E dipped before responses, which is not expected from any of the models. This could potentially affect the interpretation of the MEG signals in late-post-cue or pre-response periods.

      (3) The definitions of the different periods (e.g., early/late post-cue) are vague, making it hard to assess the functional relevance of the signals. For example, is the difference between the early pre-response map in Figure 5B and the late evidence map in Figure 4B due to completely non-overlapping time periods? A diagram of the timing definitions for different task periods would be helpful.

      (4) Perhaps related to #2, it is puzzling that evidence encoding is absent in the visual cortex during the early post-cue period.

      (5) The presentation and discussion of results related to correlated variability assume that the readers have already read their previous paper. A little more elaboration of the significance of this measurement would be helpful.

    1. eLife Assessment

      This important study links blood-derived dietary content to sustained increases in sleep in the mosquito Aedes aegypti. Using multiple independent approaches, the authors provide convincing evidence for blood-induced changes in sleep. These findings have broad implications for understanding how specialized diets regulate sleep across species and for mosquito vector biology.

    2. Reviewer #1 (Public review):

      Summary:

      The presented investigation aims to expand the sleep definition and its relationship with blood meal and/or circadian clock in the mosquito, Aedes aegypti. The authors exhausted the established sleep analytical paradigm and three behaviour toolkits: LAM10, EthoVision, and DART. They also investigated the potential underlying molecular mechanism by using dsRNA injection (LkR) and a KO mosquito (Cyc-/-).

      Strengths:

      The authors presented a very solid dataset showing posture changes and an increase in the arousal threshold of the mosquito after 10 minutes of immobility. This is a major clarification and extension to our understanding of insect sleep beyond Drosophila. Inclusion of analytical parameters such as bout length, waking activity and pDoze/Wake provide critical reminder for other investigators of the steps needed for defining sleep in a new species. The investigation, with its technical span in behaviour assays, therefore establishes a good standard for mosquito sleep analysis to the same quality seen in the landmark studies (Shaw et al 2000 and Hendricks et al 2000) for Drosophila sleep. The pioneering data showing a clear effect of blood meal and LkR reduction on locomotion and sleep provides an entry point for further investigations.

      Weaknesses:

      Despite the versatility of the behaviour and transgenic methods in this manuscript, there are two logical gaps in the conclusion, which are related to the effect of blood meal/BSA/LkR KD on A. aegypti sleep:<br /> (1) Conventionally, a coincidence of sleep increase and locomotion reduction would weaken the certainty of a sleep increase assessment. The authors implied this concurrence observed after blood meal is derived from internal "drowsy" neural state instead of physical "cripple", but they did not use their two high-resolution video tracking velocity or pDoze/Wake to clarify this.<br /> (2) The major molecular component underlying blood meal effect on sleep/locomotion is less certain, because the BSA solution used for feeding contains ATP, which itself is able to enter haemolymph and potentially exerts sleep/locomotion effect. Additionally, the basal or control sleep recording is done after sucrose feeding. It is, however, unclear from the method if this is 10% too? And if the observed sleep level increase after a blood meal is a result of sugar level reduction in the blood (~0.1%).

    3. Reviewer #2 (Public review):

      Zhang et al. investigate how blood feeding and dietary protein influence sleep in the mosquito Aedes aegypti. The authors first establish a behavioural definition of sleep using postural analysis and arousal threshold measurements, then demonstrate that both blood meals and a bovine serum albumin (BSA)-based protein diet increase sleep for several days. They further show that RNAi-mediated knockdown of the leucokinin receptor (Lkr) enhances sleep, implicating neuropeptide signalling in the regulation of postprandial sleep. The authors propose that elevated sleep persists well beyond the restoration of host-seeking behaviour, suggesting the existence of distinct "opportunistic" versus "determined" host-seeking phases.

      Strengths

      The central question is well-motivated, and the experimental approach is systematic. The use of multiple independent methods to characterise sleep - postural analysis, infrared activity monitoring, videography, and arousal threshold - provides converging evidence. The BSA feeding experiment is a particularly effective demonstration that dietary protein, rather than other blood components, is the key regulator of the sleep increase. The conservation of leucokinin signalling in sleep regulation between Drosophila and Ae. aegypti is a noteworthy finding that adds comparative depth.

      Weaknesses

      (1) Sleep definition.

      The authors settle on a 10-minute immobility threshold, but their own data do not convincingly support this choice. The arousal threshold data (Figure 1G) show no significant difference between the 1-5 min and 6-10 min bins (P=0.246), with significance emerging only at the 11-15 min bin. The postural analysis likewise indicates that sleep-associated postures appear at ~20 min during the day and ~11 min at night. A 15-minute threshold would be better supported by the data as presented. The previous literature used 120 minutes for this species (Ajayi et al. 2022), making this a dramatic shift.

      (2) Confound of reproduction and sleep.

      The primary experimental paradigm measures sleep beginning at Day 4 post-blood feeding, immediately after oviposition. Animals have undergone gut distension, vitellogenesis, and oviposition, and what is being measured as "sleep" could reflect post-reproductive quiescence or recovery rather than diet-induced sleep per se. The BSA experiment partially addresses this, but since BSA also triggers vitellogenesis and egg production (as the authors note), the confound persists.

      (3) Opportunistic vs. determined host-seeking hypothesis.

      This framework is presented as a key conceptual contribution, but the paper contains no data on host-seeking behaviour. The authors infer two phases from the temporal mismatch between a 72-hour host-seeking suppression window (from prior studies) and elevated sleep through Day 5 (~120 hours). While this is an interesting hypothesis, it requires actual measurement of host-seeking alongside sleep to be substantiated, or at least the caveats need to be discussed more explicitly.

      (4) Statistical approach.

      The methods describe "one-way ANOVA, followed by Mann-Whitney tests with Welch's correction," which is an internally inconsistent combination: Mann-Whitney is non-parametric and does not use Welch's correction (which applies to t-tests). Throughout the figures, F-statistics (parametric) are reported alongside what appear to be non-parametric tests. The statistical framework needs to be clarified and made consistent. Exact sample sizes per group should also be stated explicitly in the methods for all experiments.

    1. eLife Assessment

      This manuscript reports a potentially valuable modeling study on sequence generation in the hippocampus in a variety of behavioral contexts. While the scope of the model is ambitious, its presentation is incomplete, and there remains some lack of clarity on the methodology and interpretation. The work will interest the broad community of researchers studying cortical-hippocampal interactions and sequences.

    2. Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments both from the rodent and the human literature such as splitter cells, lap cells, the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over/under representation of context information.

      My general assessment of the work is unchanged, and I still have some questions requesting methodological clarification

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, action selection. The model also nicely links ideas from reinforcement learning to a neuronally interpretable mechanisms, e.g. learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be heavily improved. Judgment of generality and plausibility of the results is severely hampered but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is impossible to judge whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work on the field.

      Comments:

      The authors have made strong efforts to improve on their description of the methods, however, it is still very hard to understand. As a result of some of their clarifications, new issues appeared that I was not able to extract in the previous version.

      (1) Particularly I had problems figuring out how the individual dynamical systems are interrelated (sequences, attractor, action, learning). As I understand it now (and I still might be wrong) there is one discrete time dynamics, where in each time step one action takes place as well as the attractor and sequence dynamics are moved one step forward. Also, synaptic updates happen in every one of those time steps. The authors may verify or correct my interpretations and further improve on their description in the manuscript. It is also confusing that time in the figure panels is given in units of trials, where each trial may consist of (maybe different amounts of) multiple time steps. Are the thin horizontal red ad blue lines time steps?

      (2) As a consequence of my new understanding of the model dynamics, I have become doubts about the interpretation of the attractor network as context encoding. Since the X population mainly serves to disambiguate sequence continuation, right before the action has to be taken (active for only two time steps in Figure 1C?) they could also be considered to encode task space (El-Gaby et al. 2024; doi: 10.1038/s41586-024-08145-x).

      (3) Also technically, I wonder why the authors introduce the criterion of 50(!) time steps to allow the attractor to converge, if the state of the attractor network is only relevant in one time step to choose the appropriate continuation of the sequence of actions. Is attractor dynamics important at all? What would happen if just the input and output weights to the X population are kept and the recurrent weights are set 0?

      (4) Figure 3E: How many time steps are the H cells active (red bars?) Figure 4J: What are the units of the time axis?

    3. Reviewer #3 (Public review):

      Summary:

      This paper develops a model to account for flexible and context-dependent behaviors, such as where the same input must generate different responses or representations depending on context. The approach is anchored in the hippocampal place cell literature. The model consists of a module X, which represents context, and a module H (hippocampus), which generates "sequences". X is a binary attractor RNN, and H appears to be a discrete binary network, which is called recurrent but seems to operate primarily in a feedforward mode. H has two types of units (those that are directly activated by context, and transition/sequence units). An input from X drives a winner-take-all activation of a single unit H_context unit, which can trigger a sequence in the H_transition units. When a new/unpredicted context arises, a new stable context in X is generated, which in turn can trigger a new sequence in H. The authors use this model to account for some experimental findings, and on a more speculative note, propose to capture key aspects of contextual processing associated with schizophrenia and autism.

      Strengths:

      Context-dependency is an important problem. And for this reason, there are many papers that address context-dependency - some of this work is cited. To the best of my knowledge, the approach of using an attractor network to represent and detect changes in context is novel and potentially valuable.

      Comments on revisions:

      The authors have adequately addressed my concerns. Most importantly, the details of the implementation of the different components of the model are much more clearly described.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a potentially valuable modeling study on sequence generation in the hippocampus in a variety of behavioral contexts. While the scope of the model is ambitious, its presentation is incomplete and would benefit from substantially more methodological clarity and better biological justification. The work will interest the broad community of researchers studying corticalhippocampal interactions and sequences.

      Thank you very much for your comments. We are very encouraged by your positive feedback. We have revised our manuscript to clarify our model, strengthen its biological justification, and make it more accessible to a broader audience.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Ito and Toyozumi proposes a new model for biologically plausible learning of context-dependent sequence generation, which aims to overcome the predefined contextual time horizon of previous proposals. The model includes two interacting models: an Amari-Hopfield network that infers context based on sensory cues, with new contexts stored whenever sensory predictions (generated by a second hippocampal module) deviate substantially from actual sensory experience, which then leads to hippocampal remapping. The hippocampal predictions themselves are context-dependent and sequential, relying on two functionally distinct neural subpopulations. On top of this state representation, a simple Rescola-Wagner-type rule is used to generate predictions for expected reward and to guide actions. A collection of different Hebbian learning rules at different synaptic subsets of this circuit (some reward-modulated, some purely associative, with occasional additional homeostatic competitive heterosynaptic plasticity) enables this circuit to learn state representations in a set of simple tasks known to elicit context-dependent effects.

      We appreciate it for carefully reading the manuscript and finding the novelty and significance in our work.

      Strengths:

      The idea of developing a circuit-level model of model-based reinforcement learning, even if only for simple scenarios, is definitely of interest to the community. The model is novel and aims to explain a range of context-dependent effects in the remapping of hippocampal activity.

      Weaknesses:

      The link to model-based RL is formally imprecise, and the circuit-level description of the process is too algorithmic (and sometimes discrepant with known properties of hippocampus responses), so the model ends up falling in between in a way that does not fully satisfy either the computational or the biological promise. Some of the problems stem from the lack of detail and biological justification in the writing, but the loose link to biology is likely not fully addressable within the scope of the current results. The attempt at linking poor functioning of the context circuit to disease is particularly tenuous.

      We thank the reviewer for the insightful comments.

      To better characterize our model, we added formal descriptions of each task setting and explicitly specified the sources of uncertainty. We revised the schematic figures in Figure 1 to more clearly illustrate our model. An important revision is that we now distinguish between stimulus prediction error (SPE)–driven remapping and reward prediction error (RPE)–facilitated remapping. SPEdriven remapping is triggered by mismatches between actual sensory stimuli and those predicted from past history and serves to update the current contextual state or to create a new one. In contrast, RPE-facilitated remapping is more likely to occur when executing an action planning sequence associated with recent negative reward prediction errors, possibly due to environmental changes, and promotes exploration of alternative planning sequences.

      “Based on the source of prediction errors, we consider two types of remapping: sensory prediction error (SPE)–driven remapping and reward prediction error (RPE)–facilitated remapping (Figure 1C). SPE-driven remapping is triggered when the mismatch between the predictive inputs from H to X and externally driven sensory inputs exceeds a threshold (see Materials and Methods), causing X to either transition to a different contextual state or form a new one (Figure 1D). RPE-facilitated remapping is more likely to be triggered when the agents execute an action plan following a hippocampal sequence marked by a no-good indicator. The no-good indicator indicates that the action plan, i.e. the hippocampal sequence, has recently been associated with negative reward prediction errors, possibly due to environmental changes (see Materials and Methods). It then facilitates the exploration of alternative hippocampal sequences (Figure 1E).”

      In addition, we added Figure 2C-E to clarify the neural representations of external stimuli and contextual states in the X module, as well as the neural representations within the H module. We also clarified the purpose of each model component and discussed plausible biological implementations to justify our modeling choices. Furthermore, we added a schematic illustration of our results related to psychiatric disorders in Figure 5B and revised the corresponding section of the manuscript to explicitly frame these results as a computational hypothesis. We also expanded the discussion to relate our findings to existing computational psychiatry models (see point-bypoint responses below).

      We believe that these revisions have improved the clarity of our model and broadened its accessibility to a wider audience.

      Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments, both from the rodent and the human literature, such as splitter cells, lap cells, and the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over-/under-representation of context information.

      We appreciate it for carefully reading the manuscript and finding the novelty and significance in our work.

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, and action selection. The model also nicely links ideas from reinforcement learning to neuronally interpretable mechanisms, e.g., learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be majorly improved. Judgment of generality and plausibility of the results is hampered, but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is unclear whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work in the larger field.

      We appreciate the reviewer’s valuable feedback. In the revised manuscript, we have improved the presentation of the methodological aspects by providing a more intuitive and general explanation of the model framework and training procedure. We also rewrote the section on psychiatric implications to more clearly explain how dysfunction in contextual inference occurs in our model. These revisions enhance both the clarity and plausibility of our conclusions.

      More specifically:

      (1) The methods section is impenetrable. The specific adaptations of the model to the individual use cases of the model, as well as the posthoc analyses of the simulations, did not become clear. Important concepts are only defined in passing and used before they are introduced. The authors may consider a more rigorous mathematical reporting style. They also may consider making the methods part self-contained and moving it in front of the results part.

      Thank you for raising the important point.

      To improve readability, we have updated Figure 1 to more clearly illustrate the main model structure and its adaptation to individual use cases. Additionally, we have moved the previous Figure 6 (now Figure S1) to an earlier point in the Results to facilitate understanding of the methodological flow. Method section is also revised to explain the algorithmic structure indicated in Figure S1. These revisions make the methods more self-contained and easier to follow.

      In the revised manuscript, we have clarified that our model is qualitatively related to the Bayesadaptive reinforcement learning framework (Guez et al., 2013) as follows.

      “In the framework of reinforcement learning, our model can be mapped onto a Bayesian-adaptive model-based architecture in which contextual state serves as the root of Monte Carlo tree search (Guez et al., 2013) in a simple, largely stable environment with noiseless and unambiguous sensory stimuli, and only occasional abrupt changes. In this setup, prediction errors arise from agent’s lack of experience or due to abrupt environmental changes. Once a context selector X infer the hidden state, the sequence composer H generates episodic sequences that correspond to trajectories in a search tree, each branch representing possible action–outcome sequences. Just as Monte Carlo tree search explores potential future paths to evaluate expected rewards, H produces hippocampal sequences that simulate future states and rewards based on its learned connectivity. In this way, X defines the context that anchors the root of the tree, while H expands the tree through replay or planning, thereby our model provides a simplified algorithmic implementation model-based reinforcement learning via tree search planning.”

      (2) The description of results in the main text remains on a very abstract level. The authors may consider showing more simulated neural activity. It remains vague how the different stimuli and contexts are represented in the network. Particularly, the simulations and related statistical analyses underlying the paradigms in Figure 4 are incompletely described.

      Thank you for pointing this out.

      In the revised manuscript, we have added explicit examples of simulated neural activity. Specifically, we added new figures in Figure 2C–E and showed representative activity patterns from both Context selector (X) and Sequence composer (H). We also clarified the distinction between activity in the stimulus domain (externally driven) and the context domain (internally inferred states)

      “Figure 2C illustrates an example of both the environmental state transition and the corresponding contextual state transition of an agent. The neural activity of X at each contextual state is shown in Figure 2D, where the environmental states … are represented in the stimulus domain and the contextual states … are represented in the context domain. … In the example transition shown in Figure 2C, the agent selected an environmental state transition from S2 to S4 in the 2nd, 5th, and 8th trials, which corresponds to a contextual state transition from X2β to X4β in the X module. However, because this transition was not rewarded, no synaptic potentiation occurred among hippocampal neurons. Subsequently, in the 11th trial, the agent attempted an environmental state transition from S2 to S5, corresponding to the transition from X2β to X5β in the contextual states.

      The agent received a reward at S5, and the corresponding hippocampal sequence was strengthened, enabling the agent to acquire the alternation task in the following trials (Figure 2E).”

      (see point-by-point responses below).

      We also added a detailed explanation of our results in Figure 4 as follows.

      “We consider a simplified environment of a probabilistic cueing paradigm (Ekman et al., 2022). In this study, two auditory contextual cues probabilistically predicted distinct visual motion sequences, and fMRI decoding was used to examine the frequency of hippocampal replay. We simplified this task as shown in Figure 4A. ”

      “... This result replicates Ekman et al. (2022), who showed that the probability of the contextual cues is reflected in the statistically significant differences in hippocampal replay probability in humans (Figure 4F).”

      “F, Our model behavior is similar to the human fMRI result of the cue-probability-dependent hippocampal replay (Ekman et al., 2022). Paired sample t-test. **P<0.01.”

      We believe that these revisions make the model description and simulation results more concrete and easier to interpret.

      (3) The literature review can be improved (laid out in the specific recommendations).

      Thank you for pointing this out. We revised the literature review to the best of our ability.

      (4) Given the large range of experimental phenomenology addressed by the manuscript, it would be helpful to add a Discussion paragraph on how much the results from mice and humans can be integrated, particularly regarding the nature of the context selection network.

      Thank you for your suggestion.

      In the revised manuscript, we added a new paragraph in the Discussion explicitly addressing how results from mice and humans can be integrated.

      “Our model is a functionally modular account of the cortical regions and hippocampus, enabling it to capture experimental findings across species. While hippocampal activity in rodents has been extensively characterized in terms of spatial coding, human hippocampal representations are more often non-spatial and episodic-like (Bellmund et al., 2018; Eichenbaum, 2017). For episodic memory to support flexible behavior, it would be beneficial to retrieve each episode in a contextdependent manner. The episodic contents may vary across species and individuals, yet the fundamental computations—estimating the current context from external stimuli and their history, and flexibly updating this estimate via prediction errors—are likely conserved. Holding context information until the contextual prediction error is detected is analogous to the belief state in model-based reinforcement learning, which is known to improve performance under partially observable conditions (POMDPs) (Kaelbling et al., 1998). Our model provides a simple algorithmic implementation of this principle.”

      (5) As a minor point, the hippocampus is pretty much treated as a premotor network. Also, a Discussion paragraph would be helpful.

      Thank you for pointing this out.

      We define action as a transition from one environmental state to another, and transition-coding hippocampal neurons are used for action-planning. Because our model does not incorporate errors in transitions (actions), the generated hippocampal sequences are perfectly correlated with the executed transitions (actions). However, we acknowledge that computations in the brain are more complex, with contributions from other regions such as the premotor network and the basal ganglia. To clarify this, we added formal representations of state transitions (action) in each task and the following sentences to the manuscript.

      “In Sequence composer, there exist two types of neurons: state-coding neurons, which represent each contextual state, and transition-coding neurons, which encode transitions to successive contextual states given the contextual state indicated by the state-coding neurons (Materials and Methods). Note that in the real brain, not only hippocampus but also the premotor cortex and the basal ganglia contribute to action planning and execution (Hikosaka et al., 2002). Here, however, we focus on how simplified planning sequences are learned and composed in a context-dependent manner.”

      “Our model posits that the Sequence Composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state without errors in action.”

      Reviewer #3 (Public review):

      Summary:

      This paper develops a model to account for flexible and context-dependent behaviors, such as where the same input must generate different responses or representations depending on context. The approach is anchored in the hippocampal place cell literature. The model consists of a module X, which represents context, and a module H (hippocampus), which generates "sequences". X is a binary attractor RNN, and H appears to be a discrete binary network, which is called recurrent but seems to operate primarily in a feedforward mode. H has two types of units (those that are directly activated by context, and transition/sequence units). An input from X drives a winner-take-all activation of a single unit H_context unit, which can trigger a sequence in the H_transition units. When a new/unpredicted context arises, a new stable context in X is generated, which in turn can trigger a new sequence in H. The authors use this model to account for some experimental findings, and on a more speculative note, propose to capture key aspects of contextual processing associated with schizophrenia and autism.

      We thank the reviewer for this summary of our model.

      We would like to clarify that the hippocampal Sequence composer (H) is a recurrent network that iteratively composes the next state and the associated sensory stimuli in the sequence based on the current contextual state.

      Strengths:

      Context-dependency is an important problem. And for this reason, there are many papers that address context-dependency - some of this work is cited. To the best of my knowledge, the approach of using an attractor network to represent and detect changes in context is novel and potentially valuable.

      Weaknesses:

      The paper would be stronger, however, if it were implemented in a more biologically plausible manner - e.g., in continuous rather than discrete time. Additionally, not enough information is provided to properly evaluate the paper, and most of the time, the network is treated as a black box, and we are not shown how the computations are actually being performed.

      We thank the reviewer for suggesting an important direction for future work. The goal of this research is to develop a minimal, functionally modular neural circuit model that provides general insights into how context-dependent behavior can be realized across species, including humans. To simplify our model, we only considered discrete-time environmental states, where the exact length of the time step depends on each environment. Extending the model to a more biologically plausible, continuous-time framework is a promising direction for future work, such as using continuous-time modern Hopfield networks and synfire chains. We modified the Discussion section to clearly point out this direction.

      “... the resolution at which our model should distinguish different contextual states, including the stimulus resolution and time resolution, is hand-tuned in this work. While we used an abstract, gridlike state space with discrete time, an important direction for future work is to model its activity at finer-grained neural timescales, … In realistic, continuously changing environments, such resolutions should be adjusted autonomously. Introducing continuous and hierarchical representations with multiple levels of spatial and temporal resolution would facilitate such adjustments, potentially through mechanisms such as modern Hopfield networks (Kurotov and Hopfield, 2020) or synfire-chain–based hippocampal sequence generation (Abeles, 1982; Diesmann et al., 1999; Shimizu and Toyoizumi, 2025; Toyoizumi, 2012), but this is beyond the focus of the current study”

      Also, we would like to emphasize that our model is not treated as a black box. To improve the understandability, we have majorly revised Figures 1 and 2 to include additional details illustrating the neural activity and the internal computational mechanisms.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments and suggestions for improvement:

      (1) Formal link to model based RL is unclear: a core feature of inference is the role of uncertainty in modulating computation and corresponding circuit dynamics, in particular defining expected and unexpected degree of errors; as far as I understand the degree of tolerable errors within a context is defined by the size of the basin of attraction of the context module (which is dependent on number of items and the structure of correlations across patterns) and in no obvious way affected by sensory uncertainty (unless the inputs from H serve that purpose in a more indirect way). Similarly, most experiments are deemed to have deterministic (unambiguous) maps between sensory inputs and world state (although how the agent's state relates to environmental state is more complex and not completely clear based on the existing text).

      Thank you for raising this important point. Our model bears conceptual similarities to model-based RL frameworks, for example, the optimal-inference formulation that underlies Monte Carlo Tree Search (Guez et al., 2013), as we now clarify in the revised manuscript. These similarities, however, are qualitative rather than quantitative. In particular, the error thresholds that separate expected from unexpected outcomes are manually specified in our model, but their exact values do not appreciably influence the simulation results.

      Concretely, the heuristic threshold for SPE-driven remapping (𝜃<sub>𝑟𝑒𝑚𝑎𝑝</sub>) is set to 5 bits, allowing for small miss-convergence during recall in the Amari–Hopfield model. For RPE-facilitated remapping, the threshold is set to 𝜃<sub>𝑁𝐺</sub> = 0.7, making the agent sufficiently sensitive to abrupt environmental changes and enabling it to explore some candidate contexts after RPE-facilitated remapping. This simple thresholding scheme is adequate for our largely deterministic simulation setting, where contextual switches are rare and occur abruptly in an otherwise stable and unambiguous environment.

      Importantly, our goal in this work was not to achieve Bayesian optimality. Mice and likely humans in certain settings often deviate from optimal inference. Instead, we focus on the qualitative remapping-related processes that support goal-directed planning following epistemic errors. We have clarified this scope in the revised manuscript.

      “In the framework of reinforcement learning, our model can be mapped onto a Bayesian-adaptive model-based architecture in which contextual state serves as the root of Monte Carlo tree search (Guez et al., 2013) in a simple, largely stable environment with noiseless and unambiguous sensory stimuli, and only occasional abrupt changes. In this setup, prediction errors arise from the agent’s lack of experience or due to abrupt environmental changes. … However, these conceptual similarities are qualitative rather than quantitative. The goal of this work is not to achieve Bayesian optimality, but rather to show qualitative remapping-related processes that support goal-directed planning following epistemic errors.”

      “Note that we set the remapping threshold 𝜃<sub>𝑟𝑒𝑚𝑎𝑝</sub> = 5 bits to allow for small miss-convergence during recall in the Amari–Hopfield model.”

      “Note that we set 𝜃<sub>𝑁𝐺</sub> as 0.7 to make the agents sufficiently sensitive to abrupt environmental changes and enable exploring some candidate contexts after RPE-facilitated remapping.”

      (2) Improvement: start describing each task specification in explicit model-based RL terms, then explain how the environmental specification translates into agent operations. Be explicit about what about the process is inferential, in particular, sources of uncertainty.

      Thank you for this important suggestion. Following your recommendation, we revised the manuscript to describe each task explicitly in model-based RL terms. For each task, we now identify the relevant sources of uncertainty, which arise either from imperfections in the agent’s internal model of the environment or from occasional abrupt switches in task rules. We also explain how the agent infers the hidden state from experience to construct an appropriate context representation, enabling the model to perform the task successfully.

      (3) A lot of seemingly arbitrary model choices need additional computational and biological justification; the description of the process is fundamentally an algorithmic one, which includes a lot of if-then type of operations: the dynamics of different elements of the circuit switch between "initialization to landmark/other", "error detected/not", different forms of plasticity on/off etc and it is not discussed in way how this kind of global coordination of different processes is supposed to be orchestrated biologically; e.g. as far as I understand the sequential structure in H activity is largely hardcoded rather than an emergent property of the learning+neural dynamics.

      Thank you for this important suggestion. We have made a concerted effort to clearly describe the biological context and the relevant literature motivating each of our algorithmic assumptions. Notably, as highlighted in Fig. 1F, we emphasize that the sequential structure in H activity emerges as a consequence of the agent’s exploration and learning. We also explain how the two remapping mechanisms concatenate sequence segments to support long-term planning and to predict both stimuli and rewards.

      About Fig. 1F

      “At the beginning of learning, hippocampal segments are not connected, and H yields only short sequences that generate immediate actions and short-term predictions. As learning continues, the three-factor Hebbian plasticity rule concatenates these segments, thereby creating longer sequences that reflect the task structure (Figure 1F).”

      About “initialization to landmark/other,”

      “While the history-based initialization was introduced to select contextual state based on the history input from H (episodic), the landmark-based initialization was introduced to terminate the episodes that would otherwise continue indefinitely. Biologically, the landmark-based initialization corresponds to the operation of anchoring a contextual state to salient environmental landmarks - such as an animal’s nest - that serve as clear reference points.”

      About “error detected/not,”

      “Based on the source of prediction errors, we consider two types of remapping: sensory prediction error (SPE)-driven remapping and reward prediction error (RPE)-facilitated remapping (Figure 1C). SPE-driven remapping is triggered when the mismatch between the predictive inputs from H to X and externally driven sensory inputs exceeds a threshold (see Materials and Methods), causing X to either transition to a different contextual state or form a new one (Figure 1D). RPE-facilitated remapping is more likely to be triggered when the agents execute an action plan following a hippocampal sequence marked by a no-good indicator. The no-good indicator indicates that the action plan, i.e. the hippocampal sequence, has recently been associated with negative reward prediction errors, possibly due to environmental changes (see Materials and Methods). It then facilitates the exploration of alternative hippocampal sequences (Figure 1E). ”

      About “different forms of plasticity on/off”

      “We used different learning rules for the intra-hippocampal synaptic weights depending on withinepisodic and between-episodic segments.”

      “Within-episodic connections, i.e., state-coding to transition-coding synapses, are constantly updated in a reward-independent manner … This modeling is inspired by behavioral time scale plasticity in the hippocampus (Bittner et al., 2017), in which synaptic potentiation occurs for events that are close in time regardless of reward, and such plasticity is believed to support the formation of place cells, etc..”

      “Between-episodic connections, i.e., transition-coding to state-coding synapses, are constantly updated in a reward-dependent manner … This is supported by the finding that dopaminergic neuromodulation gates LTP, enabling preferential consolidation of reward-associated experiences (Lisman and Grace, 2005; Takeuchi et al., 2016).”

      (4) Improvement: Justify individual design choices by biology whenever possible; in the absence of such justification, provide at least a computational rationale for each such model choice. Additional justification for the neural substrate of different prediction errors.

      Thank you for pointing this out. Following the advice, we have added the computational objectives behind each algorithmic component in addition to the biological motivations described above. In particular, we have completely updated Fig. 1 to help readers better understand the key remapping mechanisms in our algorithm: SPE-driven and RPE-facilitated remapping.

      About the Amari-Hopfield model

      “We employ the Amari–Hopfield model because it allows multiple contexts to be stably maintained and selected in response to stimuli and can be trained via Hebbian plasticity. We assume that similar computations are carried out in prefrontal and entorhinal cortical circuits in the brain.” “As one possible biological implementation, we consider that Context selection in X as the brainwide evoked potential during which bottom-up information may be integrated with top-down signals to select the current context (Mohanty et al., 2025). In this case, it takes several hundred milliseconds for the contextual states in X to settle (Massimini et al., 2005).”

      About the default matrix

      “This contextual state is set as a default context, ensuring that the X module assigns a unique contextual state to each environmental state. Biologically, one possible interpretation is that this default context corresponds to modality-specific innate representations in prefrontal regions (Manita et al., 2015).”

      About state-coding neurons and transition-coding neurons

      “The state-coding neurons receive input from X and represent the current contextual state, while the transition-coding neurons send output to X and predict the next contextual state after an action ... One possible biological grounding for this functional separation is that entorhinal cortex provide contextual inputs to CA3, and CA3 and CA1 generates predictions of next state through its recurrent architecture (Chen et al., 2024).”

      About the no-good indicator

      “No-good indicator is introduced to transiently suppress previously established sequences that have not been recently rewarded, without devaluing them. This no-good indicator facilitates RPEfacilitated remapping (see RPE-facilitated remapping section) that leads to exploration of different contextual states in X and sequences in H. The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025).”

      (5) In particular, the temporal scale at which processes unfold with reference to behavioral time scale actions is fundamentally unclear: what determines the time scale of a sequential element? What stitches them together? What is the temporal relationship between H and X operations? At what time scale do actions happen in terms of those operating scales? How does this align with what is known about hippocampal dynamics during behavior?

      (6) Improvement: make the time scales of different aspects of the process explicit in the text, potentially with additional graphic support.

      Thank you for the questions and suggestions. In this work, we model the agent’s behavior in an abstract grid-world environment with discrete time steps, as is common in classical RL. At each time step, the agent observes a sensory stimulus, makes a plan, and executes an action based on it. The action induces a state transition in the environment. Accordingly, the model includes a single fundamental timescale: the environmental (behavioral) time step.

      The modeled brain dynamics in both X and H are similarly locked to this environmental clock. As clarified in Fig. 1F, each sequence segment corresponds to one behavioral time step. These segments are then chunked based on reward events, enabling longer-horizon planning and prediction.

      The agent’s cognitive operations at each behavioral time step are summarized in Fig. S1. Briefly, the agent infers the contextual state X from the current stimulus and its stimulus history, generates a sequential action plan H with predictions using chunked sequence segments, and then follows the plan when it is sufficiently promising. In addition, when sensory or reward prediction errors occur, the agent reorganizes the synaptic-weight parameters of the context selector and sequence composer. Once the agent becomes familiar with the environment, H typically generates an extended action sequence along with predictions of future stimuli and the resulting reward. The agent then executes this sequential plan, bypassing step-by-step context estimation by X, until a prediction error triggers remapping.

      The revised manuscript includes the following additions.

      “For simplicity, the environment is defined in discrete time, and agents move through environmental states characterized by distinct external stimuli. The model operation relies on the environmental (behavioral) time step. At each time step, the agents perform contextual state estimation by Context selector and activate a corresponding hippocampal neuron. Then, this hippocampal neuron initiates sequential activity based on hippocampal synaptic connectivity. Each hippocampal sequence represents a planned course of action and is used to predict a series of external stimuli. … The hippocampal sequence from which actions are generated is updated upon a reward. After the action execution, the agents repeat the process by selecting the current contextual state. As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent. The algorithmic flow chart of our model is described in Figure S1.”

      (7) As far as I understand it, the existence of splitter cells is directly inherited from the task specification, and to some extent the same can be said about the lap cells; please explain what can be understood from the model simulations that goes beyond what was put into the inputs/reward function for each experiment. Emphasize numerical results that are counterintuitive or where additional predictions about the dynamics come directly from simulating the model but would have been less obvious beforehand.

      The existence of splitter cells in our model is not inherited from the task specification. Instead, it emerges directly from the hippocampal module retaining sensory history (namely, whether the agent approached from the left or right arm), independent of reward structure or other task details. When sensory history is removed from the sequence composer (and, consequently, from the context selector), splitter-cell representations disappear.

      To develop lap-cell representations, immediate sensory history alone is not sufficient. The sequence composer must chunk episodic segments based on rewards to support sufficiently long action plans (i.e., history dependence) that span the multiple laps required by the task. The planning horizon - the length of action sequences - typically increases as animals learn a task. This progressive development of hippocampal sequences and their dependence on reward yields experimentally testable predictions. Notably, as we clarified in Fig. S2, the required sensory history length must also be learned adaptively: if it is too short, the agent cannot solve the task, whereas if it is too long, learning becomes unnecessarily slow.

      In the revised manuscript, we explicitly described the emergent process of splitter cells and lap cells as follows.

      About splitter cells

      “A second contextual state at S2, X2β, was generated through SPE-driven remapping at the second visit of S2 (second trial) due to history mismatch… In our model, the transition-coding neurons exhibit right/left turn-specific firing at S2 after learning is complete (Figure 2E, I), replicating the emergence of splitter cells.”

      About lap cells

      “the task environment changes again and the agents are rewarded for two laps, …. Either the shortest transition, ..., or the one-lap transition, …, is no longer rewarded, which triggers another RPE-facilitated remapping and exploration. During exploration, a history mismatch occurs …, and the contextual states for the second lap … are generated. Finally, the rewarded transition of contextual states and corresponding sequence… is reinforced (Figure 3B).”

      “This task can also be solved by simply preparing temporal contexts with three steps of sensory history (n=3), which is the minimal number to solve this task. (see Materials and Methods for Model-free learning). However, it takes much longer to find the correct transition for solving the 1-lap task than our model because it involves an excessive number of states (Figure S2).”

      “As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent.”

      (8) The partitioning of H subpopulation into current input vs predictive subpopulations seems to fundamentally deviate from known CA1 properties like theta phase processing, where the same neurons encode information about recent past, present, and future at different moments in time within a theta cycle. The existence of such populations (especially since they come with distinct plasticity mechanisms and projection patterns) seems like a strong avenue for validating the model experimentally.

      (9) Improvement: biologically justify the two subpopulations, discuss neural signatures of this distinction that could be used to identify such neurons in experiments

      We thank the reviewer for bridging our model with biological circuits.

      First, we would like to clarify that we do not claim that our H module corresponds to CA1 specifically.

      Rather, we assume that within the broader hippocampal loop (EC–DG–CA3–CA1–EC), subpopulations emerge that preferentially encode the current contextual states and the transitions to the next contextual states. This assumption reflects our hypothesis that the hippocampus implements a mechanism for predicting the next context given the current one. Importantly, this functional separation does not contradict known theta-phase coding in which the same neurons can represent past, present, and future information at different phases of the theta cycle.

      As a possible biological grounding, we particularly emphasize the CA3–CA1 projection. Recent studies have shown that CA1 representations exhibit a temporal delay relative to CA3 activity (Chen et al., 2024), suggesting a circuit-level mechanism by which predictions of upcoming contextual states may be computed based on the current context. In this framework, state-coding and transition-coding functions could be assigned to CA3 and CA1, or dynamically expressed through their interactions. Based on our model, we make testable experimental predictions. Specifically, we predict that neural representations in CA3 and CA1 should precede contextual switching in tasks such as alternation or multiple-lap tasks, and that perturbing CA3–CA1 computations would impair task performance.

      Please note, however, that our model does not characterize the sequence composer’s activity at such fine-grained neuronal timescales. Instead, we model the computation it performs in abstract time steps corresponding to the grid states (e.g., while the animal is at a corner of the maze).

      We have added these points to the Discussion to clarify the biological interpretation and to suggest potential experimental validations of the proposed subpopulation distinction as follows.

      “Our model posits that the Sequence composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider the CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state. Consistent with this idea, the temporal lag in CA3→CA1 transmission suggests a functional gradient in which CA3 represents present-oriented information while CA1 carries more futureoriented predictions (Chen et al., 2024), and neurons in both CA3 and CA1 exhibit action-driven remapping and encode action-planning signals (Green et al., 2022). Our framework, therefore, predicts that changes in CA3→CA1 population activity precede behavioral switching in contextdependent alternation in Figure 2 or multi-lap tasks in Figure 3, and perturbation of this input will degrade the behavioral performance.”

      “While we used an abstract, grid-like state space with discrete time, an important direction for future work is to model its activity at finer-grained neural timescales, such as theta cycles (Foster and Wilson, 2007; Wikenheiser and Redish, 2015).”

      (10) The flexibility of the new solution in terms of learning contexts with variable temporal horizons seems an important feature of the model, but one poorly demonstrated in the existing numerical experiments. Could more concrete model predictions be generated by designing an experiment targeted specifically for such scenarios?

      Thank you for raising this point.

      As we showed in Figure S2, in environments with variable temporal horizons, our model performs better than model-free learning (Q-learning) that incorporates temporal context.

      To further demonstrate this point, we added a new task in Figures 3G and H, in which the 1-lap task and the 2+ lap task are alternated. Our model exhibits rapid switching between these tasks, regardless of differences in sequence length or temporal horizon. We added the following text.

      “To demonstrate the advantage of our model in a rapidly switching task that requires different history lengths, we show that an agent trained on both the 1-lap and 2-lap tasks can flexibly alternate between them in a reward-dependent manner (Figure 3G), selectively engaging hippocampal sequences of different lengths according to the current task context (Figure 3H). Together, these results illustrate how hippocampal lap-like representations emerge through learning and enable flexible context switching across tasks with distinct temporal demands.”

      In such a scenario, a subjective representation of laps in the hippocampus is the key to solving the task. As we responded to points (8) and (9), neural representations, especially in CA1, are expected to bifurcate between the 1-lap and 2-lap conditions, and this bifurcation would precede and critically govern the animal’s behavior.

      (11) I found figures confusing/uninformative, specifically in making it explicit what is external task structure and what is the agent's internal representation of it; as a result it is not clear what of the results is trivially inherited from the task specification and what is an emergent property of the model; e.g. Figure 2A described external transition specification according to world model but it is unclear to me if Figure 2B shows the ideal agent state representation across context or a graphical summary of what the agent actually learned from the sensory experience described in A; from the text. Figure 2F is supposed to describe a property of the emergent representation, but what is shown is another cartoon... etc.

      We appreciate the reviewer’s insightful comments regarding the clarity of our figures.

      To clarify the neural representation of the agent and how it links to the action, we have revised Figure 2 and the descriptions in the main text.

      First, Figure 2A schematically depicts the external stimulus as being determined solely by the task. In this task, animals must keep track of the immediately preceding state (S1 or S3) to correctly choose between S4 and S5 upon reaching S2. Without such a memory of prior states, an agent would have no basis for distinguishing which action is appropriate, and therefore cannot selectively move to S4 and S5. Therefore, any reinforcement learning model that does not incorporate at least a onestep state history cannot solve the task.

      To solve the task, S2 must be represented as two distinct contextual states depending on the previous state. Figure 2B therefore illustrates an example of internal representation that separates S2 into X2α and X2β: transitions from S1 to S2 are internally represented as X1 → X2α, whereas transitions from S3 to S2 are represented as X3 → X2β. Although the sensory inputs provided to the model correspond only to the task-defined states in Figure 2A, the combination of the sensory input with contextual states in Context selector successfully achieves this contextual representation of X2α and X2β (see Figure 2C, D). Also, the hippocampal neurons in Sequence composer indicate the next contextual states given the current contextual states, i.e., X2α→X4 and X2β→X5 (see Figure 2E). Thus, combining Context selector and Sequence composer successfully achieves the task requirement indicated in Figure 2B.

      Regarding the reviewer’s concern that Figure 2F (now Figure 2I) appeared to be another cartoon, we have revised the panel to clearly display our result. These results demonstrate that some hippocampal neurons in our model encode the transition from X2α→X4 and X2β→X5. The updated figure clarifies that our hippocampal neurons functionally work similarly to the splitter cells in Wood et al., 2000.

      (12) Improvement: use visuals and captions. Make it clear what is a cartoon, what is a model specification, and what is an actual result. Replace/complement algorithmic cartoons in Figure 1 with a description of the actual result.

      Thank you for raising this point.

      As we explained in the previous point (11), we added Figure 2D and Figure 2E for displaying the actual neural activity, and the corresponding annotations in the manuscript, e.g, X2α. Also, we revised the cartoons of our model description in Figure 1 to better describe our model structure.

      (13) Map between model and experimental results is poorly justified: in particular the nature of sensory inputs is not clearly specified, and how the experimental manipulations (e.g. MEC input disruption) map into model manipulations is not intuitive and no justification is provided for the choices beyond that the model ends up matching the experiment by some metric. Also, not clear why a tradeoff of neural resources as implemented in the model makes sense for the clinical case and how this hypothesis deviates from alternative Bayesian accounts invoking imperfections in inference (e.g. relative strength of priors vs likelihood as reported by e.g. P.Series's group, or issues with hierarchical inference more generally along R.Jardri's work).

      Thank you for raising this important point. We have revised the manuscript to clarify the mapping between model components, sensory inputs, and the experimental manipulations, and to further justify the clinical interpretation.

      About sensory inputs

      First, each environmental state in our model is represented as a binary (0/1) pattern. We have added Figure 2D to explicitly illustrate these sensory stimuli and how they are provided to the context-selection module.

      About mapping between model components and brain circuits

      Functionally, we speculate that Context selector (X) corresponds to computations carried out in the prefrontal cortex (PFC) and entorhinal cortex (EC), and Sequence composer (H) corresponds to the hippocampus. Inputs from the PFC are thought to reach the hippocampus via the EC. Therefore, suppression of MEC→hippocampus inputs in Sun et al. (2020) naturally maps onto blocking a subset of the inputs from X to H in our model.

      We clarified this correspondence in the revised manuscript and now explicitly justify why this manipulation matches the biological experiment.

      Relation to Bayesian theories

      We agree that Bayesian accounts have provided influential explanations of psychiatric symptoms by invoking imperfections in inference, such as imbalances between priors and likelihoods (e.g., work by P. Series and colleagues) or disruptions in hierarchical inference (e.g., work by Jardri and others). Our model complements these frameworks by explicitly incorporating sequential structure and context remapping. Rather than treating priors as static or fixed-weight quantities, our model allows contextual representations to be dynamically reorganized based on prediction errors over time. In the SZ-like condition, we assume that an excessively expanded context domain increases the influence of internally generated contextual predictions, causing them to override sensory inputs and resulting in maladaptive behavior with hallucination-like percepts. Importantly, this effect reflects not only stronger priors but also excessive generation and competition of contextual states, leading to unstable and non-reproducible remapping. In contrast, in the ASD-like condition, sensory-weighted context representations limit the ability to flexibly incorporate newly introduced contexts, causing the model to perseverate on an initially learned context and thereby reproduce inflexible behavior. We added a schematic illustration in Figure 5B and expanded the Discussion to clarify this point.

      “When the stimulus domain is relatively underrepresented, the reconstruction of contextual state in the Amari-Hopfield network tends to infer contextual states based on the context domain rather than the stimulus domain. Consequently, it converges to an incorrect attractor that is not assigned to the current environmental state, thereby increasing perceptual error for external stimuli (hallucination-like effects). Moreover, SPE-driven remapping and the corresponding synaptic plasticity occur more frequently. In contrast, when the stimulus domain is overrepresented, the Amari-Hopfield network rarely assigns multiple contextual states to a given environmental state, leading to an overuse of default contextual states (see Figure 5B and Materials and Methods). ”

      “Our model also provides an algorithmic-level account of psychiatric symptoms by changing the relative weighting of sensory-encoding versus context-coding neurons. This implementation is analogous to Bayesian theories linking priors to psychiatric symptoms. In SZ, hallucinations and delusions have been modeled as arising from overly strong top-down priors (Powers et al., 2016) or circular inference, which leads to erroneous belief formation (Jardri et al., 2017; Jardri and Denève, 2013). In our model, we used an underrepresented stimulus domain to increase the relative influence of internally generated context representation in context selection. Crucially, this implementation does not simply strengthen priors but induces excessive generation and competition of contextual states, leading to frequent yet non-reproducible remapping of hippocampal contextual activity and a failure of learning to converge despite repeated experience. In ASD, it has been argued that abnormally high sensory precision reduces the updating of expectations (Karvelis et al., 2018) or leads to sensory-dominant perception, which has been interpreted as weak priors (Angeletos, Chrysaitis, and Seriès, 2023; Lawson et al., 2014; Pellicano and Burr, 2012). In our framework, we used an overrepresented stimulus domain to increase the relative influence of external stimulus representations in context selection. Importantly, our model captures not only sensory-dominant processing emphasized in previous studies, but also a distinctive impairment in flexibly utilizing newly introduced contexts, reflecting a failure of context reconstruction and resulting in persistent inflexible behavior. Thus, our conjunctive modeling of sensory and context processing complements Bayesian accounts of psychiatric symptoms and provides a mechanistic explanation for the role of sensory processing in maladaptive, inflexible behavior. ”

      (14) Improvement: justify choices, explain in more detail relationships with computational psychiatry literature.

      Thank you for pointing it out. As we explained in the previous point (13), we justified our model choice in the revised version.

      Minor comments:

      (1) Typos: "algorism" (pg2), duplicate Sun reference.

      Thank you for finding the typo and the missing reference. We revised accordingly.

      (2) Unclear statements from Methods:

      • "preparing temporal context with three histories" not sure what is meant by this.

      • "... state estimation by the context-selection module becomes less frequent." (Methods/Overview): what is the mechanism?

      • "default pattern" and failure to converge: What is the biological basis for them?

      • Why is the converter function used on some occasions but not others?

      • "new contextual state is prepared": What does that mean?

      We thank the reviewer for pointing out several unclear statements in the Methods section.

      • “preparing temporal context with three histories”

      We now explicitly state the formal description of three histories in the Methods as follows.

      “the state is defined by the recent n-step transition history of task state (i.e. 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> =(𝑆<sub>𝑘</sub>,𝑆<sub>𝑘−1</sub>, ⋯,𝑆<sub>𝑘−𝑛</sub>)<sup>𝑇</sup> , where 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> is the temporal context state, and 𝑆<sub>𝑘</sub> is the environmental state at time 𝑘). We changed n from 0 to 3.”

      • “state estimation by the context-selection module becomes less frequent”

      In our model, context selection is performed every time the agents execute an action sequence generated by Sequence composer. As learning progresses, the Sequence composer comes to predict distant future states and executes coherent action sequences based on these predictions. When no unexpected errors are encountered during execution, context estimation is suppressed, resulting in less frequent context selection. We modified the manuscript as follows.

      “After the action execution, the agents repeat the process by selecting the current contextual state. As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent. The algorithmic flow chart of our model is described in Figure S1.”

      • “default pattern”

      In biological systems, it is reported that the frontal cortex shows sensory modality-specific representation without prior learning (Manita et al., 2015). We refer to these innate modalityspecific sensory representations as the default pattern. In the early stages of learning, we assume that no stable contextual representations have yet been formed in the brain, and therefore, a default pattern uniquely driven by external stimuli is used as the context representation. Even during intermediate stages of learning, the context selector may fail to converge to a specific state. In such context-uncertain environments, it has been reported that agents often rely on previously learned or habitual action choices (psychological inertia), which is evident in ASD patients.

      “This contextual state is set as a default context, ensuring that the X module assigns a unique contextual state to each environmental state. Biologically, one possible interpretation is that this default context corresponds to modality-specific innate representations in prefrontal regions (Manita et al., 2015).”

      “This default implementation is analogous to psychological inertia, particularly under uncertainty (Ip and Nei, 2025; Sautua, 2017), which has been reported to be more pronounced in ASD patients (Joyce et al., 2017).”

      • Why is the converter function used only in some cases?

      The converter function A(stim → context) was introduced to compose the default pattern (one-toone mappings between stimuli and contexts) as we described above. In other cases, the Hopfield dynamics were used to select contextual states; therefore, we did not use the converter function.

      • “new contextual state is prepared”

      Thank you for pointing this out.

      The term “prepared” was inaccurate. We revised it to “generated”.

      In the case of remapping, we assumed that X generates a new random neural activity pattern in its contextual domain and stores it as a new contextual state. We described this process as “a new contextual state is generated”.

      (3) Please explain the mapping between hippocampal sequences to actions in more detail for each task.

      • Why 9 attempts before rejection?

      • Why all the variations on Hebb?

      We appreciate the reviewer’s request for clarification. Below, we provide additional explanations point by point.

      Mapping between hippocampal sequences and actions

      In this research, we defined action as the transition from one environmental state to another environmental state. The hippocampal sequences predict the transition of environmental states; therefore, they correspond to a set of action plans from the current environmental state. In the revised manuscript, we added the formal definition of environmental states and actions in each task.

      • Why 9 attempts before rejection?

      These repetitions ensure adequate exploration of the contextual states in X and the episodic sequence in H before committing to an action. Increasing the number of attempts excessively causes the reward value function to be dominated by a single highest-scoring sequence, thereby causing excessive exploitation and narrowing behavioral variability. While the exact number 9 is not critical—the qualitative results are robust to moderate changes—we selected this value because it provides a good balance between exploration and exploitation and produces the clearest visualizations in our figures. We have clarified this in Method below.

      “We set the number of attempts before rejection to nine, providing a balance between exploration and exploitation and serving as a good compromise for visualization.”

      • Why all the variations on Hebbian learning?

      We consider three loci of plasticity in our model: the X module, the H module, and their reciprocal connections. Within the H module, synaptic connections that link episodic segments—specifically from transition-coding neurons to state-coding neurons—are assumed to follow a reward prediction error–dependent, supervised form of Hebbian learning. This choice reflects the need to selectively reinforce transitions that lead to successful outcomes. In contrast, all other synaptic updates in the model are assumed to follow reward-independent, activity-based Hebbian learning. These learning rules support the unsupervised formation and stabilization of contextual representations and action execution.

      In addition to the basic Hebbian rule, we introduced biologically motivated constraints, such as upper and lower bounds on synaptic weights and heterosynaptic depression, which weakens nonpotentiated synapses. Importantly, these mechanisms do not alter the fundamental nature of Hebbian learning but increase the stability of our model.

      (4) For Q learning: please clarify "the state is defined by the recent transition history of task state.

      As you suggested, we clarified the statement by adding the following sentences in Method. “To highlight the advantage of our model, we compared it to the Q-learning with temporal contexts, namely, the state is defined by the recent n-step transition history of task states (i.e. 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> =(𝑆<sub>𝑘</sub>,𝑆<sub>𝑘−1</sub>, ⋯,𝑆<sub>𝑘−𝑛</sub>)<sup>𝑇</sup> , where 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> is the temporal context state, and 𝑆<sub>𝑘</sub> is the environmental state at time 𝑘.”

      (5) What is the purpose and biological justification for the NG addition to RW?

      Thank you for raising this point. The prediction-error–based update of each sequence’s value function 𝑅 alone cannot distinguish between two fundamentally different cases:

      (a) the value of a sequence has genuinely decreased, or

      (b) the sequence remains useful, but it is just not appropriate in the current context. This distinction is essential for modeling context-dependent switching of behavioral strategies. To address this, we introduced the No-good (NG) indicator. NG allows the agent to temporarily mark certain sequences as unsuitable without altering their long-term value, thereby facilitating short-term exploration of alternative sequences. In other words, NG provides a mechanism for transiently suppressing a previously valid sequence in case of contextual changes, while preserving the underlying value learned in past experiences.

      This mechanism is consistent with several lines of biological evidence. First, extinction learning after fear conditioning does not erase the original fear memory but instead forms a new memory trace, known to be stored in the medial PFC (Milad & Quirk, 2002). This suggests that animals may switch to a different contextual representation rather than simply downgrading the value of the conditioned stimulus, supporting the idea of temporarily suppressing a sequence without modifying its intrinsic value.

      Second, recent studies in the ventral hippocampus show that dopamine D2–expressing neurons in the ventral subiculum promote exploration specifically under anxiogenic contexts (Godino et al., 2025). This finding is consistent with the short-term exploratory behavior enabled by our NG mechanism. Thus, we added the following statement to the manuscript:

      “No-good indicator is introduced to transiently suppress previously established sequences that have not been recently rewarded, without devaluing them. This no-good indicator facilitates RPEfacilitated remapping … that leads to exploration of different contextual states in X and sequences in H. The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025).”

      Together, these biological findings provide a conceptual basis for modeling NG as a contextsensitive, transient modulation that encourages exploration without overwriting previously learned sequence values.

      (6) Missing details about H network size

      Thank you for pointing it out.

      We used 300 neurons for H. We indicated it as below.

      “We model the hippocampus with an N = 300 binary recurrent neural network.”

      (7) S1 figure: learning is slower even in the early, easy phases of learning when the temporal dependence should not matter; how are learning rates calibrated across models?

      Thank you for raising this point. In our model, the learning rate was fixed at 0.15, whereas the control model (now shown in Figure S2) uses a higher learning rate of 0.4, independent of temporal context.

      Regarding why learning appears slower even in the early, easy phases, when the number of temporal contexts increases, the size of the state space expands. This broadening of the state space makes it more time-consuming to identify and reinforce the appropriate state transitions. This is especially evident in easy phases because the temporal context prepared in the model is excessive to the number of temporal contexts that the task requires.

      Importantly, unlike the control model, which postulated a fixed number of temporal contexts, our model gradually increases the number of temporal contexts depending on prediction error. This adaptive mechanism allows the model to achieve fast learning during early, easy phases while still enabling more complex learning in later phases.

      Reviewer #2 (Recommendations for the authors):

      (1) "Hippocampal neurons show sequential activity...." The authors should include more classical references for hippocampal sequential activity at this point, too.

      Thank you for your suggestion. We added the citations below

      Skaggs and McNaughton, 1996; Wilson and McNaughton, 1993

      (2) "...called remapping" also here, please reference classic work (Bostock, Muller, ...)

      As suggested, we added the citations below

      Bostock et al., 1991; Muller and Kubie, 1987

      (3) "Several theoretical models..." What I miss here are models that explain remapping by inputs from the grid cell population, and/or the LEC (see Latuske 2017 for review), still widely considered the standard mechanism. Also, the models by Stachenfeld et al. 2017, Mattar and Daw 2019, and Leibold 2020 specifically address context dependence. Accordingly, "A comprehensive model that can explain the formation of context-dependent hippocampal sequences of various lengths through remapping, while relying on a biologically plausible learning process,..." somewhat overstates the novelty of the current paper.

      Thank you for pointing this out and for suggesting relevant citations. We agree with the reviewer that inputs from MEC and LEC to the hippocampus constitute a fundamental mechanism underlying remapping. However, in our view, a key open question in the remapping field is how MEC and LEC estimate the current context and convey this information to the hippocampus in a manner that supports goal-directed behavior. While previous studies have addressed remapping at the representational level and the hippocampal sequence at planning, the overall relationship between remapping, reinforcement learning, and planning has not yet been explained within a single unified model. In this work, we propose a simple and biologically plausible model that integrates an Amari–Hopfield network for context selection with hippocampal sequences, providing an account of coordination under goal-directed behavior. To more accurately position the novelty of our contribution, we have revised the manuscript as follows.

      “While previous works have explored hippocampal sequential activity for planning (Jensen et al., 2024; Mattar and Daw, 2018; Pettersen et al., 2024; Stachenfeld et al., 2017) and hippocampal remapping for contextual inference (Low et al., 2023) separately, they have yet to elucidate how these two aspects jointly enable flexible behavior. A simple biologically plausible model-based reinforcement learning model that uses the Amari-Hopfield model for context selection and hippocampal sequences of various lengths as a state-transition model for long-horizon planning, relying on remapping driven by prediction errors to form state representation, would thus provide valuable insights into the neural mechanisms underpinning context-dependent flexible behavior.”

      (4) Please properly introduce nomenclature "C2α, C2β, S2,...." S is sometimes used for stimulus, sometimes for location (state?), or even action?

      Thank you for pointing it out. We acknowledge that the annotation of Cn (e.g., C1, C2…) was not straightforward. Therefore, we changed the annotation to Xn (e.g., X1, X2, …) in order to indicate the contextual state of X.

      We define Sn (e.g., S1, S2…) as the external input given by the environment and represented in stim. domain of X, while Xn (e.g., X1, X2…) is the subjective contextual state generated by the agent and represented in the context domain of X. As a reference, we added the neural representation of X in Figure 2D and added the following text below.

      “The neural activity of X at each contextual state is shown in Figure 2D, where the environmental states (e.g., S1, S2…) are represented in the stimulus domain, and the contextual states (e.g., X1, X2α…) are represented in the context domain.”

      (5) "Our model replicates this result by blocking the synaptic transmission from most of the neurons in the context domain of X to H (Figure 3F).". Does this mean the X module is hypothesized to be in the EC?

      Thank you for the thoughtful question. In our model, the X module is intended as a functional abstraction that combines the roles of several brain regions known to contribute to contextual representation, including the prefrontal cortex (PFC) and the entorhinal cortex (EC). Although X is not necessarily meant to correspond to a single anatomical region, we consider it likely that the contextual information represented in X would reach the hippocampus (H) (CA3 and CA1) primarily through the EC. Thus, the experimental manipulation shown in Figure 3F—suppression of medial EC axon at the hippocampus—is interpreted in our framework as weakening the input from X to H.

      We added the following texts in the Discussion section.

      “We speculate that Context selector is implemented across multiple brain regions with varying degrees of resolution, including a part of the entorhinal cortex and prefrontal cortex.”

      “Our model posits that the Sequence Composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider the CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state.”

      (6) Discussion "model-based reinforcement learning": Please detail where the model is here. In my understanding, the naive agent does not have a model (this would be model-free then?).

      Thank you for asking.

      Unlike model-free reinforcement learning, where each action is evaluated step by step, we use hippocampal sequences for multiple-step prediction and action planning. This is the “model” in our research. As you mentioned, initially, animals do not have a “model”, but Sequence composer gradually chunks the episodic segments to compose a longer sequence.

      (7) "...can change the attractor dynamics in the hippocampus (34)": What is (34)? I also would doubt that one can make such absolute statements about the human hippocampus.

      Thank you for pointing out the missing citation. We corrected it accordingly.

      Rolls E. 2021. Attractor cortical neurodynamics, schizophrenia, and depression. Transl Psychiatry 11. doi:10.1038/s41398-021-01333-7

      (8) "To the best of our knowledge, this is the first model that describes the formation of contextdependent hippocampal activity through remapping and its contribution to flexible behavior." See "Several theoretical models...".

      Thank you for pointing this out. We admit that it was an overstatement. We corrected it accordingly.

      “To the best of our knowledge, this is the first model that uses associative memory for describing the formation and switching of context-dependent hippocampal activity through remapping and its contribution to flexible behavior.”

      (9) "We speculate that the context-selection module is implemented across multiple brain regions..." How would an attractor network be implemented over "multiple brain regions"?

      We thank the reviewer for raising this important conceptual question. Context information in realistic environments is likely to have a hierarchical structure. We therefore speculate that multiple brain regions may jointly support context selection by maintaining different levels or components of this hierarchy. In particular, the prefrontal cortex (PFC), medial entorhinal cortex (MEC), and lateral entorhinal cortex (LEC) have all been implicated in representing contextual or task-state information at different levels of abstraction. These regions are known to exhibit attractor-like dynamics and to provide inputs to the hippocampus. Thus, an attractor network spanning multiple regions could arise, with different areas stabilizing distinct components of the contextual representation, depending on the timescale of memory, task demands, or sensory features.

      We used the Amari–Hopfield network as a functional abstraction to explain such multi-regional interactions underlying context representation, rather than to provide a one-to-one mapping onto a specific brain region. How region-specific attractor dynamics jointly contribute to maintaining global contextual information and enabling context switches in response to prediction errors remains an important direction for future research.

      Methods:

      (10) "... agents move through discrete environmental states characterized by distinct external stimuli.": How is this exactly implemented? What is the neural representation of these states, xi? What is the difference to a "landmark"?

      We appreciate the reviewer’s thoughtful question regarding the implementation and neural representation of environmental states. In our model, each environmental state is represented as a binary stimulus pattern provided to the stimulus-domain neurons in Context Selector. Specifically, for each state, we constructed a pattern in which half of the neurons are set to 1 and the other half to 0. We chose this design because, in the Amari–Hopfield model, memory performance is maximized when stored patterns contain approximately equal proportions of 0 and 1. For clarity, we have added an illustration of these stimulus patterns in the revised Figure 2D.

      Regarding the reviewer’s question about landmarks: in our framework, a landmark denotes an environmental state for which the contextual state is uniquely determined, regardless of the preceding transition history. For simplicity in this study, we designated the initial environmental state in each task (S0 or S1) as the landmark. Importantly, in our implementation, landmarks do not differ from other states in terms of their stimulus pattern; their special role arises solely from the task structure, not from additional sensory properties.

      In real environments, what constitutes a landmark likely varies depending on stimulus saliency and the agent’s prior experience. Determining how landmarks should be optimally defined or learned is an interesting direction for future work.

      (11) How are different contexts represented for the same stimulus xi^stim?

      We added an example of neural activity in X in Figure 2D, illustrating the distinction between the stimulus domain and the context domain. While the activity in the stimulus domain depends on the external stimulus, the contextual domain consists of uncorrelated random neural states. We exploit a key property of the Amari–Hopfield network to associate each contextual state with a given external stimulus.

      (12) "...and its stimulus domain ??stim becomes identical to ??xistim ." Does that mean every stimulus is an attractor in the context net? How can that work with only 1200 neurons? Is that realistic for real-life environments? Neuron numbers would need to increase dramatically.

      As you mentioned, we assigned each stimulus to a corresponding attractor in the Context selector (X). An Amari–Hopfield network with 1,200 neurons can store approximately 10–20 attractors, which is sufficient to solve the tasks considered in this study. We adopted the Amari–Hopfield network for its simplicity and conceptual clarity; however, in biological neural systems, it is not necessary to construct such rigid attractors for every stimulus. For example, modality-specific neural projections exist in the brain and are sometimes sufficient to form loose attractor states across different stimuli. In addition, the prefrontal cortex is known to support working memory, which may also serve as a form of contextual representation incorporating recent history. Thus, we propose that multiple brain regions cooperate to implement the Context selector.

      (13) How are WHX and WHH initialized?

      Thank you for pointing this out.

      We set the initial condition of all W to 0. We added the following text in the Method section.

      “Note that the initial synaptic weights of 𝑊<sup>𝐻𝑋</sup> and 𝑊<sup>𝑋𝐻</sup> are all 0.”

      (14) It is unclear why the hippocampus separates into state and transition neurons. Why cannot one pattern serve both purposes?

      Thank you for asking about this important point.

      The reason why we prepare two kinds of hippocampal neurons is that state-coding neurons represent the current contextual state, and transition-coding neurons predict the following contextual state under the current contextual state. These two separations enable it to predict multiple scenarios under the current contextual state and to choose a sequence most suitable in the environment.

      We rewrote the following sentences in the manuscript.

      In result section,

      “In Sequence composer, there exist two types of neurons: state-coding neurons, which represent each contextual state, and transition-coding neurons, which encode transitions to successive contextual states given the contextual state indicated by the state-coding neurons”

      In Method section,

      “The state-coding neurons receive input from 𝑋 and represent the current contextual state, while the transition-coding neurons send output to 𝑋 and predict the next contextual state after an action i.e., T(𝑋<sub>𝑘+1</sub>|𝑋<sub>𝑘</sub>,𝑎<sub>𝑘,𝑘+1</sub>).”

      (15) "the agents execute actions according to this sequence." How are the actions defined? Are they part of the state?

      We thank the reviewer for raising this important point. In our model, an action is defined as the transition from a given environmental state to the next environmental state. To avoid ambiguity, we have added a formal mathematical definition of actions for each task in the revised manuscript. In our framework, the transition-coding neurons in Sequence Composer (H) predict the upcoming environmental state, and thus the hippocampal sequence intrinsically contains the representation of an action. Consequently, the sequence generated before actions functions as the agent’s internal action planning process.

      (16) "Because the input source for the state-coding neuron and the transition coding neuron differ (the former is selected from ??, while the latter is selected from ??), the same hippocampal neuron could occasionally be used for both state-coding and transition-coding across different contextual states. This is evident when an excessive number of contextual states are prepared, especially in the SZ condition. This phenomenon degrades state estimation at X (eq.3)." I have no idea what you want to convey here, .... and how is state estimation related to Equation 3?

      We appreciate the reviewer’s feedback and agree that our original explanation was unclear. Our intention was to clarify why context estimation deteriorates specifically in the SZ condition.

      In our model, state-coding neurons in the hippocampus represent the current contextual state, and transition-coding neurons predict the next contextual state given the current contextual state. Under normal conditions, these two sets of neurons remain sufficiently distinct, allowing accurate prediction of the upcoming contextual state, which is conveyed to X. However, when an excessively large number of contextual states are stored in the SZ condition, representations in the hippocampus begin to overlap. As a result, some hippocampal neurons are inadvertently recruited for both state-coding and transition-coding across different contextual states. This overlap disrupts the H’s ability to accurately predict the next contextual state.

      This degraded prediction directly affects the state-estimation process in X (Eq.3), because Eq.3 relies on receiving an accurate predicted next state from H. When this signal becomes ambiguous, X may converge to an incorrect contextual state, potentially mimicking hallucination-like inference errors.

      We have rewritten the relevant passage in the manuscript to clarify this mechanism as follows.

      “When the number of contextual states increases - particularly in the SZ condition - representational overlap arises between hippocampal state-coding and transition-coding neurons.

      This overlap makes the prediction of the next contextual state by the transition-coding neurons unreliable. The degraded prediction from H, in turn, corrupts the initial condition for context selection in X (Eq. 3), leading to hallucination-like behavior.”

      (17) The figures hardly show simulated activity. Consider displaying more neuronal simulations to help the reader grasp the workings of the model.

      Thank you for your suggestion. We indicated the neural activity of X and H in Figures 2D and 2E, respectively, to show the overview of our model.

      (18) Figure 5: What is the "Hopfield count"?

      Thank you for pointing this out. The definition of the Hopfield count was ambiguous. We added an explicit explanation of “context selection” and its possible outcomes (correct association, hallucination-like, and default contexts) in Fig. S1. To clarify our claim, we replaced the countbased measure with the probability of selecting hallucination-like and default contexts during context selection. Accordingly, we removed the term “Hopfield count” and revised the caption of Figure 5 as follows.

      “The result of context selection (see Figure S1). The probability of wrong stimulus reconstruction (hallucination-like effects) is plotted in red, and the probability of default context usage due to failures in context reconstruction (see Materials and Methods) is plotted in blue.”

      (19) Figure 6: Consider moving this upfront.

      Thank you for the suggestion. We moved Fig.6 to Fig.S1 and introduced it earlier in the manuscript.

      Reviewer #3 (Recommendations for the authors):

      I was a bit confused about the implementation, which may not be autonomous, meaning there are numerous stages that require intervention from outside the X-H network (see Figure 6). It seems that the X network might wait to converge before providing input to H, rather than having the entire network evolve in parallel. There are also aspects to the implementation that seem rather ad hocsuch as the "no-good indicator".

      Thank you for the thoughtful comments. We would like to clarify several points regarding the implementation and its biological motivation.

      First, regarding the concern that the X–H interaction may not be fully autonomous:

      In our framework, the convergence time of the X module under external sensory input is assumed to be on the order of several hundred milliseconds, consistent with the timescale of stimulus-evoked cortical population dynamics observed in biological systems. Especially when hippocampal input is present, X does not need to explore the full attractor landscape. Instead, it quickly settles into an attractor located near the hippocampal cue, which substantially shortens the convergence time.

      Second, although our current implementation proceeds in an algorithmically sequential manner for clarity, we do not intend to imply that the brain performs these steps sequentially. Biologically, the states of X and H are expected to co-evolve and mutually constrain each other through recurrent interactions. The sequential algorithm in the model is therefore a practical choice for implementation, not a theoretical claim about strict temporal ordering in the neural system.

      Finally, the “no-good indicator” is introduced to suppress hippocampal sequences transiently and thereby accelerate switching behavior. Our no-good indicator is most consistent with the biological findings on D2-expressing neurons in the hippocampus. We added the following text below.

      About the no-good indicator

      “The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025)”

      Besides the hippocampus, similar mechanisms—temporary suppression of recently visited or lowvalue attractor states—have been proposed in biological decision-making and working-memory literature, providing conceptual support for the no-good indicator in our model.

      After exposure to a new context, a new memory/context is stored in the X network. As the storage of a new memory requires synaptic plasticity, this step would presumably take a significant amount of time in an animal.

      Thank you for raising this important point. We agree that the formation of a new memory or context requires synaptic changes, and it is well established that processes such as tagging during wakefulness and consolidation during sleep take considerable time. However, once a context has been learned, switching between contexts can be achieved just by moving between attractors in the X network. This mechanism allows for rapid, context-dependent behavior without requiring new synaptic modifications each time. Our study focuses on this aspect of fast context-dependent switching rather than the initial memory formation.

      My understanding is that the Amari-Hopfield network should be evolving in continuous time and not be binary. But there were no time constants mentioned, and the equations were not provided, and it seems that the elements of X were binary units, rather than analog. This should be clarified.

      Thank you for the comment.

      Although there are models with continuous firing rates and continuous time (Ramsauer et al., 2021), the original Amari-Hopfield model uses binary neurons operating in discrete time steps. As we answered the comments (5) and (6) from Reviewer 1, we considered only a discretely timestepped environment for which the timescale is arbitrary. At each environmental state where the current contextual state is selected, it typically takes about ten iterations for the conversion of the Amari-Hopfield network.

      In the text, we added the following text.

      “For simplicity, the environment is defined in discrete time, and agents move through environmental states characterized by distinct external stimuli.”

      Figure 3 is aimed at replicating the lap cell finding of Sun et al, 2020. In panel E, a comparison is made between the data and the model. Are the cells in the model the entire population of H neurons (state and transition), or just a subset? Does the absence of the "ghosts" (the weaker off diagonal responses seen in the experimental data) imply that the network is not encoding that it is in the same location, but a different lap? Why is there not any true sequentiality (i.e., why do all H units go on at once)?

      Thank you for your insightful comments. Throughout this study, we used 300 neurons for the Sequence composer (H); however, for simplicity, we constrained the model such that only a single H neuron was active at each time point. As a result, most other neurons remained silent. Accordingly, in Fig. 3E, we display only neurons with firing activity, and silent neurons are not shown.

      As you correctly inferred, hippocampal neurons in our model encode lap identity rather than the same physical location across laps. This design choice reflects our focus on hippocampal neurons representing contextual states, rather than place-coding neurons, as only the former contributes directly to contextual behavior in our framework. As shown in Fig. 3E, hippocampal neurons exhibit clear sequential activity with “episode-like” representations corresponding to individual laps. Nevertheless, we believe that incorporating a mixture of context-coding neurons and place-coding neurons is an important direction for future work, as illustrated in Fig. S3.

      We revised the caption of Fig. 3E as follows.

      “E, The comparison of (Left) lap cells in the hippocampus in the 4-lap task (Sun et al., 2020) and (Right) our results of active neurons in the H module.”

      Typo "but also makeS predictions".

      Thank you for pointing this out. We revised it correctly.

    1. eLife Assessment

      This is a potentially valuable modeling study on sequence generation in the hippocampus in a variety of behavioral contexts. While the scope of the model is ambitious, its presentation is incomplete and would benefit from substantially more methodological clarity and better biological justification. The work will interest the broad community of researchers studying cortical-hippocampal interactions and sequences.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Ito and Toyozumi proposes a new model for biologically plausible learning of context-dependent sequence generation, which aims to overcome the predefined contextual time horizon of previous proposals. The model includes two interacting models: an Amari-Hopfield network that infers context based on sensory cues, with new contexts stored whenever sensory predictions (generated by a second hippocampal module) deviate substantially from actual sensory experience, which then leads to hippocampal remapping. The hippocampal predictions themselves are context-dependent and sequential, relying on two functionally distinct neural subpopulations. On top of this state representation, a simple Rescola-Wagner-type rule is used to generate predictions for expected reward and to guide actions. A collection of different Hebbian learning rules at different synaptic subsets of this circuit (some reward-modulated, some purely associative, with occasional additional homeostatic competitive heterosynaptic plasticity) enables this circuit to learn state representations in a set of simple tasks known to elicit context-dependent effects.

      Strengths:

      The idea of developing a circuit-level model of model-based reinforcement learning, even if only for simple scenarios, is definitely of interest to the community. The model is novel and aims to explain a range of context-dependent effects in the remapping of hippocampal activity.

      Weaknesses:

      The link to model-based RL is formally imprecise, and the circuit-level description of the process is too algorithmic (and sometimes discrepant with known properties of hippocampus responses), so the model ends up falling in between in a way that does not fully satisfy either the computational or the biological promise. Some of the problems stem from the lack of detail and biological justification in the writing, but the loose link to biology is likely not fully addressable within the scope of the current results. The attempt at linking poor functioning of the context circuit to disease is particularly tenuous.

    3. Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments, both from the rodent and the human literature, such as splitter cells, lap cells, and the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over-/under-representation of context information.

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, and action selection. The model also nicely links ideas from reinforcement learning to neuronally interpretable mechanisms, e.g., learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be majorly improved. Judgment of generality and plausibility of the results is hampered, but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is unclear whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work in the larger field.

      More specifically:

      (1) The methods section is impenetrable. The specific adaptations of the model to the individual use cases of the model, as well as the posthoc analyses of the simulations, did not become clear. Important concepts are only defined in passing and used before they are introduced. The authors may consider a more rigorous mathematical reporting style. They also may consider making the methods part self-contained and moving it in front of the results part.

      (2) The description of results in the main text remains on a very abstract level. The authors may consider showing more simulated neural activity. It remains vague how the different stimuli and contexts are represented in the network. Particularly, the simulations and related statistical analyses underlying the paradigms in Figure 4 are incompletely described.

      (3) The literature review can be improved (laid out in the specific recommendations).

      (4) Given the large range of experimental phenomenology addressed by the manuscript, it would be helpful to add a Discussion paragraph on how much the results from mice and humans can be integrated, particularly regarding the nature of the context selection network.

      (5) As a minor point, the hippocampus is pretty much treated as a premotor network. Also, a Discussion paragraph would be helpful.

    4. Reviewer #3 (Public review):

      Summary:

      This paper develops a model to account for flexible and context-dependent behaviors, such as where the same input must generate different responses or representations depending on context. The approach is anchored in the hippocampal place cell literature. The model consists of a module X, which represents context, and a module H (hippocampus), which generates "sequences". X is a binary attractor RNN, and H appears to be a discrete binary network, which is called recurrent but seems to operate primarily in a feedforward mode. H has two types of units (those that are directly activated by context, and transition/sequence units). An input from X drives a winner-take-all activation of a single unit H_context unit, which can trigger a sequence in the H_transition units. When a new/unpredicted context arises, a new stable context in X is generated, which in turn can trigger a new sequence in H. The authors use this model to account for some experimental findings, and on a more speculative note, propose to capture key aspects of contextual processing associated with schizophrenia and autism.

      Strengths:

      Context-dependency is an important problem. And for this reason, there are many papers that address context-dependency - some of this work is cited. To the best of my knowledge, the approach of using an attractor network to represent and detect changes in context is novel and potentially valuable.

      Weaknesses:

      The paper would be stronger, however, if it were implemented in a more biologically plausible manner - e.g., in continuous rather than discrete time. Additionally, not enough information is provided to properly evaluate the paper, and most of the time, the network is treated as a black box, and we are not shown how the computations are actually being performed.

    5. Author Response:

      We appreciate the reviewers’ thoughtful assessments and constructive feedback on our manuscript. The central goal of our study was to propose a simple and biologically inspired model-based reinforcement learning (MBRL) framework that draws on mechanisms observed in episodic memory systems. Unlike model-free approaches that require processing at each state transition, our model uses sequential activity (= transition model) to predict environmental changes in the long term by leveraging episode-like representations.

      While many prior studies have focused on optimizing task performance in MBRL, our primary aim is to explore how flexible, context-dependent behavior—reminiscent of that observed in biological systems—can be instantiated using simple, neurally plausible mechanisms. In particular, we emphasize the use of an Amari-Hopfield network for the context selection module. This network, governed by Hebbian learning, forms attractors that can correct for sensory noise and facilitate associative recall, allowing dynamic separation of prediction errors due to sensory noise versus those due to contextual mismatches. However, we acknowledge that our explanation of these mechanisms, especially in relation to sensory noise, was not sufficiently developed in the current manuscript. We plan to revise the text to clarify this limitation and to expand on the implications of these mechanisms in the context of psychiatric disorder-like behaviors, as illustrated in Figure 5. Several reviewers raised concerns about the clarity of our model. Our implementation is intentionally algorithmic rather than formal, designed to provide an accessible proof-of-concept model. We will revise the manuscript to better describe the core logic of the model—namely, the bidirectional interaction between the Hopfield network (X) and the hippocampal sequence module (H), where X sends the information on estimated current context to H, and H returns a future prediction based on the episode to X. This interaction forms a loop enabling the current context estimation and its reselection.

      The key advantage of this architecture is its ability to flexibly adjust the temporal span of episodes used for inference and control, providing a potential solution to the challenge of credit assignment over variable time scales in MBRL. Because our model forms and stores the variable length of episodes depending on the context, it can handle both short-horizon and long-horizon tasks simultaneously. Moreover, because each episode is organized by context, reselecting contexts enables rapid switching between these variable timescales. This flexibility addresses a challenge in MBRL—the assignment of credit across variable time scales—without requiring explicit optimization. To better illustrate this important feature, we plan to include additional experiments in the revised manuscript that demonstrate how context-dependent modulation of episode length enhances behavioral flexibility and task performance.

      Finally, we will address the comments on the presentation and the biological grounding of our model. To improve clarity and biological relevance, we will revise the Methods section to explicitly describe how the model is grounded in mechanisms observed in real neural systems. Also, we will clarify which parts of our figures represent computational results versus schematic illustrations and more clearly explain how each model component relates to known neural mechanisms. These revisions aim to improve both clarity and accessibility for a broad audience, while reinforcing the biological relevance of our approach.

      We thank the reviewers again for their insightful comments, which will help us substantially improve the manuscript. We look forward to submitting a revised version that more clearly conveys the contributions and implications of our work.

    1. eLife Assessment

      This important paper presents a rigorous and comprehensive deep mutational analysis of the kinase TYK2, revealing how single amino acid substitutions influence protein abundance, signaling activity, and responses to pharmacological inhibitors. By combining high‑quality experimental design with dose‑response signaling assays and multiple inhibitor conditions, the authors generate a robust dataset that identifies variants across all domains of TYK2, including clusters at functionally critical sites and protein-protein interfaces. The study highlights mutations that drive drug resistance or potentiation and shows that reduced TYK2 abundance aligns with protective autoimmune‑associated variants, underscoring the therapeutic relevance of modulating TYK2 stability. Overall, the work provides compelling insights with clear implications for biochemistry, immunology, clinical genetics, and drug discovery.

    2. Reviewer #1 (Public review):

      Summary:

      In this compelling study, Howard et al. use deep mutational scanning to probe essentially all possible single amino acid substitutions in the TYK2 tyrosine kinase, and identify those that modulate signaling function and protein abundance. The methodological approach is elegant and thorough, and the results identify numerous examples of amino acid substitutions that have been previously reported to modulate TYK2 function, validating the approach.

      Substitutions that are LOF with respect to IFN-a signaling but not protein abundance are particularly interesting and are widely dispersed across the protein. They include known functionally critical sites such as the active site and activation loop of the kinase domain, as well as the allosteric site within the regulatory pseudokinase domain, but also hundreds of other additional sites. The approach is then used to study the effects of substitutions on kinase inhibition using several JAK family inhibitors that target the pseudokinase domain. By assessing variant effects at both high and low drug concentrations, they are able to identify variants that mediate resistance or conversely potentiate inhibition, respectively. These map to distinct sites on the pseudokinase domain. Finally, the authors show that several TYK2 variants, most notably the P1104A substitution, previously shown to protect against autoimmune disease, correspond to substitutions that reduce protein abundance in their screen. Combining their DMS data with autoimmune phenotype and TYK2 genotype data uncovered a general dose relationship between autoimmunity and TYK2 abundance, and the authors propose that this might justify targeting TYK2 protein levels with degraders.

      Strengths:

      This is a nicely executed, well-written study with good figures and a clear presentation.

      Weaknesses:

      The only substantial critique I have is that while the paper makes a compelling case for the validity and power of the approach, the authors could perhaps go further in their interpretation of their data, particularly with regards to identifying functionally important sites and connecting them to putative allosteric sites and functionally relevant protein-protein interfaces in the context of what is known about JAK family kinase structure and function. An attempt is made to interpret the data in light of a composite structural model of full-length TYK2 engaged with the IFNAR1 receptor (Figure 2C), but much more could be said about this. Below, I list several examples where additional insight might be gleaned.

      (1) The discussion of gain-of-function variants is limited. Given that tight regulation is a general theme of kinase signaling and gain-of-function mutations are a common disease mechanism, these mutations could be particularly interesting. Could the authors comment on patterns of gain versus loss? Are there gain-of-function signaling variants that work in a IFN-a dose dependent versus independent manner?

      (2) The discussion of the signaling-specific variants (LOF in signaling but not abundance) is interesting but could be expanded. Can the authors comment on which regions of the pseudokinase/kinase interface, for instance, are affected, since this allosteric communication is a critical and unique aspect of JAK family protein function? Can something be said about what the 6 activation loop substitutions are doing?

      (3) The cytokine signaling screen was performed at several different levels of IFN-α cytokine stimulation. The authors state that these data were used to identify quantitative variant effects (p7), but the cytokine dose response data are not widely discussed in the manuscript. Is it not possible that valuable information about the strength of substitution effects could be gleaned from this? One might expect that simple loss of function mutants that, e.g. completely destroy catalytic activity, will be LOF at all levels of stimulation, whereas mutations that have more nuanced "tuning" or allosteric effects on signaling might display LOF at low cytokine stimulation levels but be restored at high stimulation levels. Such information could be of potential functional importance and interest. Could the authors comment on this?

      (4) In general, the variant data could be interpreted more specifically in light of the available detailed structural information about TYK2 and JAK kinases generally. For instance, could the resistance versus potentiation variants be interpreted in this context to hypothesize what they might be doing?

    3. Reviewer #2 (Public review):

      Howard et al. describe a set of deep mutational scanning (DMS) experiments applied to TYK2, which is a drug target implicated in autoimmune disease. By assaying protein abundance (stability) effects as well as immune signaling, the authors are able to disentangle variant effects that may be directly involved in protein activity (and therefore potentially druggable) from variant effects that are due to loss of protein or general structural instability. By performing these assays under multiple conditions, including the presence of various concentrations of small molecules, they develop a clear picture of which sites in TYK2 may be most relevant for intervention or targeting. Overall, the work represents a very compelling example of DMS for understanding protein biology and candidate drug mechanisms.

      The work is very thorough, with multiple DMS assays described and compared/contrasted. This greatly enhances the impact and interpretability of any individual assay performed.

      The authors have made improvements to the state of the art in terms of wet-lab assay design as well as the analysis of FACS-based deep mutational scans.

      The potential mechanism of loss of protein abundance in TYK2 being protective for autoimmune disease is clear, but the estimates of the effect size in more physiologically relevant settings vary quite a bit and might be quite small. Are there examples that could be cited of other similar disease mechanisms where a 10% loss in abundance is associated with a clinical phenotype?

    4. Reviewer #3 (Public review):

      Summary:

      In the paper "Deep mutational scanning reveals pharmacologically relevant insights into TYK2 signaling and disease", the authors perform a comprehensive deep mutational scan of the kinase TYK2, a protein of pharmacological interest due to its central role in multiple immune-related phenotypes. The study assesses two key functional phenotypes: protein abundance and IFN-α-dependent signaling. The signaling assays were conducted across a dose-response range under various inhibitor conditions, allowing for an in-depth characterization of TYK2 activity and regulation. Both the experimental design and data analysis were executed with rigor and transparency, yielding a dataset that appears highly reliable. The authors provide strong evidence and a scientifically grounded interpretation of their results.

      The paper presents the results of a deep mutational scan based on two assays: an IFN-α-stimulated signaling assay and a protein abundance assay. These measurements are further supported by variant classifications from AlphaMissense and ClinVar, providing a framework for functional interpretation. Building on these data, the authors propose four potential pharmacological applications of their screening system at the end of the first results section.

      First, they demonstrate that the combined analysis of abundance and IFN-α signaling identifies potential allosteric sites, focusing on variants with normal protein stability but reduced signaling activity. Through this approach, they detect two previously uncharacterized allosteric regions (Results Section 2).

      Second, they explore how the screen can be used to predict variant-specific drug responses or resistance mechanisms (Results Section 3). This is achieved through assays involving two different inhibitors, which reveal both resistance- and potentiation-associated variants.

      Third, they assess the relative functional consequences of ligand and inhibitor dosing by performing IFN-α and inhibitor dose-response experiments (1, 10, and 100 U/mL IFN-α; IC99 and IC75 inhibitor concentrations; Results Section 3).

      Finally, the authors investigate how specific human variants, such as P1104A and I684S, may inform therapeutic modality selection (Results Section 4). Although these variants exhibit no detectable effect on IFN-α signaling within this experimental system, they substantially impact protein abundance. By integrating data from the UK Biobank, the authors further demonstrate that protective effects against autoimmune disease are associated with altered protein abundance rather than differences in IFN-α signaling, highlighting the distinct mechanistic basis of TYK2's clinical relevance.

      Strengths:

      Overall, we found this paper rigorous, well-written, and easy to follow. As such, we think this is an exceptional example of a deep mutational scanning manuscript, and this dataset will be invaluable to the field. We particularly appreciate that the authors could explore sensitivity to inhibitor concentration across multiple doses of the inhibitor.

      Weaknesses:

      Despite the authors' rigorous experimentation and thoughtful interpretation, the study leaves several important mechanistic questions unresolved, as is common in any study. While the data provide clear functional patterns, the underlying biophysical and biochemical explanations remain insufficiently explored. For instance, in point 1, the identification of two novel allosteric sites is intriguing, yet the paper does not elaborate on the structural basis or mechanistic rationale for their regulatory effects. In point 2, resistance and potentiation variants are described for two distinct inhibitors, but it remains unclear why certain variants respond specifically to one compound and not the other. In point 3, higher inhibitor concentrations appear to diminish allosteric interactions, though the reasons why some sites are affected while others are not are left unexplained. Finally, in point 4, the observation that protein abundance, but not IFN-α signaling, correlates with autoimmune protection is compelling but mechanistically ambiguous. These gaps do not detract from the technical excellence of the work; rather, they highlight opportunities for future studies to clarify the molecular and pharmacological mechanisms underlying TYK2 regulation and to deepen the translational insights drawn from this comprehensive mutational scan. We hope that the authors could provide more direction and mechanistic context in the discussion section to guide readers toward these next steps.

    5. Author response:

      We thank the reviewers for their excellent and thoughtful comments and suggestions, along with their strong support of the work. We agree with the general feedback that there is opportunity for further mechanistic dissection of the data from a variety of interesting angles. This was a fascinating project to work on because of all of the possible directions, and we attempted to highlight a diversity of compelling findings. We wish we had time to devote to answering more of the open mechanistic questions, but, given competing priorities, we are unfortunately unable to do them justice at this time. At the suggestion of a reviewer, we have made results available through MaveDB (accession numbers urn:mavedb:00001270-a and urn:mavedb:00001271-a) as a way to empower others to explore more.

    1. eLife Assessment

      The authors establish solid theoretical principles for designing brain perturbations under the assumption that brain activity evolves under a linear model. By prioritizing low-variance components, resonant frequencies, and hub nodes, this framework provides an important foundation for optimizing information gain, neural state classification, and the control of neural dynamics. However, the lack of investigation of model mismatch makes the study incomplete.

    2. Joint Public Review:

      Summary:

      Inferring so-called "functional connectivity" between neurons or groups of neurons is important both for validating models and for inferring brain state. Under the assumption that brain dynamics is linear, the authors show that the error in estimating functional connectivity depends only on the eigenvalues of the covariance matrix of the observed data, and it is the small eigenvalues -corresponding to directions in which the variance of the brain activity is low - that lead to large estimation errors. Based on this, the authors show that to achieve low estimation error, it's important to excite the resonant frequencies and perturb well-connected hubs. The authors propose a practical iterative approach to estimate the functional connectivity and demonstrate faster convergence to the optimal estimate compared to passive observation.

      Strengths:

      The main contribution of the study is the derivation of an explicit expression for the error in functional connectivity that depends only on the covariance matrix of the observed data. If valid, this result can have a profound impact on the field. The study also motivates the current shift to closed-loop experiments by demonstrating the effectiveness of active learning in the system using perturbation, in comparison to passive estimation from resting-state activity. Finally, the relative simplicity of the model makes its practical applications straightforward, as the authors illustrate in the context of brain state classification and neural control.

      Weaknesses:

      The derivation of the main error term misses some important steps, which complicates peer review at this stage. In particular, factorisation of the covariance into noise and the inverse of the observation covariance matrix needs a more thorough justification. The cited sources do not contain the derivation for a noise term with full covariance, which is essential for deriving this error term.

      The practical recommendation at the end of the paper also requires clearer guidance on how the design perturbations are constructed, and how many times and for how long the system is stimulated in each iteration of the experiment.

      Finally, there is no analysis of model mis-specification. In particular, the true dynamics are unlikely to be linear; the noise is unlikely to be either Gaussian or uncorrelated across time; and the B matrix is unlikely to be known perfectly. We're not suggesting that the authors consider a more complex model, but it's important to know how sensitive their method is to model mismatch. If nothing can be done analytically, then simulations would at least provide some kind of guide.

    3. Author response:

      We thank the editors and reviewers for their careful reading of our manuscript and for their insightful comments. We appreciate the opportunity to clarify several aspects of the derivations and experimental design, and we will revise the manuscript accordingly. Below we provide responses to the major weaknesses raised by the reviewers.

      The derivation of the main error term misses some important steps, which complicates peer review at this stage. In particular, factorisation of the covariance into noise and the inverse of the observation covariance matrix needs a more thorough justification. The cited sources do not contain the derivation for a noise term with full covariance, which is essential for deriving this error term.

      Thank you for pointing this out. We agree that the derivation of the main error term should be presented more explicitly to facilitate peer review. In the revised manuscript, we will explicitly cite the relevant equation numbers from the references to make each step of the argument easier to follow. We will also revise the text to more clearly discuss the assumption on the noise covariance matrix.

      The pratical recommendation at the end of the paper also requires clearer guidance on how the design perturbations are constructed, and how many times and for how long the system is stimulated in each iteration of the experiment.

      Thank you for this helpful suggestion. We agree that the practical implementation of the experimental design should be explained more clearly. In the revised manuscript, we will provide a more explicit description of how the input perturbations are constructed in each iteration. To more clearly explain how many times and for how long the system is stimulated, we will clarify the stopping criterion used in the iterative procedure and the time length of the external inputs. As shown in Eq. (8), the estimation error scales approximately as 1/T, so longer measurements improve accuracy. For clearer guidance, we will add additional explanations on the relation between the stimulation time and estimation accuracy, as well as on the role of iterative input design.

      Finally, there is no analysis of model mis-specification. In particular, the true dynamics are unlikely to be linear; the noise is unlikely to be either Gaussian or uncorrelated across time; and the B matrix is unlikely to be known perfectly. We're not suggesting that the authors consider a more complex model, but it's important to know how sensitive their method is to model mismatch. If nothing can be done analytically, then simulations would at least provide some kind of guide.

      We thank the reviewer for raising this important point. We agree that it is important to understand how sensitive the proposed method is to model mismatch. While our current theoretical analysis assumes linear dynamics with Gaussian noise for analytical tractability, real systems may deviate from these assumptions in several ways, including nonlinear dynamics, temporally correlated noise, or imperfect knowledge of the input matrix B. To address this concern, we will add simulation experiments to examine the robustness of our method under several types of model misspecification. These simulations will provide practical guidance on how deviations from the assumed model affect estimation performance. We will include these results and discuss their implications in the revised manuscript.

    1. eLife assessment

      This important study uses state-of-the-art, multi-region two-photon calcium imaging to characterize the statistics of functional connectivity between visual cortical neurons. Although alternative interpretations may partially account for the data, the study provides solid evidence that functionally distinct classes of neurons convey visual information via parallel channels within and across both primary and higher-order cortical areas.

    2. Reviewer #1 (Public review):

      Summary:

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons, and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Strengths:

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. To control for potential influences of behaviour-related top-down modulation of noise correlations, the manuscript uses measurements of pupil dynamics as a proxy for behavioural state and shows that this top-down modulation cannot explain the stability of noise correlations across stimuli.

      Weaknesses:

      The interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicate the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

    3. Reviewer #2 (Public review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimuli. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap, behavioral state). The paper demonstrates the robustness of the activity clustering analysis and of the activity correlation measurements. The paper shows convincingly that the correlation structure observed with grating stimuli is present in the responses to naturalistic stimuli. A simple simulation is provided that suggest that recurrent connectivity is required for the stimulus invariance of the results. The paper is well written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. A methodological issue that does not seem completely addressed is whether the calcium imaging measurements with their limited sensitivity amplify the apparent dependence of noise correlations on the similarity of tuning. Although the paper shows that noise correlation measurements are robust to changes in firing rates / missing spikes, the effects of receptive field tuning dissimilarity are not addressed directly. The calcium responses of mouse visual cortical neurons are sharply tuned. Neurons with dissimilar receptive fields may show too little overlap in their estimated firing rates to infer noise correlations, which could lead to underestimation of correlations across groups of dissimilar neurons.

    4. Reviewer #3 (Public review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons in 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.<br /> NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neurons pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights on the correlation structure of visual responses across multiple areas.

      Strengths:

      The measurements of shared variability across multiple areas are novel. The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are one of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory evoked responses (Niell et al , Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al , Neuron 2015 for a similar point).

      In the new version of the manuscript, behavioral modulations are explicitly considered in Figure S8. New analyses show that most of the variance of the neuronal responses is driven by the stimulus, rather than by behavioural variable. However, they new analyses still do not address if the shared noise correlation in cotuned neurons is also independent of behavioral modulations .

      As behavioral modulations are not considered this confound affects the conclusions and the conclusion that activity in communicated unmixed across areas ( results in Figure 4), as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain the results without the need of discrete broadcasting channels or any particular network architecture and should be addressed to support the main claims.

      (2) Discrete vs continuous communication channels<br /> (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels, as stated in teh title of the paper. This discreteness is based on an unbiased clustering approach on the tuning of neurons, followed by a manual grouping into six categories with relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

    5. Author response:

      The following is the authors’ response to the original reviews.

      General Response

      We are grateful for the constructive comments from reviewers and the editor.

      The main point converged on a potential alternative interpretation that top-down modulation to the visual cortex may be contributing to the NC connectivity we observed. For this revision, we address that point with new analysis in Fig. S8 and Fig. 6. These results indicate that top-down modulation does not account for the observed NC connectivity.

      We performed the following analyses.

      (1) In a subset of experiments, we recorded pupil dynamics while the mice were engaged in a passive visual stimulation experiment (Fig. S8A). We found that pupil dynamics, which indicate the arousal state of the animal, explained only 3% of the variance of neural dynamics. This is significantly smaller than the contribution of sensory stimuli and the activity of the surrounding neuronal population (Fig. S8B). In particular, the visual stimulus itself typically accounted for 10-fold more variance than pupil dynamics (Fig. S8C). This suggests that the population neural activity is highly stimulus-driven and that a large portion of functional connectivity is independent of top-down modulation. In addition, after subtracting the neural activity from the pupil-modulated portion, the cross-stimulus stability of the NC was preserved (Fig. S8D).

      We note that the contribution from pupil dynamics to neural activity in this study is smaller than what was observed in an earlier study (Stringer et al. 2019 Science). That can be because mice were in quiet wakefulness in the current study, while mice were in spontaneous locomotion in the earlier study. We discuss this discrepancy in the main text, in the subsection “Functional connectivity is not explained by the arousal state”.

      (2) We performed network simulations with top-down input (Fig. 6F-H). With multidimensional top-down input comparable to the experimental data, recurrent connections within the network are necessary to generate cross-stimulus stable NC connectivity (Fig. 6G). It took increasing the contribution from the top-down input (i.e., to more than 1/3 of the contribution from the stimulus), before the cross-stimulus NC connectivity can be generated by the top-down modulation (Fig. 6H). Thus, this analysis provides further evidence that top-down modulation was not playing a major role in the NC connectivity we observed.

      These new results support our original conclusion that network connectivity is the principal mechanism underlying the stability of functional networks.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across the mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. However, the interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicates the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

      Behavioral modulation can influence the gain of sensory-evoked responses (Niell and Stryker, Neuron, 2010). This can explain why signal correlation is one of the best predictors of noise correlations as reported in the manuscript. A pair of neurons that are similarly gain-modulated by spontaneous behavior (e.g. both active during whisking or locomotion) will have higher noise correlations if they respond to similar stimuli. Top-down modulation by the behavioral state is also consistent with the stability of noise correlations across stimuli. Therefore, it is important to determine to what extent noise correlations can be explained by shared behavioral modulation.

      We thank the reviewer for the constructive and positive feedback on our study.

      The reviewer acknowledged the quality of our experiments and analysis and stated a concern that the noise correlation can be explained by top-down modulation. We have addressed this concern carefully in the revision, please see the General Response above.

      Reviewer #2 (Public Review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over a millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of the visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimulation. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap). The paper convincingly demonstrates the robustness of the clustering analysis and of the activity correlation measurements. The calcium imaging results convincingly show that noise correlations are correlated across visual stimuli and are strongest within cell classes which could reflect distributed visual channels. A simple simulation is provided that suggests that recurrent connectivity is required for the stimulus invariance of the results. The paper is well-written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. The modeling results presented, however, suggest interestingly that a simple feedforward architecture may not account for fundamental characteristics of the data. A limitation of the study is the lack of a behavioral task. The paper shows nicely that the correlation structure generalizes across visual stimuli. However, the correlation structure could differ widely when animals are actively responding to visual stimuli. I do think that, because of the complexity involved, a characterization of correlations during a visual task is beyond the scope of the current study.

      An important question that does not seem addressed (but it is addressed indirectly, I could be mistaken) is the extent to which it is possible to obtain reliable measurements of noise correlation from cell pairs that have widely distinct tuning. L2/3 activity in the visual cortex is quite sparse. The cell groups laid out in Figure S2 have very sharp tuning. Cells whose tuning does not overlap may not yield significant trial-to-trial correlations because they do not show significant responses to the same set of stimuli, if at all any time. Could this bias the noise correlation measurements or explain some of the dependence of the observed noise correlations on signal correlations/similarity of tuning? Could the variable overlap in the responses to visual responses explain the dependence of correlations on cell classes and groups?

      With electrophysiology, this issue is less of a problem because many if not most neurons will show some activity in response to suboptimal stimuli. For the present study which uses calcium imaging together with deconvolution, some of the activity may not be visible to the experimenters. The correlation measure is shown to be robust to changes in firing rates due to missing spikes. However, the degree of overlap of responses between cell pairs and their consequences for measures of noise correlations are not explored.

      Beyond that comment, the remaining issues are relatively minor issues related to manuscript text, figures, and statistical analyses. There are typos left in the manuscript. Some of the methodological details and results of statistical testing also seem to be missing. Some of the visuals and analyses chosen to examine the data (e.g., box plots) may not be the most effective in highlighting differences across groups. If addressed, this would make a very strong paper.

      We thank the reviewer for acknowledging the contributions of our study.

      We agree with the reviewer that future studies on behaviorally engaged animals are necessary. Although we also agree with the reviewer that behavior studies are out the scope of the current manuscript, we have included additional analysis and discussion on whether and how top-down input would affect the NC connectivity in the revision. Please see the General Response above.

      Reviewer #3 (Public Review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons into 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.

      NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neuron pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights into the correlation structure of visual responses across multiple areas.

      Strengths:

      The study uses state-of-the art mesoscopic two-photon imaging.

      The measurements of shared variability across multiple areas are novel.

      The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra-class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory-evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are some of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory-evoked responses (Niell et al, Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al, Neuron 2015 for a similar point).<br /> As behavioral modulations are not considered, this confound affects most of the conclusions of the manuscript, as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain most of the results without the need for discrete broadcasting channels or any particular network architecture and should be addressed to support its main claims.

      (1b) In Figure 5 the observations are interpreted as evidence for NCs reflecting features of the network architecture, as NCs measured using gratings predicted NC to naturalistic videos. However, it seems from Figure 5 A that signal correlations (SCs) from gratings had non-zero correlations with SCs during naturalistic videos (is this the case?). Thus, neurons that are cotuned to gratings might also tend to be coactivated during the presentation of videos. In this case, they are also expected to be susceptible to shared behaviorally driven fluctuations, independently of any circuit architecture as explained before. This alternative interpretation should be addressed before concluding that these measurements reflect connectivity features.

      We thank the reviewer for acknowledging the contributions of our study.

      The reviewer suggested that gain modulation might be interfering with the interpretation of the NC connectivity. We have addressed this issue in the General Response above.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      (2) Discrete vs continuous communication channels

      (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels. This discreteness is based on an unbiased clustering approach to the tuning of neurons, followed by a manual grouping into six categories in relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      (2b) Consequently, I feel the support for discrete vs continuous selective communication is rather inconclusive. It seems that following the author's claims, it would be important to establish if neurons belong to the same groups, rather than tuning similarity is a defining feature for showing large NCs.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

      We have addressed this issue in the General Response above and the response to comment (1).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      A general recommendation discussed with the reviewers is to make use of behavioural recording to assess whether shared behaviourally driven modulations can explain the observed relation between SC and NC, independently of the network architecture. Alternatively, a simulation or model might also address this point as well as the possibility that the relation of SC and NC might be also independent of network architecture given the sparseness of the sensory responses in L2/3.

      We have addressed this in the General Response above.

      Broadly speaking, inferring network architecture based on NCs is extremely challenging. Consequently, the study could also be substantially improved by reframing the results in terms of distributed co-active ensembles without insinuation of direct anatomical connectivity between them.

      We agree that the inferring network architecture based on NCs is challenging. The current study has revealed some principles of functional networks measured by NCs, and we showed that cross-stimulus NC connectivity provides effective constraints to network modeling. We are explicit about the nature of NCs in the manuscript. For example, in the Abstract, we write “to measure correlated variability (i.e., noise correlations, NCs)”, and in the Introduction, we write “NCs are due to connectivity (direct or indirect connectivity between the neurons, and/or shared input)”. We are following conventions in the field (e.g., Sporns 2016; Cohen and Kohn 2011).

      Notice also that the abstract or title should make clear that the study was made in mice.

      Sorry for the confusion, we now clearly state the study was carried out in mice in the Abstract and Introduction.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript presents a meticulous characterization of noise correlations in the visual cortical network. However, as I outline in the public review, I think the use of noise correlations to infer communication channels is problematic and I urge the authors to carefully consider this terminology. Language such as "strength of connections" (Figure 4D) should be avoided.

      We now state in the figure legend that the plot in Fig. 4D shows the average NC value.

      My general suggestion to the authors, which primarily concerns the interpretation of analyses in Figures 4-6, is to consider the possible impact of shared top-down modulation on noise correlations. If behavioral data was recorded simultaneously (e.g. using cameras to record face and body movements), behavioral modulation should be considered alongside signal correlation as a possible factor influencing NCs.

      We have addressed this issue in the General Response above.

      I may be misunderstanding the analysis in Figure 4C but it appears circular. If the fraction of neurons belonging to a particular tuning group is larger, then the number of in-group high NC pairs will be higher for that group even if high NC pairs are distributed randomly. Can you please clarify? I frankly do not understand the analysis in Figure 4D and it is unclear to me how the analyses in Figure 4C-D address the hypotheses depicted in the cartoons.

      Sorry for the confusion, we have clarified this in the Fig. 4 legend.

      Each HVA has a SFTF bias (Fig. 1E,F; Marshel et al., 2011; Andermann et al., 2011; Vries et al., 2020). Each red marker on the graph in Fig. 4C is a single V1-HVA pair (blue markers are within an area) for a particular SFTF group (Fig. 1). The x-axis indicates the number of high NC pairs in the SFTF group in the V1-HVA pair divided by the total number of high NC pairs per that V1-HVA pair (summed over all SFTF groups). The trend is that for HVAs with a bias towards a particular SFTF group, there are also more high NC pairs in that SFTF group, and thus it is consistent with the model on the right side. This is not circular because it is possible to have a SFTF bias in an HVA and have uniformly low NCs. The reviewer is correct that a random distribution of high NCs could give a similar effect, which is still consistent with the model: that the number of high NC pairs (and not their specific magnitudes) can account for SFTF biases in HVAs.

      To contrast with that model, we tested whether the average NC value for each tuning group varies. That is, can a small number of very high NCs account for SFTF biases in HVAs? That is what is examined in Fig. 4D. We found that the average NC value does not account for the SFTF biases. Thus, the SFTF biases were not related to the modulation in NC (i.e., functional connection strength). 

      I found the discussion section quite odd and did not understand the relevance of the discussion of the coefficient of variation of various quantities to the present manuscript. It would be more useful to discuss the limitations and possible interpretations of noise correlation measurements in more detail.

      We have revised the discussion section to focus on interpreting the results of the current study and comparing them with those of previous studies.

      Figure 3B: please indicate what the different colors mean - I assume it is the same as Figure 3A but it is unclear.

      We added text to the legend for clarification.

      Typos: Page 7: "direct/indirection wiring", Page 11: "pooled over all texted areas"

      We have fixed the typos.

      Reviewer #2 (Recommendations For The Authors):

      The significance of the results feels like it could be articulated better. The main conclusion is that V1 to HVA connections avoid mixing channels and send distinctly tuned information along distinct channels - a more explicit description of what this functional network understanding adds would be useful to the reader.

      Thanks for the suggestion. We have edited the introduction section and the discussion section to make the take-home message more clear.

      Previous studies with anatomical data already indicate distinctly tuned channels - several of which the authors cite - although inconsistently:

      • Kim et al 2018 https://doi.org/10.1016/j.neuron.2018.10.023

      • Glickfeld et al., 2013 (cited)

      • Han et al., 2022 (cited)

      • Han and Bonin 2023 (cited)

      Thanks for the suggestion, we now cite the Kim et al. 2018 paper.

      I think the information you provide is valuable - but the value should be more clearly spelled out - This section from the end of the discussion for example feels like abdicates that responsibility:<br /> "In summary, mesoscale two-photon imaging techniques open up the window of cellular-resolution functional connectivity at the system level. How to make use of the knowledge of functional connectivity remains unclear, given that functional connectivity provides important constraints on population neuron behavior."

      A discussion of how the results relate to previous studies and a section on the limitations of the study seems warranted.

      Thanks for the suggestion, we have extensively edited the discussion section to make the take-home message clear and discuss prior studies and limitations of the present study.

      Details:

      Analyses or simulations showing that the dependency of correlations on similarity of tuning is not an artifact of how the data was acquired is in my mind missing and if that is the case it is crucial that this be addressed.

      At each step of data analysis, we performed control analysis to assess the fidelity of the conclusion. For example, on the spike train inference (Fig. S4), GMM clustering (Fig. S1), and noise correlation analysis (Figs. 2, S5).

      None of the statistical testing seems to use animals as experimental units (instead of neurons). This could over-inflate the significance of the results. Wherever applicable and possible, I would recommend using hierarchical bootstrap for testing or showing that the differences observed are reproducible across animals.

      We analyzed the tuning selectivity of HVAs (Fig. 1F) using experimental units, rather than neurons. It is very difficult to observe all tuning classes in each experiment, so pooling neurons across animals is necessary for much of the analysis. We do take care to avoid overstating statistical results, and we show the data points in most figure to give the reader an impression of the distributions.

      Page 2. "The number of neurons belonged to the six tuning groups combined: V1, 5373; LM, 1316; AL, 656; PM, 491; LI, 334." Yet the total recorded number of neurons is 17,990. How neurons were excluded is mentioned in Methods but it should be stated more explicitly in Results.

      We have added text in the Fig. 1 legend to direct the audience to the Methods section for information on the exclusion / inclusion criteria.

      Figure 1C, left. I don't understand how correlation is the best way to quantify the consistency of class center with a subset of data. Why not use for example as the mean square error. The logic underlying this analysis is not explained in Methods.

      Sorry for the confusion, we have clarified this in the Methods section.

      We measured the consistency of the centers of the Gaussian clusters, which are 45-dimensional vectors in the PC dimensions. We measured the Pearson correlation of Gaussian center vectors independently defined by GMM clustering on random subsets of neurons. We found the center of the Gaussian profile of each class was consistent (Fig. 1C). The same class of different GMMs was identified by matching the center of the class.

      Figure 1E. There are statements in the text about cell groups being more represented in certain visual areas. These differences are not well represented in the box plots. Can't the individual data points be plotted? I have also not found the description and results of statistical testing for these data.

      We have replotted the figure (now Fig. 1F) with dot scatters which show all of the individual experiments.

      Figure 2A, right, since these are paired data, I am not quite sure why only marginal distributions are shown. It would be interesting to know the distributions of correlations that are significant.

      This is only for illustration showing that NCs are measurable and significantly different from zero or shuffled controls. The distribution of NCs is broad and has both positive and negative values. We are not using this for downstream analysis.

      Figure 4A, I wonder if it would not be better to concentrate on significant correlations.

      We focused on large correlation values rather than significant values because we wanted to examine the structure of “strongly connected” neuron pairs. Negative and small correlation values can be significant as well. Focusing on large values would allow us to generate a clear interpretation.  

      Figure 4B, 'Mean strength of connections' which I presume mean correlations is not defined anywhere that I can see.

      I believe the reviewer means Fig. 4D. It means the average NC value. We have edited the figure legend to add clarity.

      Figure 4F, a few words explaining how to understand the correlation matrix in text or captions would be helpful.

      Sorry for the confusion, we have clarified this part in figure legend for Fig. 4F.

      Page 5, right column: Incomplete sentence: "To determine whether it is the number of high NC pairs or the magnitude of the NCs,".

      We have edited this sentence.

      Page 5, right column: "Prior findings from studies of axonal projections from V1 to HVAs indicated that the number of SF-TF-specific boutons -rather than the strength of boutons- contribute to the SF-TF biases among HVAs (Glickfeld et al., 2013)." Glickfeld et al. also reported that boutons with tuning matched to the target area showed stronger peak dF/F responses.

      Thank you. We have revised this part accordingly.

      Page 9, the Discussion and Figure 7 which situates the study results in a broader context is welcome and interesting, but I have the feeling that more words should be spent explaining the figure and conceptual framework to a non-expert audience. I am a bit at a loss about how to read the information in the figure.

      Sorry for the confusion, we have added an explanation about this section (page 10, right column).

      As far as I can see, data availability is not addressed in the manuscript. The data, code to analyze the data and generate the figures, and simulation code should be made available in a permanent public repository. This includes data for visual area mapping, calcium imaging data, and any data accessory to the experiments.

      We have stated in the manuscript that code and data are available upon request. We regularly share data with no conditions (e.g., no entitlement to authorship), and we often do so even prior to publication.

      The sex of the mice should be indicated in Figure T1.

      The sex of the mice was mixed. This is stated in the Methods section.

      Methods:

      Section on statistical testing, computation of explained variance missing, etc. I feel many analyses are not thoroughly described.

      Sorry for the confusion, we have improved our method section.

      Signal correlation (similarity between two neurons' average responses to stimuli) and its relation to noise correlation is not formally defined.

      We have included the definition of signal correlation in the Methods.

      Number of visual stimulation trials is not stated in Methods. Only stated figure caption.

      The number of visual stimulus trials is provided in the last paragraph of the Methods section (Visual Stimuli).

      Fix typos: incorrect spelling, punctuation, and missing symbols (e.g. closing parentheses).

      We have carefully examined the spelling, punctuation, and grammar. We have corrected errors and we hope that none remain.

      Why use intrinsic imaging to locate retinotopic boundaries in mice already expressing GCaMP6s?

      We agree with the reviewer that calcium imaging of visual cortex can be used to identify the visual cortex.

      It is true that areas can be mapped using the GCaMP signals. That is not our preferred approach. Using intrinsic imaging to define the boundary between V1 and HVAs has been a well refined routine in our lab for over a decade. It is part of our standard protocol. One advantage is that the data (from intrinsic signals) is of the same nature every time. This enables us to use the same mapping procedure no matter what reporters mice might be expressing (and the pattern, e.g., patchy or restricted to certain cell types).

      Reviewer #3 (Recommendations For The Authors):

      The possibilty that larger intra-group NCs observed simply reflect a multiplicative gain on cotuned neurons could be addressed using pupil and/or face recordings: Does pupil size or facial motion predict NCs and if factored out, does signal correlation still predict NCs?

      Perhaps a variant of the network model presented in Figure 6 with multiplicative gain could also be tested to investigate these issues.

      We have addressed this issue in general response.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      Similarly further analyses can be done to strengthen support for the claims that the observed NCs reflect discrete communication channels. A direct test of continuous vs categorical channels would strengthen the conclusions. One possible analysis would be to compare pairs with similar tuning (same SC) belonging to the same or different groups.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      I also found many places where the manuscript needs clarification and /or more methodological details:<br /> • How many times was each of the stimulus conditions repeated? And how many times for the two naturalistic videos? What was the total duration of the experiments?

      The number of visual stimulus trials is provided in the last paragraph of the Methods section entitled Visual Stimuli. About 15 trials were recorded for each drifting grating stimulus, and about 20 trials were recorded for each naturalistic video.

      • Typo: Suit2p should be Suite2p (section Calcium image processing - Methods).

      We have fixed the typo.

      • What do the error bars in Figure 1E represent? Differences in group representation across areas from Figure 1E are mentioned in the text without any statistical testing.

      We have revised the Figure 1E (current Fig. 1F), and we now show all data points.

      • The manuscript would benefit from a comparison of the observed area-specific tuning biases across areas (Figure 1E and others) with the previous literature.

      We have included additional discussion on this in the last paragraph of the section entitled Visual cortical neurons form six tuning groups.

      • Why are inferred spike trains used to calculate NCs? Why can't dF/F be used? Do the results differ when using dF/F to calculate NC? Please clarify in the text.

      We believe inferred spike trains provide better resolution and make it easier to compare with quantitative values from electrical recordings. Notice that NC values computed using dF/F can be much larger than those computed by inferred spike trains. For example, see Smith & Hausser 2010 Nat Neurosci. Supplementary Figure S8.

      • The sentence seems incomplete or unclear: "That is, there are more high NC pairs that are in-group." Explicit vs what?

      We have revised this sentence.

      • Figure 1E is unclear to me. What is being plotted? Please add a color bar with the metric and the units for the matrix (left) and in the tuning curves (right panels). If the Y and X axes represent the different classes from the GMM, why are there more than 65 rows? Why is the matrix not full?

      We have revised this figure. Fig. 1D is the full 65 x 65 matrix. Fig. 1F has small 3x3 matrices mapping the responses to different TF and SF of gratings. We hope the new version is clearer.

      • How are receptive fields defined? How are their long and short axes calculated? How are their limits defined when calculating RF overlap?

      We have added further details in the Methods section entitled “Receptive field analysis”.

    1. eLife Assessment

      This useful study presents a simple homeostatic-plasticity model in spiking E-I networks to link spontaneous critical dynamics with representational drift and relatively stable stimulus-response geometry in mouse visual cortex. However, the evidence is incomplete because key concepts and analysis details are not well defined, controls are limited, and several results might be the result of specific methodological choices (e.g., dimensionality reduction, aggregation, or tuned parameters) rather than a robust mechanism. As a result, the work currently supports an interesting correlation between these phenomena, but not a clear causal account.

    2. Reviewer #1 (Public review):

      Summary:

      The authors study criticality and drift in spontaneous activity observed in visual cortex of mice from existing data, and relate it to a model based on homeostatic plasticity. The main phenomena are power laws and an alignment across different neural representations that is maintained through drift.

      Strengths:

      The authors should be commended by making the effort of relating their model to experimental data. The mechanism that they propose has the advantage of being simple, and could unify various phenomena.

      Weaknesses:

      Introduction/abstract: General wording: the notion of reliability, which is key to the paper is not explicitly defined anywhere. The authors refer to some notion of information being preserved, but again, this is not clearly explained. A good example is the sentence "identical input signals exhibit significant variability but also share certain reliability across sessions". Depending on the definition of reliability, the sentence could be a contradiction. A similar issue appears when the authors talk about "restricted" representation. I get what they want to say, but it's not properly defined. "One example is the recent studies about stimulus-evoked..." The sentence explains that there are examples, but provides no citations! Also "One" and "exampleS"

      Fig. 1: - The method to fit the power law is not detailed in the methods (just a vague reference to a package). This is a problem because some methods like least squares don't do well on power laws, and particularly for neuroscience due to low sampling (Wilting & Priesemann, Nat com.). - The "olive" curve is not "olive". Olive is dark green, and the color is purple. The problem appears in the subsequent figure.

      Fig. 2: - The number of neurons is very small (19). This is very odd, since the original dataset has a lot of neurons. Also, the authors seem to pick age 97 and 102, but do not explain why those two points have any relevance. - If you run a correlation you need to explain what is the correlation (pearson, spearman?). It also matters where the variables are normalized or not, and there is no control for shuffling. - The authors mention "low dimensional", but don't explain what method they use (looks t-SNE to me). - The authors use the word "signal" while in the text they refer to the "mean activity". Are those the same? - "We reproduced previous results showing that low-dimensional embeddings of mean population response vectors for different signals remain similar across sessions" The blue and green clusters that the authors report as being close across sessions are not close. Red-green-grey seem to remain closer, but even that is quite a stretch. - Correlation across matrices is strange. Since the authors did not clarify the actual formula or method, the correlation of 0.5 in Fig. 2E could be simply due to the fact that all the variables are pre-selected to be positive (or above threshold). This would also have an important effect on the angle (Fig. G). In fact, it would explain how comes that the correlation does not decrease with Delta T (which is what would be expected from drift. - Whenever the authors run a statistical analysis, it would help to run a shuffled control.

      Self-organised criticality emerges through homeostatic plasticity. - The authors refer a lot to reference 35, but it's not clear what is the difference between their work and that one. - The text provides a general overview and refers to the methods for details. Since most of the results are based on that mode, I suggest putting it in the main text (although this is an opinion, not a dealbreaker). - Especially, mention which populations are we talking about, what are the numbers of neurons in each, and how are they connected.

      • Fig. 4 has a lot of the same weaknesses as Fig. 2. In fact, the results on E are very similar, despite the fact that the matrices in D are clearly not the same.

      Enhanced Neural representation through self-organised criticality The phase transition seems to be an observation over a computational model, but I don't see much analysis. It would be nice to have some order parameter, although the plots are convincing without it. The authors do spend time talking about co-spiking and silent periods though, but don't actually plot this. The only reference is to S4, which actually only seems to cover the super-critical state.

      Fig 6: - It might be true that the accuracy peaks at the critical point, but it's really hard to call it significant. The authors should run multiple models and assess significance. - I don't entirely see the point of C. What does it mean for the model? And although I assume it is on the same experimental data, the authors do not mention it.

      Fig. 7: - Plot is squeezed, and has low resolution. - Since the authors didn't clarify whether they have II connections or not (some models use them, some don't), or whether their plasticity applies to inhibitory neurons, it is very hard to assess what are the differences between A and B.

      References: There are a fair amount of works that studied computational models for criticality. I am particularly thinking of the works of Bruno del Papa "Criticality meets learning: Criticality signatures in a self-organizing recurrent neural network". Experimentally, there are works showing that the so-called spontaneous activity is actually very reliable (if you record enough neurons). Nghia et al. "Nguyen, Nghia D., et al. "Cortical reactivations predict future sensory responses." Nature 625.7993 (2024): 110-118."

      An important point missing in this work is that it assumes that spontaneous activity is somehow intrinsically generated. This is not necessarily true of cortical areas (where it could easily come from hippocampus).

    3. Reviewer #2 (Public review):

      This work attempts to reconcile the concepts of critical neural dynamics with short-term reliable responses and long-term drifting responses. This is an important question, because critical dynamics are typically associated with unpredictable population responses to perturbations. Instead, this paper demonstrates that recordings from the mouse visual cortex include typical avalanche statistics in their spontaneous state as well as clustered within-session responses to natural movies. The authors find that a spiking neural network with homeostatic plasticity on inhibitory coupling captures the correlation-based metrics observed in experiments and that this network self-organizes into a critical state.

      Strengths:

      The structure of the manuscript is clear, and the line of argumentation is easy to follow. The question raised is valid, and the model employed to answer it is adequate. While I am unsure if representation should be equated with reliable responses, I find the framework of reliable responses well-suited to compare experimental and numerical data.

      Weaknesses:

      • The claim that the presented model "self-organizes to the critical spontaneous state" is incompatible with Fig. 6 showing that the inhibitory timescale is a control parameter of the transition from subcritical to supercritical avalanche statistics.

      • The notion of "drift" implies to me a gradual change on long timescales. This is demonstrated in Ref. [47] for a model including two different types of plasticity. Also, such a drift over time was observed in Ref. [11] Fig.3C. In the present work, we can see from Fig. 2E that the correlation drops immediately to a plateau. Instead, the model actually shows some decay of correlations, expected from the ongoing plasticity. This challenges the claim that the "model successfully reproduce[s] both representational drift and [...]". Instead, the model of [47] does reproduce representation drift.

      • The claim that "spontaneous self-organized criticality serves as [...] functional mechanism for maintaining reliable information representation under continuously changing networks" is not justified by the above-raised points.

      • From the methods, I understand that the dimensionality reduction in Fig.2C and Fig.4C is a result of independent t-SNE. Since t-SNE to my knowledge starts with a random projection of data to then optimize the embedding, the resulting orientation of independent runs cannot be compared such that statements like "rotation of low-dimensional representations as in Fig. 2C, where nodes (centers of the same-color clusters) change their positions across sessions (top panel and bottom panel), but their relative positions remain stable" are not possible.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses computational modeling of a spiking network of E-I with homeostatic inhibitory plasticity and aims to show that self-organized criticality that arises from the homeostatic mechanism can result in representational drift as well as reliable stimulus representation, because the geometric representation of stimuli remains restricted.

      Strengths:

      This paper provides a framework to link critical spontaneous state, homeostatic inhibitory plasticity, representational drift, and stimulus population response reliability

      Weaknesses:

      The study does not show a causal (or necessary/ sufficient) relationship between criticality at the spontaneous state, representational drift, and reliable stimulus presentation. The study only reports an observation that these features could co-exist. However, it does not show how the criticality of the spontaneous state could restrict the manifold for stimulus response.

    1. eLife Assessment

      Using a combination of innovative and robust techniques, this study outlines cell-type-specific translational landscape changes that occur in the spinal cord neurons in the early and late phases of nerve injury. The authors provided compelling evidence suggesting an essential role of protein synthesis regulation in the chronic phase of neuropathic pain. Although additional mechanisms contributing to late-phase neuropathic pain beyond altered PV+ neuron excitability remain to be elucidated, this is a fundamental and significant study toward a comprehensive understanding of the molecular pathways involved in neuropathic pain.

    2. Reviewer #2 (Public review):

      Summary:

      This manuscript compares transcription and translation in the spinal cord during the acute and chronic phases of neuropathic pain induced by surgical nerve injury. The authors chose to focus their investigation on translation in the chronic phase due to its greater impact on gene expression in the spinal cord compared to transcription.

      (1) The study is significant because the molecular mechanisms underlying chronic pain remain elusive. The role of translational regulation in the spinal cord has not been investigated in neuroplasticity and chronic pain mouse models. The manuscript is innovative and technically robust. The authors employed several cutting-edge techniques such as Rio-seq, TRAP-seq, slice electrophysiology, and viral approaches. Despite the technical complexity, the manuscript is well-written. The authors demonstrated that inhibition of eIF4E alleviates pain hypersensitivity, that de novo protein synthesis is more pronounced in inhibitory interneurons, and that manipulating mTOR-eIF4E pathways alters mechanical sensitivity and neuroplasticity.

      (2) Strengths: innovation (conceptual and technical levels), data support the conclusions.

      Comments on revisions:

      The authors did a great job addressing my comments.

    3. Reviewer #4 (Public review):

      Summary:

      The significance of this study lies in its focus on translational regulation in the late phase of neuropathic pain, using both genetic and pharmacological approaches, with specific emphasis on parvalbumin-positive (PV⁺) inhibitory interneurons in the spinal cord. The authors are very responsive to all the reviewers' comments.

      Strengths:

      I did not review this manuscript in the first round. However, the authors have been highly responsive to the reviewers' comments and have substantially strengthened the study. They conducted new behavioral experiments that yielded informative negative results (Fig. 6A and 6B). These findings demonstrate that targeting translational control in PV neurons is sufficient to reverse SNI-induced reductions in PV neuron excitability, but insufficient to ameliorate behavioral phenotypes. This suggests that additional cell types and pathways contribute to late-phase neuropathic pain.

      Weaknesses:

      Only the withdrawal threshold was measured to assess neuropathic pain. Some studies only used female mice. However, the authors appropriately discuss the study's limitations in the final two paragraphs and have added experimental details to improve clarity. Overall, the manuscript has been significantly improved.

    4. Reviewer #5 (Public review):

      Summary:

      This study investigates the molecular mechanisms underlying the maintenance of neuropathic pain, specifically focusing on the role of mRNA translation in the spinal cord. Using the Spared Nerve Injury (SNI) model, the authors demonstrate that while both transcription and translation are active in the early phase, the chronic phase (day 63) is uniquely characterized by a shift toward translational control. They identify spinal inhibitory neurons, particularly parvalbumin-positive interneurons, as key sites of this translational regulation.

      Strengths:

      Technical Rigor: The use of Ribo-seq and TRAP-seq allows for a high-resolution view of the "translatome," which more accurately reflects the functional protein output than standard mRNA-seq.Novelty: The study uncovers that reducing a single translation initiation factor (eIF4E) specifically in the CNS is sufficient to provide long-lasting relief from established chronic pain.Addressing Disinhibition: The electrophysiological evidence showing that increased translation in PV+ neurons reduces their excitability provides a clear mechanism for the "spinal disinhibition" typically seen in chronic pain.

      Weaknesses:

      Cell-Type Sufficiency: New experiments in the revision show that while inhibiting translation in PV+neurons restores their individual excitability, it is not sufficient on its own to reverse behavioral pain hypersensitivity. This suggests that the maintenance of chronic pain likely involves translational changes across a broader network of cell types, including other inhibitory neurons or non-neuronal cells like microglia. -This does not have to be resolved in the current study, but providing some framework to account for potential mechanisms might help the audience.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigated the role of transcriptional and translational controls of gene expression in dorsal root ganglia and lumbar spinal cord in neuropathic pain in mice. Using ribosome profiling (Ribo-seq) and translating ribosome affinity purification (TRAP), they show changes in transcriptomic and translational gene expression at the peripheral and central levels rapidly after nerve injury. While translational changes in gene expression remained elevated for more than two months in both DRGs and the spinal cord, transcriptomic regulation was absent in the spinal cord long after the onset of neuropathy. Disrupting mRNA translation in dorsal horn neurons using antisense oligonucleotides reduced mechanical withdrawal threshold and facial expression of pain. Using fluorescent noncanonical amino acid tagging (FUNCAT), the authors further show that de novo protein expression primarily occurs in inhibitory neurons in the superficial dorsal horn after nerve injury. Accordingly, a selective increase in translational control of gene expression in spinal inhibitory neurons, or a subset of mainly inhibitory neurons expressing parvalbumin (PV), using transgenic mice, led to a decrease in the excitability of PV neurons and mechanical allodynia. In contrast, decreasing the translational control of spinal PV neurons prevented the alteration of the electrophysiological properties of the PV cells induced by nerve injury.

      Strengths:

      This is a well-written article that uncovers a previously unappreciated role of gene expression control in PV neurons, which seems to play an important part in the loss of inhibitory control of spinal circuits typically seen after peripheral nerve injury. The conclusions are generally well supported by the data.

      Weaknesses:

      The study would benefit from further clarifications in the methods section and a deeper analysis of gene expression changes in mRNA expression and ribosomal footprint observed after nerve injury.

      We have improved the description of the methods and clarified the rationale underlying the presentation of gene expression changes. We have also added lists of the top differentially expressed genes at both the translational and transcriptional levels to Figure 1, and improved the description of the datasets in the Supplementary Materials.

      Antisense oligonucleotides used to reduce translation by disrupting eIF4E expression were administered i.c.v. It is unknown if the authors controlled for locomotor deficits, which might add confounds in the interpretation of behavioral results. A more local route should have been preferable to avoid targeting brain regions, which could potentially affect behavior.

      Thank you for raising this important point. We used i.c.v. administration to specifically target the central nervous system (CNS) without affecting the peripheral nervous system, as this is the recommended approach for selectively targeting the CNS using ASOs. Intraspinal administration of ASOs (into the spinal cord parenchyma) at an effective dose for long-term effects is not feasible. Intrathecal administration is possible but would result in exposure of the DRGs to the injected ASO and therefore would not be specific to the CNS.

      To rule out potential locomotor deficits, we now subjected mice to the rotarod and open field tests to assess motor function. We found no differences between eIF4E-ASO– and control-ASO– injected mice (Fig. 2J, K).

      In the revised version of the manuscript, we now better explain the rationale for i.c.v. injection. Moreover, we discuss the potential supraspinal effects of eIF4E-ASO in the Limitations section, while also describing the lack of motor phenotypes in the rotarod/open field tests.

      Only female mice were used for Ribo-Seq, TRAP, FUNCAT, and electrophysiology, but both sexes were used for behavior experiments.

      Our manuscript involves various complicated techniques and analyses. Due to limited resources, we therefore opted to use only females for expensive and labor-intensive experiments, such as Ribo-Seq, TRAP, FUNCAT, and electrophysiology, while using both sexes for behavioral studies.

      We now clearly acknowledge this limitation in the revised manuscript.

      The conditional KO of 4E-BP1 using transgenic animals should be total in the targeted cells. However, only a partial reduction is reported in Figure S2 in GAD2, PV, Vglut2, or Tac1 cells. Again, proper methods for quantification of fluorescence in these experiments are lacking.

      We apologize for the oversight; we have now updated the description of the methods for IHC signal quantification. Although genetic ablation is indeed expected to result in a complete loss of signal, in practice, previous studies employing IHC, but not Western blotting, for 4E-BP1 have also shown only a partial reduction in signal. This is likely because the 4E-BP1 antibody partially detects other epitopes. Using the same antibody, we and others have shown complete elimination of the band corresponding to 4E-BP1 in spinal cord and DRG tissue (e.g., PMID: 26678009).

      The elegant knockdown of eIF4E using AAV-mediated shRNAmir shows a recovery of the electrophysiological intrinsic properties of PV neurons after injury. It is unclear if such manipulation would be sufficient to reverse mechanical allodynia in vivo.

      Thank you for this concern, which was also raised by other reviewers. We have now performed two additional experiments, which revealed that suppressing the mTORC1–eIF4E axis in spinal PV neurons (using AAVs expressing eIF4E-shRNA in spinal PV neurons [Fig. 6A] and transgenic mice expressing non-phosphorylatable 4E-BP1 in PV neurons [Fig. 6B]) is not sufficient to alleviate neuropathic pain. These new findings need to be reconciled with our other results showing that eIF4E downregulation in PV neurons prevents the SNI-induced reduction in their excitability, and that ASO-mediated suppression of eIF4E, which affects all cell types, alleviates neuropathic pain.

      Together, these results suggest that targeting translational control in PV neurons is sufficient to reverse SNI-induced reduction in PV neuron excitability, but is not sufficient to prevent behavioral phenotypes, which likely require changes in other cell types and/or additional pathways, as well as other alterations within PV neurons. We have now included these new results in the revised manuscript (Fig. 6A and Fig. 6B) and revised the text accordingly. These changes include toning down the role of translational control in PV neurons after SNI in driving behavioral hypersensitivity.

      Reviewer #2 (Public review):

      Summary:

      I reviewed the manuscript titled "Translational Control in the Spinal Cord Regulates Gene Expression and Pain Hypersensitivity in the Chronic Phase of Neuropathic Pain." This manuscript compares transcription and translation in the spinal cord during the acute and chronic phases of neuropathic pain induced by surgical nerve injury. The authors chose to focus their investigation on translation in the chronic phase due to its greater impact on gene expression in the spinal cord compared to transcription.

      (1) The study is significant because the molecular mechanisms underlying chronic pain remain elusive. The role of translational regulation in the spinal cord has not been investigated in neuroplasticity and chronic pain mouse models. The manuscript is innovative and technically robust. The authors employed several cutting-edge techniques such as Rio-seq, TRAP-seq, slice electrophysiology, and viral approaches. Despite the technical complexity, the manuscript is wellwritten. The authors demonstrated that inhibition of eIF4E alleviates pain hypersensitivity, that de novo protein synthesis is more pronounced in inhibitory interneurons, and that manipulating mTOR-eIF4E pathways alters mechanical sensitivity and neuroplasticity.

      Strengths:

      Innovation (conceptual and technical levels), data support the conclusions.

      Weakness:

      Confusion about the sex of the animals. It is unclear whether eIF4E ASO affects translation and which cells. It is not determined that modulating translation in PV<sup>+</sup> neurons impacts neuropathic pain behaviors.

      We thank the reviewer for their thoughtful comments. In the revised version of the manuscript, we better explain that both sexes were used for behavioral experiments, whereas only females were used for Ribo-Seq, TRAP, FUNCAT, and electrophysiology experiments.

      ASOs are not known to be intrinsically cell-type-specific; therefore, we do not expect differential effects on excitatory versus inhibitory neurons. We demonstrated that eIF4E-ASO reduces the levels of eIF4E, a key translation initiation factor that is rate-limiting for cap-dependent translation.

      Moreover, in the revised manuscript we included two additional experiments (Fig. 6A and Fig. 6B) showing that decreased eIF4E-dependent translation in PV neurons is not sufficient to alleviate neuropathic pain, despite its effects on excitability measures. We have updated the manuscript to reflect these important new findings

      Reviewer #3 (Public review):

      Summary:

      This study provides evidence for translational changes in inhibitory spinal dorsal horn neurons following chronic nerve injury. Gene expression changes have been widely studied in the context of pain induction and provided key insights into the adaptation of the nervous system in the early phases of chronic pain. Whereas this is interesting biologically, most patients will arrive in the clinic beyond the acute phase of their injury, thus limiting the translational relevance of these studies. Recent studies have extended this work to highlight the difference between acute and chronic pain states, potentially explaining the cascading factors leading to chronic pain, and hopefully how to prevent this in vulnerable populations. The present study suggests that translational changes within spinal inhibitory populations could underlie long-term chronic pain, leading to decreased inhibition and heightened pain thresholds.

      Strengths:

      The approaches used and the broad outcomes of the manuscript are interesting and could be an exciting development in the field. The authors are using approaches more common in molecular biology and extending these into neuroscientific research, getting into the detail of how pathology could impact gene expression differentially across the course of an injury. This could open up new areas of research to selectively target not only defined populations but additionally help alleviate pain symptoms once an injury has already reached the maintenance phase. There is an opportunity to delve into what must be a very large data set and learn more about what genes are differentially translated and how this could affect circuit function.

      Weaknesses:

      Whereas the authors approach a key question in pain chronicity, the manuscript falls a little short of providing any conclusive data. The manuscript was in some areas very difficult to follow. Terminology was not always consistent or clear, and the flow of the manuscript could use some attention to highlight key areas. Whereas the overall message is clear in the summary, this would not necessarily be the case when reading the manuscript alone.

      To improve the clarity and flow of the manuscript, we made changes to the text, including the addition of intermediate summaries and further explanations of terms and experiments.

      The study claims to show that translational control mechanisms in the spinal cord play a role in mediating neuropathic pain hypersensitivity, but the studies presented do not fully support this statement. The authors instead provide some correlation between translation and behavioural reflex excitability (namely vfh and Hargreaves).

      It is difficult to fully interpret the work, as there are a number of inconsistencies, namely the range of timings pre- and post-injury, lack of controls for manipulations, the use of shmiRNA versus lineage deletions, and lack of detailed somatosensory testing. It is not completely clear how this work could be translatable as is, without a deeper understanding of how translational control affects circuit function and whether all of this is necessarily bad for the system, or whether this is a positive homeostatic adaptation to the hyperexcitability of the circuit following injury.

      A large portion of the work is focussed on showing an inhibitory-selective change in translation following chronic nerve injury. The evidence for this is however lacking. Statistics to show that translational effects are restricted to inhibitory subpopulations are inadequate. The author's choice of transgenic lines is not clear and seems to rely on availability rather than hypothesis.

      Although we agree with some of the criticism, we have reservations regarding other points raised by the reviewer. To address several of the concerns, we added new experiments (Fig. 2J, 2K, 6A, and 6B). We also made changes to the text to improve readability and to better explain the rationale for the study and our focus on inhibitory neurons.

      For example, we clarify that we do not state that changes in mRNA translation in the spinal cord during the chronic phase of neuropathic pain occur exclusively in inhibitory neurons. Although we observe changes in general protein synthesis, assessed using FUNCAT, in inhibitory but not excitatory neurons after SNI, alterations in the translation of specific transcripts, assessed using the TRAP approach, are observed in both excitatory and inhibitory neurons.

      The second part of the paper focuses on inhibitory neurons because these neurons demonstrate larger translational changes. We now clearly indicate that alterations in excitatory neurons are also likely important during the chronic phase of SNI. This conclusion is further supported by newly added results (Fig. 6A and Fig. 6B), showing that targeting eIF4E-dependent translation in spinal PV neurons using two different approaches is not sufficient to reverse pain hypersensitivity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Analysis of gene expression in Figure 1 lacks clarity, and the data do not effectively guide the reader toward their intended purpose. A list of the most dysregulated genes at the transcriptional level, the translational level, or both, would help the reader fully appreciate the outcome of this analysis. Similarly, what is the message conveyed by Figures 4 D-G?

      As requested, we have now included the top 10 upregulated and top 10 downregulated genes at both the translational and transcriptional levels in Figure 1. We also expanded the main text and figure legends to clarify that Supplementary Figure 1 includes volcano plots for all conditions, and that Supplementary Table 1 contains the complete datasets. In addition, we expanded the figure legends to explain the organization of the data in Supplementary Table 1. Finally, we provide pathway analyses of translationally regulated genes in the spinal cord, as this condition is the primary focus of the study.

      Figure 4D–G shows the top 15 translationally upregulated and downregulated genes in inhibitory neurons at days 4 (D) and 60 (E), and in Tac1<sup>+</sup> excitatory neurons at days 4 (F) and 60 (G) (four conditions in total) after SNI. These panels convey that translational regulation of specific transcripts occurs in both inhibitory and excitatory neurons. Panel 4H further demonstrates that, although translational changes are observed in both neuronal populations, a greater number of genes are altered in inhibitory neurons. We have improved the readability and flow of this section to better convey this message.

      Details about how AHA was quantified in Figure 3 are missing. It is unclear how and where the cells were selected for quantification. Objective criteria for expression/no expression of AHA in the cells are not indicated. Additionally, the signal seems to have somehow been normalized over images from the contralateral side. It is difficult to understand what the bar graphs actually represent in panel C. One would interpret them as percentages of excitatory/inhibitory cells expressing AHA.

      We apologize for the lack of clarity. We have now expanded the description of the analyses in the figure legend and in the Methods to better explain the results shown in Fig. 3. The imaged cells were selected based on specific criteria, such as lamina location and cell type. In panel C (the anisomycin experiment), values were normalized to the control group. In all other panels, no normalization was applied, and the values represent the AHA integrated density on maximumintensity projection images (averaged per mouse). We also describe the number of sections and cells per mouse, as well as other technical details, as requested.

      In addition, a few minor changes should be made:

      (1) Rephrase Introduction: "Peripheral nerve injury can cause neuropathic pain, a chronic pain condition [...]." Neuropathic pain is not necessarily chronic.

      This sentence was reworded to read “Peripheral nerve injury may result in neuropathic pain, a debilitating condition with limited effective treatment options”.

      (2) Host species for secondary anti-mouse antibodies are provided but not for the anti-rabbit (donkey?). Also, check for consistency in the methods section. The method mentions P21 two secondary antibodies and an apparent third antibody named "anti-HRP-conjugated antibody." Please provide information about this antibody, or remove it.

      Thank you for flagging it, the inadvertent repetition of “anti-HRP-conjugated antibody” was removed.

      (3) Provide primary antibody hosts on page 22.

      The hosts of all primary and secondary antibodies were now provided.

      (4) Define PBST on page 21 and PBS-T on page 22.

      We defined PBST in the revised manuscript (0.2% Triton-X100 in PBS).

      (5) Specify the filter sets used for fluorescent microscopy.

      We specified the filter sets used for fluorescent microscopy.

      (6) Change the legend to 50% withdrawal threshold for vF behavior tests.

      We addressed this by making the requested change in all relevant legends.

      Reviewer #2 (Recommendations for the authors):

      Major:

      (1) The authors need to show that eIF4E ASO (Figure 2) reduces translation in both inhibitory and excitatory neurons.

      ASOs are not intrinsically cell-type specific, as they do not contain promoters or regulatory elements and act wherever they enter cells and engage RNase H1. However, differences in ASO effects across cell types can arise from variability in uptake, intracellular trafficking, RNase H activity, or target mRNA expression levels.

      In our study, we used eIF4E-ASO as a general approach to demonstrate that eIF4E-dependent translation contributes to SNI-induced hypersensitivity, particularly at the chronic phase. We show a marked reduction in eIF4E levels in the spinal cord of eIF4E-ASO–injected mice compared with controls. We do not claim that the effects of eIF4E-ASO are mediated by a specific cell type; rather, they may involve excitatory neurons, inhibitory neurons, and non-neuronal cells, such as microglia and astrocytes, among others.

      Notably, while eIF4E can promote general translation during development, in adult mice it predominantly regulates cap-dependent translation of specific mRNAs without having a major effect on overall protein synthesis. In our case, the partial reduction in eIF4E is unlikely to substantially affect general translation, as assessed by AHA incorporation, and would instead require TRAP or Ribo-Seq to detect transcript-specific translational changes. We now better explain the rationale for the eIF4E-ASO experiment and clearly state that the effects observed cannot be attributed to a specific cell type.

      In addition, our new results showing that inhibition of eIF4E-dependent translation in PV neurons is not sufficient to alleviate SNI-induced mechanical hypersensitivity suggest that translational changes in other neuronal and/or non-neuronal cell types contribute to hypersensitivity. This important point is now more clearly explained in the revised manuscript, and the role of PV neurons is toned down throughout the paper.

      (2) In Figure 5, it is necessary to show the effect of eIF4E-shRNA in PV+ neurons on neuropathic behaviors (von Frey and MGS).

      To address this important concern, we performed two new experiments, both of which showed that inhibiting the mTORC1–eIF4E axis in parvalbumin neurons is not sufficient to alleviate neuropathic pain. First, we injected PV-Cre mice with AAV-eIF4E-shRNAmir and a scrambled control. We found that downregulating eIF4E in spinal PV neurons has no effect on SNI-induced mechanical hypersensitivity. We used a second, complementary approach to validate this finding. Specifically, we generated transgenic mice in which a non-phosphorylatable form of 4E-BP1 is expressed in PV neurons. Because non-phosphorylatable 4E-BP1 acts as a translational suppressor of eIF4E, this approach is functionally similar to eIF4E deletion.

      Altogether, our findings indicate that cell-type–non-specific suppression of eIF4E using ASOs is sufficient to alleviate neuropathic pain, particularly at the chronic phase. In contrast, while activation of eIF4E-dependent translation in PV neurons (via 4E-BP1 deletion) induces pain hypersensitivity, suppression of eIF4E-dependent translation in PV neurons inhibits SNI-induced decrease in PV neuron excitability but does not alleviate pain hypersensitivity. Thus, increased eIF4E-dependent translation in PV neurons is sufficient to induce pain hypersensitivity, but targeting this pathway in PV neurons alone is not sufficient to reverse neuropathic pain.

      Potential explanations for these findings include: (1) the presence of other important mechanisms in PV neurons (e.g., changes in synaptic transmission) that are translation independent; (2) the insufficiency of correcting reduced PV neuron excitability to alleviate hypersensitivity; and (3) an essential role for mRNA translation in other neuronal and/or non-neuronal cell types in neuropathic pain. We have updated the manuscript to include these potential explanations in the Discussion section.

      Moderate:

      (1) In Figure 2, MGS should be performed at earlier time points as well.

      We performed MGS when von Frey testing, which is less noisy and less labor intensive in our hands, suggested altered phenotypes.

      (2) In Figure 4B, the gene markers are different in Gad2+ and Tac1+ cells. Please show the 12 markers for both cell types.

      We now better explain the selection of the markers.

      (3) In Figure 5, MGS should be performed to test if the effect is limited to mechanical sensation/reactivity or extends to nociception. Additionally, do these mice exhibit altered locomotion and grip strength?

      As described above, we added experiments involving downregulation of eIF4E and expression of a mutant non-phosphorylatable 4E-BP1 in PV neurons. We performed von Frey testing, which showed no effect of suppressing the mTORC1–eIF4E axis on mechanical hypersensitivity under these conditions. Given these negative results, we did not proceed with mouse grimace scale (MGS) analysis.

      (4) In Figure S2E, the reduction of eIF4E does not appear to be specific to GFP+ cells.

      We now replaced the representative images in this Figure.

      (5) Can chronic neuropathic pain be reduced by enhancing 4E-BP1 specifically in PV+ neurons?

      We added the experiment proposed by the reviewer in Fig. 6B. We found that enhancing 4E-BP1 activity, by expressing a non-phosphorylatable form of 4E-BP1 in PV neurons, is not sufficient to alleviate neuropathic pain hypersensitivity.

      (6) Why did the authors not use PainFace for the MGS?

      We began using manual, blinded MGS scoring, as originally described by Mogil and colleagues in 2010 (PMID: 20453868), for this project before PainFace became available around 2019 (e.g., Tuttle and Zylka) and in later versions (e.g., PMID: 39024163). For consistency, we therefore continued using the same approach throughout the experiments.

      (7) In Figures 2A-C, the labeling of the bar graphs seems incorrect: is it 4E-BP1 or eIF4E immunoreactivity?

      Thank you very much for noticing this; we have corrected the mistake.

      (8) In Figure 1, present the data by sex.

      We performed sequencing analyses only in females. This decision was based on the large number of mice and experimental conditions required for both Ribo-Seq (n = 15 mice per replicate, 3 replicates per condition, and 2 time points for SNI/Sham, ~180 mice total) and TRAP (n = 3 mice per replicate, 3 replicates per condition, 2 time points, and 2 genotypes [Tac1 and GAD2] for SNI/Sham), as well as the high cost of sequencing. Behavioral experiments were performed in both sexes. This information is clearly indicated in the Methods section, and we have now also included it in the Limitations section of the paper.

      (9) While the methods state that all behavioral testing was done with equal numbers of male and female mice, it seems that several experiments were done only in females. In the absence of a strong justification, all experiments should be conducted in both sexes.

      As explained above, due to the very large number of mice required for some experiments and the high cost of sample processing and sequencing, only behavioral experiments were performed in both sexes. We now clearly describe the sex of the animals used in each experiment in the figure legends.

      Minor:

      (1) In Figure 3, the legend is confusing and lacks labels.

      We expanded the Fig. 3 legends and added labels, as requested.

      Reviewer #3 (Recommendations for the authors):

      Overall, the manuscript needs to be made clearer and more specific. As it stands, the logic and flow are difficult to follow. Figure legends are not always indicative of the figure and are inconsistent.

      Regarding timelines:

      The logic of the different timelines is not clear. Either explain why different times post-injury were chosen between experiments or keep them consistent. It seems a key message here is that the timing is important. It therefore follows that the authors should be strict about this in their own experiments. Figure 1: 4 and 63 days. Figure 2: Day 3 and weeks 8 and 12. Figure 3: Days 4 and 60. Figure 4: Days 4 and 60. Figure 5: 6 weeks. Figure S1: 4 and 60. Clarifying why these timings were used in each case and showing at the transcript level that these are most appropriate would be needed.

      We thank the reviewer for carefully reviewing our manuscript. We focused on early versus late time points. For the sequencing experiments, we performed Ribo-seq at day 4 for the early time point and day 63 for the late time point, whereas TRAP analyses (and FUNCAT) were performed at day 4 for the early time point and day 60 for the late time point. These differences (day 60 versus day 63) were due to logistical issues related to sample collection. In our view, there are no major biological differences between day 60 and day 63 for the late time points, particularly because we do not perform direct comparisons across different experiments.

      In other experiments, we used several time points (e.g., day 3, as well as 6, 8, and 12 weeks) either to follow the development of phenotypes or based on previous publications regarding the timing of specific effects. We now acknowledge the potential limitation of using slightly different time points in the Limitations section of the paper.

      Regarding the use of inhibitory and excitatory markers:The comparisons they made between subpopulations seem a little random- for one, the number of Tac1 positive cells in the dorsal horn is not equal to that of PV, and so the comparison seems inappropriate.

      The number of cells from each subpopulation should not affect the number of DEGs. Because these analyses were performed on bulk mRNA rather than at the single-cell level, the comparisons are made between SNI and control groups within each subpopulation. Thus, the number of differentially translated genes is determined per cell type, not per individual cell.

      The lack of any semblance of variability or statistics with regard to gene changes makes it difficult to assess whether these comparisons were justified experimentally. Pax2 is a developmentally regulated transcription factor, with reduced levels in the adult. Using Pax2- NeuN+ to label excitatory interneurons is therefore not appropriate for comparison. A more appropriate comparison would be to use vGluT2 and GAD67. Similarly, the use of the GAD2Cre seems a poor choice. This is a restricted population of interneurons that have been suggested to have specific roles in presynaptic inhibition. If the authors were interested in this subpopulation for that reason, then they should state so.

      Pax2 is commonly used as a marker of inhibitory neurons in the spinal cord (e.g. PMID: 36323322) as in the adult dorsal horn, Pax2 protein remains expressed in nearly all inhibitory neurons, including both GABAergic (GAD65/67<sup>+</sup>) and glycinergic (GlyT2<sup>+</sup>) neurons. VGluT2 marks terminals of IB4-binding peripheral sensory neurons as well as those of spinal cord excitatory interneurons in lamina II of the dorsal horn, complicating the analyses. We attempted using Lmx1b for excitatory neurons (Pax2 for inhibitory and Lmx1b for excitatory) but could not obtain specific and robust signal using different commercial antibodies (we have no access to non-commercial Pax2 antibody).

      Regarding Cre lines, Gad2-Cre has been extensively used to target GABAergic neurons in the spinal cord. Although it is not expressed in purely glycinergic neurons, it is expressed in GABAergic and mixed GABA/glycine interneurons. Gad2-Cre is more restricted to superficial dorsal laminae I–III, which are relevant to pain processing, versus Gad1-Cre, which may also capture low-level GABAergic neurons in deep laminae and ventral horn inhibitory neurons. Moreover, there are also differences in the developmental profile, whereas Gad1-Cre is expressed earlier at embryonic stages during inhibitory neuron development, GAD2 is expressed later, in post-mitotic and mature inhibitory neurons. Because of these considerations (higher specificity to dorsal horn and later developmental expression), we used Gad2-Cre mouse line in our experiments.

      Regarding cKO experiments:

      It is unclear whether the deletion of Eif4ebp (which is not "ablation" as stated in the manuscript) has had any effect on the PV/GAD2 cells themselves seeing as this deletion would be a lineage deletion. One would imagine that altering transcription in such a population from early development would affect a host of neuronal and circuit properties, such as connectivity, dendritic branching, etc. The authors should show that the circuit properties were not broadly changed, not least as PV is expressed throughout the nervous system and in muscles. This could in itself explain the hypersensitivity described in their results. Experimenters should repeat the AAV shRNAmir experiments in non-injured animals, and not just control animals with the scrambled sh.

      We agree with the concerns related to potential developmental effects. Although it is nearly impossible to reliably and comprehensively demonstrate that circuit properties were not altered in our cKO mice, our manuscript presents several lines of evidence supporting a role for translational control in specific cell types in the regulation of gene expression and nociception independent of developmental effects. First, our translational gene expression analyses were performed in adult WT mice and reflect SNI-induced changes in gene expression at the translational level, assessed using complementary approaches. In addition, the effects of eIF4E ASO delivered to adult animals support a role for translational control in the regulation of SNI-induced pain hypersensitivity at later stages.

      Moreover, downregulation of eIF4E in PV neurons using an AAV-based approach in adult mice affects their SNI-induced excitability, further supporting a role for translational mechanisms in regulating PV neuron plasticity after peripheral nerve injury in adulthood. To acknowledge the potential developmental effects associated with 4E-BP1 deletion using Tac1-Cre, Gad2-Cre, and PV-Cre mouse lines (with PV-Cre beginning expression postnatally), we have included an explicit limitation statement in the Discussion of the revised manuscript.

      We also thank the reviewer for highlighting the distinction between deletion and ablation, and we have corrected this terminology in the revised manuscript.

      Regarding pain:

      A large sticking point within the study is the lack of clarity of the populations they are targeting. Many of the populations mentioned are not expressed solely in the dorsal somatosensory horn and instead are also expressed in the ventral motor horn. This is particularly important with regard to the sensory tests they are performing, which rely on reflex responses. It seems these results, although interesting, are not proof of a pain effect, but rather showing changes in vfh-behaviour. To show this is a pain-specific event, and not just correlative or reflexive, the authors should perform further behavioural tests beyond vfh, Hargreaves, and the grimace scale, such as low threshold touch, rotarod, etc. How much of this effect is due to changes in reflex excitability? Would the authors expect similar results for all neuropathic models but not for chronic inflammatory states for example? Western Blot analysis at the moment is for the whole cord, which could imply changes in the ventral or intermediate horn, it could help strengthen the study to show that these changes are selective to the dorsal cord.

      We have now added a new experiment showing that eIF4E-ASO has no effect on motor function in the rotarod and open field tests (Fig. 2J, K). In addition, the eIF4E-ASO experiment included in the original submission reflects supraspinal behavior, as assessed by MGS. Overall, our study includes numerous experiments and datasets. While we agree with some of the reviewer’s concerns, the extensive additional work requested, including additional neuropathic and inflammatory pain models, further assays of supraspinal behavior, Western blot analyses restricted to the dorsal horn, additional Cre lines and markers, and other analyses, is not feasible within the scope of the current manuscript.

      Notably, in the revised manuscript, we have added new experiments (Fig. 2J, 2K, 6A, 6B) that we believe address the most critical concerns raised by the reviewers, and we have revised the text to more clearly acknowledge the limitations of the study.

      Regarding patch clamp studies:

      An increase in rheobase alone in the PV cells would not in itself account for the changes seen in behaviour, seeing as the authors are suggesting this is a selective effect for von Frey and not radiant heat, for example. The authors should therefore show a change in mechanically-evoked firing of PV/GAD2 cells either by dorsal root stimulation in slice, or by cfos or equivalent marker of activation following sensory stimulation. The title of this figure is also misleading- it is not clear how there is any proof of promotion of plasticity in the experiments shown.

      In the original submission, in addition to an increase in rheobase, we also demonstrated decreased spiking activity in response to a range of stimulating currents (Fig. 4). We agree that assessing mechanically evoked responses of PV neurons would be informative; however, such studies are beyond the scope of the current manuscript.

      To address the final concern, we modified the title of Fig. 5 and the related text. Moreover, the newly added data showing that inhibition of translation in PV neurons does not alleviate SNIinduced hypersensitivity prompted us to tone down, throughout the manuscript, the link between translational changes in PV neurons and pain hypersensitivity.

    1. eLife Assessment

      In this manuscript, Wafer and Tandon et al. present a thoughtful and well-designed genetic screen for regulators of adipose remodeling using zebrafish as a model system. This work is valuable because it uncovers several genes associated with adipose tissue hyperplastic hypertrophic morphology and diet-induced remodelingthe hat have considerable potential health impact. The rigorous phenotypic analyses and compelling evidence make this work a key resource for the field.

    2. Reviewer #1 (Public review):

      In this manuscript, Wafer and Tandon et al. present a thoughtful and well-designed genetic screen for regulators of adipose remodeling using zebrafish as a model system. The authors cross-referenced several human adipocyte-related transcriptomic and genetic association datasets to identify candidate genes, which they then functionally tested in zebrafish. Importantly, the authors devised an unbiased microscopy-based screening platform to document quantitative adipose phenotypes with whole animal imaging, while also employing rigorous statistical methods. From their screen, the authors identified 3 genes that resulted in robust adipose phenotypes out of a total of 25 that were tested. Overall, this work will be an important resource for the field because of the genes identified from the screen, the quantitative screening pipeline, and the rigorous phenotypic analysis.

      Comments on revisions:

      The authors have far exceeded my expectations with their revised manuscript. All my questions and concerns from the original manuscript have been addressed by the authors. The additional data and analysis in Figure 6 and Supplementary Figure 8 are compelling and have greatly improved the manuscript.

    3. Reviewer #2 (Public review):

      This manuscript by Wafer, Tandon et al., presents exciting new approaches for using the zebrafish CRISPR screening and imaging system to identify genes that are associated with hyperplastic and hypertrophic adipose morphology. This paper established valuable screening pipelines in zebrafish to identify genetic regulators that affect adipose tissue morphology by combining CRISPR with an imaging-based, comprehensive adipose spatial analysis platform. Starting from a human transcriptomic dataset with differentially expressed genes that separate small and large adipocytes, they eventually identified 3 genes that induce hyperplastic or hypertrophic phenotypes in zebrafish. From which, they focused on foxp1 gene, a transcription factor known to regulate tissue development. They discovered that the foxp1 mutant displays basal hypertrophic morphology and failed to undergo hypertrophic remodeling in response to a high-fat diet, suggesting a link between adipose tissue development and diet-induced remodeling response. Overall, this manuscript is extremely well-written, the data presented is quite compelling, and the identified novel genes that are associated with adipose tissue hyperplastic and hypertrophic morphology and diet-induced remodeling are very exciting.

      Strength:

      (1) Obesity remains a worldwide public health concern. The mechanisms underlying adipose tissue hypertrophic and hyperplastic adaptation remain unclear.

      (2) This manuscript combined multiple omic datasets to identify candidate genes and performed a CRISPR-based screening to identify genes underlying adipose tissue development and adaptation. This new method will open opportunities that will facilitate our understanding and testing of new genetic mechanisms underlying the development of obesity.

      (3) Using the screening approach, this paper successfully identified new genes that are associated with adipose tissue LD size change. More importantly, the paper provided further validation using a stable CRISPR line to show the phenotype in basal and HFD conditions.

      (4) The experiments are extremely well-designed. Sample sizes are large. Statistical analysis is rigorous. Overall, this is a very high-quality study.

      Author's response to the previous comments/weakness:

      (1) In this revised manuscript, the authors provided new comprehensive spatial analyses of foxp1a and foxp1 b mutants in basal conditions as well as responding to high-fat feeding. The new data confirmed their initial findings and beautifully illustrated the spatiotemporal dynamics of the adipocytes in response to High-fat diet feeding.

      (2) The authors have addressed all my comments, and I do not have further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for the thoughtful and constructive comments on our manuscript. We have carefully addressed all points raised, and believe the manuscript is substantially improved as a result. In particular, we have performed:

      - Comprehensive spatial analysis of stable mutants. Following Recommendations for the authors comment #1, we performed spatial analysis by binning the anterior-posterior axis into 200 µm strata. This analysis validates our initial conclusions and reveals striking spatiotemporal dynamics, including profoundly blunted HFD responses in foxp1b mutants (68% reduction) and loss of spatial gradients in foxp1a mutants.

      - Substantially enhanced the statistical rigour of the screen analysis. We have implemented stratified Kolmogorov-Smirnov tests (within-experiment testing, then combined via Fisher's method) alongside linear mixed models to control for batch effects. In the revised manuscript, we now focus on three hypertrophy genes – foxp1b, txnipa and mmp14b – which are robustly validated by both methods.

      - Normalisation of adipose area to body size. To address concerns about developmental delay (Recommendations for the authors #2), we now normalise adipose area to standard length. With this normalisation, foxp1b single mutants show only a non-significant trend toward decreased adiposity (updated from our original analysis), while the hypertrophic LD morphology remains highly significant - demonstrating the phenotype is independent of body size and not a developmental delay.

      - Revised title. As suggested by Recommendations for the authors comment #6, we have changed the title to: "A quantitative in vivo CRISPR-imaging platform identifies regulators of hyperplastic and hypertrophic adipose morphology in zebrafish"

      - Extensive code and analysis availability. We now provide all code and extensive analysis pipelines in interactive HTML documents at https://github.com/jeminchin/zebrafish_adipose_morphology_screen

      Joint Public Review:

      We thank the reviewers for their thoughtful assessment of our work and their recognition of the rigorous experimental design, statistical approaches, and the utility of both the identified genes and screening pipeline for the field. We address their concerns below.

      Weakness:

      Distinguishing developmental patterning from adipose tissue plasticity

      We appreciate this important distinction and agree that separating developmental from adaptive effects is a key challenge in the field. We would like to make several points in response:

      First, we acknowledge this limitation in our discussion and have now expanded this section to more explicitly address the interpretive boundaries of our approach. Our screening platform was intentionally designed to capture the outcome of genetic perturbation across development and early adaptation, as these processes are inherently intertwined during the establishment of adipose tissue.

      Second, regarding the suggested analysis of lipid droplet size along the AP axis in response to HFD: we have now performed this analysis and include it as new Fig. 6 and new Supplemental Fig. 8 & 9. These data validate our initial conclusions and reveal striking spatiotemporal dynamics, including profoundly blunted HFD responses in foxp1b mutants (68% reduction) and loss of spatial gradients in foxp1a mutants. Further, these data provide additional resolution on regional responses to dietary challenge.

      Third, we note that our stable mutant validation experiments (Figure 6) do begin to disentangle these effects by examining both baseline and HFD-challenged conditions in animals with constitutive genetic loss. However, we agree that definitive separation would require temporally controlled genetic manipulation, which we now acknowledge as an important future direction.

      Lack of tissue-specific manipulations

      We agree that tissue-specific approaches would strengthen mechanistic conclusions and have acknowledged this limitation in our revised discussion. The current study was designed as a discovery-focused screen to identify candidate regulators, with the understanding that mechanistic dissection would require follow-up studies employing tissue-specific tools.

      We note that adipocyte-specific Cre/lox or Gal4-UAS approaches in zebrafish are feasible and represent an important next phase of investigation for the most promising candidates identified here, rather than a requirement for the current screening study. We have added text explicitly framing our findings as establishing genetic associations that warrant future tissue-autonomous investigation.

      Recommendations for the authors: 

      (1) Analysis: In Figure 6, the authors state that foxp1b mutants "fail to undergo further hypertrophic remodeling in response to a high-fat diet (HFD)." Foxp1b mutant juveniles are already hypertrophic before the high-fat diet. After a high-fat diet, these mutants reach mean lipid droplet diameters similar to WT, approximately 65 µm, which the authors state earlier in the manuscript are "a potential upper limit of LD growth at this developmental stage." The authors should perform additional analysis of their existing data. Specifically, determine lipid droplet size by binning the AP axis as shown in Figure 3. The rationale is that lipid droplet size differences in response to HFD may be more evident when not considering the anterior populations of lipid droplets that have already reached maximum steady state size for this juvenile stage. This would not require any new experiments, just reanalyzing data similar to how they did in Figure 3.

      We thank the reviewer for this excellent suggestion. We have performed the requested spatial analysis by binning the AP axis into 200 µm strata (Figure 3 approach). These data can be found in new Fig. 6H-M, and new Supplemental Figs 8 & 9. This new analysis verifies our initial conclusions, and also reveals several very interesting spatiotemporal dynamics

      (i) Baseline hypertrophy in foxp1b mutants across AP strata

      In support of our initial conclusion that foxp1b mutants have larger LDs at baseline, the spatial analysis confirms that on a control diet (baseline), foxp1b mutants have significantly larger LDs than WT across strata 1-5 (new Fig. 6I), ranging from +22.2 µm larger in strata 1 to +17.8 µm larger in strata 5 (all FDR-adjusted p < 0.05, linear mixed effects model). Extended analysis across all 15 strata is shown in Supplemental Figs. 8 & 9. By contrast, and also in support of our initial conclusion, foxp1a mutants showed no baseline hypertrophy on control diet (all strata p > 0.10, Supplemental Fig. 8).

      (ii) foxp1b mutants show a profoundly blunted hypertrophic response to HFD

      Using paired analysis (same fish on both control diet and after 14 days of high-fat diet) with a linear mixed effects model, we quantified the effect of HFD across all strata:

      (A) Anterior/oldest strata (1-6): WT + HFD increases LD diameter by +25.1-28.1 µm (+52-58%, p < 0.0001). Whereas, foxp1b mutants + HFD only increase LD diameter by +7.5-11.7 µm (+12-19%, p < 0.003). Therefore, in the oldest/most anterior regions, containing the largest LDs, the hypertrophic response of foxp1b mutants to HFD is ~57% weaker than WTs.

      (B) Posterior/newer strata (7-15): WT + HFD undergo significant increases in LD diameter of +17.7-23.7 µm (p < 0.024). However, in foxp1b mutants there is no significant hypertrophic response at all (p > 0.068), and hypertrophic effect sizes decline from +6.8 µm (stratum 7) to +0.4 µm (stratum 15).

      (C) Overall effect: Averaged across all strata, WT + HFD LDs show +24.4 µm increase (p < 0.0001), whereas foxp1b mutant LDs only show a +7.7 µm increase with HFD (p = 0.020). Therefore, foxp1b mutants show a 68% reduction in hypertrophic growth in response to HFD compared to WT (Fig. 6K).

      The consequence of these spatial dynamics is that WT SAT LDs - which start 22 µm smaller than foxp1b mutants on a control diet - undergo massive hypertrophy across all regions/strata in response to a HFD. Meanwhile, foxp1b mutants - starting larger than in WTs - show only a modest, spatially restricted response. This results in a convergence in LD size in early/anterior strata, but WT LDs actually surpass foxp1b mutant sizes in late/posterior strata (strata 14-15: +WT 14.7 µm larger on HFD, p = 0.028; Supplemental Figs. 8 & 9).

      By contrast, foxp1a mutants retain the capacity for HFD-induced hypertrophy but show a ~35% weaker response than WT (p = 0.023) – significantly less severe than the 68% reduction in foxp1b mutants. Interestingly, foxp1a mutants after HFD show a reduction in the AP gradation of LD size observed in WT and foxp1b mutants (uniform +14.4 mm across all strata versus WT range of +26.4 mm anteriorly to +16.6 mm posteriorly), suggesting that foxp1a may regulate spatial heterogeneity in adaptive responses to HFD (Fig. 6L-M).

      (iii) Developmental ceiling or impaired adaptive capacity?

      The reviewer raises an important question about whether anterior adipose LDs have reached a "developmental ceiling." After conducting the spatial analysis suggested by the Reviewer, we now believe several lines of evidence support an intrinsic defect in HFD-induced hypertrophy in foxp1b mutants, rather than reaching a developmentally determined limit:

      First, foxp1b mutants show reduced responses across ALL strata, not just anterior regions. The attenuation extends throughout the entire AP axis (57% reduction in strata 1-6, complete loss of response in strata 7-15). If anterior adipocytes had simply reached a size ceiling, we would expect normal responses in posterior regions where cells are smaller - but we don't observe this.

      Second, in posterior/newer regions of SAT (strata 14-15) the hypertrophic response to HFD in foxp1b is so limited that WT LDs actually become larger than foxp1b mutant LDs (+14.7 mm larger, p = 0.028; Supplemental Fig. 9). This demonstrates that these LD sizes are not developmentally limiting and argues for intrinsic hypertrophic defects in response to HFD.

      Third, foxp1a mutants provide an important control. These mutants show no baseline hypertrophy (all strata p > 0.10) yet still exhibit blunted hypertrophic responses to HFD (~35% reduction, p = 0.023), proving that reduced HFD responses can occur independently of baseline hypertrophy.

      We have updated the Results and Discussion to reflect these new conclusions. Methods have been updated to include the spatial analysis approach.

      (2) Adipose morphogenesis in WT is a function of standard length, as shown by the authors. At juvenile stages, foxp1 mutants are both smaller and have reduced adipocyte coverage, while adults show normal body length and very subtle adipose phenotypes. Can the authors demonstrate that the observed defects in foxp1 mutant juveniles are bona fide phenotypes rather than a developmental delay?

      We thank the reviewer for this key point. We agree it is critical to distinguish true foxp1b-dependent phenotypes from potential developmental delay. Importantly, our data strongly argue against a simple developmental delay. We show that LD size scales with body size in Fig. 3G, with smaller zebrafish having smaller LDs and larger zebrafish having larger LDs. In contrast to a developmental delay, our data show that foxp1b single and foxp1a;foxp1b double mutants are smaller (reduced standard length) but have larger LDs (Fig. 6E,G). This dissociation between body size and LD size is the opposite of what would be expected from developmental delay.

      To account for the body size difference, we have now normalised adipose area to standard length (Fig. 6F). With this normalisation, foxp1b single mutants show only a non-significant trend toward decreased adiposity, whereas foxp1a;foxp1b double mutants remain significantly reduced. This represents a change from our original analysis and we have updated the text accordingly. Critically, despite normalised adipose area showing only a trend in foxp1b singles, the hypertrophic LD morphology remains highly significant (Fig. 6G), demonstrating that the morphological phenotype is robust and independent of overall body size.

      We have clarified this interpretation in the Results and Discussion.

      (3) What was the rationale for selecting one amongst paralogous genes for the screen? For example, why did the authors choose ptenb rather than ptena?

      (4) Point 3 is particularly relevant for the final six genes that resulted in adipose phenotypes. Why did the authors choose not to target both paralogs, given that multi-plexed F0 CRISPR targeting is feasible in zebrafish (PMID: 29974860).

      We answer Points 3 & 4 together here.

      We used the DIOPT (DRSC Integrative Ortholog Prediction Tool) orthology tool to identify the zebrafish paralogue with the highest orthology score to each human gene. This tool integrates predictions from 20 orthology databases to generate a composite score. We selected the paralogue with the highest DIOPT score for each gene. For example, we selected ptenb over ptena because it showed a higher predicted orthology to human PTEN.

      We acknowledge this approach has important limitations, including orthology scores not necessarily predicting functional equivalence (ie, the "most orthologous" paralogue may not be the one with the most relevant adipose tissue function in zebrafish). We acknowledge that this may mean we have missed genuine hits - testing only one paralogue means we could fail to identify genes where the "less orthologous" paralogue has the relevant adipose function.

      Our findings with Foxp1 paralogues both validate this approach and reveal its limitations. The higher-scoring paralogue foxp1b (DIOPT score = 13/19) showed the more severe phenotype, validating our prioritisation. However, the lower-scoring paralogue foxp1a (DIOPT score = 5/19), which we tested subsequently, showed a distinct but significant phenotype (altered spatial patterning) – a finding that would have been missed had we not pursued secondary validation.

      For future screens where comprehensive hit identification is the goal, multiplexed targeting of all paralogues would be valuable, though this may complicate interpretation of paralogue-specific phenotypes. We have discussed this in the Discussion.

      (5) General framework and limitations: The analysis platform presented in the manuscript cannot separate the developmental effects from adipose tissue plasticity/remodeling. Potential approaches that may help address this concern include: (a) establishing a baseline model to illustrate how WT fish respond to high-fat diet (HFD); (b) showing how mutants with hyperplasticity (opposite effects of foxp1 mutants) respond to HFD; (c) examining whether foxp1 gene expression level changes in response to HFD. However, these approaches (especially a and b) would require extensive experimental work and may be beyond the scope of this study. Without further evidence or data support of adipose tissue plasticity and remodeling, the author may want to emphasize in the background and discussion sections how adipose tissue development may affect plasticity and adaptation, and soften the tone of how genes may directly regulate adipose tissue plasticity and adaptation.

      We thank the reviewer for this comment about the relationship between adipose development and plasticity/remodelling. We agree this is an important issue as we are looking in juvenile fish that are still growing. Therefore, when we feed them HFD and see LDs get bigger – is this diet-induced remodelling or just accelerated normal development (ie, growth that would happen anyway, but occurring faster due to more nutrients)?

      To address the reviewer's specific suggestions:

      (A) Baseline model of WT HFD response: We have now performed detailed spatial analysis of WT responses to HFD (new Fig 6H-M, Supplemental Figs. 8 & 9). This analysis establishes a comprehensive baseline for hypertrophic responses to HFD in developing adipose tissue. In summary, WT fish show robust, statistically significant and spatially-graded hypertrophic responses to HFD across the entire AP axis, with responses ranging from +28.1 mm anteriorly to +17.7 mm posteriorly.

      We agree with the Reviewer that separating developmental from adaptive processes in growing juvenile fish is challenging. Importantly, we believe foxp1a mutants provide compelling genetic evidence that we are studying adaptive responses rather than purely developmental processes. foxp1a mutants have normal baseline LD sizes on control diet (demonstrating foxp1a is not required for developmental adipose expansion), yet when challenged with HFD show significantly reduced hypertrophic expansion and reduction of spatial gradient. This genetic dissociation strongly argues we are observing adaptive capacity rather than developmental growth rate.

      (B) Hyperplastic mutants:

      We agree that analysis of hyperplastic mutants would provide valuable complementary information about tissue remodelling capacity. However, as the reviewer anticipated, this would require: (1) generating stable lines of the appropriate hyperplastic mutants, (2) conducting paired HFD feeding studies, (3) performing spatial morphometric analysis comparable to our foxp1 studies, and (4) potentially distinguishing hyperplastic vs hypertrophic contributions to expansion. We agree this constitutes substantial additional experimental work beyond the scope of the current manuscript, though it represents an important direction for future studies.

      (C) foxp1 expression changes in HFD:

      Unfortunately, we do not have SAT samples from HFD-treated fish preserved for RNA analysis, and therefore cannot assess whether foxp1 expression levels change in response to dietary challenge. This would be valuable for future studies to determine whether foxp1 genes are dynamically regulated during metabolic adaptation or function as constitutive regulators of adaptive capacity.

      Following the Reviewer's guidance, we have revised throughout the manuscript to more carefully distinguish developmental patterning from metabolic adaptation.

      (6) Title: In the absence of experimental results that can distinguish between developmental effects from adipose tissue plasticity/remodeling, such as those mentioned above, the manuscript title is not accurate and should therefore be revised to be something like "hyperplastic and hypertrophic adipose morphology."

      We have now altered the title as the Reviewer suggested to “A quantitative in vivo CRISPR-imaging platform identifies regulators of hyperplastic and hypertrophic adipose morphology in zebrafish”

      Minor:

      (7) In mice studies, deleting foxp1b in adipose tissue protects mice from diet-induced obesity, while overexpressing foxp1b in adipose tissue promotes diet-induced obesity (Liu et al., Nature Communication, 2019). These overall phenotypes and foxp1b-mediated effects appear to be contradictory to what is observed in the zebrafish model. Can the authors also provide more evidence/discussion on why such a difference occurs comparing zebrafish and mice models?

      We thank the reviewer for this important comparison. We believe the apparent contradictions reflect (1) differences in adipose tissue thermogenic capacity - between species possibly, but also between functionally distinct depots and (2) whole-organism versus tissue-specific experimental approaches.

      (1) Different adipose tissue biology: browning-prone vs browning-resistant adipose

      Liu et al. (2019, PMID: 31699980) demonstrated that adipose-specific deletion of Foxp1 in mice increases thermogenesis and browning of SAT, with protection from diet-induced obesity (DIO) and improved insulin sensitivity. Conversely, Foxp1 overexpression impaired adaptive thermogenesis and promoted DIO. Mechanistically, Foxp1 directly represses β3-adrenergic receptor transcription, thereby inhibiting the thermogenic program. Strikingly, mouse Foxp1-deleted adipocytes displayed smaller, multilocular lipid droplets characteristic of brown/beige adipocytes.

      These morphological outcomes initially appear opposite to our zebrafish findings: mouse Foxp1 mutants have smaller adipocytes (due to browning), while zebrafish foxp1b mutants have larger lipid droplets (hypertrophy). We believe this fundamental difference may reflect the propensity of adipose tissue to undergo adaptive thermogenesis.

      While it was recently discovered that zebrafish possess thermogenic epicardial adipose tissue (PMID: 38507414), in general zebrafish adipose is not considered thermogenic, and zebrafish as ectotherms are thought to lack adaptive thermogenesis for thermoregulation. The exact thermogenic potential of zebrafish adipose remains to be fully characterised, but potential differences in thermogenic capacity between mouse and zebrafish adipose may help explain the distinct phenotypic outcomes.

      Importantly, Liu et al. studied mouse inguinal subcutaneous WAT - the depot most prone to browning in rodents. It remains unclear what role Foxp1 plays in browning-resistant mammalian WAT depots, where thermogenic conversion does not readily occur. In such depots, Foxp1 loss might produce phenotypes more similar to our zebrafish findings - dysregulated white adipose function without browning.

      The above hypothesis suggest that browning responses may mask other roles for Foxp1 in WAT. Interestingly, although not quantified in the paper, Liu et al.’s Foxp1 overexpression model (Ap2-Foxp1) appeared to reduce adipocyte size despite suppressing Ucp1 expression and reducing lipolysis. These data suggest more complex roles and indicate that Foxp1’s control of adipocyte size might extend beyond simply regulating thermogenesis and may involve coordinating the balance between hyperplastic versus hypertrophic expansion.

      Furthermore, human subcutaneous WAT is not as prone to browning as mouse inguinal WAT. Human browning occurs primarily in specialised depots (e.g. supraclavicular, deep neck), while the majority of human adipose tissue represents constitutive white adipose with limited thermogenic capacity. Therefore, it remains an open question whether FOXP1's primary physiological role in humans relates to thermogenesis regulation (in specialised depots) or white adipose metabolic control (in the majority of adipose tissue). Zebrafish findings examining constitutive WAT function (admittedly the lack of adaptive thermogenesis in zebrafish is presumed at this stage) may be more relevant to human adipose than initially appear.

      (2) Whole-organism vs tissue-specific effects on metabolic health

      A second apparent contradiction concerns metabolic outcomes: mouse adipose-specific Foxp1 deletion improves metabolic health (Liu et al.), whereas our zebrafish whole-organism foxp1b mutants display metabolic dysfunction (baseline hypertrophy, impaired HFD response, hyperglycaemia and fatty liver). We believe this discrepancy reflects comparison of whole-animal mutants (zebrafish) to tissue-specific deletions (mouse), rather than opposite adipose tissue functions.

      Critically, Foxp1 has established roles in hepatic glucose metabolism. Zou et al. (PMID: 26504089) demonstrated that hepatic Foxp1 inhibits expression of gluconeogenesis genes and decreases hepatic glucose production and fasting blood glucose by competing with Foxo1 for binding of insulin responsive gluconeogenic genes. In line with these observations, we observe fatty liver and hyperglycaemia in foxp1a;foxp1b double mutant zebrafish (data not shown), suggesting that the metabolic dysfunction in our whole-animal mutants may be driven primarily by hepatic Foxp1 loss rather than adipose-specific effects.

      We have expanded on the points raised here in the Discussion.

      (8) Line 522-524: "The major phenotype in foxp1a mutants was impaired adipose expansion following HFD, suggesting failure to respond to diet-induced stress signals". In the presented Figure 6j, foxp1a mutant expands adipose LD size following HFD, similar to the control, which is contradictory to the statement above. Please clarify.

      We thank the reviewer for highlighting this apparent inconsistency and apologise for imprecise wording. These measurements are actually consistent but refer to different scales of analysis.

      Tissue level (Supplementary Fig. 7): foxp1a mutants show significantly reduced total adipose expansion (based on whole-animal Nile Red images) compared to wild-type fish on HFD—this is what we refer to as "impaired adipose expansion."

      Cellular level (Fig. 6L-M): At the individual adipocyte level, foxp1a mutants show statistically significant increases in LD diameter following HFD. However, the magnitude is reduced by ~35% compared to wild-type (mutants: +14.4 µm; WT: +22.2 µm; p = 0.023).

      We have revised the text to more precisely state "reduced adipose expansion" rather than "impaired expansion" to avoid implying complete failure to respond.

    1. eLife Assessment

      This potentially valuable study investigates the interaction of two integral membrane proteins (Cdhr1a and Pcdh15b) and their roles in cone-rod dystrophy. Convincing evidence using loss-of-function mutants demonstrates that both proteins are required for cone maintenance and survival. There is insufficient evidence to support the subcellular localization and the proposed heterodimeric interaction of the two proteins from distinct subcellular compartments. The methodologies are unclear, and the statistical methods and analysis are improperly applied.

    2. Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading do this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al. makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1 associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Comments on the revised version of the manuscript:

      The authors adequately addressed previous comments related to lack of details on quantitative and statistical analyses and methods. In this regard, I believe the revised manuscript presents a stronger analysis of the data. I also appreciated the revised discussion section, which better contextualizes their new data with previous observations in different animal models.

      The authors provided additional evidence in Fig 1C-H for the co-localization of pcdh15b and actin within CPs using immunolabeling with super resolution imaging. This data firmly supports their other observations. A similar approach tends to also show co-localization of actin and cdhr1a, although the authors suggest that the pattern of expression is less overlapping, which would be expected if cdhr1a is predominately expressed in the OS membranes whereas pcdh15b is predominantly expressed in the CP membranes. In my opinion the data presented to support this separation is still not that convincing. Moreover, the authors show that both cdhr1a and pcdh15b are expressed in CPs using immuno-TEM (Fig 1I). This is a difficult question to address experimentally, and it is, of course, still plausible that pcdh15b within the CP membrane and cdhr1a within the OS membrane are interacting in trans. However, I just don't think that the data unequivocally support mutually exclusive localization of these proteins as suggested by the authors and depicted in the model in Fig 1J.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binging assay, and high- resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely opposes PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicate these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potential stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone specific phenotypes associate with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption is not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Fig 4F, 6E) as well as other morphometric data (Fig 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also not whether analysis was done in an automated and/or masked manner.

      Comments on revisions:

      Most of my concerns were addressed in this revised version.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss in less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty for this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data as presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      This is a large body of data.

      Weaknesses:

      (1) I have serious concerns about the quality of the imaging here. The premise that cdhr1a/pcdh15 juxtaposition is evidence for the two proteins mediating the connection between outer segments and calyceal processes requires very careful microscopy. The SIM images have two major issues - one being that the red and green channels are misaligned and the other being evidence of bleed through between the channels. This is obvious in Fig 2A but likely true across all the panels in Fig 2, and possibly applies to confocal images in Fig 1 as well. The co-labelling with actin shows very uneven, punctate staining for actin bundles.

      (2) The newly added TEM and transverse sections include colored regions that obscure the imaging.

      (3) The quantification should be done with averages from individual fish. Counting individual measurements as single data points artificially inflates the significance. Also, the cone subtypes are still lumped together for analysis despite their variable sizes.

      (4) I highlighted previously that the measurement of calyceal processes was incorrect. The redrawn labels in Fig 7 are now more accurate, although still difficult to interpret. However, the quantification in Fig 7O is exactly the same. How can that be if the measurement region is now different?

      (5) Lower magnification views would provide context for the TEM data.

      (6) The statement describing the separation between calyceal processes and the outer segment in the mutants is still not backed up by the data.

      (7) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs". This is now referenced, but incorrectly. Also, the issue of pigment interference was not addressed.

      (8) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading to this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al.makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1-associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) Provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) Identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Nonetheless, the study has several shortcomings in methodology, analysis, and conceptual insight, which limits its overall impact.

      Below I outline several issues that the authors should address to strengthen their findings.

      Major comments:

      (1) Co-localization of cdhr1a and pcdh15b proteins

      The model proposed by the authors is that the interaction of cdhr1a and pcdh15b occurs in trans as a heterodimer. In cochlear hair cells, PCDH15 and CDHR23 are proposed to interact first as dimers in cis and then as heteromeric complexes in trans. This was not shown here for cdhr1a and pcdh15b, but it is a plausible configuration, as are single heteromeric dimers or homodimers. Regardless, this model depends on the differential compartmental expression of the cdhr1a and pcdh15b proteins. Data in Figure 1 show convincing evidence that these two proteins can, at least in some cases, be distributed along the length of photoreceptor membranes that are juxtaposed, as would be the case for OS and CP. If pcdh15b is predominantly expressed in CPs, whereas cdhr1a is predominantly expressed in OS, then this should be confirmed with actin double labeling with cdhr1a and pcdh15b since the apicobasal oriented (vertical) CPs would express actin in this same orientation but not in the OS. This would help to clarify whether cdhr1a and pcdh15b can be trafficked to both OS and CP compartments or whether they are mutually exclusive.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we are completed imaging of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections (Fig 1C-H). Additionally, we have recently established an immuno-gold-TEM protocol and showcase co-labeling of cdhr1a and pcdh15b at TEM resolution along the CP (Fig 1I).

      Photoreceptor heterogeneity goes beyond the cone versus rod subtypes discussed here and it is known that in zebrafish, CP morphology is distinct in different cone subtypes as well as cone versus rod. It would be important to know which specific photoreceptor subtypes are shown in zebrafish (Figures 1A-C) and the non-fish species depicted in Figures 1E-L. Also, a larger field of view of the staining patterns for Figures 1E-L would be a helpful comparison (could be added as a supplementary figure).

      The revised manuscript includes labels for the location of different cone subtypes in figure 1. All of the images showcasing CHDR1 localization across species concentrate on the PNA positive R/G cones. Larger fields of view were not collected as we prioritized the highest resolution possible and therefore collected small fields of view.

      (2) Cdhr1a function in cell culture

      The authors should explain the multiple bands in the anti-FLAG blots. Also, it would be interesting to confirm that the cdhr1a D173 mutant prevents the IP interaction with pcdh15b as well as the additive effects in aggregate assays of Figure 2.

      The multiple bands on the WB is like our previous results (Piedade 2020), which we believe arise due to ubiquitination and proteolytic cleavage of cdhr1a. We expect the D173 mutation to result in a complete absence of cdhr1a polypeptide, based on the lack of in situ signal in our WISH studies.

      Is it possible that the cultured cells undergo proliferation in the aggregation assays shown in Figure 2? Cells might differentially proliferate as clusters form in rotating cultures. A simple assay for cell proliferation under the different transfection conditions showing no differences would address this issue and lend further support to the proposed specific changes to cell adhesion as a readout of this assay.

      This is a possibility; however we did not use rotating cultures, this was a monolayer culture. We did not observe any differences in total cell number between the differing transfections. As such, we do not feel proliferation explains the aggregation of K562 cells.

      Also, the authors report that the number of clusters was normalized to the field of view, but this was not defined. Were the n values different fields of view from one transfection experiment, or were they different fields of view from separate transfection experiments? More details and clarification are needed.

      This will be clarified in the revised manuscript, in short we replicated this experiment 3 times, quantifying 5 different fields of view in each replicate.

      (3) Methodological issues in quantification and statistical analyses

      Were all the OS and CP lengths counted in the observation region or just a sample within the region? If the latter, what were the sampling criteria? For CPs, it seems that the length was an average estimate based on all CPs observed surrounding one cone or one-rod cell. Is this correct? Again, if sampled, how was this implemented? In Fig 4M', the cdhr1a-/- ROS mostly looks curvilinear. Did the measurements account for this, or were they straight linear dimension measurements from base to tip of the OS as depicted in Fig 5A-E? A clearer explanation of the OS and CP length quantification methodology is required.

      The revised manuscript will clearly outline measurement methods. In short, we measured every CP/OS in the imaged regions. We did not average CPs/cell, we simply included all CP measurements in our analysis. All our CP measurements (actin or cdhr1a or pcdh15), were measured in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements (landmark) and association with proper cell type. Our new figure 7 now includes cone OS counter staining to better highlight the OS.

      All measurements were taken as best as possible to reflect a straight linear dimension for consistency.

      How were cone and rod photoreceptor cell counts performed? The legend in Figure 4 states that they again counted cells in the observation region, but no details were provided. For example, were cones and rods counted as an absolute number of cells in the observation region (e.g., number of cones per defined area) or relative to total (DAPI+) cell nuclei in the region? Changes in cell density in the mutant (smaller eye or thinner ONL) might affect this quantification so it would be important to know how cell quantification was normalized.

      The revised manuscript will clearly outline measurement methods. In short, rod and cone cell counts were based on the number of outer segments that were observed in the imaging region and previously measured for length. We did not observe any eye size differences in our mutant fish.

      In Figure 6I, K, measuring the length of the signal seems problematic. The dimension of staining is not always in the apicobasal (vertical) orientation. It might be more accurate to measure the cdhr1a expression domain relative to the OS (since the length of the OS is already reduced in the mutants). Another possible approach could be to measure the intensity of cdhr1 staining relative to the intensity within a Prph2 expression domain in each group. The authors should provide complementary evidence to support their conclusion.

      The revised manuscript will clearly outline measurement methods. In short, all of our CP measurements (actin or cdhr1a or pcdh15), were done in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements and association with proper cell type.

      A better description of the statistical methodology is required. For example, the authors state that "each of the data points has an n of 5+ individuals." This is confusing and could indicate that in Figure 4F alone there were ~5000 individuals assayed (~100 data points per treatment group x n=5 individuals per data point x 10 treatment groups). I don't think that is what the authors intended. It would be clearer if the authors stated how many OS, CP, or cells were counted in their observation region averaged per individual and then provided the n value of individuals used per treatment group (controls and mutants), on which the statistical analyses should be based.

      This has been addressed in the revised manuscript. In short, we had an n=5 (individual fish) analyzed for each genotype/time point.

      There are hundreds of data points in the separate treatment groups shown in several of the graphs. It would not be correct to perform the ANOVA on the separate OS or CP length measurements alone as this will bias the estimates since they are not all independent samples. For example, in Figure 6H, 5dpf pcdh15b+/- have shorter CPs compared to WT but pcdh15b-/- have longer compared to WT. This could be an artifact of the analysis. Moreover, the authors should clarify in the Methods section which ANOVA post hoc tests were used to control for multiple pairwise comparisons.

      We have re-analyzed the data using multiple pairwise comparison ANOVA with post hoc tests (Tukey test). This new analysis did not significantly alter the statistical significance outcome of the study.

      (4) Cdhr1a function in photoreceptors

      The Cdhr1a IHC staining in 5dpf WT larvae in Figure 3E appears different from the cdhr1a IHC staining in 5dpf WT larvae in Figure 1A or Figure 6I. Perhaps this is just the choice of image. Can the authors comment or provide a more representative image?

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we have included an image that better represents cdhr1a staining in the WT and mutant.

      The authors show that pcdh15b localization after 5dpf mirrored the disorganization of the CP observed with actin staining. They also show in Figure 5O that at 180dpf, very little pcdh15b signal remains. They suggest based on this data that total degradation of CPs has occurred in the cdhr1a-/- photoreceptors by this time. However, although reduced in length, COS and cone CPs are still present at 180dpf (Figure 5E, E'). Thus, contrary to the authors' general conclusion, it is possible that the localization, trafficking, and/or turnover of pcdh15b is maintained through a cdhr1a-dependent mechanism, irrespective of the degree to which CPs are maintained. The experiments presented here do not clearly distinguish between a requirement for maintenance of localization versus a secondary loss of localization due to defective CPs.

      We agree, this point has been addressed in our revised manuscript. Additionally, we have also included data from 1 and 2 year old samples.

      (5) Conceptual insights

      The authors claim that cdhr1a and pcdh15b double mutants have synergistic OS and CP phenotypes. I think this interpretation should be revisited.

      First, assuming the model of cdhr1a-pcdh15b interaction in trans is correct, the authors have not adequately explained the logic of why disrupting one side of this interaction in a single mutant would not give the same severity of phenotype as disrupting both sides of this interaction in a double mutant.

      Second, and perhaps more critically, at 10dpf the OS and CP lengths in cdhr1a-/- mutants (Figure 7J, T) are significantly increased compared to WT. In contrast, there are no significant differences in these measurements in the pcdh15b-/- mutants. Yet in double homozygous mutants, there is a significant reduction of ~50% in these measurements compared to WT. A synergistic phenotype would imply that each mutant causes a change in the same direction and that the magnitude of this change is beyond additive in the double mutants (but still in the same direction). Instead, I would argue that the data presented in Figure 7 suggest that there might be a functionally antagonistic interaction between cdhr1a and pcdh15b with respect to OS and CP growth at 10dpf.

      If these proteins physically interacted in vivo, it would appear that the interaction is complex and that this interaction underlies both OS growth-promoting and growth-restraining (stabilizing) mechanisms working in concert. Perhaps separate homodimers or heterodimers subserve distinct CP-OS functional interactions. This might explain the age-dependent differences in mutant CP and OS length phenotypes if these mechanisms are temporally dynamic or exhibit distinct OS growth versus maintenance phases. Regardless of my speculations, the model presented by the authors appears to be too simplistic to explain the data.

      We agree with the reviewer, as such we have revised the discussion in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binding assay, and high-resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely oppose PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicates these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potentially stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone-specific phenotypes associated with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, the results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption are not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Figures 4F, 6E) as well as other morphometric data (Figure 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also note whether the analysis was done in an automated and/or masked manner.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      The revised manuscript outlines both methods and statistics used for quantitation of our data. (please see comments from reviewer 1). While we do not include direct evidence of the mechanism of CDHR1 function, we do propose that its role is important in anchoring the CP and the OS, particularly in the cones, while in rods it may serve to regulate the release of newly formed disks (as previously proposed in mice). We do plan to test both of these hypothesis directly, however, that will be the basis of our future studies.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss is less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty of this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      Weaknesses:

      (1) The imaging data in Figure 1 is insufficient to show the specific localization of Pcdh15 to calyceal processes or Cdhr1a to the outer segment membrane. The addition of actin co-labelling with Pcdh15/Cdhr1a would be a good start, as would axial sections. The division into rod and cone-specific imaging panels is confusing because the two cell types are in close physical proximity at 5 dpf, but the cone Cdhr1a expression is somehow missing in the rod images. The SIM data appear to be disrupted by chromatic aberration but also have no context. In the zebrafish image, the lines of Pcdh15/Cdhr1a expression would be 40-50 um in length if the scale bar is correct, which is much longer than the outer segments at this stage and therefore hard to explain.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we have added images of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections. Additionally, we have established an immuno-gold-TEM protocol and provide data showcasing co-labeling of cdhr1a and pcdh15b at TEM resolution.

      (2) Figure 3E staining of Cdhr1a looks very different from the staining in Figure 1. It is unclear what the authors are proposing as to the localization of Cdhr1a. In the lab's previous paper, they describe Cdhr1a as being associated with the connecting cilium and nascent OS discs, and fail to address how that reconciles with the new model of mediating CP-OS interaction. And whether Cdhr1a localizes to discrete domains on the disc edges, where it interacts with Pcdh15 on individual calyceal processes.

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we include an image that better represents cdhr1a staining in the WT and mutant.

      (3) The authors state "In PRCs, Pcdh15 has been unequivocally shown to be localized in the CPs". However, the immunostaining here does not match the pattern seen in the Miles et al 2021 paper, which used a different antibody. Both showed loss of staining in pcdh15b mutants so unclear how to reconcile the two patterns.

      We agree that our staining appears different, but we attribute this to our antigen retrieval protocol which differed from the Miles et al paper. We also point to the fact that pcdh15b localization has been shown to be similar to our images in other species (monkey and frog). As such, we believe our protocol reveals the proper localization pattern which might be lost/hampered in the procedure used in Miles et al 2021.

      (4) The explanation for the CRISPR targets for cdhr1a and the diagram in Figure 3 does not fit with crRNA sequences or the mutation as shown. The mutation spans from the latter part of exon 5 to the initial portion of exon 6, removing intron 5-6. It should nevertheless be a frameshift mutation but requires proper documentation.

      This was an overlooked error in figure making, we have corrected this typo in the revised manuscript.

      (5) There are complications with the quantification of data. First, the number of fish analyzed for each experiment is not provided, nor is the justification for performing statistics on individual cell measurements rather than using averages for individual fish. Second, all cone subtypes are lumped together for analysis despite their variable sizes. Third, t-tests are inappropriately used for post-hoc analysis of ANOVA calculations.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript.

      (6) Unclear how calyceal process length is being measured. The cone measurements are shown as starting at the external limiting membrane, which is not equivalent to the origin of calyceal processes, and it is uncertain what defines the apical limit given the multiple subtypes of cones. In Figure 5, the lines demonstrating the measurements seem inconsistently placed.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript. We have also clarified that CP measurements were made based on a counterstain for the cone/rod OS so that the actin signal was only CP associated. We have included the counter stain in our revised Figure 7.

      (7) The number of fish analyzed by TEM and the prevalence of the phenotype across cells are not provided. A lower magnification view would provide context. Also, the authors should explain whether or not overgrowth of basal discs was observed, as seen previously in cdhr1-null frogs (Carr et al., 2021).

      The revised manuscript now includes the n number for our TEM samples. We have also added text comparing our results directly to Carr 2021.

      (8) The statement describing the separation between calyceal processes and the outer segment in the mutants is not backed up by the data. TEM or co-labelling of the structures in SIM could be done to provide evidence.

      We have completed both more SIM as well as immuno-gold TEM to support our conclusions, see new Figure 1.

      (9) "Based on work in the murine model and our own observations of rod CPs, we hypothesize that zebrafish rod CPs only extend along the newly forming OS discs and do not provide structural support to the ROS." Unclear how murine work would support that conclusion given the lack of CPs in mice, or what data in the manuscript supports this conclusion.

      In the revised manuscript we have adjusted our discussion to hypothesize that the small length of rod CPs is most likely to represent their interaction with newly forming discs rather than connect with mature discs which are enclosed in the OS.

      (10) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs" without providing a reference. In the manuscript, the measurements do show rod CPs to be shorter, but there are errors in the cone measurements, and it is possible that the RPE pigment is interfering with the rod measurements.

      We have included references where rod CPs have been found to be shorter. We have no doubt that in zebrafish the rod CPs are significantly shorter. All our CP measurements are done with a counter stain for rods and cones to be sure that we are measuring the correct cell type.

      (11) The discussion should include a better comparison of the results with ocular phenotypes in previously generated pcdh15 and cdhr1 mutant animals.

      The revised manuscript has included these points.

      (12) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

      We assure the reviewer that each of the images in supplemental figure 1 are distinct and represent different in situ experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the second sentence of the Introduction section, the acronym 'PRC' should be defined.

      This has been corrected

      (2) In the Discussion section, it would be useful to comment on differences between the published Xenopus cdhr1-/- OS phenotypes and the published zebrafish pcdh15b-/- OS phenotypes compared to the present zebrafish cdhr1a-/- phenotypes. In the published studies, OS in these mutants demonstrated dysmorphic and overgrown disc membranes compared to the relatively minor disc layering defects shown for cdhr1a-/- in the present study.

      This discussion has been added.

      (3) CDHR1 mutations in patients cause cone-rod dystrophy, but mutations in PCDH15 (Usher 1F) cause rod-cone dystrophy. In the Discussion section, the authors should comment on what might lead to these different phenotypic trajectories in humans in the context of their proposed model.

      We have added to our discussion highlighting that is not possible to assess rod-cone dystrophy in the pcdh15b model as the mutation is lethal by 15dpf, which is still before most rods mature.

      Reviewer #2 (Recommendations for the authors):

      In addition to defining the 'n' for animal and cell numbers (as well as methods of analysis - automated/masked), there are a few additional recommendations for the authors.

      (1) Expression of USH1 genes in larval zebrafish (Figure S1) is not very convincing. SC RNAseq data exists and argues against this cell type restriction.

      Based on extensive experience with WISH we are confident that our interpretation of the data are valid. Furthermore, analysis of the daniocell data base confirms that cdh23, ush1ga, ush1c (harmonin) and myo7aa all have either no expression in photoreceptors or very low levels especially compared to pcdh15b and cdhr1a.

      (2) The model in Figure 1 is great. The coloring was a bit confusing. Cdhr1 and axoneme are both in green, while Pcdh15 and actin are both in red. Can each have its own color?

      Changed pcdh15b color to blue

      (3) Figure 2A: Please explain the multiple bands in some lanes. What do the full blots look like?

      Full blots were uploaded to eLife and do not exhibit any additional bands. The multiple bands are likely due to ubiquitination or proteolytic cleavage of cdhr1a and have been documented in our previous publication (Piedade 2020).

      (4) Is "data not shown" permissible? (lack of compensation of cdh1b in cdh1a mutants) (nonsense-mediated decay of the mutant transcript).

      We have added a supplementary figure showcasing this data.

      (5) Figure 4: Is there a TEM phenotype in discs before 15dpf? One would think there would be...?

      Due to technical limitations, we have not been able to examine disc phenotypes prior to 15dpf.

      (6) Figure 5: How are calyceal processes discriminated from cortical/PM-associated actin? A bonafide calyceal marker seems to be needed. Espin or Myo3, for example.

      We discriminate to identify CPs as actin signal that originates at the base of the OS and travels along the OS. Pcdh15b is a bonafinde CP marker which we show overlaps with actin signal along CPs.

      (7) Figures 5A-J: How is actin staining for CPs discriminating between rod and cones??? Apical - basal level imaging? This could be better clarified.

      CP identification is based on co-stain for either rod or cone Oss

      (8) Figure 6: Het phenotype for pcdh15b+/- (cone OS length and CP length at 5 and 10 dpf) is surprising ... worth discussing. (Figures 6E, H).

      The discussion section has been updated to discuss this finding.

      (9) Last, the authors state "Data not shown" throughout the manuscript. I do not believe this is allowed for the journal.

      This data (cdhr1b expression in cdhr1a mutants as well as cdhr1a WISH in cdhr1a mutants) has been added as supplementary figures.

      Reviewer #3 (Recommendations for the authors):

      Major comments are addressed above and the most important is the need for a convincing demonstration of Cdhr1a localization on the outer segment and proximity to Pcdh15b. The SIM could be a powerful tool, but the images provided are impossible to assess without any basis for context. Could a membrane, Prph2, and/or actin label be added? And lower magnification views?

      Minor comments.

      (1) The mention of "short CPs" in rodents is not an accurate description. Particular rodents (e.g. mouse, rat) lack CPs altogether or have a single vestigial structure.

      We have adjusted the text to reflect this point.

      (2) Inconsistent spacing between numbers and units.

      We have corrected these inconsistencies

      (3) Missing references.

      We have added missing references

      (4) Indicate the mean or median for bar graphs.

      The materials and methods section now specifies that all of our graphs depict a mean value

      (5) Unclear how rods are distinguished from cones in the cone analysis if both are labeled with prph2 antibody.

      Rods are physiological separate from cones in zebrafish retina and therefore easily identified by location as well as their distinct pattern of actin staining.

      (6) Red and green should not be used together for microscopy images.

      (7) The diagram in Figure 1D is confusing because of the repeated use of red and green for disparate structures. Also, the location and structure of actin are misrepresented, as is the transition of disc structure during maturation in rods.

      We have adjusted the color of pcdh15b to blue.

    1. eLife Assessment

      In this important study, the authors provide a simple yet elegant approach to identifying therapeutic targets that synergize to prevent therapeutic resistance in ovarian cancer using cell lines, data-independent acquisition proteomics, and bioinformatic analysis. The authors convincingly identify several combinations of pharmaceuticals that were able to overcome or prevent therapeutic resistance in culture models of ovarian cancer, a disease with an unmet diagnostic and therapeutic need. This work will be of interest to researchers focused on ovarian cancer.

    2. Reviewer #1 (Public review):

      Summary:

      The authors provide a simple yet elegant approach to identifying therapeutic targets that synergize to prevent therapeutic resistance using cell lines, data-independent acquisition proteomics, and bioinformatic analysis. The authors identify several combinations of pharmaceuticals that were able to overcome or prevent therapeutic resistance in culture models of ovarian cancer, a disease with an unmet diagnostic and therapeutic need.

      Strengths:

      The manuscript utilizes state-of-the-art proteomic analysis, entailing data-independent acquisition methods, an approach that maximizes the robustness of identified proteins across cell lines. The authors focus their analysis on several drugs under development for the treatment of ovarian cancer and utilize straightforward thresholds for identifying proteomic adaptations across several drugs on the OVSAHO cell line. The authors utilized three independent and complementary approaches to predicting drug synergy (NetBox, GSEA, and Manual Curation). The drug combination with the most robust synergy across multiple cell lines was the inhibition of MEK and CDK4/6 using PD-0325901+Palbociclib, respectively. Additional combinations, including PARPi (rucaparib) and the fatty acid synthase inhibitor (TVB-2640). Collectively, this study provides important insight and exemplifies a solid approach to identifying drug syngery without large drug library screens.

      Weaknesses:

      The manuscript supports their findings by describing the biological function(s) of targets using referenced literature. While this is valuable, the number of downstream targets for each initial target is extensive, thus, the current work does not attempt to elucidate the mechanism of their drug synergy. Responses to drugs are quantified 72 hours after treatment and exclusively focused on cell viability and protein expression levels. The discovery phase of experimentation was solely performed on OVSAHO cell line. An additional cell line(s) would increase the impact of how the authors went about identifying synergistic targets using bioinformatics. Ovarian cancer is elusive to treatment as primary cancer will form spheroids within ascites/peritoneal fluids in a state of pseudo-senescence to overcome environmental stress. The current manuscript is executed in 2D culture, which has been demonstrated to deviate from 3D, PDX, and primary tumours in terms of therapeutic resistance (DOI: 10.3390/cancers13164208). Collectively, the manuscript is insufficient in providing additional mechanistic insight beyond the literature, and its interpretation of data is limited to 2D culture until further validated.

      Comments on revisions:

      The reviewer has no further recommendations for the authors.

    3. Reviewer #2 (Public review):

      Summary:

      Franz and colleagues combined proteomics analysis of OVSAHO cell lines treated with 6 individual drugs. The quantitative proteomics data was then used for computational analysis to identify candidates/modules that could be used to predict combination treatments for specific drugs.

      Strengths:

      The authors present solid proteomics data and computational analysis to effectively repeat at the proteomics level analysis that have previously been done predominately with transcriptional profiling. Since most drugs either target proteins and/or proteins are the functional units of cells, this makes intuitively sense.

      Weaknesses:

      Considering the available resources of the involved teams, preforming the initial analysis in a single HGSC cells is certainly a weakness/limitation. During the revision additional cell lines were used for verification.

      The data also shows how challenging it is to correctly predict drug combinations. In Table 2 (if I read it correctly) the majority of the drug combinations predicted for the initial cell line OVSAHO did not result in the predicted effect. It also shows how variable response was in the different HGSC cell lines used for combination treatment. The success rate will most likely continue to drop as more sophisticated models are being used (i.e., PDX). Human patients are even more challenging.

      It would most likely be useful to more directly mention/discuss these caveats in the manuscript. This was added to the discussion during the revision. Overall the authors have responded to previous suggestions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors provide a simple yet elegant approach to identifying therapeutic targets that synergize to prevent therapeutic resistance using cell lines, data-independent acquisition proteomics, and bioinformatic analysis. The authors identify several combinations of pharmaceuticals that were able to overcome or prevent therapeutic resistance in culture models of ovarian cancer, a disease with an unmet diagnostic and therapeutic need.

      Strengths:

      The manuscript utilizes state-of-the-art proteomic analysis, entailing data-independent acquisition methods, an approach that maximizes the robustness of identified proteins across cell lines. The authors focus their analysis on several drugs under development for the treatment of ovarian cancer and utilize straightforward thresholds for identifying proteomic adaptations across several drugs on the OVSAHO cell line. The authors utilized three independent and complementary approaches to predicting drug synergy (NetBox, GSEA, and Manual Curation). The drug combination with the most robust synergy across multiple cell lines was the inhibition of MEK and CDK4/6 using PD-0325901+Palbociclib, respectively. Additional combinations, including PARPi (rucaparib) and the fatty acid synthase inhibitor (TVB-2640). Collectively, this study provides important insight and exemplifies a solid approach to identifying drug synergy without large drug library screens.

      Weaknesses:

      The manuscript supports their findings by describing the biological function(s) of targets using referenced literature. While this is valuable, the number of downstream targets for each initial target is extensive, thus, the current work does not attempt to elucidate the mechanism of their drug synergy. Responses to drugs are quantified 72 hours after treatment and exclusively focused on cell viability and protein expression levels. The discovery phase of experimentation was solely performed on the OVSAHO cell line. An additional cell line(s) would increase the impact of how the authors went about identifying synergistic targets using bioinformatics. Ovarian cancer is elusive to treatment as primary cancer will form spheroids within ascites/peritoneal fluids in a state of pseudo-senescence to overcome environmental stress. The current manuscript is executed in 2D culture, which has been demonstrated to deviate from 3D, PDX, and primary tumours in terms of therapeutic resistance (DOI: 10.3390/cancers13164208). Collectively, the manuscript is insufficient in providing additional mechanistic insight beyond the literature, and its interpretation of data is limited to 2D culture until further validated.

      We appreciate your positive remarks on the use of NetBox, GSEA, and human curation for predicting anti-resistance effects of second drugs. Regarding the weaknesses you identified:

      Mechanistic Insight: We agree that our current work interprets findings using prior published knowledge and does not attempt to infer detailed mechanisms of drug resistance of the nominated drug combinations. Our primary goal with this study was to establish a robust, unbiased proteomic and computational pipeline for proposing anti-resistance drug combinations, rather than to fully characterize the downstream molecular effects for each combination or to prove causation. To get closer to mechanistic insight, meaning detailed hypotheses of causative interactions, one would need to investigate anti-resistance effects in other pre-clinical materials as a crucial next step for the most promising combinations identified. This was out of scope for us. We assume the proposed combinations are useful for focussed follow-up in the community.

      Discovery Phase on a Single Cell Line: Our discovery phase was focused solely on the OVSAHO cell line due to its resemblance to surgical ovarian cancer samples. Including additional cell lines in the initial proteomic-response discovery phase plausibly would have enhanced the generalizability. But this was not done due to resource constraints. However, we did perform more extensive validation of the effect of drug combinations on proliferation in several cell lines to explore broader applicability.

      2D Culture Limitations: We are fully aware of the limitations of 2D cell culture models, especially in the context of ovarian cancer, where in clinical reality interactions with the microenvironment and other effects can have significant roles in therapeutic resistance. Adn we recognize that in lab experiments 2D culture does not fully recapitulate the complexities of 3D tumors, PDX models, or primary patient tumors. We have added citations to the relevant literature (including the reference you provided), and have emphasized in the Discussion that our findings serve as a strong foundation for future experimental tests (validation) in more physiologically relevant experimental model systems.

      Reviewer #2 (Public review):

      Summary:

      Franz and colleagues combined proteomics analysis of OVSAHO cell lines treated with 6 individual drugs. The quantitative proteomics data were then used for computational analysis to identify candidates/modules that could be used to predict combination treatments for specific drugs.

      Strengths:

      The authors present solid proteomics data and computational analysis to effectively repeat at the proteomics level analysis that have previously been done predominantly with transcriptional profiling. Since most drugs either target proteins and/or proteins are the functional units of cells, this makes intuitive sense.

      Weaknesses:

      Considering the available resources of the involved teams, performing the initial analysis in a single HGSC cell is certainly a weakness/limitation.

      The data also shows how challenging it is to correctly predict drug combinations. In Table 2 (if I read it correctly), the majority of the drug combinations predicted for the initial cell line OVSAHO did not result in the predicted effect. It also shows how variable the response was in the different HGSC cell lines used for the combination treatment. The success rate will most likely continue to drop as more sophisticated models are being used (i.e., PDX). Human patients are even more challenging.

      It would most likely be useful to more directly mention/discuss these caveats in the manuscript.

      Thank you for your summary and positive comments. Regarding the weaknesses you identified:

      Initial Analysis in a Single Cell Line: We concur with your assessment that performing the initial analysis in a single HGSC cell line (OVSAHO) is a limitation. As mentioned in our response to Reviewer #1, resource limitations caused this decision, and we acknowledge that a broader initial screen would have strengthened generalizability. We added this limitation in the discussion section, emphasizing use of diverse cell lines in the initial protein response profiling as an area for future work.

      Challenges in Predicting Drug Combinations and Variability: We thank the observation regarding the challenges in predicting the effect of drug combinations and the variability of antiproliferative effects observed in different HGSC cell lines (Table 2). As with any predictive method, our computational-experimental pipeline is not guaranteed to identify with absolute certainty additive or synergistic interactions, but generates data-informed hypotheses to be considered in the presence of other available observations. We now emphasize in the Discussion that while our computational pipeline provides plausible anti-resistance candidates, the precise results (extent of additivity or synergy) differ in different cell lines. This underscores that experimental validation across diverse physiological models, such as PDXs or organoids (not just additional cell lines) is an essential criterion of validity of the generated hypotheses. And we underscore the (obvious) challenge of the ultimate translation of pre-clinical experiments to therapeutic effects in humans.

      In revision, we have clarified in detail the expectation of predicted synergy implied by the reviewer’s comment, “the majority of the drug combinations predicted for the initial cell line OVSAHO did not result in the predicted effect”. This reflects a misunderstanding of our goals. The predictions are for drug effects that are anti-resistant, such that the proteomic response to one drug is counteracted by the second drug. The predicted effect is not synergy. Indeed, useful anti-resistance effect does not require synergy - additivity is sufficient: if cells are resistant to the original drug, the second drug plausibly still has antiproliferative effect, as it targets the cellular processes that are increased in activity (upregulated) in response to the first drug. So we deleted the red synergy color in Table 2 to avoid the potential conclusion from our results that without synergy, there is no benefit to a drug combination. In fact, additive drug combination effects are in themselves beneficial. For clarity on this point, added coloring in Table 2 to highlight the small number of combinations that did not work well in that the combination was clearly antagonistic, using a combination index CI >= 2.0 cutoff; we clarify this point in the Discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 2b. This figure would be more impactful if presented as an upset plot with the same Venn diagram embedded. I am not sure Figure 2C accurately supports the statement : "Frequently affected proteins generally had expression level changes in the same direction across all drug perturbations (Figure 2c), indicating a potential general stress response. ". It would be beneficial if the authors could present the data in a way that shows the number of genes with similar directional groupings. Likewise, the color scheme for this figure is hard to interpret as grey is the most negative value and values are preselected for absolute fold-change. Please consider colors with a stronger contrast.

      Authors should consider uploading MS files to the PRIDE or MASSIVE repository.

      We have addressed these very useful suggestions. We have edited Figure 2b to include the requested upset plot. It serves to illustrate the intersection of proteins responding to different perturbation conditions; due to figure space constraints, we limit the figure to entries with counts of at least 15. We have added the number of proteins with consistent directional changes in the figure 2c caption and the text.

      For Figure 2c, we have edited the color bar legend to better reflect the colors that appear in the heatmap.

      We have added our mass-spectrometry drug-response dataset to the ProteomeXchange Consortium via PRIDE with accession number PXD066316.

    1. eLife Assessment

      This valuable computational study presents a conceptually simple and biologically plausible reinforcement-learning framework for motor learning based on policy-gradient methods. The evidence supporting the conclusions is convincing, including rigorous mathematical derivations of learning rules for the mean and variance of motor commands and simulation results for three sets of experimental data, based on three different motor learning tasks from the literature. However, there is a lack of a clear description of the specific conditions under which this framework yields unique mechanistic insights or predictive values, hence falling short of qualifying as a "general theory of motor learning". The work will be of interest to researchers in computational motor learning and motor neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      This study proposes a simple and universal reinforcement-learning framework for understanding learning in complex motor tasks. Central to the framework is a policy-gradient algorithm, in which motor commands are updated not via the gradient of the reward with respect to policy parameters, but via the gradient of the policy itself, scaled by reward information. The authors demonstrate that this scheme can reproduce learning dynamics that have been reported in previous empirical studies.

      Strengths:

      The key contribution of this study lies in its application of a policy-gradient algorithm to describe motor learning processes. This idea is biologically plausible, as computing the gradient of the policy with respect to its parameters is likely to be substantially easier for the nervous system than computing the gradient of the reward with respect to policy parameters. The authors present three representative examples showing that this scheme can capture several aspects of motor learning dynamics. Notably, providing such a unified description across different tasks has been difficult for conventionally proposed learning frameworks, such as supervised learning.

      Weaknesses:

      While this scheme is valuable in that it captures certain aspects of learning dynamics, I find that its overall significance is limited for the following reasons.

      (1) The empirical results examined in this study primarily demonstrate that motor learning drives performance toward the spatial task goal while reducing variability. Given that the policies are expressed using Gaussian distributions and that their parameters (i.e., the mean and covariance matrix) are updated during learning, it is not surprising that the proposed scheme can reproduce these results by fitting the parameters to the data.

      (2) The proposed framework assumes that the motor learning system relies on the gradient of the policy with respect to its parameters. However, I am not convinced that this assumption is always appropriate, because in all three empirical studies examined here, explicit spatial error information is available. In such cases, the motor learning system could, in principle, compute the gradient of the error with respect to the policy parameters directly, without relying on a policy-gradient mechanism.

      (3) Most importantly, it remains unclear how the proposed scheme advances our understanding of the underlying learning mechanisms beyond providing a descriptive account of the learning process. While the framework offers a compact mathematical description of learning dynamics, it is uncertain how it can yield novel mechanistic insights or testable predictions that distinguish it from existing learning models.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Haith applies, and to some extent extends, the theoretical framework of policy gradient (PG) and the derived REINFORCE learning rules to human motor learning. This approach is coherent because human motor skill learning is characterized by improvements in both accuracy and precision (the inverse of variance), and REINFORCE provides update rules for both the mean and the variance of the motor commands.

      Weaknesses:

      The mean update (equation 4) is given in task space (i.e., angle and velocity for the skittle task), but the covariance update (equation 5) is given in eigenvector space. This formulation appears to have been provided for computational convenience, as it ensures that the variances are always positive by exponentiating the eigenvalues. However, this eigenspace formulation is somewhat artificial and complex (notably the update rule for the orientation of the covariance matrix) and seems far from biological reality. A simpler alternative, suggested by the author, is to provide the full covariance matrix, including crossed terms, and derive equations to update the diagonal variance terms and the cross-terms (perhaps after a transformation to keep all elements positive if needed). This would provide a simpler and more biologically plausible update to the covariance matrix terms, in the spirit of the original REINFORCE algorithm. The author suggests that he has derived the update rule for the cross terms, so this should be relatively easy to write and update, especially for the skittle learning rules. If the author wishes to keep their rules in simulations, then the two mathematical rules could be presented in the methods or a supplementary material section.

      The discussion about binary rewards and the increase in variance in previous experiments is potentially interesting. However, I do not understand why variance cannot increase with the policy-gradient RL update? Surely, equation 5 can lead to both an increase and a decrease in variance depending on the reward prediction error and the noise (for example, suppose the noise at trial i is small and leads to a smaller reward than the baseline; variance would increase). It would be interesting to see detailed simulation results for the skittle task showing changes in both mean and variance across a few consecutive trials, with both increases and decreases in reward prediction errors. These results could then be compared in simulations with those of a task with discrete binary rewards.

      Generalization is a major feature of human learning, but it is not discussed or studied here. In fact, in the de novo task simulations, there can be no generalization because the values are modeled as running averages for each target rather than derived from a critic network. Can the author discuss this point and, ideally, show generalization results in simulations, say in the skittle task?

      The application of the model to reproduce the Shmuelof et al. data is, at the same time, justified (because one of their main results is an improvement in precision, which Policy Gradient directly addresses) and somewhat "forced," as the author approximates curved movements with a series of straight-line movements. The author therefore needs to specify multiple via points with PG updating and a reward function that also enforces smoothness. The justification for the Guigon 2023 model seems somewhat artificial because it mainly applies to slow movements. Can the author comment and discuss alternatives that do not require via points, drawing from the robotics literature if needed (Schaal's Dynamic Movement Primitives come to mind, for example).

      Policy Gradient requires both a "noisy" and a clean "pass", making it non-biological in its simplest form. Legenstein et al. (2010) and Miconi (2017) provided biologically plausible forms for the mean update. Since Policy Gradient is proposed as a model of human motor learning, can the author discuss the biological plausibility of the proposed learning rules and possible biologically plausible extensions?

    1. eLife Assessment

      This study addresses an important gap in drug discovery by delivering a rigorous, large-scale evaluation of widely used co-folding methods for predicting ligand-bound protein complexes and virtual screening. A key strength is the comprehensive benchmarking framework, which leverages structures and chemical compounds that were absent from the AI models training set, thereby providing particularly compelling and unbiased evidence of co-folding performance. The findings clearly delineate the complementary roles of deep learning-based co-folding and physics-based docking, offering practical guidance for their rational integration into drug discovery workflows. Although the conclusions are convincing, improvements in the test cases, presentation, and usability can further strengthen the overall impact.

    2. Reviewer #1 (Public review):

      The authors conducted a comprehensive benchmarking and evaluation of co-folding platforms, including AlphaFold3, Boltz-2, Chai-1, and the docking algorithm Dock3.7, which employs a physics-based scoring function that incorporates van der Waals interactions, electrostatics, and ligand desolvation energies. The system of interest was the SARS-CoV-2 NSP3 macrodomain (Mac1), an increasingly popular antiviral target, and the ligand sets comprised 557 unseen ligand poses (keeping the training for these co-folding platforms in mind). Additionally, the authors investigated whether the co-folding models could distinguish true ligands from non-binding small molecules. The study is thorough, with extensive statistical support and consensus across multiple metrics (chemoinformatics for quantifying ligand similarity and efficacy). The questions that the authors aim to address are whether the co-folding models struggle with memorization, whether they can distinguish between a true and a false binder, whether they replicate experimental binding affinities and efficacy, and how they compare to the physics-based docking algorithm (Dock3.7).

      Strengths:

      Overall, this is a scientifically solid paper. The work is highly detailed and well executed, featuring thorough data analysis and statistical assessment.

      Weaknesses:

      My main concern is that the study's aim is a bit unclear. Modern benchmarking studies comparing physics-based docking with deep learning-based co-folding approaches (e.g., AF3, Boltz-2, Chai-1, and others) are increasingly expected to go beyond aggregate performance metrics. In addition to rigorous dataset construction, transparent methodology, and appropriate statistical evaluation, high-impact benchmarks typically provide actionable guidance on when each method class is most appropriate, reflecting their distinct inductive biases and practical constraints. Failure-mode analyses that link performance differences to protein flexibility, ligand chemistry, or binding-site characteristics are particularly valuable, as they move comparisons beyond "scoreboard" assessments toward mechanistic understanding. While full biological validation is not expected, qualitative interpretation grounded in physical and biological principles strengthens conclusions. Providing reproducible workflows or reference pipelines is not mandatory, but it is increasingly viewed as a best practice because it facilitates adoption and helps contextualize results for practitioners.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Kim et al. evaluates the performance of three modern AI-based methods in predicting complex structures and binding affinities between proteins and chemical compounds. An honest 'prospective' evaluation is achieved by studying benchmark structures and chemical compounds that did not exist in the PDB at the time the AI structure prediction models (AlphaFold3, Chai-1, Boltz-2) were trained.

      Strengths:

      (1) The study addresses an important question in modern computational biology and drug discovery, and establishes the strengths and limitations of the three tools in solving various computational chemistry tasks, including compound pose prediction, active-inactive discrimination, and potency ranking.

      (2) The conclusions are based on examination of four separate targets and respective compound datasets, where for one of the targets, the authors also obtained numerous X-ray structures to serve as experimental answers for the binding pose prediction task.

      (3) The study reports relationships between structure prediction confidence, predicted energies (DOCK3.7), and affinity predictions (Boltz-2) with the geometric accuracy of compound pose prediction as well as the experimentally measured potency.

      (4) One of the key findings is the limited ability of co-folding methods to predict conformational rearrangements, which does not correlate with their ability to predict binding poses of the compounds inducing these rearrangements.

      (5) The findings could serve as useful guidelines for computational chemists in selecting appropriate software and scoring schemes for each task.

      Weaknesses:

      While I consider this a solid study, several aspects would need to be addressed to make it really strong:

      (1) DOCK3.7 docking and scoring experiments were performed using one experimental structure of Mac1, selected from dozens of structures based on a criterion that is not sufficiently well justified. For sigma2 receptor, dopamine D4 receptor, and AmpC β-lactamase, it is not clear which structures or models were selected for docking at all. It is well known that geometry predictions, scoring, and active-inactive ROC AUCs are all strongly influenced by the selected structure. It would be important to attempt Mac1 docking using all available experimental Mac1 structures, or at least against representative structures in various conformations; it would also be quite insightful to compare results to docking of the same compound sets to AF3, Boltz-2 and Chai-1 predicted structures of Mac1. Same goes for the docking studies of sigma2, D4, and AmpC β-lactamase.

      (2) For binding affinity predictions, as a control, authors should consider compound co-folding with an unrelated protein, or even with a pseudo-peptide that consists of a few random single amino acids - this would provide an honest baseline for such predictions.

      (3) ROC curves Figure 3 and elsewhere should be shown, and AUCs quantified/reported on a log or square-root scaled x-axis, to emphasize early enrichment, which is the area of practical significance for these predictions. For example, Figure 3A currently suggests that the pose prediction performance of AF3 exceeds that of Boltz-2 whereas the early enrichment is clearly better for Boltz-2.

      (4) 'Trained set' in figures and text should probably be 'training set'? Or otherwise explain this new term the first time it is introduced.

      (5) Figure 1 illustrates a projection onto the first two principal components of a space that apparently had only one (scalar) metric for each compound pair (% maximum common substructure or Tanimoto coefficient); the authors need to better explain the principle behind this analysis and visualization.

    4. Reviewer #3 (Public review):

      Summary:

      This study's core conclusions are well-supported by data. It is shown that co-folding outperforms docking in known ligand pose/affinity prediction (validated by RMSD and IC₅₀ correlation), struggles with false-positive discrimination in virtual screens (lower AUC values), and is complementary to docking (non-correlated errors, distinct strengths in drug discovery stages).

      Strengths:

      (1) Unprecedented prospective design with 557 novel Mac1-ligand complexes ensures rigorous, independent evaluation of co-folding methods.

      (2) Comprehensive comparison of 3 co-folding tools (AlphaFold3, Chai-1, Boltz-2) with DOCK3.7 across diverse targets and metrics enables nuanced performance assessment.

      (3) The study clearly demonstrates complementary roles of co-folding (superior pose/affinity prediction for known ligands) and docking (better hit prioritization), and addresses deep learning memorization concerns via ligand similarity analysis.

      Weaknesses:

      (1) Limited generalization to diverse protein families (e.g., no ion channels/transporters).

      (2) Ambiguity in the mechanism underlying co-folding's failure to predict rare conformational changes.

      (3) Virtual screen comparison is unbalanced (docking-prioritized hit lists bias results).

    1. eLife Assessment

      In this study, the authors describe the degradation of HDACs in late HSV-1 infection and attempt to link this phenomenon to HDAC export to the cytoplasm and to DNA damage response. However, the evidence is incomplete, as many of the experiments are lacking in rigor. As a result, mechanistic links to the proposed model are weak.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors propose that HSV-1 infection degrades the class I histone deacetylases HDAC1 and HDAC2. The MDM2 E3 ubiquitin ligase from the DNA damage response pathway is responsible for ubiquitinating these HDACs that are subsequently degraded via proteasomes. The authors hypothesize that HDAC degradation will cause hyperacetylation of viral chromatin and enable viral gene transcription.

      Strengths:

      The ubiquitination of HDAC1 & HDAC2 by Mdm2 and the mapping studies are clear.

      Weaknesses:

      (1) Degradation of HDACs is observed late, at least 12-24 h post-infection (1 PFU/cell). Viral genes have been transcribed by that point, and the virus has replicated its genome. The kinetics do not match the proposed model.

      (2) The authors need to connect these findings with their story. As of now, these findings are correlative. For example, what is the impact of MDM2 depletion on viral gene expression and progeny virus production? Leptomycin B is not specific to the HDAC cytoplasmic translocation, and its effect on the infection could be due to its effect on ICP27.

      (3) The time point when the inhibitors were added to the cultures has not been stated in any experiment. If inhibitors were added with the virus, viral gene expression would be blocked.

      (4) The authors need to present late gene expression data in all the experiments where drugs have been used.

      (5) Figure 1A, ICP4 is not detected up to 12 hours post-infection of HeLa cells with 1 PFU/cell. This cannot be true.

      (6) Leptomycin B blocks nuclear/cytoplasmic shuttling of ICP27 that brings viral mRNAs to the cytoplasm to be translated. So, the effect of LMB is not specific to the HDACs.

      (7) The key experiment is to use the degradation-resistant form of HDAC1 to evaluate its impact on viral gene transcription.

      (8) In the experiment where Mdm2 was depleted, the authors need to demonstrate the effect on the infection. ICP4 expression is not enough. How about growth curves? After Mdm2 depletion, ICP4 expression increases, which may contradict the authors' findings. An analysis of alpha and gamma gene expression is important.

      (9) Why did the authors analyze a liver HSV-1 infection and not a more relevant skin infection?

    3. Reviewer #2 (Public review):

      Summary:

      The authors discovered that HDAC1/2 are degraded in HSV-1 and PRV infections. They attempted to establish a new mechanism by which HDAC1/2 are translocated to the cytoplasm to be degraded in HSV-1 infection, and the degradation causes changes in histone acetylation to affect the DDR pathway.

      Strength:

      (1) Interesting findings of HDAC1/2 degradation during HSV-1 and PRV infection, and it may impact more than the virology field.

      (2) Significant work to identify the ubiquitin site in HDAC1/2 and K63 linkage.

      Weaknesses:

      (1) Insufficient evidence to support the mechanism described by the authors.

      (2) Expansion of the conclusion to alphaherpesvirus without studying the intended mechanism in PRV infection.

      Overall, there may be a correlation between HDAC1/2 level, ATM/ATR phosphorylation, and HDAC1 translocation during the HSV-1 infection. However, core evidence supporting the mechanism that a) HDAC1 export causes its degradation, b) degradation of HDAC1 causes histone acetylation changes and DRR activation has not been sufficiently demonstrated.

    4. Reviewer #3 (Public review):

      The authors state that infection of cells by the alphaherpesviruses HSV-1 or PRV leads to a proteosome-dependent reduction in levels of HDAC1 and HDAC2 and that this leads to chromatin hyperacetylation, a DNA damage response, and greater replication of these viruses. Previously, other authors reported no change in levels of HDAC1 and HDAC2 after HSV-1 infection of human cells, but this paper is neither cited nor commented on in this new submission. The experiments are poorly designed. For instance, most of the time points analysed are way beyond the time needed for HSV-1 replication and are therefore not biologically relevant. The infections are done with a dose of virus that does not ensure that all cells are infected synchronously, but rather infection spreads from cell to cell with multiple rounds of replication. Some essential controls are missing. Additionally, this reviewer feels that the data presented do not support the conclusions drawn. Currently, links are not established between a reduction in HDAC1/ 2 and other phenomena such as hyperacetylation of histones, a DDR, and altered virus replication. The paper does not identify which HSV or PRV protein(s) induce reduction in HDACs, nor how the HDACs mediate antiviral activity; what are the HSV-1 or PRV protein targets? Lastly, the paper is not well prepared, and it does not adequately refer to prior literature.

    1. eLife Assessment

      This useful study examines patterns of clonal reproduction and somatic mutations in 'Pando', a massive, quaking aspen clone consisting of ~47000 stems. Because the study relies on relatively low-coverage, reduced-representation genomic resequencing data for the detection of somatic mutations, the evidence provided for several of the primary conclusions about clone age and the relationship between mutation accumulation and geographic distance is incomplete.

    2. Reviewer #1 (Public review):

      Summary

      The authors use reduced-representation sequencing (GBS) across samples from the quaking aspen clonal stand Pando to identify putative somatic mutations, which were used to estimate clone age, and evaluate whether somatic variation shows spatial structure across the grove. This is a compelling and charismatic system to look at somatic mutation in plants. They report little sharing of putative somatic mutations as a function of distance and interpret this as evidence for weak mutation transmission or homogenization over time, potentially driven by rapid root growth and clonal spread dynamics. They use mutations to estimate clone age. The authors are generally upfront and commendably transparent about limitations in sequencing depth and mutation calling. The paper addresses an interesting research system, but struggles to overcome limitations in the suitability of the data.

      Strengths.

      This is a fantastic system and an interesting set of questions. The authors' GBS data does a great job distinguishing Pando from its neighbors, which is an important first step in studying the history of this clone.

      The manuscript is upfront and highlights the need for improved data to refine inference, for example: "Higher-coverage whole-genome sequencing, and ideally single-cell sequencing of defined meristem lineages, will be needed to refine mutational and evolutionary parameter estimates in this iconic organism."

      It also states that "either we are missing roughly 80% of true somatic mutations or only 20% of the mutations we detect are true positives."

      I appreciate that the authors report an age estimate range that considers the breadth of potential false negatives and positives.

      Weaknesses

      I am still not sure whether the paper overcomes issues with the use of GBS for somatic mutation calling.

      I found it difficult to reconcile the manuscript's description of the call set as "conservative" with the reported validation tests (calibrated by looking at retained variants detected in 2 of 8 technical replicates). How was this threshold determined? A mutation with 2/8 has quite low reproducibility, which could reflect either substantial false negatives under low depth (true variants frequently dropping out) or false positives that recur sporadically due to library - or sequencing-specific artifacts. Without stronger internal diagnostics or external validation, it is hard to determine which applies here.

      The GBS sequence space and genomic distribution could be more clearly explained. According to the methods, "The total number of base pairs sequenced(129,194,577) was estimated using angsd, and reduced following the proportion of base pairs that we filtered out because of low coverage (48%)." What does the 129M basepairs represent? Is that 129M/genome length, or is it the number of aligned basepairs (i.e., 1M genome covered x129 depth)? In addition, summarizing where GBS loci fall across the genome, genic vs intergenic vs TE; repetitive vs unique, since these can have substantially different somatic mutation rates (Meyer et al. 2025). Without additional summary/descriptive statistics, it is hard to interpret both missingness and "rate".

      Statistical concerns about some results. In the Figure 3 legend, the authors state that the sample-level relationship between shared variants and distance is significant: "Pearson correlation coefficient ... is −0.02, 95% CI = [−0.05, 0.00], which is significantly different from a randomized distribution (P < 0.001) (B)." However, as plotted in Figure 3B, the observed correlation (−0.02) appears to fall well within the bulk of the randomized distribution of correlation coefficients. If the reported P value is intended to be permutation-based (i.e., the tail probability under the randomized null), it is unclear how P could be < 0.001 given that the observed value does not appear extreme relative to the null.

      The developmental program of plant stem cell layers is essential, but not discussed much. In a root-spreading clone, expectations about mutation sharing depend strongly on how new ramets arise developmentally (root-derived meristem initiation) and how layered meristems partition mutations across tissues (e.g., L1/L2/L3). I was surprised there was not a substantial discussion of the details about the layer specificity of somatic development and mutation accumulation in plants. Especially relating to mutations that would be shared between roots/shoots around potential layer-specific growth of roots. The current analysis seems to focus on comparisons within tissue types (e.g., leaves between ramets), but did not report informative tests between tissue and within-ramet (e.g., in heavily sampled trees, whether a ramet's root, shoot, leaves, share a subset of variants; whether neighboring ramets share root-lineage variants more than shoot-lineage variants). It would help to articulate expectations and clarify what the data can and cannot test. Relatedly, for "mutation rates," in aging material, it would be good to discuss which meristem layer(s) each tissue is likely sampling and how layer-specific mutation dynamics (e.g., reported differences between L1 vs L2 lineages) could influence rate and therefore age estimates (Goel et al. 2024, Amundson et al. 2025).

      Developmental mosaicism makes expected allele fractions lower than discussed in the paper. The supplement states, "However, because the Pando clone is triploid, it reduces our expectation for fixation of a mutation to 0.33", but this ignores layer-specific stem cells in plant development. True that if calls are made against a haploid reference, then a new somatic mutation in a triploid background is expected around ~1/3 allele fraction - but only if fixed in 100% of cells. Layer-specificity (e.g., L1 vs L2 vs L3 restriction) or polyclonal founding events will push expected allele fractions substantially lower. Therefore, at ~12-14× depth (or min of 4x), these allele fractions translate into only a handful (or even 0) of alternate reads (<<33% is expectation).

      Within-tree replicate consistency was unclear. The manuscript hints at multiple samples/replicates per tree (e.g., Figure S2), but it is not clear how often the same putative somatic variants are recovered across samples from the same ramet and tissue. A simple reproducibility summary would be extremely helpful: for variants called in one sample, what fraction are recovered in other samples from the same tree (by tissue), what variant allele fractions, and how do their spectra compare to mutations unique to a single sample?

      The manuscript did not provide supplemental tables or mutation calls. Supplemental tables containing pre-filter and/or post-filter calls (or some other structured data file with flags indicating various quality metrics, REF vs ALT depths at minimum, REF call, and ALT call) would substantially improve transparency and ability to evaluate the work.

    3. Reviewer #2 (Public review):

      Summary:

      The topic of the paper is intriguing as it sets out to age one of the potentially largest living organisms, a tree clone (Pando), using shallow genome resequencing of a large number of replicate samples. The key result is that the Pando clone is several tens of thousands of years old, which is of high-interest to plant genomics and evolutionary ecology.

      Weaknesses:

      Unfortunately, the claims are not matched by the available data and their analysis. Probably, the results can also not be resurrected using modified analyses, as the available data are not suited to reliably detect somatic genetic variation as a means to age-clonal plants.

      In order to reliably age clones, one needs to consider the full process by which clone mates genetically diverge from one another over time, which starts with a plant's apical meristem (SAM). From this, all above-ground tissues such as twigs and branches, as well as leaves, are derived, which has been beautifully worked out now in oaks and many fruit trees (e.g., doi: 10.1101/2023.01.10.523380 ; 10.1101/2024.01.04.573414). For the accumulation and propagation of fixed somatic genetic variation, only the processes in the SAM matter. Hence, it does make little sense to look at tissue-specific mutations unless one is invoking non-cell division induced mutations through UV light. Those, however, would remain undetected with the present low-coverage sequencing as they cannot leave the mosaic status any more, as that tissue is essentially non-dividing.

      Somatic genetic drift (https://www.nature.com/articles/s41559-020-1196-4) is the foundation for the fixation of somatic genetic variation and hence, for ageing (plant) clones. It requires quantitative modeling of the processes at the cell-line level when new modules, here, aspen trees are formed, in particular N (cell population size) and N0 (founder cell size).

      Calibrations have to be made using the mutation and fixation rate at the somatic cell lineage level, ideally also with some empirical data. In trees such as aspen, it would be very easy to obtain calibration points of branch tips that have physically and thus genetically diverged upon a defined TCA to directly determine the rate of accumulation of somatic genetic variation by direct dendrochronology (i.e., counting tree rings).

      Instead, in the present work, a mutation rate from another tree species is taken, which will introduce a lot of uncertainty into the estimates, given that tree SAMs divide at a very different pace (see doi 10.1093/evolut/qpae150). It is clear that a small difference in the assumed mutation rate, e.g., a higher one, would conversely reduce the age estimate considerably.

      I am doubtful that a conventional phylogenetic model based on coalescence, such as the one employed here, can be utilized, as it assumes a sexually recombining population and hence variable sites. A model simulation on an asexually evolving population would be needed to check this.

      In order to reliably call somatic genetic variation, a decent coverage of short-read sequences is needed, definitely > 15x, which was achieved in the present dataset. This is particularly relevant as a fixation in one of the three haploid chromosome sets would just amount to a read frequency of only 0.33. A coverage of only 4x reads per called site seems very low to me; in other words, the filtering steps do not seem to be very rigorous to me. It is also difficult to follow the logic of several ad hoc adjustments that were made to compensate for the low coverage of sequencing, in particular, the common panel and the replicate identical samples. Why chose 80% in the latter?

      There are alternative, non-sequencing-based ways to double-check the accuracy of somatic SNP calls (e.g., described here https://www.nature.com/articles/s41559-020-1196-4), which could have been employed at least once to evaluate the error rates for the specific sequencing strategy.

      I also suggest that for any future study, reference to mutation callers developed for cancer somatic mutation detection should be employed, which are now increasingly used both in clonal plants and trees for that purpose.

      What worries me is that there is a poor correlation between physical and genetic distance. This lack of correlation among spatial and genetic structure, for example, the star-like phylogeny presented in Figure 6d, indicates a large fraction of false positives rather than some special, as yet unexplained processes of local mutation accumulation that the authors claim to have discovered.

      Finally, the work is not properly embedded into the current literature. For example, recent developments of molecular clocks were not considered, such as the development of a dedicated somatic genetic clock that precisely addresses this question (https://www.nature.com/articles/s41559-024-02439-z). Also, older but nevertheless significant work that aged aspen clones using microsatellite markers is not mentioned (http://dx.doi.org/10.1111/j.1365-294X.2008.03962.x).

    1. eLife Assessment

      This important study explores whether complex structures that are lost during evolution can re-evolve, which is a long-standing debate in evolutionary and developmental biology. The authors demonstrate that re-evolution can occur if the gene regulatory network that underlies the development of complex traits is maintained. The evidence supporting its conclusions is solid and the work will be of interest to those studying the evolution and development of complex traits.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Vasquez-Correa and colleagues describes the expression pattern of the ocelli (simple eye) gene regulatory network in ants. They correlate the expression pattern of these genes with the presence and absence of ocelli in different classes and species of ants. The presence of ocelli is a polyphenic trait in ants - understanding the molecular and developmental underpinnings of polyphenic traits is of significant interest to evolutionary biologists, developmental biologists, and ecologists. The authors propose that the presence of the latent expression of the ocellar network in classes of ants that do not display ocelli in the adults may underlie the re-evolution of ocelli within the ant lineage.

      Strengths:

      The strengths of the manuscript are that it is well written, the images are of the highest quality, and the data support the conclusions of the authors.

      Weaknesses:

      One improvement that could be made is to include imaginal discs of the queen ants as well as scanning electron images of the ocelli of the queen ant to match the pupal stage images of the worker and soldier ants. A second improvement is to attempt a gene knockdown using RNAi or similar methods to ensure that the genes that are being studied are, in fact, responsible for ocelli development in the ant.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript titled "Latent gene network expression underlies partial re-evolution of a polyphenic trait in the worker caste of ants" by Vasquez-Correa et al. aimed to study genetic mechanisms underlying developmental plasticity, especially binary polyphenism in queen vs worker ant castes. This is an interesting question regarding the extent to which phenotypic traits were altered, lost or regained, and how molecular pathways (upstream vs. downstream) can facilitate this process.

      In ants, reproductive castes (queens and males) develop wings as well as 3 ocelli for mating flights and other activities, while worker castes are wingless, and in some species, they have either no or a reduced number of ocelli. The phylogenetic analysis showed that in the Camponotini ant clade, the one-ocellus phenotype re-evolved in three species independently. The authors analyzed the conserved developmental pathways between Drosophila (well-established) and ants using HCR (a high-quality in situ hybridization technique). They found that although upstream genes for the development of ocelli (otd and hh) showed similar expression between castes, downstream genes (toy, eya, and so) had reduced or no expression in workers of C. floridanus, and this differential expression may lead to partial or complete loss of ocelli. Consistently, workers develop rudimentary tissues, suggesting that they initiate the ocellus developmental process but somehow stop it before adulthood.

      Strengths:

      Evo-devo approaches to reveal conserved molecular pathways of ocellus development. High-quality HCR provided convincing evidence of the expression of key genes in ocelli, eyes and antenna throughout larval development.

      Using HCR, the authors showed differential expression of downstream genes in males vs. soldiers vs. minor workers of C. floridanus, which might explain phenotypic differences between castes.

      Weaknesses:

      Although the molecular pathway is conserved, the mechanism underlying the lack of ocelli in workers remains unclear. In C. floridanus, it could be explained by the evidence of no expression of certain developmental genes, but in other species, e.g. Polyrachis rastellata, is their expression intact, or reduced? There is no control male.

      In addition, HCR in species with partial re-evolution (if their genomes have been sequenced) would be useful to understand the mechanism. For example, there might be differential spatial expression between medial and lateral ocelli.

    4. Reviewer #3 (Public review):

      Summary:

      This paper examines the loss and re-evolution of specific organs during the evolution of ants. The authors show that these organs, the ocelli, disappear and are re-evolved in different ant species and in different ant castes within these species. The authors show that this is linked to dto a conserved GRN discovered in Drosophila, that appears to underlie the development of the ocelli, and demonstrate that this GRN appears to remain active in the developing heads of ants that have no ocelli- implying that it is the evolutionary latency of this GRN that allows loss and subsequent evolution.

      Strengths:

      This manuscript has outstanding imaging of a very difficult developing organ, and the key data, fluorescence in situ hybridisation, is done well and clearly shows what the authors wish to demonstrate. The methods are well described and underpin the whole work.

      The authors convincing demonstatrate that gene expression patterns imply the conservation of the ocellus gene regulatory network from Drosophila to ants. They further show that this network is present even in ants that don't produce an adult ocellus, but do show that in those species, loss of a developing nascent ocellus (which they identify) occurs at the same time as an interruption in the expression of the key genes in the GRN. All of this data is beautifully presented and explained.

      Weaknesses:

      There is one key weakness in that there are no functional students that indicate that the GRN actually does make the ocellus, though the expression patterns are convincing. This applies to loss of the ocellus as well. It would be nice to see that transient loss of the ocelli GRN might lead to loss of ocelli in ant species that have them. These are very difficult things to achieve, as the key genes have earlier developmental roles, such that CRISPR knockouts would not be interpretable, and transient RNAi in the head capsules of developing pupal ants would be challenging.

    1. eLife Assessment

      This important study provides new insight into the regulation of cell organization and division in Trypanosoma brucei through the control of a kinesin motor protein by a polo-like kinase. The authors present solid evidence from rigorous biochemical and imaging analyses showing that phosphorylation modulates kinesin function and cellular organization. However, direct in vivo evidence that PLK phosphorylates kinesin-G is lacking.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript identifies the orphan kinesin KIN-G as a substrate of Polo-like kinase (TbPLK) in Trypanosoma brucei and demonstrates that phosphorylation of Thr301 inhibits KIN-G microtubule binding and disrupts its cellular function. Using a combination of in vitro kinase assays, phosphosite mapping, microtubule binding and gliding assays, and in vivo complementation with phosphomimetic and phosphodeficient mutants, the authors link TbPLK-mediated regulation of KIN-G to defects in centrin arm integrity, FAZ elongation, Golgi organization, flagellum positioning, and division plane placement. The study provides a mechanistic advance in understanding how TbPLK regulates centrin arm biogenesis and integrates KIN-G into the growing regulatory network controlling hook complex and FAZ assembly. Overall, the work is technically strong, internally consistent, and builds logically on previous studies from this group and others.

      Strengths:

      A major strength of the manuscript is the clear mechanistic link between phosphoryltion of Thr301 and loss of microtubule binding activity. The use of phosphomimetic (T301D) and phosphodeficient (T301A) mutants in an RNAi-rescue framework provides a clean and convincing demonstration of functional relevance in vivo. The integration of biochemical assays with detailed cell biological phenotyping (centrin arm length, FAZ elongation, basal body segregation, and cytokinesis markers) is particularly effective and makes the central conclusion robust. The observed phenotypic cascade from centrin arm defects to FAZ and division plane abnormalities is also well aligned with existing models of trypanosome morphogenesis.

      Weaknesses:

      My (more or less main) concern relates to the interpretation of the Golgi phenotype. The conclusion that phosphorylation of KIN-G "impairs Golgi biogenesis" is currently based on fluorescence microscopy using TbGRASP and Sec13 markers and on quantification of the number and distribution of Golgi/ERES puncta in binucleated cells. While these data convincingly demonstrate altered Golgi/ERES number and spatial organization, they do not distinguish between true defects in Golgi biogenesis or duplication and alternative possibilities such as fragmentation, vesiculation, or mislocalization of Golgi membranes. Given the central role of Golgi-centrin arm organization in the proposed model, ultrastructural analysis (for example, by EM or electron tomography) would greatly strengthen this aspect of the study by providing direct evidence for structural alterations of the Golgi and its association with the centrin arm and ERES. Such data would elevate this part of the manuscript from a descriptive fluorescence phenotype to a true structural cell biological insight. I appreciate that this experiment goes beyond the current dataset, but it would substantially enhance the mechanistic depth of the Golgi-related conclusions and strengthen the causal chain linking centrin arm defects to Golgi abnormalities. However, I have to confess, the inclusion of such data would make this reviewer particularly enthusiastic about the work. If this is not feasible, I would recommend tempering the wording of "Golgi biogenesis" to a more conservative description, such as altered Golgi organization or duplication, and explicitly acknowledging the limitations of fluorescence-based analysis for this conclusion.

      An additional conceptual point concerns the dual role of TbPLK in centrin arm regulation. TbPLK is known to promote centrin arm biogenesis through phosphorylation of TbCentrin2, yet in this study, TbPLK phosphorylation of KIN-G negatively regulates centrin arm assembly. This dual positive and negative regulatory role is intriguing but could be discussed more explicitly. The manuscript would benefit from a clearer conceptual framework addressing how phosphorylation of KIN-G might serve as a temporal or spatial switch to restrain KIN-G activity at specific stages of centrin arm assembly.

      Finally, a schematic model summarizing the proposed regulatory pathway from TbPLK phosphorylation of KIN-G to centrin arm assembly, FAZ elongation, division plane placement, and Golgi organization would aid the reader.

    3. Reviewer #2 (Public review):

      Summary:

      The authors identify KIN-G as an in vitro substrate for phosphorylation by TbPLK and show that several of the in vitro P-ated sites, including T310, overlap with P-ation sites seen in live cells. The authors further show that PLK-mediated P-ation inhibits KIN-G binding to microtubules in vitro, as does a KIN-G-T301D mutant, and that expression of a KIN-G-T301D Phospho-mimic in T. brucei phenocopies KIN-G RNAi knockdowns, producing defects in cell division, morphogenesis of the centrin arm, FAZ and other cellular structures, as well as a misplaced cytokinesis furrow.

      Understanding cytoskeletal rearrangements that drive cell division in T. brucei is an important and unresolved problem, so the work addresses important questions that are of great interest. PLK and KIN-G have previously been shown to be important for cell division and morphogenesis of cytoskeletal structures that drive cell division in T. brucei. The current work advances our understanding by suggesting a potential mechanism by which PLK and KIN-G might participate, namely through PLK-dependent P-ation to control KIN-G MT binding activity.

      Strengths:

      The authors use a rigorous combination of biochemistry, phosphoproteomics, cell biology, and mutant analysis to support their conclusion that PLK-mediated P-ation of KIN-G negatively regulates KIN-G microtubule binding, and this may explain the observation that a KIN-G T301 phosphomimic mutant blocks cell division and perturbs biogenesis of cytoskeletal structures that drive cell division and morphogenesis. Combining rigorous and informative in vitro studies with mutant analysis in live cells is a great strength. The work is solid and important, though a few pieces are needed to fully connect the in vitro findings with the in vivo observations, as detailed below.

      Weaknesses:

      Overall, I find this work to be solid and to provide an important addition to our understanding of mechanisms controlling cell division in T. brucei. The biochemistry, in particular, is rigorous and convincingly demonstrates PLK can P-ate KIN-G, altering its MT-binding ability. Analysis of phospho-mutants of KIN-G in live T. brucei supports the conclusion that P-ation of KIN-G at T301 negatively affects KIN-G function in vivo. I think, however, that the results fall short of supporting the title, because, although the data convincingly show that PLK can phosphorylate KIN-G at T301 in vitro, and that T301 is P-ated in vivo, they do not formally demonstrate (nor even test) whether PLK is the kinase responsible for this phosphorylation in vivo (experiments to address this seem quite feasible). I also do not see where the authors try to reconcile the absence of phenotype for KIN-G-T301A with the implied importance of KIN-G phosphorylation by PLK in cell division, which calls into question the need for P-ation of KIN-G-T301 in cell division. Suggestions for addressing these concerns are provided below.

      My two main questions are:

      (1) What is the biological relevance of KIN-G P-ation at T301?

      a) The authors report no defect for the KIN-G-T301A mutant, so what then is the need for T301 P-ation, if the cell gets along fine without it? One step toward addressing this would be to ask what fraction of KIN-G shows P-ation at T301. Although published studies indicate P-ation at T301, it isn't known what percentage of KIN-G in the cell is P-ated. One might anticipate, for example, that T301-P is a small minority of the population in asynchronous cultures and that T301 P-ation increases at specific cell cycle stages.

      b) Published work links PLK to cell division, FAZ elongation, etc... The current work suggests that one role of PLK is to P-ate KIN-G at T301. In contrast, however, the current work also indicates that P-ation of KIN-G at T301 is unnecessary for normal cell division, FAZ elongation, etc....

      c) Some experiments or at least commentary on points a and b above would strengthen the paper.

      (2) Is PLK the kinase that P-ates Kin-G T301 in vivo?

      a) The authors show PLK P-ates T301 (and other residues) in vitro, and that T-301 is P-ated in vivo. To bring the analysis full circle, it would be informative to examine KIN-G P-ation in a PLK mutant or upon inhibition of PLK with published inhibitors. This seems to be a very doable experiment with the tools available.

    4. Reviewer #3 (Public review):

      Summary:

      Here, the authors investigate the role of the Trypanosoma brucei polo-like kinase TbPLK in the function of flagellum-associated cellular structures in trypanosomes. They set out to test the hypothesis that a key substrate of TbPLK is the kinesin protein KIN-G, and that TbPLK phosphorylation of KIN-G regulates its functions in cells.

      Strengths:

      Using in vitro biochemistry with purified proteins, the authors convincingly demonstrate that TbPLK phosphorylates KIN-G at 29 sites. Moreover, they convincingly show that phosphorylation at one site, T301, impairs the binding of purified KIN-G to purified microtubules. Using immunofluorescence-based imaging approaches, they also show that TbPLK colocalizes with KIN-G at centrin arms during the early S-phase of the cell cycle. Centrin arms are structures that are located near the basal body and flagellum and are important for new flagellum biogenesis, Golgi positioning, and cell division. To evaluate the function of KIN-G phosphorylation in cells, they depleted KIN-G by RNAi, simultaneously expressed phospho-mimetic (T301D) and phospho-ablative mutant proteins, and used immunofluorescence to examine the impact on flagellum-associated cellular structures. They show that expression of the phospho-mimetic mutant KIN-G-T301D causes the following defects: reduced cell proliferation, disruption of centrin arm and Golgi biogenesis, impairment of FAZ elongation and flagellum positioning, and misplacement of the cell division plane. The data convincingly support the conclusion that KIN-G phosphorylation on T301 plays an important role in regulating the cellular functions of this kinesin motor protein.

      Weaknesses:

      Some of the broader conclusions are not directly supported by the data. For example, the title states "Polo-like kinase phosphorylation of the orphan kinesin KIN-G negatively regulates centrin arm biogenesis in Trypanosoma brucei," but the data do not directly address the specific role of TbPLK in phosphorylating KIN-G in cells. Moreover, some of the more specific conclusions in the paper, for example, that "phosphorylation of KIN-G" causes various cellular defects, are a bit of an overstatement. The supporting data rely on the expression of a phospho-mimetic mutant of KIN-G. Presumably, phosphorylation in cells is a normal part of KIN-G regulation, and it is not just phosphorylation, but rather hyperphosphorylation that is being mimicked by the mutant. Some rewording of the specific conclusions is warranted, and the broader conclusion would be better supported with additional experimental evidence.

    1. eLife Assessment

      This valuable study uses a large cohort of clinical malaria cases collected over 18 years to address a critical knowledge gap regarding the role of PfEMP1 variants across distinct severe malaria syndromes. The conclusions are potentially of importance and interest to those who study malaria severity, but the evidence is incomplete, largely due to a lack of clarity on data inclusion and the correct use of statistical tests. More up-to-date data analysis methods would further strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      Severe childhood malaria is associated with three main overlapping syndromes: impaired consciousness (IC), respiratory distress (RD), and severe malaria anaemia (SMA). One central feature of severe malaria, driven by host and parasite factors, is the sequestration of parasitized red blood cells in vascular beds, leading to impaired tissue perfusion and lactic acidosis. The causing agent, the parasite ligand PfEMP1, is expressed on the surface of infected red blood cells, where it binds to a broad range of different endothelial receptors. Accumulation of parasite-infected erythrocytes in the host's microvasculature has been repeatedly confirmed for cerebral malaria, but there are scarce data on the extent of sequestration in the other severe malaria syndromes. However, the absence of effective adjunctive therapies for severe malaria implies that our understanding of its pathogenesis remains incomplete. Thus, by comparing var gene expression from a large Kenyan cohort (n=372 severe cases; n=340 non-severe cases), this study addresses a critical knowledge gap regarding the role of PfEMP1 across distinct severe malaria syndromes. The substantial sample size, phenotypic stratification, and use of two complementary methods (DBLa-tag sequencing and RT-qPCR), along with data about the parasite's ability to form rosettes and antibody level assessments, provide a strong setup. Var gene expression data - either proportions of different DBLa-tags classified by the number of cysteine residues and presence of particular motifs or relative expression RT-qPCR data from a set of primer pairs targeting conserved regions of var groups or particular domains - is associated with (a) severe malaria syndromes, (b) variant expression homogeneity, (c) rosetting ability, and (d) mortality using independent linear regression models, spearman ranks correlations, or logistic regression models. In summary, the study confirms that A-type and DC8-containing gene expression correlate with IC, that RD is associated with rosetting, and that SMA is linked to a high variant expression homogeneity (VEH) of var-A expression, which may indicate a longer infection duration. However, some findings remain inconclusive. For example, when analyzing pure syndromes, several associations changed: DC8 expression was also found to be significantly enriched in SMA (with multiple primer pairs) and RD, not exclusively with IC. Additionally, rosetting was associated with DC8 expression but not with IC, even though IC itself is linked to DC8 expression. Overall, the findings are significant and supported by a large dataset, though the reported evidence remains largely associative rather than mechanistic.

      Strengths:

      As the authors stated themselves, one of the key unresolved questions is whether severity-causing parasites are biologically different from parasites responsible for asymptomatic infections. This study is among the first to address this question using data from a large, phenotypically stratified cohort. The use of two complementary methods (DBLa-tag sequencing and RT-qPCR), together with data on the parasites' ability to form rosettes and assessments of antibody levels, provides an excellent experimental framework.

      Weaknesses:

      Even when assessing var gene expression using two different approaches - DBLα-tag sequencing and RT-qPCR targeting pre-defined variants - only a glimpse of the parasites' actual biology is captured. Moreover, a well-known confounder in gene expression studies of P. falciparum field isolates is variation in parasite age (hours post-invasion) or synchronicity, both of which significantly influence var gene expression. The methods employed in this study, unfortunately, do not allow for controlling or correcting for these factors. Then, the old classification system of DBLa-tag data developed by Bull et al is certainly still valid; however, more recent advances in bioinformatic tool development now allow for a more in-depth exploration of DBLa-tag datasets. Tools such as Varia (doi: 10.1186/s12859-022-04573-6), cUPS (https://doi.org/10.1371/journal.ppat.1012813), and upsML (doi: https://doi.org/10.1101/2025.05.19.654848) enable the prediction of DBLa-tag-connected PfEMP1 domains and the var group affiliations.

      As A-type var gene expression has already been associated with severity, most expression studies (including this one) have a selection bias towards A- and B/A-type var genes. Here, A- and B/A-types are covered by 8 primer pairs (gpA1, gpA2, 4x DC8, DC13, DC4), whereas high polymorphic B-types are targeted by only 2 primer pairs (b1, DC9) and C-types only by a single primer (c2). Thus, any association with A-type expression is more likely to be observed, although evidence is accumulating that parasites are preferably expressing B-type var genes at the onset of blood stage infection in naïve/less immune individuals; this is also consistent with the observation of the authors that VEH is positively associated with immunity (measured as anti-IE) and negatively associated with temperature.<br /> I am not an expert in biostatistics, but to my understanding, independently performed regressions should be corrected for multiple testing.

      Overall, the authors largely achieved their aims, identifying specific var groups associated with different severity syndromes. However, due to the complexity of var gene data and the interdependence of parameters, the resulting picture is not entirely clear. Some opposite results between different analyses may also be difficult for the reader to interpret. Nevertheless, this study can be considered a pioneering effort, providing valuable insights into the complex interplay of var gene expression across different severity syndromes and offering useful data for the field. Follow-up studies will be important to validate these findings and further dissect the mechanisms linking parasites gene expression to clinical outcomes.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript presents results of a study using two complementary approaches (RT-qPCR and DBL) to analyze the putative relationship between var gene transcription (and hence, PfEMP1 expression) and clinical presentation among Kenyan children with Plasmodium falciparum malaria. Binary rosetting (yes/no) data are used in a similar way. The study includes samples collected over a period of almost 20 years from about 700 children presenting with either severe (impaired consciousness [IC], respiratory distress [RD], severe anemia [SA]) or non-severe malaria. During the study period, the study area experienced a remarkable drop in P. falciparum transmission intensity.

      Strengths:

      The study stands on the shoulders of many similar studies of this kind, both by the authors and by other research teams, and the inferences made largely confirm those made previously. The current study has analytical rigor and a large sample size. Disentangling the multiple parameters of the above-mentioned relationship is of obvious and crucial importance to an improved understanding of P. falciparum malaria pathogenesis and of the targets and mechanisms of protective immunity to the disease. The present study is a valuable effort towards that. The study is well-structured, and the figures are clear.

      Weaknesses:

      It is somewhat unclear to this reviewer to what extent the samples and data analyzed and reported here are new (i.e., not used/analyzed in previous studies). If there is substantial overlap with earlier studies, this is a weakness because of the risk of circular inferences. The Discussion section would benefit from less repetition of the results section and a more in-depth discussion of the findings obtained relative to the existing literature. Better inclusion of key primary references is recommended.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Ndugwa et al. attempt to link specific severe malaria manifestations with particular var gene expression patterns. This is an important question, and the dataset the authors have assembled over decades is impressive. However, greater clarity in the descriptions and statistics would, in my view, help this reviewers, and readers in general develop a more precise understanding of the significance of the findings.

      Strengths:

      The study addresses a critically important question in malaria pathogenesis, and the dataset is extensive and represents a significant long-term effort by the authors.

      Weaknesses:

      The Results section often lacks clarity: clinical group definitions (NS, non-IC, non-SMA, mild vs. moderate) are sometimes ambiguous, and key methodological details, including the VEH index calculation, RT-qPCR quantification, antibody detection methods, and rosetting assays, are either missing from the results text or poorly explained in the figure legends. Additionally, figure presentation requires improvement, with inconsistent reporting of sample sizes, undefined colors, and p-values that overlap with data points rather than being clearly displayed above them.

    1. eLife Assessment

      This important study presents a novel immunotherapy strategy for cancer. The authors develop a whole-tumor cell vaccine comprised of senescent tumor cells and a COX2 inhibitor in a hydrogel matrix. They present convincing evidence of the efficacy of this approach in preclinical models, demonstrating that prostaglandin E2 (PGE2) modulates the senescence-associated secretory phenotype (SASP) toward an immunostimulatory state, although more mechanistic/functional work would strengthen their conclusions. This work is timely and will be of interest to immunologists and others interested in the development of novel cancer therapies.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to overcome the limitations of whole-tumor-cell vaccines, specifically the weak immunogenicity and rapid clearance often associated with them. They leveraged the unique properties of senescent tumor cells (STCs), which remain metabolically active and secrete chemokines, as a source of antigens. However, to counteract the secretion of the immunosuppressive lipid prostaglandin E2 (PGE2), which is part of the senescence-associated secretory phenotype (SASP), they engineered a hydrogel vaccine formulation (STCs+CLX-Lipo@Gel) containing STCs and liposomal celecoxib (a COX2 inhibitor).

      Strengths:

      (1) The study is conceptually strong in its approach to leveraging the SASP to improve immunotherapy responses. By selectively inhibiting COX2/PGE2 while preserving the secretion of recruitment chemokines (like CCL2 and CCL5) in the SASP, the authors successfully turn a potentially deleterious cellular state into a therapeutic asset.

      (2) Mechanistic Insight: The manuscript provides detailed evidence regarding the mechanism of action. The authors convincingly show that the vaccine restores activity in the NK-DC axis. Specifically, they demonstrate that reducing PGE2 levels enhances NK cell activation (upregulation of NKG2D and NKp46) and promotes the secretion of CCL5 and XCL1 by NK cells, which subsequently recruits cDC1 dendritic cells.

      (3) The therapeutic potential is tested across multiple models, including a subcutaneous melanoma model, a difficult-to-treat melanoma brain metastasis model, and an orthotopic pancreatic cancer model. The consistent efficacy across these distinct physiological contexts suggests broad applicability.

      Weaknesses:

      (1) While the authors successfully inhibit PGE2, the SASP is a complex cocktail of factors. The discussion regarding the long-term presence of these "live" senescent cells is somewhat limited. Although the hydrogel retains cells locally, the potential for other chronic inflammatory factors to eventually promote tumorigenesis or tissue damage in the surrounding niche warrants careful consideration when translating this approach to patients and may require additional preclinical testing.

      (2) The study posits that STCs serve as an antigen reservoir. However, the manuscript would benefit from a clearer distinction between whether the immune system is recognizing senescence-specific neoantigens or simply shared tumor antigens that are being presented more effectively due to the adjuvant effect. The authors briefly touch upon neoantigens in the discussion, but the experimental data primarily measure general anti-tumor responses.

      Impact:

      This work bridges material science and immunology, offering a practical solution to the immunosuppressive barriers of cell-based vaccines. It provides a platform that could potentially be adapted for various solid tumors.

    3. Reviewer #2 (Public review):

      Summary:

      Wang et al. examined an engineered whole-tumor-cell vaccine based on senescent tumor cells co-encapsulated with liposomal celecoxib in a chitosan hydrogel. The authors propose that prolonged persistence of senescent cells, combined with COX2/PGE2 inhibition, restores NK-DC crosstalk, enhances cDC1 recruitment, and ultimately drives robust CD8⁺ T-cell-mediated antitumor immunity. The study is nicely executed and clearly presented, with extensive in vitro and in vivo validation across multiple tumor models, including melanoma brain metastases and orthotopic PDAC. While the overall concept is timely and of potential interest, several mechanistic conclusions are based primarily on correlative evidence and would benefit from additional functional experiments to strengthen causal interpretation and translational relevance.

      Strengths:

      (1) Strong conceptual framework

      (2) Impressive breadth of in vivo models.

      (3) Clear immunological readouts.

      (4) Innovative combination of senescence biology and biomaterials.

      Weaknesses:

      (1) Mechanistic conclusions rely heavily on correlation.

      (2) Lack of functional immune cell depletion studies.

      (3) Limited exploration of long-term safety and antigenic specificity.

      Major Critiques:

      (1) The authors emphasize the expansion and activation of cDC1 as a key mechanism linking innate and adaptive immunity, yet it does not directly test whether cDC1 is required for the observed CD8⁺ T-cell responses and tumor control.

      The authors should perform experiments using Batf3-deficient mice or any other cDC1-depletion strategies to provide important mechanistic validation. If such experiments are not feasible, this limitation should be more clearly acknowledged and discussed.

      (2) The authors note that senescence may generate neoantigens distinct from those present in proliferating tumor cells, but the extent to which STC-induced immunity cross-reacts with non-senescent tumor cells is not fully addressed. While it is appreciated that tumor challenge experiments are included, the author should perform a more explicit analysis of antigenic overlap that would strengthen the translational relevance of the approach. For example, they can compare senescence induced by different stimuli or directly assess immune recognition of non-senescent tumor targets, which would help clarify whether the vaccine primarily exploits senescence-specific antigens or broadly shared tumor antigens.

      (3) Hydrogel encapsulation clearly extends STC persistence in vivo; however, the study provides limited information on the eventual clearance of these cells and the potential implications of prolonged SASP exposure. Given general concerns regarding chronic inflammation associated with senescent cells, additional discussion of long-term local and systemic responses would be helpful. If extended safety analyses are beyond the scope of the current study, the authors should acknowledge the limitation.

      (4) The immunological effects are attributed to COX2/PGE2 inhibition, but it remains unclear whether these effects are specific to celecoxib or could reflect formulation-dependent or off-target mechanisms. The authors may perform additional experiments employing an alternative COX2 inhibitor, genetic COX2 suppression, or PGE2 rescue, which could further support the specificity of the COX2/PGE2-dependent mechanism.

    1. eLife Assessment

      This important work describes systematic computational and experimental approaches to turn a moderately stable α-helical bundle into a very stable fold. The authors advance our understanding of α-helix stabilization providing a convenient framework that has general implications for the protein design field. The main claims have convincing support through a sound methodology, with strong specific conclusions for designing mechanically, thermally, and chemically stable α-helical bundles.

    2. Reviewer #1 (Public review):

      Summary:

      In the work from Qiu et al. a workflow aimed at obtaining the stabilization of a simple small protein against mechanical and chemical stressors is presented.

      Strengths:

      The workflow makes use of state-of-the-art AI-driven structure generation and couples it with more classical computational and experimental characterizations in order to measure its efficacy.

      The work is well presented and results are thorough and convincing.

      The Methods description is quite precise, and some important details were added during review.

      Weaknesses:

      The pulling velocity is quite high but in accordance with this observation the results were only used for comparative analyses.

      Following the review process the authors have shown that the minimum distance between each protein from its periodic images was consistently above 1 nm, yet towards the end of some simulations the value crosses the non-bonded interaction cut-off distance.

      Comments on revisions:

      The authors did a good job in addressing the reviews.

    3. Reviewer #2 (Public review):

      Summary:

      Qiu, Jun et. al., developed and validated a computational pipeline aimed at stabilizing α-helical bundles into very stable folds. The computational pipeline is a hierarchical computational methodology tasked to generate and filter a pool of candidates, ultimately producing a manageable number of high-confidence candidates for experimental evaluation. The pipeline is split into two stages. In stage I, a large pool of candidate designs is generated by RFdiffusion and ProteinMPNN, filtered down by a series of filters (hydropathy score, foldability assessed by ESMFold and AlphaFold). The final set is chosen by running a series of steered MD simulations. This stage reached unfolding forces above 100pN. In stage II, targeted tweaks are introduced - such as salt bridges and metal ion coordination - to further enhance the stability of the α-helical bundle. The constructs undergo validation through a series of biophysical experiments. Thermal stability is assessed by CD, chemical stability by chemical denaturation, and mechanical stability by AFM.

      Strengths:

      A hierarchical computational approach that begins with high-throughput generation of candidates, followed by a series of filters based on specific goal-oriented constraints, is a powerful approach for a rapid exploration of the sequence space. This type of approach breaks down the multi-objective optimization into manageable chunks and has been successfully applied for protein design purposes (e.g., the design of protein binders). Here, the authors nicely demonstrate how this design strategy can be applied to successfully redesign a moderately stable α-helical bundle into an ultrastable fold. This approach is highly modular, allowing the filtering methods to be easily swapped based on the specific optimization goals or the desired level of filtering.

      Weaknesses:

      Assessing the change in stability relative to the WT α-helical bundle is challenging because an additional helix has been introduced, resulting in a comparison between a three-helix bundle and a four-helix bundle. Consequently, the appropriate reference point for comparison is unclear. A more direct and informative approach would have been to redesign the sequence of the original α-helical bundle of the human spectrin repeat R15, allowing for a more straightforward stability comparison.

      The three constructs chosen are 60-70% identical to each other, either suggesting over-constrained optimization of the sequence, or a physical constraint inherent to designing ultrastable α-helical bundles. It would be interesting to explore whether choosing a different combination of filters would enable ultrastable α-helical bundles constructs with a more varied sequence content.

      While the use of steered MD is an elegant approach to picking the top N most stable designs, its computational cost may become prohibitive as the number of designs increases or as the protein size grows, especially since it requires simulating a water box that can accommodate a fully denatured protein.

      Comments on revisions:

      The authors have done a good job of addressing the comments.

    4. Reviewer #3 (Public review):

      Summary:

      Qiu et al., present a hierarchical framework that combine AI and molecular dynamic simulation to design α-helical protein with enhanced thermal, chemical and mechanical stability. Strategically chemical modification by incorporating additional α-helix, site-specific salt bridges and metal coordination further enhanced the stability. The experimental validation using single-molecule force spectroscopy and CD melting measurements provide fundamental physical chemical insights into the stabilization of α-helices. Together with the group's prior work on super-stable β strands (https://www.nature.com/articles/s41557-025-01998-3), this research provides a comprehensive toolkit for protein stabilization. This framework has broad implications for designing stable proteins capable of functioning under extreme conditions.

      Strengths:

      The study represents a complete frame work for stabilizing the fundamental protein elements, α-helices. A key strength of this work is the integration of AI tools with chemical knowledge of protein stability.<br /> The experimental validation in this study is exceptional. The single-molecule AFM analysis provided a high-resolution look at the energy landscape of these designed scaffolds. This approach allows for the direct observation of mechanical unfolding forces (exceeding 200 pN) and the precise contribution of individual chemical modifications to global stability. These measurements offer new, fundamental insights into the physicochemical principles that govern α-helix stabilization.

      Weaknesses:

      (1) While the initial manuscript lacked a detailed explanation for the stabilizing effect of the additional helix, the revised version now includes a clear structural basis for this improvement. The authors successfully attribute the increased unfolding force threshold to the reinforcement of the hydrophobic core and enhanced cooperative interactions, supported by relevant literature correlations between helix bundle size and stability.

      (2) The author analyzed both thermal stability and mechanical stability. It would be helpful for the author to discuss the relationship between these two parameters in the context of their design. Since thermal melting probes equilibrium stability (ΔG), while mechanical stability probes the unfolding energy barriers along pulling coordinate. While the integrative design approach successfully improved both stability types, a deeper exploration of how the specific structural modifications influence the unfolding energy barrier relative to the overall equilibrium stability would further strengthen the mechanistic impact of the work.

      (3) While the current study demonstrates a dramatic increase in global stability, the analysis focuses almost exclusively on the unfolding (melting) process. However, thermodynamic stability is a function of both folding (kf) and unfolding (ku) rates. The author have clarified that the observed ultrastability likely originates from a significantly reduced unfolding rates, a hypothesis consistent with the unfolding force. Direct measurements of the kinetics would provide deeper insights.

      (4) The authors chose the spectrin repeat R15 as the starting scaffold for their design. R15 is a well-established model known for its "ultra-fast" folding kinetics, with folding rates (kf ~105s), near three orders of magnitude faster than its homologues like R17 (Scott et.al., Journal of molecular biology 344.1 (2004): 195-205). Measuring the folding rates of newly designed proteins would provide additional insights into the design.

      Comments on revisions:

      I think the author have addressed comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In the work from Qiu et al., a workflow aimed at obtaining the stabilization of a simple small protein against mechanical and chemical stressors is presented.

      Strengths:

      The workflow makes use of state-of-the-art AI-driven structure generation and couples it with more classical computational and experimental characterizations in order to measure its efficacy. The work is well presented, and the results are thorough and convincing.

      We are grateful to this reviewer for his/her thoughtful assessment and supportive feedback. In response, we have addressed each comment and incorporated the necessary revisions into the manuscript.

      Weaknesses:

      I will comment mostly on the MD results due to my expertise.

      The Methods description is quite precise, but is missing some important details:

      (1) Version of GROMACS used.

      We used GROMACS version 2023.2 (single-precision). All subsequent MD simulation procedures mentioned below have been consolidated and described in detail in the Supporting Information (SI).

      (2) The barostat used.

      Pressure coupling was applied using the C-rescale barostat (τ<sub>p</sub> = 5.0 ps, ref<sub>p</sub> = 1.0 bar).

      (3) pH at which the system is simulated.

      No explicit pH was defined during system construction. Proteins were modeled using standard protonation states as assigned by GROMACS preprocessing tools, corresponding to physiological, near-neutral pH (~ 7.0).

      (4) The pulling is quite fast (but maybe it is not a problem)

      The relatively high pulling velocity (1 nm/ns) was selected to enable efficient screening across a large number of designed proteins (211 candidates), while maintaining reasonable computational cost/time. Given the intrinsic orders-of-magnitude difference between simulation and experimental pulling rates, SMD results were used as a comparative screening tool, rather than for direct quantitative comparison with AFM data.

      (5) What was the value for the harmonic restraint potential? 1000 is mentioned for the pulling potential, but it is not clear if the same value is used for the restraint, too, during pulling.

      All positional restraints used in the simulations, including those applied during equilibration as well as the harmonic restraint on the N-terminus and the pulling umbrella restraint during SMD, employed the same force constant (k = 1000 kJ·mol<sup>–1</sup>·nm<sup>2</sup>). We have clarified this point in the revised Methods section.

      (6) The box dimensions.

      Rectangular simulation boxes were used throughout. For equilibrium MD simulations, the box dimensions in each direction were set based on the maximum extent of the protein along that axis, with a minimum distance of 1.2 nm between the protein surface and the box boundary on all sides. For SMD simulations, the same box dimensions were applied in the x and y directions. Along the pulling (z) direction, the box length was extended to accommodate the theoretical stretching length, defined as the initial N–C terminal distance plus 0.36 nm per stretched residue, while maintaining a 1.2 nm buffer at both ends (2.4 nm total). These details have now been clarified in the revised Supporting Information.

      From this last point, a possible criticism arises: Do the unfolded proteins really still stay far enough away from themselves to not influence the result?

      We analyzed the minimum atomic distance between each protein and its periodic images to assess potential artifacts from periodic boundary conditions. For all simulation stages used in screening and statistical analysis, the minimum protein–image separation remained above 1.0 nm for the majority of the simulation time, exceeding the nonbonded interaction cutoff and minimizing cross-boundary interactions. As shown in the Author response image 1for SpecAI89 (left), this separation during SMD simulations is consistently well above the threshold, indicating that the chosen box dimensions are appropriate. In the very late stages of annealing MD, highly unstable proteins may exhibit large conformational fluctuations and transient boundary proximity (right); however, these regimes are associated with large RMSD deviations and are excluded from analysis. Notably, the mechanically relevant unfolding events occur near the center of the simulation box and proceed along the pulling axis in SMD simulations, making boundary effects unlikely to influence the unfolding process or the relative mechanostability ranking.

      Author response image 1.

      Analysis of the minimum atomic distance between the protein and its periodic images under periodic boundary conditions. Left: SpecAI89 during SMD simulations, showing that the minimum protein–image distance remains above 1.0 nm for the majority of the simulation time. Right: WT during AMD simulations, where transient proximity to the periodic boundary is observed at very late stages due to large conformational fluctuations.

      Additionally, no time series are shown for the equilibration phases (e.g., RMSD evolution over time), which would empower the reader to judge the equilibration of the system before either steered MD or annealing MD is performed.

      We thank the reviewer for this suggestion. To assess equilibration, we analyzed the backbone RMSD evolution during the equilibration phase. Using SpecAI89 as a representative example (Author response image 2), the protein backbone RMSD converges rapidly and reaches a stable plateau within approximately 5 ps. The subsequent 125 ps equilibration period therefore sufficiently demonstrates that the system is well equilibrated prior to both steered MD and annealing MD simulations.

      Author response image 2.

      The backbone RMSD of SpecAI89 over time during simulation

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure S2, only one copy (or the average of the three copies; it is not clear from the caption) is shown, would be better to show the individual traces for each repeat. Additionally, only the plot for the forces is shown, and not, similarly to the AMD, the RMSD plot. This could be a stylistic choice, but it just reports on how much force was applied and not on how the protein responded to the force. Moreover, horizontal lines at the maximum value reached by the force could be added in order to directly see the difference in force applied, since it is then remarked on.

      Figure S2 originally shows a representative single SMD trajectory, as the force–extension peak positions vary between independent simulations and averaging the force traces would obscure the characteristic force peaks. In the revised Supplementary Information, we have now added the force–extension traces from the other two independent SMD repeats for each construct (New Figure S2). In addition, horizontal lines indicating the maximum force reached in each trajectory have been included to facilitate direct comparison of force differences between designs.

      (2) In Figure S3 the plots have different y-axis. Maybe it could be valuable to modify it so that in figures b, c, and d the spectrum result is in the background (perhaps in gray) so that the y-axis is not changed to retain the information included in this plot, but one could still compare directly to the spectrum result. With a 0 to 1 nm y-axis part of the spectrin run will be hidden, but in any case, plot a can be used to see the full behavior. Similarly to S2, the repeats (if any) could be shown.

      We have revised Figure S3 as suggested. The y-axis is now unified to 0–1.2 nm across all panels. For panels b–d, the natural spectrin trajectory is displayed in light gray in the background for direct comparison. Additionally, three independent MD replicates are now presented for each construct to demonstrate reproducibility.

      Finally, minor remarks that could nevertheless improve the paper:

      (3) In Figure S7, a bimodal distribution model for the number of events could be used to fit the data better.

      We thank the reviewer for the detailed suggestion. Following this advice, we explored the bimodal Gaussian distribution model for fitting the force-event data in Figure S7. Indeed, our analysis showed that a bimodal fit could fit Figures S7 panel f better (as shown in Author response image 3). The two peaks were centered at F<sub>1</sub> = 190 ± 4 pN and F<sub>2</sub> = 380 ± 6 pN. Interestingly, the force of the first major peak obtained is the same as the previously fitted value. The second one is double force value which we guess maybe is a bi-molecule stretched for unknown reason. Considering the very few numbers of the second peak and the same force value (190 pN), we decide not to change the unfolding force value in the manuscript. But we thank this reviewer’s insightful comment.

      Author response image 3.

      The bimodal fit for unfolding force of SpecAI88-49E102K-6H149H show the same 190 pN unfolding for the first peak as previous fit.

      (4) The colors in the video are not very intuitive, as the spectrin is shown initially in light blue, but becomes grey in the variants, where light blue is reserved for the additional helix. A counter of elapsed time and/or force/temperature applied could help the readers orient. Maybe it could be useful to produce a video with spectrin and the three variants all shown together?

      We thank this comment. The videos have been revised to improve clarity and consistency accordingly. In all cases, the original protein scaffold is now shown in gray, while the additional helix in the designed variants is highlighted in blue. Real-time annotations have been added to aid interpretation: the instantaneous temperature is displayed during AMD simulations, and time is shown during SMD simulations. In addition, for ease of comparison, the AMD and SMD results of all four proteins are each compiled into a single combined video, allowing their behaviors to be viewed side by side.

      Reviewer #2 (Public review):

      Qiu, Jun et. al., developed and validated a computational pipeline aimed at stabilizing α-helical bundles into very stable folds. The computational pipeline is a hierarchical computational methodology tasked to generate and filter a pool of candidates, ultimately producing a manageable number of high-confidence candidates for experimental evaluation. The pipeline is split into two stages. In stage I, a large pool of candidate designs is generated by RFdiffusion and ProteinMPNN, filtered down by a series of filters (hydropathy score, foldability assessed by ESMFold and AlphaFold). The final set is chosen by running a series of steered MD simulations. This stage reached unfolding forces above 100pN. In stage II, targeted tweaks are introduced - such as salt bridges and metal ion coordination - to further enhance the stability of the α-helical bundle. The constructs undergo validation through a series of biophysical experiments. Thermal stability is assessed by CD, chemical stability by chemical denaturation, and mechanical stability by AFM.

      Strengths:

      A hierarchical computational approach that begins with high-throughput generation of candidates, followed by a series of filters based on specific goal-oriented constraints, is a powerful approach for a rapid exploration of the sequence space. This type of approach breaks down the multi-objective optimization into manageable chunks and has been successfully applied for protein design purposes (e.g., the design of protein binders). Here, the authors nicely demonstrate how this design strategy can be applied to successfully redesign a moderately stable α-helical bundle into an ultrastable fold. This approach is highly modular, allowing the filtering methods to be easily swapped based on the specific optimization goals or the desired level of filtering.

      We are thankful for the reviewer’s diligent evaluation and positive remarks. His/her concluding remarks, which encourage our future work at the intersection of AI-protein design and AFM-SMSF, are especially appreciated. All comments have been incorporated into our revisions.

      Weaknesses:

      Assessing the change in stability relative to the WT α-helical bundle is challenging because an additional helix has been introduced, resulting in a comparison between a three-helix bundle and a four-helix bundle. Consequently, the appropriate reference point for comparison is unclear. A more direct and informative approach would have been to redesign the original α-helical bundle of the human spectrin repeat R15, allowing for a more straightforward stability comparison.

      This is an insightful comment. Indeed, a direct comparison between the same structure of the three-helix bundle will be most straightforward with a clear reference point. I will take this advice and try it in our future endeavor.

      In our case, a substantial fraction of the hydrophobic region is relatively shallow and partially solvent-exposed in the wild-type R15 α-helical bundle. So, the added fourth helix provides a new hydrophobic packing interface, increasing core burial, packing density, and strengthening the internal load-bearing network. Consistent with this design rationale, rSASA analysis shows that the designed proteins exhibit a higher degree of hydrophobic core burial compared to the wild-type R15. Specifically, the fraction of residues with rSASA < 0.2 exceeds 30% in the designs, compared to 23% in the natural spectrin repeat.

      While the authors have shown experimentally that stage II constructs have increased the mechanical stability by AFM, they did not show that these same constructs have increased the thermal and chemical stabilities. Since the effects of salt bridges on stability are highly context dependent (orientation, local environment, exposed vs buried, etc.), it is difficult to assess the magnitude of the effect that this change had on other types of stabilities.

      We agree that the effects of salt bridges are highly context-dependent and that different dimensions of stability do not always correlate. Following your suggestion, we evaluated the thermal and chemical stabilities of the Stage II constructs. The experimental results (now added as Figure S9) show that Stage II designs successfully maintain the high thermal stability and resistance to chemical denaturation to different extend. The thermal stability is still as high as the Stage I but the resistance to chemical denaturation is slightly reduced. We have added this result in the manuscript accordingly.

      The three constructs chosen are 60-70% identical to each other, either suggesting overconstrained optimization of the sequence or a physical constraint inherent to designing ultrastable α-helical bundles. It would be interesting to explore these possible design principles further.

      Yes, the observed sequence convergence likely arises from a combination of intrinsic physical constraints of the protein architecture and the applied design and screening criteria. In particular, the tightly packed hydrophobic core imposes strong constraints on side-chain size, packing complementarity, and the alignment of heptad-like motifs reminiscent of coiled-coil organization, which collectively reduce the accessible sequence space. In addition, the strong selection pressure imposed by foldability and stability filters further promotes convergence toward similar solutions. And we agree with the reviewer that this represents an important direction for future work.

      While the use of steered MD is an elegant approach to picking the top N most stable designs, its computational cost may become prohibitive as the number of designs increases or as the protein size grows, especially since it requires simulating a water box that can accommodate a fully denatured protein

      Yes, steered MD can become computationally expensive, particularly as the number of designs increases or as protein size grows. Considering the vast pool created by AI, SMD in this work was applied to a relatively small, high-confidence subset of candidates after multiple rounds of rapid prescreening, keeping the overall computational cost manageable. In future applications, this step could be further accelerated by integrating machine-learning–based predictors to improve scalability.

      Reviewer #2 (Recommendations for the authors):

      I am not convinced that the difference in rSASA between the designs and the natural spectrin repeat is meaningful. It would be helpful to report confidence intervals for the rSASA values of the designs to clarify whether any differences are statistically robust. Even if such differences prove statistically significant, it is not clear that they are large enough to be practically meaningful.

      In our analysis, rSASA values were calculated from equilibrated MD conformations and were consistently higher for all designed proteins that passed the simulation-based screening compared to the wild-type spectrin repeat. However, we believe that rSASA was used only as a supportive structural descriptor to indicate a trend toward a more compact and better-buried hydrophobic core, rather than as a standalone or decisive metric of stability.

      Protein stability is indeed influenced by multiple factors, including hydrogen bonding, salt bridges, metal coordination, and topology-dependent load-bearing interactions, none of which are captured by rSASA alone. Therefore, we agree with the reviewer that differences in rSASA alone should not be overinterpreted as a quantitative measure of protein stability. For this reason, rSASA was not used as a ranking criterion or a predictor of stability, but only as complementary evidence consistent with the overall design rationale and with the experimentally observed stability enhancements.

      The claim "The strong agreement between computational rankings and experimental measurements validates this approach for prioritizing designs based on relative mechanostability, offering a practical pipeline to bridge the gap between in silico design and experimental validation." should be substantiated by a citation or a figure. Since the authors have the experimental AFM data and steered MD data, I suggest adding a Spearman correlation plot of the two.

      Following this comment, we examined the Spearman rank correlation between SMD-derived unfolding forces and experimentally measured AFM forces (Author response image 4). The resulting correlation was modest (ρ = 0.4, p = 0.6), which is not unexpected given (i) the large difference in force and timescales between high-speed SMD simulations and single-molecule AFM experiments, and (ii) the limited number of designs and simulation repeats available.

      Nevertheless, qualitatively, the difference between the first point from wt-spectrin and the other three specAI is clear. Considering the large computational cost, we only performed three times simulation one each design to balance the accuracy and the cost/time. To avoid overinterpretation, we therefore did not include the correlation analysis in the main text and revised the manuscript to soften claims of strong agreement, emphasizing instead the qualitative and comparative role of SMD in the design pipeline.

      Author response image 4.

      Spearman correlation between SMD and AFM unfolding forces for natural spectrin and SpecAI designs. SMD force (x-axis) versus experimental AFM force (y-axis); each point represents one protein.

      Reviewer #3 (Public review):

      Summary:

      Qiu et al. present a hierarchical framework that combines AI and molecular dynamics simulation to design an α-helical protein with enhanced thermal, chemical, and mechanical stability. Strategically, chemical modification by incorporating additional α-helix, site-specific salt bridges, and metal coordination further enhanced the stability. The experimental validation using single-molecule force spectroscopy and CD melting measurements provides fundamental physical chemical insights into the stabilization of α-helices. Together with the group's prior work on super-stable β strands (https://www.nature.com/articles/s41557-025-01998-3), this research provides a comprehensive toolkit for protein stabilization. This framework has broad implications for designing stable proteins capable of functioning under extreme conditions.

      Strengths:

      The study represents a complete framework for stabilizing the fundamental protein elements, α-helices. A key strength of this work is the integration of AI tools with chemical knowledge of protein stability.

      The experimental validation in this study is exceptional. The single-molecule AFM analysis provided a high-resolution look at the energy landscape of these designed scaffolds. This approach allows for the direct observation of mechanical unfolding forces (exceeding 200 pN) and the precise contribution of individual chemical modifications to global stability. These measurements offer new, fundamental insights into the physicochemical principles that govern α-helix stabilization.

      We appreciate the positive assessment of our manuscript from this reviewer and his/her support. We have answered all the comments as follows and modified the manuscript accordingly.

      Weaknesses:

      (1) The authors report that appending an additional helix increases the overcall stability of the α-helical protein. Could the author provide a more detailed structural explanation for this? Why does the mechanical stability increase as the number of helixes increase? Is there a reported correlation between the number of helices (or the extent of the hydrophobic core) and the stability?

      In multi-helix bundle proteins, tight interhelical packing leads to the formation of a dense hydrophobic core, which substantially enhances overall structural stability. The introduction of an additional helix does not merely increase helix count, but expands the buried hydrophobic interface, improving packing density and cooperative side-chain interactions in the core. This, in turn, strengthens the internal load-bearing network that resists force-induced unfolding.

      From a mechanical perspective, adding a helix also increases topological interlocking among secondary-structure elements, which raises the energetic barrier for unfolding and shifts the unfolding pathway toward more cooperative rupture events, thereby increasing the unfolding force threshold. Consistent with this design principle, pioneering studies have reported a positive correlation between the number of helices (or the extent of the hydrophobic core) in helix bundles and their stability (Lim et al., Structure, 2008, 16:449; Minin et al., J. Am. Chem. Soc., 2017, 139, 16168; Bergues-Pupo et al., Phys. Chem. Chem. Phys., 2018, 20, 29105). Inspired by these works, our AI-protein design study uses the appended helix to reinforce the hydrophobic core rather than simply increasing secondary-structure content.

      (2) The author analyzed both thermal stability and mechanical stability. It would be helpful for the author to discuss the relationship between these two parameters in the context of their design. Since thermal melting probes equilibrium stability (ΔG), while mechanical stability probes the unfolding energy barriers along the pulling coordinate.

      We agree this is a crucial distinction. Thermal and chemical stabilities report on the equilibrium free energy (ΔG), while mechanical stability probes the kinetic unfolding barrier (ΔG‡) along a force-dependent pathway. Their inherent difference makes concurrent improvement in all parameters a non-trivial task, which highlights the importance and success of our integrative design approach.

      (3) While the current study demonstrates a dramatic increase in global stability, the analysis focuses almost exclusively on the unfolding (melting) process. However, thermodynamic stability is a function of both folding (k<sub>f</sub>) and unfolding (k<sub>u</sub>) rates. It remains unclear whether the observed ultrastability is primarily driven by a drastic decrease in the unfolding rate (k<sub>u</sub>) or if the design also maintains or improves the folding rate (k<sub>f</sub>)?

      We agree with the reviewer that thermodynamic stability is determined by both the folding rate (k<sub>f</sub>) and the unfolding rate (k<sub>u</sub>). In the present study, we did not directly measure folding kinetics, and therefore cannot quantitatively deconvolute the respective contributions of k<sub>f</sub> and k<sub>u</sub> to the observed ultrastability. Based on the design strategy and the experimental observations, we propose that the enhanced stability primarily originates from a substantial reduction in the unfolding rate (k<sub>u</sub>), corresponding to an increased unfolding energy barrier. The reinforcement of the hydrophobic core, the introduction of stabilizing interactions such as salt bridges and metal coordination, and the additional helix that increases topological and packing constraints all raise the energetic cost of disrupting key interactions in the folded state.

      This interpretation is consistent with the high mechanical unfolding forces observed in both AFM experiments and SMD simulations. In contrast, these stabilizing features are not necessarily expected to accelerate folding and may even modestly increase folding complexity. Addressing folding kinetics explicitly would require dedicated kinetic experiments or simulations, which are beyond the scope of the present work but represent an interesting direction for future studies.

      (4) The authors chose the spectrin repeat R15 as the starting scaffold for their design. R15 is a well-established model known for its "ultra-fast" folding kinetics, with folding rates (k<sub>f</sub> ~105s), near three orders of magnitude faster than its homologues like R17 (Scott et.al., Journal of molecular biology 344.1 (2004): 195-205). Does the newly designed protein, with its additional fourth helix and site-specific chemical modifications, retain the exceptionally high folding rate of the parent R15?

      We did not directly measure the folding kinetics of the newly designed proteins, and therefore cannot determine whether they retain the exceptionally fast folding rate reported for the parent spectrin repeat R15. While R15 is known for its ultrafast folding behavior, the introduction of an additional fourth helix and site-specific chemical modifications, although beneficial for enhancing stability, may increase the complexity of the folding landscape and do not necessarily guarantee that the folding rate (k<sub>f</sub>) remains comparable to that of R15.

      Reviewer #3 (Recommendations for the authors):

      (1) Please clarify the used Gaussian function to fit the unfolding force distribution (Figure 3-4). In Figure S8, the Bell-Evans model is used to analyze unfolding force. The authors should explain the choice of fitting methods and ensure consistency.

      The Gaussian fitting used in Figures 3–4 is intended as a descriptive statistical analysis to summarize the unfolding force distributions and to facilitate direct comparison between different designs. This approach provides a robust estimate of the most probable unfolding force and the distribution width, without invoking a specific physical unfolding model, and is commonly used in single-molecule force spectroscopy for comparative purposes.

      In contrast, the Bell-Evans model applied in Figure S8 is a kinetic framework that explicitly accounts for force-loading-rate dependence and is used to extract mechanistic insights into the unfolding process. Therefore, the two fitting approaches serve complementary roles: Gaussian fitting for quantitative comparison and ranking of mechanostability, and Bell-Evans analysis for mechanistic interpretation. We have clarified this distinction and the rationale for using both methods in the revised Supplementary Information to ensure consistency and transparency.

      (2) The authors utilized steered MD simulation to analyze the mechanical properties via ForceGen (Ni et al., 2024, Sci. Adv. 10, eadl4000). However, the significant discrepancy between the predicted unfolding force (~600 pN) and the experimental value (~50 pN for spectrin, line 376) requires further justification (line 376). Please clarify how the accuracy of these predictions can be established. Specifically, do the MD simulations successfully capture the relative ranking or trends in stability across the different designed variants?

      We agree with the reviewer that there is a substantial discrepancy between the absolute unfolding forces predicted by SMD simulations (~ 600 pN) and those measured experimentally by AFM (~ 50 pN for spectrin). This difference primarily arises from the orders-of-magnitude mismatch in loading rates between simulations and experiments. In our SMD simulations, the pulling velocity (~10<sup>9</sup> nm/s) is several orders of magnitude higher than that used in AFM experiments (~10<sup>3</sup> nm/s), which is to systematically elevate the apparent unfolding force. In addition to loading-rate effects, limitations in force-field accuracy, finite system size, and restricted conformational sampling further contribute to deviations in absolute force values. As a result, the unfolding forces obtained from SMD are not intended to provide quantitative agreement with experimental measurements or absolute mechanical stability.

      Instead, SMD is employed here as a comparative screening tool to assess relative mechanostability across different designed variants under identical simulation conditions. Despite the limited number of repeats imposed by computational cost, the simulations consistently distinguish candidates with markedly different mechanical responses. Importantly, the variants identified by SMD as more mechanically stable were subsequently confirmed experimentally to exhibit enhanced mechanostability relative to the wild-type spectrin repeat. Therefore, while SMD does not yield quantitatively accurate unfolding forces, it successfully captures relative stability trends and provides a practical and effective means for prioritizing designs prior to experimental validation.

    1. eLife Assessment

      This is an important study showing that movement vigor is not solely an individual property but emerges through interaction when two people are physically linked. The evidence is convincing, supported by a well-controlled experimental design and modeling that closely match the observed behavior. While the authors provided a helpful comparison of several candidate models of human-human interaction dynamics, the statistical power remains limited.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a novel investigation of movement vigor of individuals completing a synchronous extension-flexion task. Participants were placed into groups of two (so-called "dyads") and asked to complete shared movements (connected via a virtual loaded spring) to targets placed at varying amplitudes. The authors attempted to quantify what, if any, adjustments in movement vigor individual participants made during the dyadic movements, given the combined or co-dependent nature of the task. This is a novel, timely question of interest within the broader field of human sensorimotor control.

      Participants from each dyad were labeled as "slow" (low vigor) or "fast" (high vigor), and their respective contributions to the combined movement metrics assessed. The authors presented four candidate models for dyad interactions: (a) independent motor plans (i.e., co-activity hypothesis), (b) individual-led motor plans (i.e., leader-follower hypothesis), (c) generalization to a weighted average motor plan (i.e., weighted adaptation hypothesis), and (d) an uncertainty-based model of dynamic partner-partner interaction (i.e., interactive adaptation hypothesis). The final model allowed for dynamic changes in individual motor plans (and therefore, movement vigor) based on partner-partner interactions and observations. After detailed observations of interaction torque and movement duration (or, vigor), the authors concluded that the interactive adaptation model provided the best explanation of human-human interaction during self-paced dyadic movements.

      Strengths:

      The experimental setup (simultaneous wrist extension-flexion movements) has been thoroughly vetted. The task was designed particularly well, with adequate block pseudo-randomization to ensure general validity of the results. The analyses of torque interaction, movement kinematics, and vigor are sound, as are the statistical measures used to assess significance. The authors structured the work via a helpful comparison of several candidate models of human-human interaction dynamics, and how well said models explained variance in the vigor of solo and combined movements.

      The authors adequately addressed several concerns that I raised in my initial review of the work, including clarity regarding analyses of movement vigor and inclusion of additional analyses of reaction time. The results are supported by both parametric and non-parametric statistical methods.

      The research question is timely and extends current neuroscientific understanding of sensorimotor control, particularly in social contexts. This work answers several new, important questions about control of vigor during volitional movements, and in doing so it motivates future research into the topic.

      Weaknesses:

      My chief concern about the study is the relatively low number of dyad data points (n=10). The authors recruited 20 participants, but the primary conclusions are based on dyad-specific interactions (i.e., analyses of "fast" vs "slow" participants in each pair). However, it is important to note that most of the effects upon which the conclusions rest are associated with relatively large effect sizes.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines how individual movement vigor is integrated into a shared, dyadic vigor when two individuals are physically coupled. Participants performed wrist-reaching movements toward targets at different distances while mechanically linked via a virtual elastic band, and dyads were formed by pairing participants with different baseline vigor profiles. Under interaction conditions, movements converged to coordinated patterns that could not be explained by simple averaging, indicating that each dyad behaved as a single functional unit. Notably, under coupling, movement durations for both partners were shorter than in the solo condition, arguing against the view that each individual simply executed an independent movement plan. Furthermore, dyadic vigor was primarily predicted by the slower partner's vigor rather than by the faster partner's, suggesting that neither a leader-follower strategy nor a weighted averaging account fully explains the observed behavior. The authors propose a computational model in which both partners adapt to the emerging interaction dynamics ("interactive adaptation strategy"), providing a coherent explanation of the behavioral observations.

      Strengths:

      The study is carefully designed and addresses an important question about how individual movement vigor is integrated during joint action. The experimental paradigm allows systematic manipulation of interaction strength and partner asymmetry. The behavioral results show clear and robust patterns, particularly the shortening of movement durations under elastic coupling (KL and KH condition) and the asymmetrical contribution of the slower partner's vigor to dyadic vigor. The computational model captures the main behavioral patterns well and provides a principled framework for interpreting dyadic vigor not as a simple combination of two independent motor plans, but as an emergent property arising from mutual adaptation. Conceptually, the study is notable in extending the notion of vigor from an individual attribute to a dyad-level construct, opening a new perspective on coordinated movement and motor decision-making.

      Weaknesses:

      The revised manuscript now clearly explains why the proposed computational model successfully accounts for the observed dyadic behavior. In particular, the mechanisms by which uncertainty associated with the slower partner and time-related costs of the faster partner jointly shape dyadic vigor are now clear. I have no further comments to add.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides novel insights into how individuals regulate the speed of their movements both alone and in pairs, highlighting consistent differences in movement vigor across people and showing that these differences can adapt in dyadic contexts. The findings are significant because they reveal stable individual patterns of action that are flexible when interacting with others, and they suggest that multiple factors, beyond reward sensitivity, may contribute to these idiosyncrasies. The evidence is generally strong, supported by careful behavioral measurements and appropriate modeling.

      The authors have addressed all of my previous comments. I appreciate the clarification of abbreviations, terminology, and key concepts, the expansion of the discussion, and the adjustments to some of the statistical analyses in response to both my earlier comments and those of Reviewer 1.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive and precise comments, which have helped us improve the consistency and clarity of our manuscript. Below, we provide a point-by-point response to each comment. In summary, the main changes introduced in the revised version are as follows:

      (1) We replaced all the statistical analyses to their non-parametric equivalents to ensure compliance with test assumptions and consistency of the results;

      (2) We compare the participants’ reaction times before and during connected practice, revealing a significant reduction in reaction times of both partners when connected;

      (3) We added, in the supplementary materials, a table reporting the vigor scores of each participant in each experimental condition, facilitating the assessment of individual and dyadic behaviors;

      (4) We have reviewed and refined the terminology throughout the manuscript and reduced the number of abbreviations to improve clarity.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel investigation of the movement vigor of individuals completing a synchronous extension-flexion task. Participants were placed into groups of two (so-called "dyads") and asked to complete shared movements (connected via a virtual loaded spring) to targets placed at varying amplitudes. The authors attempted to quantify what, if any, adjustments in movement vigor individual participants made during the dyadic movements, given the combined or co-dependent nature of the task. This is a novel, timely question of interest within the broader field of human sensorimotor control.

      Participants from each dyad were labeled as "slow" (low vigor) or "fast" (high vigor), and their respective contributions to the combined movement metrics were assessed. The authors presented four candidate models for dyad interactions: (a) independent motor plans (i.e., co-activity hypothesis), (b) individual-led motor plans (i.e., leader-follower hypothesis), (c) generalization to a weighted average motor plan (i.e., weighted adaptation hypothesis), and (d) an uncertainty-based model of dynamic partner-partner interaction (i.e., interactive adaptation hypothesis). The final model allowed for dynamic changes in individual motor plans (and therefore, movement vigor) based on partner-partner interactions and observations. After detailed observations of interaction torque and movement duration (or vigor), the authors concluded that the interactive adaptation model provided the best explanation of human-human interaction during self-paced dyadic movements.

      Strengths:

      The experimental setup (simultaneous wrist extension-flexion movements) has been thoroughly vetted. The task was designed particularly well, with adequate block pseudo-randomization to ensure general validity of the results. The analyses of torque interaction, movement kinematics, and vigor are sound, as are the statistical measures used to assess significance. The authors structured the work via a helpful comparison of several candidate models of human-human interaction dynamics, and how well said models explained variance in the vigor of solo and combined movements. The research question is timely and extends current neuroscientific understanding of sensorimotor control, particularly in social contexts.

      We thank the reviewer for their in-depth analysis and constructive assessment of our manuscript.

      Weaknesses:

      (1) My chief concern about the study as it currently stands is the relatively low number of data points (n=10). The authors recruited 20 participants, but the primary conclusions are based on dyad-specific interactions (i.e., analyses of "fast" vs "slow" participants in each pair). Some of these analyses would benefit greatly, in terms of power, from the addition of more data points.

      We understand and appreciate the reviewer’s concern regarding the effective sample size at the dyad level (n=10). While our primary analyses focus on dyad-specific interactions, we note that the reported effects are consistent across multiple dynamic conditions and are associated with large effect sizes. To provide a conservative assessment the Cohen’s D values reported correspond to the smallest effect size observed across the relevant statistical tests, thereby limiting the risk of false positives or overinterpretation. In addition, to ensure robustness given the sample size and distribution properties of the data, we have replaced all parametric tests with their non-parametric counterparts, as some analyses violated ANOVA assumptions. Friedman and Kruskal-Wallis tests are now used for paired and unpaired main effects respectively, and Wilcoxon and Mann-Whitney tests for paired and unpaired post-hoc comparisons respectively. Note that these changes did not alter the conclusions of the study.

      (a) The distribution of delta-vigor (Fast group vs Slow group) is highly skewed (see Figures 3D, S6D), with over half of the dyads exhibiting delta-vigor less than 0.2 (i.e., less than 20% of unit vigor). Given the relatively low number of dyads, it would be helpful for the authors to provide explicit listings of VigorFast, VigorSlow, and VigorCombined for each of the 10 separate dyads or pairings.

      We agree with this comment. However, we note that the distribution of vigor scores within a population is typically centered around 1, with large deviations observed only for the fastest and slowest participants [1]. As a result, the distri bution of ∆-vigor is inherently skewed. Correcting for this skewness would (i) require pairing participants based on their vigor, which is logistically difficult, and (ii) lead to an atypical sampling of dyads, with an over representation of pairs exhibiting very large vigor differences. The distributions of vigor scores for the fast and slow groups before and after the interaction are reported in Supplementary Fig. S21. In addition, as suggested by the reviewer, we have now included Table S.1 in the supplementary materials, listing the values VigorFast, VigorSlow, and VigorCombined for each of the 10 dyads. This table provides a complete view of the evolution of participant’s vigor throughout the experiment.

      (b) The authors concluded that the interactive adaptation hypothesis provided the best summary of the combined movement dynamics in the study. If this is indeed the case, then the relative degree of difference in vigor between the fast and slow participants in a dyad should matter. How well did the interactive adaptation model explain variance in the dyads with relatively low delta-vigor (e.g., less than 0.2) vs relatively high delta-vigor?

      We initially expected the magnitude of difference in individual vigor within a dyad to play a significant role. However, our analysis did not reveal any systematic effect of ∆-vigor on either the interaction force or the resulting dyadic vigor, as shown by the LMM analysis. Importantly, the interactive adaptation hypothesis does per se imply that the magnitude of vigor differences between the two partners should matter, only that their respective roles in selecting the adapted behavior is different. Although the model includes several free parameters, we did not attempt to fit it to individual dyads as would in principle be possible. Instead, we performed a sensitivity analysis to assess how variations in the difference in vigor between the partners influence model predictions. For this purpose, we simulated increasing values of µ and variations in the fast partner’s cost of time. In addition, we demonstrated that uncertainty in the estimated behavior of the slow partner, which is a priori specific to each individual, has a substantial impact on the optimal movement duration of the dyad. Overall, this analysis shows that the model captures the full range of qualitative trends observed in the experimental data. When applied to predict the behavior of the average dyad, the resulting movement time prediction error remain small, as detailed in the Results section.

      (2) The authors shared the results of one analysis of reaction time, showing that the reaction times of the slow partners and the fast partners did not differ during the initial passive block. Did the authors observe any changes in RT of either the slow or fast partner during the combined (primary task) blocks (KL, KH, etc.)? If the pairs of participants did indeed employ a form of interactive adaptation, then it is certainly plausible that this interaction would manifest in the initial movement planning phase (i.e., RT) in addition to the vigor and smoothness of the movements themselves.

      We thank the reviewer for this interesting question, that prompted us to extend our analysis of reaction times to the connected conditions. This additional analysis revealed a significant main effect of the condition on the reaction time for both the fast and slow groups (in both cases: W<sub>2</sub> > 0.39, p < 0.02). Post-hoc comparisons showed a significant reduction in reaction time between the initial null-field block (NF1) and the KH condition for the slow group (p = 0.03, D = 1.46), and a similar trend for the fast group (p = 0.06, D = 1.03). However, the reaction times remained comparable between the two groups, with no significant difference between them. We have incorporated these observations in the Results section (p.4, l.100–109) and expanded the Discussion (p.11, l.341–348) to address their implications for interactive adaptation in human-human and human-robot physical interactions.

      Reviewer #2 (Public review):

      Summary:

      This study examines how individual movement vigor is integrated into a shared, dyadic vigor when two individuals are physically coupled. Participants performed wrist-reaching movements toward targets at different distances while mechanically linked via a virtual elastic band, and dyads were formed by pairing participants with different baseline vigor profiles. Under interaction conditions, movements converged to coordinated patterns that could not be explained by simple averaging, indicating that each dyad behaved as a single functional unit. Notably, under coupling, movement durations for both partners were shorter than in the solo condition, arguing against the view that each individual simply executed an independent movement plan. Furthermore, dyadic vigor was primarily predicted by the slower partner’s vigor rather than by the faster partner’s, suggesting that neither a leader-follower strategy nor a weighted averaging account fully explains the observed behavior. The authors propose a computational model in which both partners adapt to the emerging interaction dynamics ("interactive adaptation strategy"), providing a coherent explanation of the behavioral observations.

      Strengths:

      The study is carefully designed and addresses an important question about how individual movement vigor is integrated during joint action. The experimental paradigm allows systematic manipulation of interaction strength and partner asymmetry. The behavioral results show clear and robust patterns, particularly the shortening of movement durations under elastic coupling (KL and KH conditions) and the asymmetrical contribution of the slower partner’s vigor to dyadic vigor. The computational model captures the main behavioral patterns well and provides a principled framework for interpreting dyadic vigor not as a simple combination of two independent motor plans, but as an emergent property arising from mutual adaptation. Conceptually, the study is notable in extending the notion of vigor from an individual attribute to a dyad-level construct, opening a new perspective on coordinated movement and motor decision-making.

      We thank the reviewer for their thorough analysis of our manuscript and their constructive feedback.

      Weaknesses:

      (1) A key conceptual issue concerns the apparent asymmetry between partners in the computational framework. While dyadic vigor is empirically better predicted by the slower partner’s vigor, the model formulation appears to emphasize the faster partner’s time-related cost and interaction forces. Although the cost function includes an uncertaintyrelated component associated with the slower partner, it remains unclear from the current formulation and description how dyadic vigor is formally derived from the slower partner’s control policy within the same modeling framework. This raises an important question regarding whether the model offers a symmetric account of dyadic vigor formation for both partners or whether it is effectively anchored to the faster partner’s control architecture.

      We have modified our phrasing to clarify the principles according to which the computational framework was designed (p.7, l.226–231 and p.9, l.260–264). As stated in the Results section, the model is indeed asymmetric by design, which corresponds to the different roles of the fast and slow partner exhibited in the data. In that context, the uncertain term associated with the slow partners should be understood as an overarching constraint that conditions the strategy of the dyad, while the fast partner cost of time acts as a contributor to the expected dyad strategy. Conceptually and numerically as reported in the sensitivity analysis, this asymmetry corresponds to the role of the slow partners in setting the vigor ranking among the dyads and the role of the fast partner in setting the average dyadic behavior.

      (2) A second conceptual issue concerns the interpretation of the term "motor plan." It remains unclear whether this term refers primarily to movement-related characteristics such as speed or duration, or more broadly to the underlying optimization structure that governs these variables. This distinction is theoretically important, as it determines whether the reported interaction effects should be understood as adjustments in movement characteristics or as changes in the structure of the control policy itself.

      We agree with the reviewer that this terminology required clarification. In this paper, the term “motor plan” refers to the time series of control inputs planned by the CNS, rather than solely to kinematic descriptors such as speed or duration. These planned control signals are a direct consequence of the underlying optimization structure and cost functions that govern trajectory generation. We have clarified this definition in the Introduction (p.1, l.23–24).

      Reviewer #3 (Public review):

      Strengths:

      This study provides novel insights into how individuals regulate the speed of their movements both alone and in pairs, highlighting consistent differences in movement vigor across people and showing that these differences can adapt in dyadic contexts. The findings are significant because they reveal stable individual patterns of action that are flexible when interacting with others, and they suggest that multiple factors, beyond reward sensitivity, may contribute to these idiosyncrasies. The evidence is generally strong, supported by careful behavioral measurements and appropriate modeling, though clarifying some statistical choices and including additional measures of accuracy and smoothness would further strengthen the support for the conclusions.

      Thank you for this analysis and the insightful feedback.

      Major Comments:

      (1) Given the idiosyncrasies in individual vigor, would linear mixed models (LMMs) be more appropriate than ANOVAs in some analyses (e.g., in the section "Solo session"), as they can account for random intercepts and slopes on vigor measures? Some figures (e.g., Figure 2.B and 3.E) indeed seem to show that some aspects of behaviour may present variability in slopes and intercepts across participants. In fact, I now realize that LMMs are used in the "Emergence of dyadic vigor from the partners’ individual vigor" section, so could the authors clarify why different statistical approaches were applied depending on the sections?

      We thank the reviewer for this thoughtful comment. We deliberately used different statistical approaches throughout the paper in order to address different types of questions. Note that the statistical tests were converted to their nonparametric equivalent for consistency (see answer to Reviewer 1).

      - Friedman tests were used in a limited number of cases to assess population- or group-level effects, such as differences in movement time, smoothness, or accuracy across the solo, connected, and after-effects conditions. Such tests provide a straightforward framework for these descriptive, condition-level comparisons.

      - The stability of individual and dyadic vigor scores across conditions was assessed using Pearson correlations across all condition pairs, which we consider the most direct and interpretable approach for evaluating consistency across sessions.

      - LMMs were employed to examine how dyadic vigor relates to the partners’ individual vigor measured in the solo conditions, which revealed the critical contribution of the slow partner.

      Rather than applying a single statistical framework throughout, we selected the method best suited to each question. While LMMs are well suited for modeling participant-specific variability when linking individual and dyadic measures, their systematic use in all analyses would be less intuitive and would not directly address several of the population-level comparisons central to this study.

      (2) If I understand correctly, the introduction suggests that idiosyncrasies in movement vigor may be driven by interindividual differences in reward sensitivity. However, the current task does not involve any explicit rewards, yet the authors still observe idiosyncrasies in vigor, which is interesting. Could this indicate that other factors contribute to these consistent individual differences? For example, could sensitivity to temporal costs or physical effort explain the slow versus fast subgrouping? Specifically, might individuals more sensitive to temporal costs move faster to minimize opportunity costs, and might those less sensitive to effort costs also move faster? Along the same lines, could the two subgroups (slow vs. fast) be characterized in terms of underlying computational "phenotypes," such as their sensitivities to time and effort? If this is not feasible with the current dataset, it would still be valuable to discuss whether these factors could plausibly account for the observed patterns, based on existing literature.

      We thank the reviewer for this interesting question. We first note that the notion of reward in motor control is quite broad. Although our task did not include explicit external (e.g. monetary) rewards, we assumed that participants attribute an implicit value to completing the task in accordance with the experimenter’s instructions. This assumption has been shown to be appropriate for characterising baseline behavior in previous studies [2–5].

      As discussed in the Introduction, vigor is generally understood to emerge from a tradeoff between effort, accuracy, and time. The reviewer is correct in noting that inter-individual differences in vigor may reflect differences in reward sensitivity or in its discounting [3,6], given that time and reward are intrinsically coupled. Differences in vigor may also arise from inter-individual variability in sensitivity to effort or perceived task difficulty. Because these factors are intertwined—for example, increasing accuracy through co-contraction typically incurs greater effort [7])—it is challenging to disentangle their respective contributions based solely on behavioral data.

      In the present study, our inverse optimal control procedure to identify the cost of time (and thus predict individuals’ vigor) relies on a predefined effort-accuracy tradeoff under fixed final time across multiple movement amplitudes [8]. As a result, the model does not allow us to independently estimate individual sensitivities to effort, accuracy, and time. Such characterization of computational "phenotypes" would likely require experimental paradigms in which each of these factors is systematically manipulated while the others are held constant, which is beyond the scope of the current dataset. In practice, the main value of behavioral modeling lies in revealing the relative weighting of these criteria by the CNS during motor planning [5]. We have expanded the Discussion to clarify these limitations and considerations (see Discussion p.12, l.396–401 & l.407–412).

      Finally, we chose not to emphasize these broader issues in the present manuscript because (i) they are peripheral to our primary research question on how individual vigor influences human-human interaction, and (ii) although we do not yet have definitive and consensual answers, they have been addressed in multiple studies reviewed elsewhere [9,10].

      (3) The observation that dyads did not lose accuracy or smoothness despite changes in vigor is interesting and suggests a shift in the speed-accuracy tradeoff. Could the authors include accuracy and smoothness measures in the main figures rather than only in supplementary materials? I think it would make the manuscript more complete.

      We also find that the preservation of accuracy and smoothness despite changes in vigor is an interesting result, and we therefore chose to report these measures in the Supplementary Materials. However, we believe it is preferable not to include them in the main figures for the following reasons:

      - We avoid framing our results in terms of a speed-accuracy trade-off, as Fitts’ work was initially designed to study fast movements [11], whereas our work focuses on self-paced movements. As outlined in the Introduction, vigor is more appropriately interpreted as reflecting a tradeoff between effort (related to movement speed), accuracy, and time. From this perspective, the reported changes of vigor already capture a shift in the underlying trade-off selected by the CNS, using a framework better suited to our experimental paradigm.

      - The manuscript is technically dense and reports multiple analyses that are essential to establish (i) the existence and definition of dyadic vigor, and (ii) how it emerges from interaction between partners. Although the observed preservation of accuracy and improvements in smoothness are informative, they are not central to these two primary questions and would risk diverting attention from the core contributions of the paper. In addition, accuracy is not a feature predicted by our deterministic modeling and extensions would be needed to capture these aspect. Here we only attempted to replicate average behaviors.

      (4) It is a bit unclear to me whether the variance assumptions for ANOVAs were checked, for instance, in Figure 3H.

      We thank the reviewer for this comment, which prompted us to verify the assumptions underlying our ANOVAs. We found that a few distributions in the original analysis, as well as in some of the new tests, did not meet these assumptions. To ensure consistency, all statistical analyses have now been replaced with non-parametric tests: Friedman and Kruskal-Wallis tests for paired and unpaired main effects, Wilcoxon and Mann-Whitney tests for paired and unpaired post-hocs. The updated results do not change any of the conclusions. the only minor change is accuracy, that appeared slightly improved in a restricted number of connected conditions, and now appears mostly non-impacted.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) Lines 146-147. The authors state, "Whereas the fast partners maintained a similar duration". Figures S6H,I suggest that fast partners made slower movements during the paired task relative to the solo task, not movements with a similar duration.

      We agree that Fig. S.6H,I suggest slightly slower movements for the fast partners, though not significant. We have modified the sentence to be less assertive than in the previous version (see p.6, l.155).

      (2) In the Discussion (Lines 318-319), the authors state that their findings confirm and extend the "benefits of dyadic control in collaborative actions". What benefits are they referring to here, relative to individual control? It would be helpful if the authors would elaborate on this claim.

      We have modified this sentence to clarify that the benefits of dyadic control refer to previously reported advantages over individual control, namely reduced movement time Reed and Peshkin (2008) [12] and improved tracking accuracy [13,14] (see p.11, l.336–337).

      (3) On Lines 87-89, the authors reference a decomposition of variance of vigor scores across the NF1, VL, and VH conditions; however, I did not see an explanation of how this decomposition was performed. The method used to estimate variance explained by inter-individual vs intra-individual differences in vigor should be outlined for the reader.

      Thank you for pointing out this missing information. We now explain in the statistical analysis section (see p.14, l.504–507), that the percentage of inter-individual variability in vigor is estimated using sum-square values as an estimation of inter- and intra-individual variability.

      (4) How was the absolute interaction torque for a paired movement calculated? Was it an integral of the temporal profile of torque for some portion of the combined movement? The method for calculating the absolute interaction torque needs to be specified.

      We have now clarified in the Methods (see p.14, l.490–491) that the reported average interaction effort was computed as the absolute value of the interaction torque as a function of time averaged over the entire movement.

      (5) Lines 123-124: "... interaction torque showed no significant correlation with differences in individual vigor within dyads." This statement should be supported by appropriate statistical measures.

      This result is now supported by reporting the corresponding Pearson correlation analyses. No significant correlations were found between interaction torque and differences in individual vigor within dyads (KL conditions: |r| < 0.43, p> 0.22; KH conditions: |r| < 0.18, p > 0.61, see p.5, l.132–133).

      (6) For the analysis, presented in Figure 3C, and specified on lines 116-123, the text mentions the main effects of both condition and target. There doesn’t appear to be much of an effect of the target for the KH data. Should these results not be reported as an interaction effect between the two factors instead?

      We agree with the reviewer and have corrected our presentation of these results (see p.4, l.126–128). Consistent with the reviewer’s observation, no significant effect of the target is found in the KH condition.

      (7) Figures 3E and S6B. What is the purpose of including the averaged data for each pair in addition to both individuals’ data from each pair? It would be useful to distinguish the individual data from the average data for each pair. Frankly, the number of data points shown on this sub-figure is excessive.

      There may have been a misunderstanding. Because the partners of a dyad are connected by a virtual elastic band (rather than a rigid bar), they do not execute identical movements. Therefore Figs. 3E,S6B display the movement time of all individual participants, together with the corresponding 20 individual regression lines, like in Fig. 2B. The solid black line represents the average across all individuals, and the averaged behaviors of dyads are not included. We have clarified this point by revising the caption of Fig. 3E (see p.5).

      Noted mis-spellings:

      Figure S.3A caption: "trials towards this target."

      Page 10 Line 313: "Importantly, these findings show ...".

      These mis-spellings have been corrected at supplementary p.2 and main text p.11, l.331. Thank you!

      Reviewer #2 (Recommendations for the authors):

      (1) To illustrate the contribution of the three components used to calibrate the overall cost function, it would be informative to include simulation analyses in which each component is selectively removed (i.e., ablation analyses).

      We did not perform ablation analyses, as selectively removing components of the model can lead to instability or ill-suited control inputs, making the resulting simulations difficult to interpret. Instead, we conducted a sensitivity analysis of the key parameters shaping the overall cost function, including the estimated mean and deviation of the slow partner’s movement duration, the weight associated with uncertain torque minimization (Figs. S.18,S.19), and the fast partner’s cost of time (Fig. S20). This analysis reveals the predominant roles of the estimated slow partner movement patterns in determining the model predictions, in agreement with our experimental observations.

      (2) Although the authors refer to the motor-off condition as "passive," participants actively generated the movements in the absence of external forces. Thus, this condition corresponds to active, unassisted movement. A different term may therefore reduce potential confusion for readers.

      We agree that term “passive” was not well-chosen given the context of the paper, thus we have instead replaced this denomination as “null-field” condition. Consequently, the P1 and P2 blocks are now referred to as NF1 and NF2.

      (3) Please clarify the instructions given to participants. Were they informed in advance that their movements would physically interact with those of their partner?

      Thank you for pointing out this missing clarification. We have now specified in the Methods (p.14, l.465–469) that participants were not informed prior to any condition that they would interact with a human partner; they were only told that the robot would provide assistance. When debriefed at the end of the experiment, only one out of the 20 participants reported having realized that they were connected to another human. Most participants believed they were interacting either with a version of themselves or with a robot with some randomness.

      (4) Line 475. Should "Fig. 2D" be "Fig. 2B"?

      Thank you for catching this error. The reference has been corrected to Fig. 2B (see p.15, l.522).

      Reviewer #3 (Recommendations for the authors):

      (1) The analysis of reaction times shows no difference between groups in the passive block, which challenges the assumption that movement vigor covaries with decision speed or action initiation speed. It may be worth discussing this in the context of recent literature.

      We agree that the initial analysis and discussion of reaction times were too superficial. In the revised manuscript, we now report that dyadic interaction leads to significantly shorter reaction times (p.4, l.100–109), concomitantly with improved movement velocity. We have also expanded the Discussion, on the relationship between decision and action speeds/durations (p.11, l.340–348).

      (2) Many abbreviations are unusual for a non-expert. I would recommend using the full terms instead. At least initially, I found it difficult to follow the results because the abbreviations were not immediately clear (at least to me).

      We agree that the paper had to many abbreviations. Therefore, we have removed the abbreviated names of the models and, when possible without impacting the readability, used the full names of the conditions.

      (3) Relatedly, the notation in Figure 1 may be confusing. The labels "S" and "F" (slow and fast) correspond to different concepts than "F" and "L" (follower and leader), so the same participant could be labeled "F" as fast but not "F" as a leader.

      Thank you for pointing out this potential source of confusion. We have therefore modified Fig. 1A (p.2) to avoid any potential confusion by using the full model names rather than abbreviations. In the remainder of the manuscript, "S" and "F" exclusively denote the slower and faster partners within a dyad, and we do not use abbreviations for "leader" or "follower" in the text.

      (4) In figures like 2.C and 3.I, keeping the same scales on the x and y axes and adding a diagonal reference line would make it easier to see shifts across conditions.

      As explained in the Methods, vigor scores in the low- and high-viscosity conditions were computed using the average movement durations from the NF1 condition as a reference. Consequently, because movements are slower in these conditions, the corresponding vigor values are lower than those in NF1. For this reason, using identical scales on the x- and y-axes and adding a 45◦ reference line could mislead the reader in thinking that the vigor scores are expected to be identical and reduce the readability of the figure.

      (5) Multiple hypotheses about dyadic regulation of vigor are nicely explained; it could help to indicate if any of these were a priori favored based on prior literature.

      Previous literature provides mixed evidence regarding how vigor might be regulated in dyadic interaction. For instance, Takagi et al. (2016) [15] reported that mechanically connected partners may rely on independent motor plans, which corresponds to the co-activity hypothesis considered here. However, in that study, movement duration was prescribed. We therefore expected that removing this constraint on movement duration could allow coordination strategies to emerge, particularly in view of findings on haptic communication during tracking of random targets while connected via an elastic band [13,14].

      At the same time, a large body of work on human–human and human–robot interaction has interpreted coordination through a leader–follower framework. In our context, vigor is understood as the outcome of a tradeoff between effort and elapsed time, with time being associated with a decaying reward. Based on this framework, we hypothesized a priori that a leader–follower scheme would emerge, in which the fast partner—being more sensitive to time costs and/or less sensitive to effort—would tend to drive the interaction, even at the expense of increased effort. For these reasons, the leader–follower hypothesis was formulated as the expected outcome throughout the manuscript.

      (6) In the introduction, statements such as "relative vigor of an individual is remarkably stable" appear true only in the solo condition. The same is true in the discussion where it is said that vigor is a stable trait. The whole study show that an individual can shift his/her vigor to the same vigor of another individual, so it doesn’t appear stable to me in such conditions but adaptable.

      Let us first clarify that when we describe vigor as “remarkably stable”, we do not imply that individuals do not adjust their movement timing in response to changes in external dynamics. For example, movement durations increase in visco-resistive conditions even during solo performance; nevertheless, individuals who move faster in the absence of resistance will remain faster relative to others when resistance is introduced. In this sense, stability refers to the preservation of relative rankings across conditions, rather than invariance of absolute movement timing. Because interaction with another individual constitutes a substantial change in task dynamics, an effect on individual pace is therefore expected.

      Told that (and as pointed to by the reviewer) (i) dyadic interactions lead to the emergence of a dyadic vigor characterized by average movement durations close to those of the fast partners, while the ranking across dyads is largely imposed by the slow partners; and (ii) these adaptations persist after the interaction phase. Importantly, the observed vigor adaptations appear to last longer in our physical interaction task than in previous attempts to manipulate vigor using visual feedback [16]. To account for this adaptability of vigor, we have (i) clarified claims in the Introduction regarding the stability of vigor (see p.1, l.18–20), and (ii) expanded the Discussion to more explicitly address vigor adaptability and the possible resulting consequences for the concept of vigor (see p.12, l.407–412).

      References

      (1) O. Labaune, T. Deroche, C. Teulier, and B. Berret, “Vigor of reaching, walking, and gazing movements: on the consistency of interindividual differences,” Journal of Neurophysiology, vol. 123, pp. 234–242, jan 2020.

      (2) L. Rigoux and E. Guigon, “A model of reward-and effort-based optimal decision making and motor control,” PLoS Computational Biology, vol. 8, pp. 1–13, Jan. 2012.

      (3) R. Shadmehr, J. J. O. de Xivry, M. Xu-Wilson, and T.-Y. Shih, “Temporal discounting of reward and the cost of time in motor control,” Journal of Neuroscience, vol. 30, pp. 10507–10516, aug 2010.

      (4) B. Berret and G. Baud-Bovy, “Evidence for a cost of time in the invigoration of isometric reaching movements,” Journal of Neurophysiology, vol. 127, pp. 689–701, feb 2022.

      (5) D. Verdel, O. Bruneau, G. Sahm, N. Vignais, and B. Berret, “The value of time in the invigoration of human movements when interacting with a robotic exoskeleton,” Science Advances, vol. 9, sep 2023.

      (6) K. Jimura, J. Myerson, J. Hilgard, T. S. Braver, and L. Green, “Are people really more patient than other animals? evidence from human discounting of real liquid rewards,” Psychonomic Bulletin & Review, vol. 16, pp. 1071–1075, dec 2009.

      (7) P. L. Gribble, L. I. Mullin, N. Cothros, and A. Mattar, “Role of cocontraction in arm movement accuracy,” Journal of Neurophysiology, vol. 89, pp. 2396–2405, may 2003.

      (8) B. Berret and F. Jean, “Why Don’t We Move Slower? The Value of Time in the Neural Control of Action,” Journal of Neuroscience, vol. 36, pp. 1056–1070, Jan. 2016.

      (9) R. Shadmehr and A. A. Ahmed, Vigor : neuroeconomics of movement control. The MIT Press, 2020.

      (10) D. Thura, A. M. Haith, G. Derosiere, and J. Duque, “The integrated control of decision and movement vigor,” Trends in Cognitive Sciences, vol. 29, pp. 1146–1157, Dec. 2025.

      (11) P. M. Fitts, “The information capacity of the human motor system in controlling the amplitude of movement,” Journal of Experimental Psychology, vol. 47, pp. 381–391, June 1954.

      (12) K. B. Reed and M. A. Peshkin, “Physical collaboration of human-human and human-robot teams,” IEEE Transactions on Haptics, vol. 1, pp. 108–120, July 2008.

      (13) G. Gowrishankar, A. Takagi, R. Osu, T. Yoshioka, M. Kawato, and E. Burdet, “Two is better than one: physical interactions improve motor performance in humans,” Scientific Reports, vol. 4, Jan. 2014.

      (14) A. Takagi, G. Ganesh, T. Yoshioka, M. Kawato, and E. Burdet, “Physically interacting individuals estimate the partner’s goal to enhance their movements,” Nature Human Behaviour, vol. 1, pp. 1–6, Mar. 2017.

      (15) A. Takagi, N. Beckers, and E. Burdet, “Motion plan changes predictably in dyadic reaching,” PLOS ONE, vol. 11, p. e0167314, Dec. 2016.

      (16) P. Mazzoni, B. Shabbott, and J. C. Cortes, “Motor control abnormalities in Parkinson’s disease,” Cold Spring Harbor Perspectives in Medicine, vol. 2, pp. a009282–a009282, Mar. 2012.

    1. eLife Assessment

      This valuable work extends a previously published regression framework for trial-aligned photometry data incorporating functional variables. However, the evidence is generally incomplete, due to the way that within-trial changes in variables have been incorporated into an inherently cross-trial analysis framework, which will limit general adoption. The ideas in this work will be of interest to researchers analyzing photometry signals.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to extend a prior fiber photometry analysis process they developed by incorporating the new ability to determine instantaneous, within trial, relationships between the photometry signal and continuously changing variables. They present solid evidence via simulations and example use cases from published datasets highlighting that their approach can capture instantaneous relationships. Overall, while they make a compelling case that this approach is less biased and more insightful, the implementation for many experimentalists remains challenging enough and may limit widespread adoption by the community.

      Strengths:

      This work builds on prior efforts to analyze photometry signals in a less biased and more statistically sound way. This work incorporates a very important aspect by avoiding the need to summarize individual trials with singular behavioral variables and instead allows for interactions with continuously changing variables to be investigated. The knowledge and expertise of the authors and the presentation provide strong validity and strength to the work. Examples from prior studies in the field are a necessary and important component of the work.

      Weaknesses:

      While use cases are provided from prior data, a clearer presentation of how common implementations in the field are performed (i.e. GLM) and how one could alternatively use the cFLMM approach would help. Otherwise, most may continue using common approaches of Pearson's correlations and GLMs.

    3. Reviewer #2 (Public review):

      The paper presents a regression-based approach for analysing fiber photometry data termed Concurrent Functional Mixed Models (cFLMMs). The approach works by fitting linear mixed effect models separately to each time point in trial aligned data, then applying smoothing to the model coefficients (betas), and computing confidence intervals. The method extends the authors previous work on using FLMMs for photometry data analysis by allowing for the inclusion of predictors whose value changes across timepoints within a trial, rather than just from trial to trial. As fiber photometry is a rapidly expanding field, developing principled methods to analyse photometry data is valuable, particularly as the authors have released an R package that implements their method to facilitate their use by other groups. The basic FLMM approach for using mixed effects models to analyse trial aligned photometry data, detailed by the authors in their previous manuscript (Loewinger et al. 2025, doi: 10.7554/eLife.95802) appears valuable. The aim of incorporating variables that change within trial into this framework is interesting, and the technical implementation appears to be rigorous. However, I have some reservations as to whether the way in which variables that change within trial have been integrated into the analysis framework is likely to be widely useful, and hence how impactful the additional functionality of cFLMM relative to the previously published FLMM will be.

      In the original FLMM approach, where predictors change only from trial-to-trial, fitting separate regressions at each timepoint generates a timeseries of betas is for each predictor, indicating when and how the predictor explained variance across the trial. This makes a lot of sense and is widely used in neuroscience data analysis. In extending this approach to incorporate variables that change within trial, the authors have used the same method of fitting separate regression models at each timepoint, to obtain a timeseries of betas for each predictor. It is less clear that this approach makes sense for variables that change within trial. This is because the resulting betas only capture how variation in the predictor across trials at a given timepoint explains variance in the signal, but does not capture effects of variation in the predictor across timepoints within trials. This partitioning of variance in the predictor into a between-trial component whose effect on the signal is modelled, and a within-trial component whose effect on the signal is not, is artificial in many experiment designs, and may yield hard to interpret results.

      Consider e.g. the experimental condition considered in Figure 3, taken from Machen et al. 2025 (doi: 10.1101/2025.03.10.642469) in which mice ran down a linear track to collect rewards. In analysing such data, one might want to know how neural activity covaried with the animal's position, but as this variable changes strongly within trial but will have a similar time-course across trials, the cFLMM analysis approach will not work to quantify these effects. This is because variance attributed to position would not capture how neural activity covaried with changes in the animals position within trial, but rather how neural activity covaried with changes in the animals position from trial-to-trial at a given timepoint, which could occur due to e.g. trial-to-trial differences in latency to start moving or running speed. As such, although significant effects of 'position' might be observed, they would not capture covariation between position and neural activity in a straightforwardly interpretable way.

      It is therefore not obvious to me that incorporating variables that change within trial into an analysis framework that runs separate regressions at each timepoint in trial aligned data is likely to be widely useful. If scientific questions require understanding how neural activity covaries as a function of variables that change both within and across trials, an alternative approach would be to run a single regression analysis across all timepoints, and capture the extended temporal responses to discrete behavioural events by using temporal basis functions convolved with the event timeseries. This provides a very flexible framework for capturing covariation of neural activity both with variables that change continuously such as position, and discrete behavioural events such as choices or outcomes, while also handling variable event timing from trial-to-trial.

      One way that cFLMM is used in the manuscript is to handle variable timing of trial events in trial aligned data. In the Machen et al. data, the time when the animal reaches the reward varies from trial to trial, and this is represented in the cFLMM analysis by a binary variable which changes value at this timepoint. From the resulting beta coefficient timeseries (Figure 3C) it is not straightforward to understand how neural activity changed as the subject approached and then received the reward. A simpler approach to quantify this, which I think would have yielded more interpretable coefficient timeseries would have been to align activity across trials on when the subject obtained the reward, rather than on the start of the trial, allowing e.g. the effect of reward type to be visualised as a function of time relative to reward delivery, and hence to see the differential effects during approach vs consumption. More broadly, handling variable trial timing in analyses like FLMM which use trial aligned data, can be achieved either by separately aligning the data to different trial events of interest or by time warping the signal to align multiple important timepoints across trials. It is not obvious that using cFLMM with binary indicator variables that indicate when task states changed will yield a clearer picture of neural activity than these methods.

      It may be that I am missing some key strengths of cFLMM relative to the other approaches I have outlined, or that there are applications where this approach to implementing within-trial variable changes is a natural formalism. However my impression is that while cFLMM represent a technical advance, it is not clear how widely useful the model formalism will be.

    4. Reviewer #3 (Public review):

      Summary:

      This work is an extension of their previous study (Loewinger et al 2025) describing a statistical framework for the analysis of photometry data using functional linear mixed models with joint confidence intervals, together with an open-source tool implemented in R. The present study extends it by adding the possibility of using 'concurrent' variables (variables that change within a trial) as regressors, for example, capturing the change of speed at each timepoint in the trial. The main claim is that using 'concurrent' regressors can identify associations between signal and behavior that could not be captured by 'non-concurrent' regressors (the value for a regressor on a specific trial is the same for each timepoint), which could lead to misleading conclusions. While the motivation for using time-varying covariates is useful and supported by previous literature (using fixed-effects models, although not cited in this manuscript), the reanalysis of previous studies does not clearly prove the benefit of using concurrent regressors as opposed to non-concurrent, and some of the results are difficult to interpret.

      Strengths:

      • The motivation for using time-varying covariates is well supported by previous literature using them on fixed-effects models, and here the authors are extending it to mixed-effects models.<br /> • The authors have included this new functionality in their previous open-source R package.

      Weaknesses:

      • The main weakness of this study is that it is not clear what the conceptual or methodological advance of this work is. As it is written, the manuscript focuses on showing how concurrent regressors offer interpretation advantages over non-concurrent regressors. While the benefit of such time-varying regressors is supported by previous literature (e.g., Engelhard et al., 2020), it is not clear whether the examples provided in the current study clearly support the advantage of one over the other, especially in the reanalysis of Machen et al. (2025), where the choice of regressors is confusing. In this specific example, if the question is about speed and reward type, why variables such as latency to reward or a binary 'reward zone vs corridor' (RZ) regressors are used instead of concurrent velocity (or peak velocity - in the case of the non-concurrent model)? Furthermore, if timing from trial start to reward collection is variable, why not align to reward collection, which would help in the interpretation of the signal and comparison between methods? Furthermore, while for the non-concurrent method, the regressors' coefficients are shown, for the concurrent one, what seems to be plotted are contrasts rather than the coefficients. The authors further acknowledge the interpretational difficulties of their analysis.<br /> • Because the relation between behavioral variables and neuronal signal is not instantaneous, previous literature using fixed effects uses, for example, different temporal lags, splines, and convolutional kernels; however, these are not discussed in the manuscript.<br /> • From the methods, it seems that in the concurrent version of fastFMM, both concurrent and non-concurrent regressors can be included, but this is not discussed in the manuscript.<br /> • The methodological advance is not clearly stated, apart from inputting into fastFMM a 3D matrix of regressors x trial x timepoint, instead of a 2D matrix of regressors x trial.<br /> • This manuscript is neither a clear demonstration of the need for concurrent variables, nor a 'tutorial' of how to use fastFMM with the added extension.

    5. Author response:

      Common responses:

      We thank the editors for considering our paper and the reviewers for their thoughtful and detailed feedback. Based on the comments, we will revise our manuscript to better describe how our approach differs from modeling strategies that are common in the field. We also aim to elaborate on the advantages of fastFMM and what scientific questions it is designed to answer. Finally, we will provide more background on our example analyses and the interpretation of the results.

      Within this response, “within-trial timepoints”, “time-varying predictors/behaviors”, and “signal magnitude” are used as specific examples of the general concepts of functional domain”, “functional co-variates”, and “functional outcome”, respectively. To make statements or examples more concrete, we may use the former neuroscience-specific terms when making general claims about functional models.

      - ncFLMM, cFLMM: non-concurrent or concurrent functional linear mixed models.

      - FUI: fast univariate inference. An approximation strategy to perform FLMM Cui et al. (2022).

      - fastFMM the R package that implements FUI.

      - CI confidence interval.

      Before specific line-by-line responses, we provide a brief comparison between cFLMM and fixed effects encoding models. All three reviewers suggested that fixed effects models could be an existing alternative to cFLMM (Reviewer 1 (1B), Reviewer 2 (2C), Reviewer 3 (3A)). Their shared comments highlight that our revision should articulate the advantages and applications of cFLMM relative to existing analysis strategies.

      Functional regression methods like cFLMM produce functional coefficient estimates that quantify how the magnitude of predictor-signal associations evolve across an ordered functional domain such as within-trial timepoints. Standard scalar outcome regression methods, like the GLMs specified in Engelhard et al. (2019), model these associations and their corresponding coefficients as fixed across the functional domain. While GLM encoding models may include time-varying predictors, these analysis strategies do not model the predictor–signal association as changing over the functional domain.

      Moreover, encoding models are less suited to hypothesis testing in clustered or longitudinal settings (e.g., repeated-measures datasets) and yield regression coefficient estimates that are only interpretable with respect to the units of the basis functions. In contrast, cFLMM provides time-varying coefficient estimates that are interpretable as statistical contrasts in terms of the original variables and produces hypothesis tests in clustered settings. cFLMM can be applied to datasets that define covariates in terms of the same flexible representations of covariates used in encoding models; this is a modeling choice rather than a methodological characteristic.

      The remainder of this provisional author response will respond to reviewers’ concerns line-by-line, approximately in the order they appear.

      Reviewer #1 (Public review):

      We thank Reviewer 1 for their comments, especially their efforts to provide first-hand experience with loading and applying fastFMM. We hope that recent improvements to fastFMM’s public release and vignettes address Reviewer 1’s concerns about ease-of-use.

      (1A) Overall, while they make a compelling case that this approach is less biased and more insightful, the implementation for many experimentalists remains challenging enough and may limit widespread adoption by the community.

      We believe the reviewer may have experimented with an old version of fastFMM, so their experience may not reflect recent rewrites and improvements. fastFMM v1.0.0+ is now stable, validated on CRAN, and contains new example data and step-by-step tutorials. We designed fastFMM’s model-fitting code to be similar to common GLM packages in R to reduce the learning curve for new users.

      (1B) …a clearer presentation of how common implementations in the field are performed (i.e. GLM) and how one could alternatively use the cFLMM approach would help.

      We will provide a clearer description of existing methods in the revised manuscript. Briefly, inference with fastFMM can accommodate large datasets that contain clustered data, repeated measures, or complex hierarchical effects, e.g., experiments with multiple animals and multiple trials per animal. When encoding models are fit to each cluster (e.g., animal, neuron) separately, we are not aware of a principled method to pool these cluster-specific models together to quantify uncertainty or yield an appropriate global hypothesis test.

      Reviewer #2 (Public review):

      Reviewer 2’s thoughtful feedback helped structure our points in the common response above, which we will refer to when applicable. In our response, we aim to clarify the problems that cFLMM solves and characterize the advantages in interpretability.

      (2A) The aim of incorporating variables that change within trial into this framework is interesting, and the technical implementation appears to be rigorous. However, I have some reservations as to whether the way in which variables that change within trial have been integrated into the analysis framework is likely to be widely useful, and hence how impactful the additional functionality of cFLMM relative to the previously published FLMM will be.

      We hope that the common response addresses these concerns. We were motivated to provide a concurrent extension of fastFMM based on our experience with statistical consulting in neuroscience research. Questions that benefit from a functional approach are common and often not adequately modeled with a non-concurrent approach, such as the variable trial length analysis we describe below.

      (2B) It is less clear that this approach makes sense for variables that change within trial…This partitioning of variance in the predictor into a between-trial component whose effect on the signal is modeled, and a within-trial component whose effect on the signal is not, is artificial in many experiment designs, and may yield hard to interpret results.

      We thank Reviewer 2 for highlighting a point that we did not adequately explain and that we will address further in the revision. The pointwise and joint CIs estimated by fastFMM account for uncertainty in the coefficient estimates due to variation in the predictors across within-trial timepoints. cFLMM targets a statistical quantity, or estimand, that is defined by trial timepoint specific effects, so the first step of our estimation strategy fits separate pointwise mixed models. However, models from every within-trial timepoint are then combined to calculate uncertainty and smooth the coefficient estimates. Thus, the widths of the pointwise and joint CIs depend on the estimated between-timepoint covariance and a smoothing penalty. Loewinger et al. (2025a) provides further details in Appendices 2 and 3, describing the covariance structure and detailing the power improvements of FUI compared to multiple-comparisons corrections.

      Other functional regression estimation strategies jointly fit the entire model with a single regression, e.g., functional generalized estimating equations Loewinger et al (2025b). However, these methods use basis expansions of the coefficients. In contrast, the encoding models mentioned in 2C below and Reviewer 3 (3A) apply basis-expansions of the covariates, and the resulting model does not capture how signal–covariate associations evolve across some functional domain. Although the first stage in the fastFMM approach fits pointwise linear models, this is only one of three steps in the estimation strategy. fastFMM yields coefficient estimates comparable to those that would be obtained from functional regression estimation strategies that jointly estimate the functional coefficients in a single regression. We mention this to distinguish between the target statistical quantity (functional coefficients) and the estimation strategy (pointwise vs. joint).

      (2C) …an alternative approach would be to run a single regression analysis across all timepoints, and capture the extended temporal responses to discrete behavioural events by using temporal basis functions convolved with the event timeseries. This provides a very flexible framework for capturing covariation of neural activity both with variables that change continuously such as position, and discrete behavioural events such as choices or outcomes, while also handling variable event timing from trial-to-trial.

      Our understanding is that the suggested approach aims to quantify the association between the outcome and within-trial patterns in covariates. This is a great question and we will incorporate a discussion of this into the revision. However, temporal basis functions convolved with the covariate time series cannot directly characterize these relationships. Encoding models can detect the contribution of predictors to neural signals while remaining agnostic to the precise relationship, but this flexibility can come at the cost of interpretability. The coefficients of the convolutions may not be translatable into a clear statistical contrast in terms of the original covariates.

      In our paper, we provide examples of cFLMM models with simple signal-covariate relationships. The coefficient estimates quantify the expected change in signal given a one unit change in the original predictors. Let 𝑌(𝑠) be the outcome and 𝑋(𝑠) be some covariate at within-trial timepoint 𝑠. For brevity, we will suppress subject/trial indices and random effects in the following notation. The coefficient at time point 𝑠 can be captured by the generic mean model

      𝔼[𝑌(𝑠) ∣ 𝑋(𝑠) = 1] − 𝔼[𝑌 (𝑥)|𝑋(𝑠) = 0].

      In contrast, the change in signal associated with patterns in within-trial covariates can be written as

      𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 1] − 𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 0]

      for all pairs of timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. While simple lagged or offset outcome-predictor associations can be incorporated as covariates in cFLMM, the approach does not capture all within-trial timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. Encoding models also do not target the above estimand. Instead, a full function-on-function regression could estimate the above. This topic can be incorporated into our revision and may be a future line of inquiry.

      (2D) In the Machen et al. data…From the resulting beta coefficient timeseries (Figure 3C) it is not straightforward to understand how neural activity changed as the subject approached and then received the reward. A simpler approach to quantify this, which I think would have yielded more interpretable coefficient timeseries would have been to align activity across trials on when the subject obtained the reward. More broadly, handling variable trial timing in analyses like FLMM which use trial aligned data, can be achieved either by separately aligning the data to different trial events of interest or by time warping the signal to align multiple important timepoints across trials.

      In this experiment, mice waited in a trigger zone, ran through a linear corridor, then received a food reward in the reward delivery zone of either water or strawberry milkshake Machen et al. (2026). Mice received different rewards between sessions but the same reward within all trials of a given session. This design complicated the analysis, as the reward type produced prominent differences in average latency (water: 3.3 seconds, milkshake: 2.0 seconds). The authors wanted to disentangle whether mean differences in the signal across reward types reflected differences in motivation to obtain the reward or differences in reaction to reward receipt.

      We agree that performing a reward-aligned analysis would be an intuitive approach to visualize the differences in average signal for mice that received milkshake compared to water. In fact, we provide a ncFLMM reward-aligned analysis in Figure S1 of Machen et al. (2025). We will add this analysis to the revision and thank the reviewer for the suggestion. We emphasize, however, that this method answers a different question. It does not identify how the signal change associated with receiving the milkshake evolves with respect to latency, especially if the relationship is non-linear. Time warping faces similar obstacles in this setting, especially since sufficiently flexible curve registration can induce similarity due purely to noise. Generally, time warping does not lend itself to hypothesis testing as it is unclear how to propagate uncertainty from the time warping model into final hypothesis tests.

      We believe cFLMM is an appropriate choice for the specific question, and we will revise the manuscript to better reflect its advantages. The functional coefficient estimates in Figures 3C-iii and 3C-iv provide insights that are not possible to derive from the proposed alternatives. For example, we can infer that for short latencies, we do not see a significant difference in signal magnitude for mice receiving water and mice receiving the milkshake. However, for latencies longer than around 2 seconds, receiving the milkshake is associated with an additional positive change in signal. We agree that we should make Figure 3C and the accompanying discussion more clear and thank Reviewer 2 for their feedback on interpretation.

      Reviewer 3 (Public review):

      (3A) …it is not clear what the conceptual or methodological advance of this work is. As it is written, the manuscript focuses on showing how concurrent regressors offer interpretation advantages over non-concurrent regressors. While the benefit of such time-varying regressors is supported by previous literature (e.g., Engelhard et al., 2020), it is not clear whether the examples provided in the current study clearly support the advantage of one over the other…

      We assume Reviewer 3 is referencing “Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons Engelhard et al. (2019). We hope that the Common response sufficiently contrasts the settings where each approach can be applied. Because these models have different goals and assumptions, they are appropriate for answering different questions.

      (3B) In this specific example, if the question is about speed and reward type, why variables such as latency to reward or a binary “reward zone vs corridor” (RZ) regressors are used instead of concurrent velocity (or peak velocity - in the case of the non-concurrent model)? Furthermore, if timing from trial start to reward collection is variable, why not align to reward collection, which would help in the interpretation of the signal and comparison between methods? Furthermore, while for the non-concurrent method, the regressors' coefficients are shown, for the concurrent one, what seems to be plotted are contrasts rather than the coefficients. The authors further acknowledge the interpretational difficulties of their analysis.

      Thank you for pointing out that we were not clear. This was mentioned by multiple reviewers and highlights the need to elaborate on our motivation in the revision. In this example, we wanted to investigate the change in signal-reward association as a function of within-trial timepoints, not the association between instantaneous velocity and the signal. “Slow” or “fast” means “mouse with below or above average latency”. We ask you to please refer to Reviewer 2 (2C) where we discuss why event alignment is an insufficient correction.

      The functional coefficient estimates in Figure 3C are interpreted as contrasts because the fixed effect coefficients capture the difference in expected signal between strawberry milkshake and water along the functional domain. An advantage of cFLMM is that it is easy to specify models in which the coefficients correspond to interpretable contrasts of the signal across conditions. The coefficient estimate shown in Figure 3B-ii also corresponds to a contrast because the estimates capture the difference in mean signal from strawberry milkshake and water. Equations (7) and (8) in the section “Materials and methods” and sub-section “Variable trial length analysis” provide additional details on the fixed effect coefficients. Based on this confusion, we will convert the two 1 x 4 sub-plots of 3B and 3C into two 2 x 2 sub-plots to avoid unintended direct comparisons.

      To contextualize how we “acknowledge the interpretational difficulties of [our] analysis”, we stated that a non-concurrent FLMM attempting to control for a time-based covariate is difficult to interpret. The concurrent FLMM provides a straightforward interpretation directly related to the question of interest, which we discuss above in Reviewer 2 (2D).

      (3C) Because the relation between behavioral variables and neuronal signal is not instantaneous, previous literature using fixed effects uses, for example, different temporal lags, splines, and convolutional kernels; however, these are not discussed in the manuscript.

      Thank you for this suggestion. All three reviewers raised this topic (see Reviewer 1 (1B), Reviewer 2 (2C), and the Common responses), and we will incorporate our response in the revision.

      (3D) From the methods, it seems that in the concurrent version of fastFMM, both concurrent and non-concurrent regressors can be included, but this is not discussed in the manuscript.

      This is an important point that we mentioned implicitly. In our cFLMM specification of the Jeong et al. (2022) model, “we incorporated trial-specific covariates for trial number and session, modeling these as increasing numerical values rather than identical categorical variables”, which are also plotted in Appendix 3. In Box 1, “if the functional covariate of interest is a scalar constant across the domain, the models fit by the concurrent and non-concurrent procedure are identical”. We will explicitly point out that cFLMM can perform inference on combinations of functional and constant covariates.

      (3E) The methodological advance is not clearly stated, apart from inputting into fastFMM a 3D matrix of regressors x trial x timepoint, instead of a 2D matrix of regressors x trial.

      Prior to our work described in this Research Advance, it was not obvious that the existing approximation approach in fastFMM could be generalized to cFLMM. During the writing of the article, a fastFMM user reached out for help with producing pseudo-concurrent FLMMs by duplicating rows in a nonconcurrent model, which both underscores the unmet need for cFLMMs and the difficulty in fitting them with available tools.

      The “under-the-hood” differences are described in Appendix 4. Concurrent FLMM with fast univariate inference was theoretically possible as early as Cui et al. (2022). The univariate step was straightforward, but guaranteeing “fast” and “inference” was not. We needed to verify, for example, that the method-of-moments estimation of the random effects covariance matrix generalized to cFLMM, which is not a trivial step. Characterizing whether the method achieved asymptotic coverage required extensive simulation studies (Figure 4, Appendix 2). Future work may focus on fully characterizing the asymptotic convergence in high noise or high complexity regimes.

      (3F) This manuscript is neither a clear demonstration of the need for concurrent variables, nor a 'tutorial' of how to use fastFMM with the added extension.

      We hope that the Common responses clarifies how cFLMM compares to existing approaches and fills a gap in the data analysis landscape for neuroscience. The fastFMM R package vignettes contain example analyses, and we intend for these files to be work in tandem with the manuscript. To provide more guidance for interested analysts, we can explicitly reference these tutorials within the revision.

      Planned revisions

      The following summary is not exhaustive.

      Writing additions:

      Per 1B, 2C and 3A, the Common responses will be incorporated in the revision.

      Per 2B, we will discuss function-on-function regression and explore how to estimate statistical contrasts for complex within-trial relationships. Relatedly, we will clarify that the CIs in fastFMM are constructed using an estimate of the within-trial covariance of the predictors, and clarify the definition of pointwise and joint CIs.

      Per 3D, we will explicitly state that concurrent FLMMs can include covariates that are constant over within-trial timepoints.

      Though we cannot prescribe a universally correct model selection procedure, we will mention that AIC, BIC, and other summary statistics can inform the specification of the random effects.

      Analysis modifications:

      Parts of Appendix 3 may be included in Figure 2 to directly address the question investigated by Jeong et al. (2022) and Loewinger et al (2024).

      When discussing Machen et al. (2025) data, the supplementary analysis with reward-aligned ncFLMM models might be added to clarify the ncFLMM/cFLMM difference.

      Per \ref{rvw2:encoding}, the additional analysis aimed at disentangling latency and reward in Machen et al.’s variable trial length data may be incorporated as an additional sub-figure in Figure 3.

      Aesthetic changes:

      Figure 3 will be reorganized to avoid unintended direct comparisons between the coefficients of the non-concurrent and concurrent model.

      Citations for Machen et al. (2026) will be updated to reflect publication of the preprint.

      The version number for fastFMM will be updated.

      References

      Cui E, Leroux A, Smirnova E, Crainiceanu CM. Fast Univariate Inference for Longitudinal Functional Models. Journal of Computational and Graphical Statistics. 2022; 31(1):219–230. https://doi.org/10.1080/10618600.2021.1950006, doi: 10.1080/10618600.2021.1950006, pMID: 35712524.

      Engelhard B, Finkelstein J, Cox J, Fleming W, Jang HJ, Ornelas S, Koay SA, Thiberge SY, Daw ND, Tank DW, Witten IB. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature. 2019 Jun; 570(7762):509–513. https://www.nature.com/articles/s41586-019-1261-9, doi: 10.1038/s41586-019-1261-9.

      Jeong H, Taylor A, Floeder JR, Lohmann M, Mihalas S, Wu B, Zhou M, Burke DA, Namboodiri VMK. Mesolimbic dopamine release conveys causal associations. Science. 2022; 378(6626):eabq6740. https://www.science.org/doi/abs/10.1126/science.abq6740, doi: 10.1126/science.abq6740.

      Loewinger G, Cui E, Lovinger D, Pereira F. A statistical framework for analysis of trial-level temporal dynamics in fiber photometry experiments. eLife. 2025 Mar; 13:RP95802. doi: 10.7554/eLife.95802.

      Loewinger G, Levis AW, Cui E, Pereira F. Fast Penalized Generalized Estimating Equations for Large Longitudinal Functional Datasets. ArXiv. 2025 Jun; p. arXiv:2506.20437v1. https://pmc.ncbi.nlm.nih.gov/articles/PMC12306803/.

      Machen B, Miller SN, Xin A, Lampert C, Assaf L, Tucker J, Herrell S, Pereira F, Loewinger G, Beas S. The encoding of interoceptive-based predictions by the paraventricular nucleus of the thalamus D2R+ neurons. iScience. 2026 Jan; 29(1):114390. doi: 10.1016/j.isci.2025.114390.

    1. eLife Assessment

      This manuscript explores the dynamic behaviors of Pol II and Pol III puncta that encompass the SL1 and 5S genes, following up on the authors' prior studies on ATTF-6. The authors show that ATTF-6 is required for RNA Pol II but not RNA Pol III foci, demonstrating that within the gene cluster, the regulation of RNA Pol II and RNA Pol III remain distinct from each other. The study is useful for analyzing understudied gene families, but it is incomplete and needs additional edits and experiments.

    2. Reviewer #1 (Public review):

      This study examines how two types of RNA polymerases organize themselves within the nucleus of C. elegans cells, building directly on the same group's prior publication and largely functioning as a companion to that earlier work. While the observation that the two polymerases occupy distinct but neighboring locations at the same genomic region adds nuance to our understanding of gene cluster regulation, the manuscript would benefit from more clearly delineating which findings are new versus continuations of previously published work. Protein localization, gene expression effects, and genomic mapping data appear to overlap substantially with the earlier paper.

      The condensate claims would also benefit from additional experimental support. Demonstrating fusion events and concentration-dependent assembly are now standard expectations in the field. Additionally, one measurement reported appears inconsistent with a condensate model, warranting further discussion.

      Some findings would benefit from more interpretive context. Why does polymerase clustering fluctuate with the cell cycle? What are the functional implications of ATTF-6 being required for one polymerase's foci but not the others?

      The elevated-temperature experiments are intriguing but difficult to interpret, as the temperature used is well-established as a broad stress trigger in this organism. Acknowledging this and considering additional controls would help clarify whether the observed effects are specific to foci behavior.

      Finally, the manuscript would be strengthened by adding quantification to some figures and revising the model diagram to better reflect what the current data support.

    3. Reviewer #2 (Public review):

      Summary:

      The researchers analyzed GFP-tagged RNA Pol II and RNA Pol III catalytic subunits RPB-1 and RPC-1, and showed that they form foci in early embryo nuclei that overlap with the 5S rDNA loci and foci by ATTF-6-RFP. They showed foci are round, dissolve upon hexanediol incubation, and are detected during S phase, removed during, and re-established after mitosis. The researchers performed FRAP and showed fast exchange of polymerases, unlike ATTF-6. They show that, unlike RNA Pol III, RNA Pol II foci are dependent on ATTF-6 and temperature sensitive. The researchers propose that the two polymerases form distinct foci with different biochemical dependencies. This study shows that, although closely located within a gene cluster, the regulation of RNA Pol II and RNA Pol III is independent.

      Strengths:

      The researchers provide high-quality images that support the main results. The researchers' use of auxin-inducible and RNAi depletion work is validated in the same embryos by fluorescent analysis of the target protein.

      Weaknesses:

      Although the researchers propose the hypothesis that the RNA Pol II and RNA Pol III form distinct condensates, alternative hypotheses are not presented, and the criteria by which the other possibilities are ruled out are not discussed.

    4. Reviewer #3 (Public review):

      Wang et al demonstrate that RNA polymerase II and RNA polymerase III form distinct nuclear foci at the 5S rDNA-SL1 gene cluster in C. elegans. By ChIP, Pol II is highly enriched at the SL1 gene, whereas Pol III is enriched at the 5S rRNA gene. Both polymerase foci are spherical, show rapid exchange in FRAP experiments, and assemble in a cell-cycle-dependent manner, predominantly during S phase. The transcription factors ATTF-6 and SNPC-4 are required for the formation of Pol II foci but are dispensable for Pol III foci. Pol II foci, but not Pol III foci, are temperature-sensitive and dissolve upon heat stress; dissolution correlates with a strong reduction of SL1 transcription, whereas 5S rRNA levels remain largely unaffected.

      Overall, this is a clean, well-organized, and well-controlled study, and I only have two comments.

      (1) Roundness measurements, FRAP, and sensitivity to 1,6-hexanediol are indicative but not sufficient to show that these foci are condensates. They could, for example, also be scaffolded /chromatin-anchored assemblies (see https://pubmed.ncbi.nlm.nih.gov/36526633/). Please either provide better evidence or rephrase/tone down the condensate statements.

      (2) Image quantification is only provided for Figure 5, but should also be reported for Figures 6 and 7. In addition to the foci number, also, e.g., intensity over background (similar to partition coefficient) should be quantified.

    5. Author response:

      Reviewer #1:

      We appreciate the reviewer’s suggestions. In the revision, we will clarify which results are new and better position this work relative to our earlier publication. We will also expand the discussion of the functional implications of polymerase clustering and its cell-cycle dynamics.

      Regarding the condensate interpretation, we agree that the current evidence is suggestive but not definitive. In the revised manuscript, we will clarify how our measurements relate to commonly used criteria for condensate assemblies and revise the text to avoid overstating this interpretation. We will also add quantification to additional figures and revise the model diagram to more accurately reflect the conclusions supported by the data.

      Reviewer #2:

      We thank the reviewer for the positive assessment of the imaging quality. We agree that the manuscript would benefit from a broader discussion of possible models for the observed polymerase foci. In the revision, we will expand the discussion to include alternative interpretations, such scaffolded assemblies as suggested by the reviewer 3, and further clarify the properties of the RNA Pol II and RNA Pol III foci.

      Reviewer #3:

      We thank the reviewer for the positive evaluation of the study and the helpful suggestions. We agree that the current evidence is indicative but not sufficient to definitively demonstrate condensate formation. In the revision, we will revise the language and discuss alternative interpretations, including scaffolded assemblies. We will also provide additional quantifications for the relevant figures.

      Overall, we appreciate the reviewers’ suggestions and believe that the planned revisions will improve the clarity and impact of the manuscript.

    1. eLife Assessment

      This fundamental work uncovers an unexpected lysosomal function for NINJ2 and links it to ferroptosis and cancer biology. The evidence supporting the conclusions appears to be convincing. Additional mechanistic clarification, particularly around the NINJ2-LAMP1 interaction and ferroptosis specificity, will further strengthen the manuscript. This work will be of general interest to the community of ferroptosis and cancer biology.

    2. Reviewer #1 (Public review):

      Summary:

      This study reports a novel and potentially impactful role for NINJ2 in maintaining lysosomal integrity and regulating cellular susceptibility to ferroptosis. The authors demonstrate that NINJ2 localizes to lysosomes and interacts with LAMP1, a key lysosomal membrane glycoprotein involved in sensing lysosomal stress. Loss of NINJ2 increases lysosomal membrane permeabilization (LMP), resulting in selective leakage of lysosomal contents, including labile iron, into the cytosol. The authors further show that NINJ2 deficiency reduces the expression of ferritin storage proteins, thereby sensitizing cells to ferroptosis induced by RSL3 and erastin. Collectively, the work proposes a mechanistic link between NINJ2-mediated control of LMP, iron homeostasis, and ferroptotic vulnerability, with potential relevance to cancer biology.

      Strengths:

      This study identifies a novel role for NINJ2 in regulating lysosomal integrity and ferroptosis and establishes a mechanistic link between lysosomal membrane permeabilization, iron homeostasis, and ferroptotic sensitivity, with potential translational relevance in cancer.

      Weaknesses:

      The results overall support the authors' conclusions and provide a plausible mechanistic framework; however, additional quantification of Western blot data and further discussion of mechanistic questions would strengthen the study.

      The findings are likely to have a broad impact by linking lysosomal integrity to ferroptosis and iron homeostasis, both of which are relevant to cancer biology and therapeutic targeting.

    3. Reviewer #2 (Public review):

      This manuscript, "Nerve Injury-Induced Protein 2 preserves lysosomal membrane integrity to suppress ferroptosis", identifies a previously unrecognized function of NINJ2 as a regulator of lysosomal membrane integrity and iron homeostasis, thereby suppressing ferroptosis. The authors demonstrate that NINJ2 localizes to lysosomes, interacts with LAMP1, limits lysosomal membrane permeabilization (LMP), stabilizes ferritin, and protects cells from ferroptotic cell death. They further extend these mechanistic findings to human cancer datasets, showing co-overexpression and positive correlation of NINJ2 with ferritin genes in iron-addicted cancers.

      Overall, the study is conceptually interesting, technically solid, and integrates cell biology, iron metabolism, and ferroptosis in a coherent framework. The work expands the functional repertoire of the Ninjurin family beyond plasma membrane rupture and inflammation, which will be of interest to researchers in cell death, lysosome biology, and cancer metabolism.

      Strengths:

      (1) The identification of NINJ2 as a lysosome-associated protein that suppresses ferroptosis represents a meaningful advance beyond its previously described roles in inflammation, pyroptosis, and tumorigenesis.

      (2) The work distinguishes NINJ2 functionally from NINJ1, reinforcing the idea that structurally related Ninjurins have divergent membrane-related roles.

      (3) The study presents a logically connected pathway:<br /> NINJ2 loss → LMP → labile iron increase → ferritin degradation → ferroptosis sensitization, which is well supported by the data.

      (4) The link between LAMP1, ferritin turnover, and ferroptosis is particularly compelling and timely given recent interest in lysosomal contributions to ferroptotic signaling.

      (5) The authors use confocal microscopy, proximity ligation assays, biochemical IPs, iron measurements, protein half-life analyses, ferroptosis assays, and TCGA-based analyses, providing convergent evidence for their model.

      (6) Use of two distinct cell lines (MCF7 and Molt4) strengthens generalizability.

      (7) The integration of cancer expression datasets linking NINJ2 with ferritin expression in hepatocellular and breast carcinomas enhances translational relevance.

      (8) Assigning NINJ2 a lysosomal protective function, distinct from NINJ1-mediated plasma membrane rupture, is novel.

      (9) Linking NINJ2 to ferroptosis regulation via lysosomal iron handling, rather than canonical GPX4 or system Xc⁻ pathways, is also novel, along with proposing a NINJ2-LAMP1-ferritin axis as a buffering mechanism against iron-driven lipid peroxidation.

      (10) These insights are not incremental; they reframe how NINJ2 may function at the intersection of membrane biology, iron metabolism, and regulated cell death.

      Areas for improvement:

      While the study is strong, several issues should be addressed for mechanistic depth and general relevance.

      (1) Although NINJ2 is shown to interact with LAMP1 and LAMP1 knockdown rescues ferritin levels, it remains unclear whether the NINJ2-LAMP1 interaction is required for lysosomal protection. The authors could:<br /> a) Map the NINJ2 domain required for LAMP1 interaction and test whether an interaction-deficient mutant fails to protect against LMP and ferroptosis.<br /> b) Rescue NINJ2 KO cells with wild-type versus mutant NINJ2 to establish causality.

      (2) The conclusion that NINJ2 suppresses ferroptosis relies primarily on RSL3 and Erastin sensitivity. A direct assessment of ferroptosis would hence the study, such as:<br /> a) Include ferroptosis rescue experiments using ferrostatin 1 or liproxstatin 1.<br /> b) Assess lipid peroxidation directly (e.g., C11 BODIPY staining) to strengthen the ferroptosis claim.

      (3) The manuscript discusses lysosomal ferritin degradation but does not directly examine NCOA4, a central mediator of ferritinophagy. It would be good to:<br /> a) Test whether NCOA4 knockdown rescues ferritin loss and ferroptosis sensitivity in NINJ2 KO cells.<br /> b) This would clarify whether NINJ2 acts upstream of canonical ferritinophagy pathways or via an alternative mechanism.

      (4) The study is entirely cell-based, despite references to inflammatory and tumor phenotypes in Ninj2-deficient mice. While not strictly required, even limited in vivo validation (e.g., ferroptosis markers or iron accumulation in existing Ninj2 KO tissues) would substantially strengthen the manuscript.

      (5) Finally, most imaging data (e.g., Galectin 3/LAMP1 colocalization, PLA signals) and immunoblot data are presented qualitatively. The authors should provide the qualifications of Western blots and other measurements.

    4. Author response:

      Reviewer #1:

      We appreciate the reviewer’s insightful suggestions. In the revised manuscript, we will provide quantitative analysis of Western blot data throughout the study to improve data robustness and reproducibility. In addition, we will expand the “Discussion” session to address the following points raised by the reviewer #1: (1) Potential mechanisms underlying the regulation of LAMP1 transcript levels by NINJ2; (2) Whether Ninjurin1 may play a similar role in regulating lysosomal membrane permeabilization (LMP); (3) The potential clinical implications of our findings, particularly in relation to cancer progression and therapeutic targeting.

      Reviewer #2:

      We thank the reviewer for the insightful and constructive suggestions, which would further deepen the mechanistic understanding of the NINJ2-LAMP1 pathway and its role in ferroptosis regulation. To address the reviewer’s concerns, we will clarify the interpretation of our findings, add quantitative analyses where appropriate, and expand the Discussion to acknowledge these important mechanistic questions and future research directions. Specifically, we will revise the Statistical Analysis section to clearly describe the statistical methods used, including whether corrections for multiple comparisons were applied where appropriate. We will further discuss the potential interaction domain(s) between NINJ2 and LAMP1. We will also discuss the potential role of NCOA4, a central mediator of ferritinophagy, in the NINJ2-FTH1-LAMP1 pathway. Finally, we will include a schematic model summarizing the proposed NINJ2-LAMP1-iron-ferroptosis axis to better illustrate the working model of our study.

    1. eLife Assessment

      This important study addresses the long-debated hypothesis that humans preferentially choose partners with dissimilar immune genes, using data from a small-scale society that allows comparison between arranged and self-chosen partnerships. Across multiple analyses controlling for genome-wide relatedness and examining functional immune diversity, the authors find no evidence of HLA/MHC-based (dis)assortative mating, suggesting that immune gene variation has limited influence on mate choice in this relatively homogeneous population and that the observed patterns instead reflect selection acting directly on immune loci. While the strength of the evidence is compelling for this population, several conclusions rely on indirect reconstruction methods and imputed data for a very complex region of the genome, which may limit how firmly some claims can be supported.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to test whether human mate choice is influenced by HLA similarity while accounting for genome-wide relatedness, using the Himba as an evolutionarily relevant small-scale society population, unique among most HLA-mate choice studies. By comparing self-chosen ("love") and arranged marriages and using NGS-based 8-locus HLA class I and II sequences and genome-wide SNP data, the authors ask whether partners who freely choose each other are more HLA-dissimilar than those paired through social arrangements or random pairs. They further extend their work by examining functional differences in peptide-binding divergence among pairs and predicted pathogen recognition in potential offspring.

      Strengths:

      This study has many strengths. The most obvious is their ability to test for HLA-based mate choice in the Himba, a non-European, non-admixed, small-scale society population, the type of population that has been missing, in my opinion, from the majority of HLA mate choice studies. While Hedrick and Black (1997) used a similarly evolutionarily relevant remote tribe of native South Americans, they only considered 2 class I loci (HLA-A and HLA-B) at the first typing field (serological allele group) and did not have data for genome-wide relatedness. The Himba are also unique among previously studied populations because they have both socially arranged and self-chosen partnerships, so the authors could test if freely-chosen partners had lower MHC-similarity than assigned or randomly chosen partners.

      Another key strength of the study was the relatively large sample size (HLA allele calls from 366 individuals, 102 unrelated) and 219 individuals with HLA data, whole genome SNP data, and involved in a partnership.

      The study was also unique among HLA-mate choice studies for comparing peptide binding region protein divergence (calculated as the Grantham distance between amino acid sequences) among partner types and randomly generated pairs. This was also the first time I have seen a study use peptide binding prediction analysis of relevant human pathogens for potential offspring among partners to test if there would be a pathogen-relevant fitness benefit of partner selection.

      Weaknesses:

      My main concerns relate to the reliance on imputed HLA haplotypes and on IBD-based metrics in a region of the genome where both approaches are known to be problematic.

      First, several key results depend on HLA haplotypes inferred through imputation rather than directly observed sequence data. The authors trained HIBAG imputation models on Himba SNP data across the full 5 Mb HLA region using paired HLA allele calls from target capture sequencing (L251-253). However, the underlying SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, meaning that both SNP discovery and subsequent imputation depend on the haplotypes represented in that reference panel. As a result, the imputation framework is likely biased toward common haplotypes shared between the Himba and Yoruba populations, while rare or Himba-specific HLA alleles are less likely to be imputed accurately or at all. This limitation has been noted previously for HLA imputation, particularly for novel or low-frequency variants and for populations that are poorly represented in reference panels. While the authors compare (first-field) imputed alleles to sequenced alleles to assess imputation accuracy, this validation step itself may be biased toward the same common haplotypes that are easiest to impute. This becomes especially problematic if IBD is inferred using imputed haplotypes, because haplotype sharing would then primarily reflect common, reference-supported haplotypes, while true population-specific variation would be effectively invisible. In this scenario, downstream estimates of IBD sharing may be inflated for common haplotypes and deflated for rare ones, potentially biasing conclusions about haplotype sharing, selection, and mate choice at the HLA region.

      Second, the interpretation of excess identity-by-descent (IBD) sharing in the HLA region is difficult given the well-documented genomic properties of this locus. The classical HLA region is highly gene-dense, structurally complex, and characterized by extreme heterogeneity in recombination rates, with pronounced hot- and cold-spots (Miretti et al. 2005; de Bakker et al. 2006, reviewed in Radwan et al. 2020). Elevated IBD in such regions can arise from low recombination, background selection, or demographic processes such as bottlenecks, all of which can mimic signals of recent positive selection. While the authors suggest fluctuating or directional selection, extensive haplotype sharing is also consistent with long-term balancing selection at the MHC (Albrechtsen et al. 2010) or recent demographic history in this population.

      Beyond these main issues, there are several additional concerns that affect interpretation. Sample sizes and partnership counts are sometimes unclear; some figures would benefit from clearer scaling (Figure 1) and annotation (Figures S6 and S7), and key methodological choices (e.g., treatment of DRB copy number variation, no recombination correction in IBD calling) require further explanation. Finally, some conclusions, particularly those invoking optimality or specific selective mechanisms, are not directly tested by the analyses presented and would benefit from more cautious framing.

    3. Reviewer #2 (Public review):

      Summary:

      Evidence for the influence of MHC on mate choice in humans is challenging, as social structures and norms often confound the power of studying populations. This study uses an unusual, diverse, but relatively isolated population that allows a direct comparison of arranged and chosen partners to determine if MHC diversity is increased when choice drives mate choice. Overall, the authors use a range of genetic analyses to determine individual relationships alongside different measures of MHC diversity and potential selection pressures. The overall finding that there is no heterozygous dissimilarity difference between arranged and chosen partners. There is evidence of positive selection that may be a stronger driver, or at least it may mask other selection forces.

      Strengths:

      A rare opportunity to study human mate choice and genetic diversity. An excellent range of data and analysis that is well applied, and all results point to the same conclusion.

      Overall, this is a very well-written and concise paper when considering the significant amount of data and excellent analysis that has been undertaken.

      Weaknesses:

      (1) For the type of samples and data available, none are obvious.

      (2) Although this paper is clearly focused on humans, I was expecting more discussion around the studies that have been undertaken in animals. It is likely that between populations and species, there are different pressures that have driven the MHC evolution, but also mate choice.

      (3) The peptide presentation based on pathogen genomes is interesting but usually not significant. I wondered if another measure of MHC haplotype diversity to complement this would be the overall repertoire of peptides that could be presented, pathogen-based or otherwise. There is usually significant overlap in the peptides that can be presented, for example, between HLA-A and HLA-B, and this may reveal more significant differences between the alleles and haplotype frequencies.

    4. Reviewer #3 (Public review):

      The study investigates MHC-related mate choice in humans using a sample of couples from a small-scale sub-Saharan society. This is an important endeavour, as the vast majority of previous studies have been based on samples from complex, highly structured societies that are unlikely to reflect most of human evolutionary history. Moreover, the study controls for genome-wide diversity, allowing for a test of the specificity of the MHC region, as theoretically predicted. Finally, the authors examine potential fitness benefits by analysing predicted pathogen-binding affinities. Across all analyses, no deviations from random pairing are detected, suggesting a limited role for MHC-related mate choice in a relatively homogeneous society. Overall, I find the study to be carefully executed, and the paper clearly written. Nevertheless, I believe the paper would benefit if the following points were considered:

      (1) The authors claim (p. 2, l. 85) that their study is the first to employ a non-European small-scale society. I believe this claim is incorrect, as Hendrick and Black (1997) investigated MHC similarity among couples from South American indigenous populations.

      (2) Regarding the argument that in complex societies, mating with a random individual would already result in sufficient MHC dissimilarity (p. 2, 78), see the paper from Croy et al. 2020, which used the largest sample to date in this research area.

      (3) Dataset. As some relationships are parallel, I assume that certain individuals entered the dataset multiple times. This should be explicitly reported in the Methods. If I understand the analyses correctly, this non-independence was addressed by including individual identity as a random effect in the model - the authors should confirm whether this is the case. I am also wondering to what extent so-called "discovered partnerships" may affect the results. Shared offspring may be the outcome of short or transient affairs and could have a different social status compared with other informal relationships. Would the observed patterns change if these partnerships were excluded from the analyses?

      (4) How many pairs were due to relatedness closer than 3rd degree? In addition, why was 4th degree relatedness used as a threshold in some of the other analyses?

      (5) I was surprised by the exclusion of HIV, given that Namibia has a very high prevalence of HIV in the general population (e.g., Low et al. 2021).

      (6) It appears that age criteria were applied when generating random pairs (p. 8, l. 350). Could the authors please specify what they consider a realistic age gap, and on what basis this threshold was chosen? As these are virtual couples used solely to estimate random variation within the population, it is not entirely clear why age constraints are necessary. Would the observed patterns change if no age criteria were applied?

      (7) I think it would be helpful for readers if the Results section explicitly stated that real couples did not differ from randomly generated pairs. At present, only the comparison between chosen and arranged pairs is reported.

      (8) I appreciate the separate analyses of pathogen-binding properties for MHC class I and class II, given their functional distinctiveness. For the same reason, I would welcome a parallel analysis of MHC sharing conducted separately for class I and class II loci.

      (9) I think the Discussion would benefit from a more detailed comparison with previous studies. In addition, the manuscript does not explicitly address limitations of the current study, including the relatively limited sample size given the extensive polymorphism in the MHC region.

      References:

      Hedrick, P. W., & Black, F. L. (1997). HLA and mate selection: no evidence in South Amerindians. The American Journal of Human Genetics, 61(3), 505-511.

      Croy, I., Ritschel, G., Kreßner-Kiel, D., Schäfer, L., Hummel, T., Havlíček, J., ... & Schmidt, A. H. (2020). Marriage does not relate to major histocompatibility complex: A genetic analysis based on 3691 couples. Proceedings of the Royal Society B, 287(1936), 20201800.

      Low, A., Sachathep, K., Rutherford, G., Nitschke, A. M., Wolkon, A., Banda, K., ... & Mutenda, N. (2021). Migration in Namibia and its association with HIV acquisition and treatment outcomes. PLoS One, 16(9), e0256865.

    5. Author response:

      Reviewer 1 (Public review):

      Summary:

      This study aims to test whether human mate choice is influenced by HLA similarity while accounting for genome-wide relatedness, using the Himba as an evolutionarily relevant small-scale society population, unique among most HLA-mate choice studies. By comparing self-chosen ("love") and arranged marriages and using NGS-based 8-locus HLA class I and II sequences and genome-wide SNP data, the authors ask whether partners who freely choose each other are more HLA-dissimilar than those paired through social arrangements or random pairs. They further extend their work by examining functional differences in peptide-binding divergence among pairs and predicted pathogen recognition in potential offspring.

      Strengths:

      This study has many strengths. The most obvious is their ability to test for HLA-based mate choice in the Himba, a non-European, non-admixed, small-scale society population, the type of population that has been missing, in my opinion, from the majority of HLA mate choice studies. While Hedrick and Black (1997) used a similarly evolutionarily relevant remote tribe of native South Americans, they only considered 2 class I loci (HLA-A and HLA-B) at the first typing field (serological allele group) and did not have data for genome-wide relatedness. The Himba are also unique among previously studied populations because they have both socially arranged and self-chosen partnerships, so the authors could test if freely-chosen partners had lower MHC-similarity than assigned or randomly chosen partners.

      Another key strength of the study was the relatively large sample size (HLA allele calls from 366 individuals, 102 unrelated) and 219 individuals with HLA data, whole genome SNP data, and involved in a partnership.

      The study was also unique among HLA-mate choice studies for comparing peptide binding region protein divergence (calculated as the Grantham distance between amino acid sequences) among partner types and randomly generated pairs. This was also the first time I have seen a study use peptide binding prediction analysis of relevant human pathogens for potential offspring among partners to test if there would be a pathogen-relevant fitness benefit of partner selection.

      Weaknesses:

      My main concerns relate to the reliance on imputed HLA haplotypes and on IBD-based metrics in a region of the genome where both approaches are known to be problematic.

      First, several key results depend on HLA haplotypes inferred through imputation rather than directly observed sequence data. The authors trained HIBAG imputation models on Himba SNP data across the full 5 Mb HLA region using paired HLA allele calls from target capture sequencing (L251-253). However, the underlying SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, meaning that both SNP discovery and subsequent imputation depend on the haplotypes represented in that reference panel. As a result, the imputation framework is likely biased toward common haplotypes shared between the Himba and Yoruba populations, while rare or Himba-specific HLA alleles are less likely to be imputed accurately or at all. This limitation has been noted previously for HLA imputation, particularly for novel or low-frequency variants and for populations that are poorly represented in reference panels. While the authors compare (first-field) imputed alleles to sequenced alleles to assess imputation accuracy, this validation step itself may be biased toward the same common haplotypes that are easiest to impute. This becomes especially problematic if IBD is inferred using imputed haplotypes, because haplotype sharing would then primarily reflect common, reference-supported haplotypes, while true population-specific variation would be effectively invisible. In this scenario, downstream estimates of IBD sharing may be inflated for common haplotypes and deflated for rare ones, potentially biasing conclusions about haplotype sharing, selection, and mate choice at the HLA region.

      We appreciate the reviewer's concern, but would like to clarify two important misunderstandings in this assessment.

      First, the reviewer suggests that our SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, and that IBD inference may therefore be biased toward haplotypes common between the Himba and Yoruba. This is not the case. Our SNP genotype data were generated from the H3Africa and MEGAex genotyping arrays, which incorporated diverse reference variation to minimize ascertainment bias in non-European ancestries. No read mapping to a Yoruba reference genome was involved in SNP discovery or genotyping. The Yoruba 1000 Genomes data were used solely to provide an ancestry-matched recombination map for phasing and IBD calling–this would not bias IBD inference toward common Yoruba haplotypes. The reviewer's concern about imputation-driven inflation of IBD sharing for common haplotypes should not be relevant in our case.

      Second, regarding HLA haplotype resolution: we trained a bespoke HIBAG model directly on the Himba SNP array genotype data paired with ground-truth HLA allele calls from our own targeted HLA capture sequencing. This Himba-specific model was then used to impute HLA alleles from pseudo-homozygous genotypes derived by extracting phased SNP-based haplotypes across the HLA region for the same individuals. In this way we resolved the phase of the HLA allele calls.. To our knowledge, this paired-data approach to individual-level HLA haplotype resolution is novel; existing HLA haplotype resolution tools generally provide only population-level haplotype frequency estimates rather than individual-level phase assignments. We are confident in the reliability of the haplotypes we report. Resolved haplotypes were required to match the known targeted-sequencing HLA allele calls at a minimum of the first field for at least one allele, and both haplotypes could not be assigned to the same allele unless the individual's HLA allele calls were homozygous. Of 722 total haplotypes, 698 were successfully resolved under these criteria. We report results only on these confidently resolved haplotypes.

      Second, the interpretation of excess identity-by-descent (IBD) sharing in the HLA region is difficult given the well-documented genomic properties of this locus. The classical HLA region is highly gene-dense, structurally complex, and characterized by extreme heterogeneity in recombination rates, with pronounced hot- and cold-spots (Miretti et al. 2005; de Bakker et al. 2006, reviewed in Radwan et al. 2020). Elevated IBD in such regions can arise from low recombination, background selection, or demographic processes such as bottlenecks, all of which can mimic signals of recent positive selection. While the authors suggest fluctuating or directional selection, extensive haplotype sharing is also consistent with long-term balancing selection at the MHC (Albrechtsen et al. 2010) or recent demographic history in this population.

      We thank the reviewer for highlighting the difficulty in modeling selection at the HLA - a problem that deserves considerable attention. We acknowledge that demographic processes such as the documented Himba population bottleneck can result in elevated IBD sharing (Swinford et al. 2023, PNAS). However, our comparison of HLA IBD sharing rates against a genome-wide baseline is designed to address this: demographic processes affect all regions of the genome, so if the HLA region maintains elevated IBD sharing significantly above the genome-wide threshold, this provides meaningful evidence for a locus-specific effect beyond demographic history alone.

      We agree with the reviewer that the recombination landscape of the HLA region is complex, but this complexity itself is consistent with the region being a frequent target of selection. Previous HLA analyses have found that at the allele level, frequencies are consistent with balancing selection, while multi-locus haplotype frequencies are consistent with purifying selection and positive frequency-dependent selection (Alter et al., 2017), patterns that contribute to the complex recombination rate heterogeneity observed in the region. Recombination rate can be both a cause of extended haplotypes but also the consequence of selection against combinations of alleles.

      As Alter et al. note, the high levels of linkage disequilibrium observed among HLA alleles serve to limit the amount of diversity within HLA haplotypes, but balancing selection at the allelic level maintains multiple HLA haplotypes at high frequency across populations over long periods of time — so-called "conserved extended haplotypes" as we observe (Supplementary Figures 1 and 9). Regarding the specific selective mechanism, our results are not equally consistent with all forms of balancing selection. Albrechtsen et al. (2010) explicitly modeled overdominant balancing selection and demonstrated that equilibrium overdominance does not produce elevated IBD sharing as we observe — our results are therefore inconsistent with this mechanism. Instead, Albrechtsen et al. conclude that allele frequency change is required to generate elevated IBD, consistent with bouts of directional selection such as negative frequency-dependent or fluctuating positive selection. We will make explicit that while our findings do not support overdominance, they are consistent with these temporally dynamic forms of selection driving periodic allele frequency change at the HLA locus. We will also incorporate local recombination rate into Figure 4 to provide a comparison of local recombination rate across chromosome 6 with the observed areas of elevated IBD sharing.

      Alter, I., Gragert, L., Fingerson, S., Maiers, M., & Louzoun, Y. (2017). HLA class I haplotype diversity is consistent with selection for frequent existing haplotypes. PLoS computational biology, 13(8), e1005693.

      Beyond these main issues, there are several additional concerns that affect interpretation. Sample sizes and partnership counts are sometimes unclear; some figures would benefit from clearer scaling (Figure 1) and annotation (Figures S6 and S7), and key methodological choices (e.g., treatment of DRB copy number variation, no recombination correction in IBD calling) require further explanation. Finally, some conclusions, particularly those invoking optimality or specific selective mechanisms, are not directly tested by the analyses presented and would benefit from more cautious framing.

      We will clarify the presentation of partnership counts and sample sizes throughout the manuscript and improve the scaling and annotation of the flagged figures. Regarding DRB copy number variation, we will add explicit discussion of our analytical choices and their potential limitations. As described in our responses to the main concerns above, we will also provide more nuanced framing of the selective mechanisms consistent with our IBD results, avoiding conclusions that go beyond what our analyses directly support.

      Reviewer #2 (Public review):

      Summary:

      Evidence for the influence of MHC on mate choice in humans is challenging, as social structures and norms often confound the power of studying populations. This study uses an unusual, diverse, but relatively isolated population that allows a direct comparison of arranged and chosen partners to determine if MHC diversity is increased when choice drives mate choice. Overall, the authors use a range of genetic analyses to determine individual relationships alongside different measures of MHC diversity and potential selection pressures. The overall finding that there is no heterozygous dissimilarity difference between arranged and chosen partners. There is evidence of positive selection that may be a stronger driver, or at least it may mask other selection forces.

      Strengths:

      A rare opportunity to study human mate choice and genetic diversity. An excellent range of data and analysis that is well applied, and all results point to the same conclusion.

      Overall, this is a very well-written and concise paper when considering the significant amount of data and excellent analysis that has been undertaken.

      Weaknesses:

      (1) For the type of samples and data available, none are obvious.

      (2) Although this paper is clearly focused on humans, I was expecting more discussion around the studies that have been undertaken in animals. It is likely that between populations and species, there are different pressures that have driven the MHC evolution, but also mate choice.

      We will improve the framing of our project within the broader non-human MHC mate choice literature in our discussion.

      (3) The peptide presentation based on pathogen genomes is interesting but usually not significant. I wondered if another measure of MHC haplotype diversity to complement this would be the overall repertoire of peptides that could be presented, pathogen-based or otherwise. There is usually significant overlap in the peptides that can be presented, for example, between HLA-A and HLA-B, and this may reveal more significant differences between the alleles and haplotype frequencies.

      We would like to clarify that we did assess the unique pathogen peptides bound across all HLA class I and class II genes by each population's common haplotypes (Figures S12–S13). We acknowledge the reviewer's point that non-pathogenic peptides are also important — for example, binding with self-produced proteins. However, binding with self-produced proteins is more relevant to autoimmune risk, and the selective pressures involved are outside the scope of our current work, which focuses on pathogen-induced fluctuating directional selection and heterozygote advantage. Furthermore, selection on non-pathogenic peptide binding repertoires likely operates in the opposite direction to pathogen repertoire; whereas broader pathogen peptide binding is advantageous, broader self-peptide binding risks excessive immune activation.

      Reviewer #3 (Public review):

      The study investigates MHC-related mate choice in humans using a sample of couples from a small-scale sub-Saharan society. This is an important endeavour, as the vast majority of previous studies have been based on samples from complex, highly structured societies that are unlikely to reflect most of human evolutionary history. Moreover, the study controls for genome-wide diversity, allowing for a test of the specificity of the MHC region, as theoretically predicted. Finally, the authors examine potential fitness benefits by analysing predicted pathogen-binding affinities. Across all analyses, no deviations from random pairing are detected, suggesting a limited role for MHC-related mate choice in a relatively homogeneous society. Overall, I find the study to be carefully executed, and the paper clearly written. Nevertheless, I believe the paper would benefit if the following points were considered:

      (1) The authors claim (p. 2, l. 85) that their study is the first to employ a non-European small-scale society. I believe this claim is incorrect, as Hendrick and Black (1997) investigated MHC similarity among couples from South American indigenous populations.

      We thank the reviewer for this important clarification. Our claim was intended to be more specific: to our knowledge, this is the first study to investigate HLA-based mate preferences in a non-European small-scale society while explicitly controlling for genome-wide relatedness. Hedrick and Black (1997) did not include genome-wide relatedness controls, which is a critical distinction given that ancestry-assortative mating can produce spurious patterns of HLA similarity or dissimilarity in the absence of such correction. We will make this qualification explicit in the revised manuscript.

      (2) Regarding the argument that in complex societies, mating with a random individual would already result in sufficient MHC dissimilarity (p. 2, 78), see the paper from Croy et al. 2020, which used the largest sample to date in this research area.

      We thank the reviewer for this reference. In our revision, we will incorporate Croy et al. (2020) into our discussion and use it as a reference for comparing the Himba’s probability of highly homozygous offspring given population allele frequencies. This comparison will help support our claim that background HLA diversity in the Himba is sufficiently high so that any unrelated partner is already likely to yield adequately dissimilar offspring—a scenario that would reduce the selective benefit of active HLA-based mate choice and could mask any such preference even if it exists.

      (3) Dataset. As some relationships are parallel, I assume that certain individuals entered the dataset multiple times. This should be explicitly reported in the Methods. If I understand the analyses correctly, this non-independence was addressed by including individual identity as a random effect in the model - the authors should confirm whether this is the case. I am also wondering to what extent so-called "discovered partnerships" may affect the results. Shared offspring may be the outcome of short or transient affairs and could have a different social status compared with other informal relationships. Would the observed patterns change if these partnerships were excluded from the analyses?

      The reviewer is correct that individuals appear multiple times in the dataset—some individuals are members of multiple known partnerships, and all individuals are additionally included many times across the full set of possible random heterosexual pairings that meet our age and relatedness criteria. This non-independence is explicitly addressed in our dyadic linear mixed models by including female ID and male ID as random effects, which account for each individual's unique contribution to their similarity scores across all pairings, both real and random. We explain this explicitly in the (n) Statistical Models section of the methods section.

      Regarding discovered partnerships: we grouped these with reported informal partnerships in the current analyses due to modest sample sizes. We agree this is worth examining more carefully and will test, in our revision, whether treating discovered partnerships as a separate category, or excluding them entirely, meaningfully affects our results. We will report these analyses as a sensitivity check.

      (4) How many pairs were due to relatedness closer than 3rd degree? In addition, why was 4th degree relatedness used as a threshold in some of the other analyses?

      This information is reported in the (n) ‘Statistical Models section of the Methods’. No pairs were found to be closer than 3rd degree relatives. No arranged marriages were related at 3rd degree or closer; 1 love match marriage and 2 informal partnerships discovered through pedigree analysis were found to be 3rd degree relatives.

      Regarding the difference in relatedness thresholds: we used a 4th degree cutoff to define the unrelated set of individuals for allele and haplotype frequency analyses (n=102), as even 3rd degree relatives would inflate allele frequency estimates. In contrast, we permitted 3rd degree relatives in the background distribution for the partnership analyses to reflect the stated cultural preference for cousin marriages in arranged unions—excluding them would have made the background distribution less representative of the actual mating pool. We explain both decisions in Methods sections (d) and (n).

      (5) I was surprised by the exclusion of HIV, given that Namibia has a very high prevalence of HIV in the general population (e.g., Low et al. 2021).

      While HIV prevalence is indeed high in Namibia generally, the Himba are a relatively isolated population and, based on personal communication with Dr. Ashley Hazel—who has extensive field experience studying sexually transmitted infections in the Himba (see references 36, 52, 53, and 54)—there is no evidence of HIV transmission within this population. Dr. Hazel's expertise on this question was the basis for our exclusion of HIV from the pathogen list.

      (6) It appears that age criteria were applied when generating random pairs (p. 8, l. 350). Could the authors please specify what they consider a realistic age gap, and on what basis this threshold was chosen? As these are virtual couples used solely to estimate random variation within the population, it is not entirely clear why age constraints are necessary. Would the observed patterns change if no age criteria were applied?

      We will clarify this in our revision, but we restricted random couples to have an age gap within the range observed in actual, known partnerships (the woman is maximum 16 years older than then man and minimum 53 years younger than the man). We included this criteria to make sure random couples represented the best approximation of background, realistic partners. Our age gap criteria was quite permissive due to the large range observed in our actual pairs and we do not imagine it significantly impacted our results.

      (7) I think it would be helpful for readers if the Results section explicitly stated that real couples did not differ from randomly generated pairs. At present, only the comparison between chosen and arranged pairs is reported.

      We would like to clarify that for each analysis we explicitly report both the effects of chosen and arranged partnerships relative to the background distribution intercept, and the pairwise contrast between chosen and arranged partnerships. The intercept of each model is derived from the full background distribution of random opposite-sex pairings meeting our age and relatedness criteria, providing a null expectation under random mating. A non-significant effect for both partnership types therefore indicates that neither arranged nor chosen partnerships differ from random mating with respect to the metric in question. We describe this explicitly in the Statistical Models section of the Methods, but we will ensure this interpretation is stated more prominently in the Results section of the revised manuscript to avoid any confusion.

      (8) I appreciate the separate analyses of pathogen-binding properties for MHC class I and class II, given their functional distinctiveness. For the same reason, I would welcome a parallel analysis of MHC sharing conducted separately for class I and class II loci.

      We can incorporate separate HLA similarity/log odds of homozygous offspring analyses for class 1 and class 2 in our revision.

      (9) I think the Discussion would benefit from a more detailed comparison with previous studies. In addition, the manuscript does not explicitly address limitations of the current study, including the relatively limited sample size given the extensive polymorphism in the MHC region.

      We will expand our discussion in the revision to provide a more detailed comparison with previous studies, including Croy et al. (2020), and will add an explicit limitations section incorporating suggestions from multiple reviewers on more careful framing of optimality and specific selective mechanisms. Regarding sample size, we acknowledge this as a genuine limitation given the extensive polymorphism of the MHC region. However, our unrelated sample size used for allelic diversity estimated is comparable to previous studies in African populations (Figure 1), and our dataset is uniquely comprehensive in combining HLA class I, class II, genome-wide SNP data, and partnership data within the same individuals—a combination that enables the genome-wide relatedness correction that distinguishes our study from much of the prior literature.

      References

      Hedrick, P. W., & Black, F. L. (1997). HLA and mate selection: no evidence in South Amerindians. The American Journal of Human Genetics, 61(3), 505-511.

      Croy, I., Ritschel, G., Kreßner-Kiel, D., Schäfer, L., Hummel, T., Havlíček, J., ... & Schmidt, A. H. (2020). Marriage does not relate to major histocompatibility complex: A genetic analysis based on 3691 couples. Proceedings of the Royal Society B, 287(1936), 20201800.

      Low, A., Sachathep, K., Rutherford, G., Nitschke, A. M., Wolkon, A., Banda, K., ... & Mutenda, N. (2021). Migration in Namibia and its association with HIV acquisition and treatment outcomes. PLoS One, 16(9), e0256865.

    1. eLife Assessment

      This study presents a valuable finding on the condition dependence of autophagy-mediated lifespan regulation in C. elegans. The evidence is solid, as the data broadly support the main claims, although variability between biological replicates and limited mechanistic exploration leave some conclusions less firmly established. The work will be of interest to researchers studying autophagy, ageing, and intracellular trafficking.

    2. Reviewer #1 (Public review):

      Summary:

      Hsiung et al. investigated whether the effects of autophagy gene knockdown on the lifespan of long-lived C. elegans mutants depend on experimental conditions. The authors first compiled published data on autophagy-dependent lifespan regulation in daf-2 and wild-type backgrounds, highlighting that prior results are notably inconsistent and likely context-dependent. They then systematically tested the lifespan effects of RNAi knockdown of six autophagy genes (atg-2, atg-4.1, atg-9, atg-13, atg-18, and bec-1) in wild-type (N2), daf-2 (reduced insulin/IGF-1 signalling), and glp-1 (germlineless) animals, while varying temperature, daf-2 allele, FUDR concentration, and bacterial infection status.

      The key findings are as follows. In wild-type animals, lifespan suppression by most autophagy gene knockdowns was more pronounced at 20{degree sign}C than at 25{degree sign}C, where little or no effect was observed. In daf-2 mutants, stronger lifespan suppression was seen in the weaker daf-2(e1368) allele at 20{degree sign}C, but not in the stronger daf-2(e1370) allele, and effects were largely absent at 25{degree sign}C. In glp-1 mutants, four of six gene knockdowns suppressed lifespan to a greater extent than in N2, though again in a temperature-dependent manner. FUDR at a high concentration (800 µM) abolished the life-shortening effects of most knockdowns and, in the case of atg-9 and atg-13, led to lifespan extension. Kanamycin treatment to eliminate bacterial proliferation did not fully account for the lifespan effects, suggesting that increased susceptibility to infection is not the primary mechanism. The authors also tested the programmed aging hypothesis that autophagy promotes lifespan reduction through biomass repurposing, but found no changes in vitellogenin levels upon knockdown of any of the six genes.

      Altogether, among all genes tested, atg-18 knockdown produced the strongest and most consistent lifespan suppression across nearly all conditions, including both daf-2 and glp-1 backgrounds. The authors probed whether atg-18 acts through the FOXO transcription factor DAF-16 by examining dauer formation and ftn-1 expression, but found no evidence for this, suggesting a DAF-16-independent mechanism.

      Strengths:

      The primary strength of this work lies in its systematic and comprehensive approach to dissecting how experimental variables influence the outcome of autophagy-lifespan epistasis tests. The compilation of prior data alongside the authors' own multi-condition dataset is a genuinely useful resource for the field. The study raises a timely and important point about condition selection bias, which is relevant not only to autophagy research but to C. elegans aging studies more broadly. The finding that atg-18 behaves distinctly from other autophagy genes across all conditions is noteworthy and opens avenues for future mechanistic work.

      Weaknesses:

      Despite its breadth, the study has several weaknesses that limit the strength of some conclusions.

      (1) Variability in control lifespan data. The N2 lifespan values under ostensibly identical conditions (e.g., GFP RNAi at 20{degree sign}C) differ substantially across experiments (compare Tables S2, S5, S6, S7, and S9). Since N2 serves as the baseline for calculating whether the effect is greater in long-lived mutants via Cox proportional hazard (CPH) analysis, this variability in controls directly affects the reliability of those comparisons.

      (2) Limited biological replication. Most experiments were performed with only two biological replicates. In several cases, the two replicates yield contradictory outcomes: one showing significant lifespan suppression and the other showing no effect or even extension. The authors combine these into cumulative datasets for analysis, which, while not incorrect in principle, may obscure genuine irreproducibility. Given that the central message of the paper concerns variability and condition dependence, additional replication would have substantially strengthened confidence in the reported results.

      (3) Low sample sizes in individual trials. A number of lifespan assays were conducted with only 40-50 worms per replicate, and in some cases, as few as 30. Such sample sizes are below the standard commonly used in the C. elegans aging field and are likely to contribute to the variability observed.

      (4) RNAi efficacy measured only in N2 at 20{degree sign}C. The authors demonstrated that atg-2 and atg-4.1 RNAi did not significantly reduce target mRNA levels, which may explain their weaker lifespan effects. However, these same RNAi treatments significantly affected lifespan in several other conditions (e.g., daf-2(e1368) at 20{degree sign}C, glp-1 at 20{degree sign}C and 25{degree sign}C, and N2 with 15 µM FUDR). Measuring RNAi efficacy across different genetic backgrounds and conditions would be needed to properly interpret these variable results.

      (5) Incomplete mechanistic exploration. The investigation of why atg-18 knockdown has uniquely strong effects was limited to DAF-16. Given published evidence that atg-18 may regulate HLH-30/TFEB, a master transcriptional regulator of autophagy and lysosomal biogenesis, testing whether atg-18 specifically affects HLH-30 nuclear localisation or activity could have provided valuable mechanistic insight and would distinguish atg-18 from the other genes tested.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines how genes involved in cellular recycling (autophagy) influence lifespan under different experimental conditions. The findings help clarify why previous studies have reported conflicting results about whether blocking autophagy shortens or extends lifespan. The work will be of interest to researchers studying aging and cellular stress responses, particularly those using model organisms.

      Strengths:

      The findings are valuable, as they help resolve inconsistencies within a specific subfield of aging research. The evidence presented is solid, as the data broadly support the primary claims of the study. In addition, the discussion is thorough and thoughtfully integrates the findings within the broader context of the field.

      Weaknesses:

      Additional functional validation would further strengthen the conclusions.

    1. eLife Assessment

      This study establishes a methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. It has been difficult to study social interactions using artificial stimuli rather than genuine interactions between unrestrained animals. This study makes a fundamental contribution to social neuroscience research in a laboratory setting. Their results are convincing showing that the study of unrestrained social interactions is possible with detailed quantification of position and gaze. The methodology presented here is relevant to research in social neuroscience, neuroethology, and primatology.

    2. Reviewer #1 (Public review):

      Summary:

      The current study by Xing et al. establishes the methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. Their results enable unrestrained social interactions under more rigorous conditions with detailed quantification of position and gaze. It has been difficult to study social interactions using artificial stimuli, as opposed to genuine interactions between unrestrained animals. This study makes an important contribution for studying social neuroscience within a laboratory setting that will be valuable to the field.

      Strengths:

      Marmosets are an ideal species for studying primate social interactions due to their prosocial behavior and the ease of group housing within laboratory environments. They also predominantly orient their gaze through head-movements during social monitoring. Recent advances in machine vision pose estimation set the stage for estimating 3D gaze position in marmosets but requires additional innovation beyond DeepLabCut or equivalent methods. A six point facial frame is designed to accurately fit marmoset head gaze. A key assumption in the study is that head-gaze is a reliable indicator of the marmoset's gaze direction, which will also depend on the eye position. Overall, this assumption has been well supported by recent studies in head-free marmosets. Thus the current work introduces an important methodology for leveraging machine vision to track head-gaze and demonstrates its utility for use with interacting marmoset dyads as a first step in that study.

      Comments on revisions:

      I thank the authors for their careful revisions of the manuscript. It has addressed all of my comments.

      One final suggestion would be to add a scale bar in Supplemental Figure 2A so the size of the video/image stimuli is clear (in cm of monitor size) and also to report a range for how far away was the marmoset in viewing these stimuli (in cm). This will enable calculation of the rough accuracy in visual degrees.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes novel technique development and experiments to track the social gaze of marmosets. The authors used video tracking of multiple cameras in pairs of marmoset to infer head orientation and gaze, and then studied gaze direction as a function of distance between animals, relationships, and social conditions/stimuli.

      Strengths:

      Overall the work is interesting and well done. It addresses an area of growing interest in animal social behavior, an area that has largely been dominated by research in rodents and other non-primate species. In particular, this work addresses something that is uniquely primate (perhaps not unique, but not studied much in other laboratory model organisms), which is that primates, like humans, look at each other, and this gaze is an important social cue of their interactions. As such, the presented work is an important advance and addition to the literature that will allow more sophisticated quantification of animal behaviors. I am particularly enthusiastic about how the authors approach the cone of uncertainty in gaze, which can be both due to some error in head orientation measurements as well as variable eye position

      Weaknesses:

      While there remains some degree of uncertainty in the precise accuracy of the gaze measure, the authors have done an excellent job accounting for these as well as they can, and appropriately acknowledge the limitations of their approach.

      Comments on revisions:

      I have no further recommendations. The authors addressed my previous suggestions or acknowledged them as topics for future investigation. This is excellent work.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The current study by Xing et al. establishes the methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. Their results enable unrestrained social interactions under more rigorous conditions with detailed quantification of position and gaze. It has been difficult to study social interactions using artificial stimuli, as opposed to genuine interactions between unrestrained animals. This study makes an important contribution for studying social neuroscience within a laboratory setting that will be valuable to the field.

      Strengths:

      Marmosets are an ideal species for studying primate social interactions due to their prosocial behavior and the ease of group housing within laboratory environments. They also predominantly orient their gaze through head movements during social monitoring. Recent advances in machine vision pose estimation set the stage for estimating 3D gaze position in marmosets but require additional innovation beyond DeepLabCut or equivalent methods. A six-point facial frame is designed to accurately fit marmoset head gaze. A key assumption in the study is that head gaze is a reliable indicator of the marmoset's gaze direction, which will also depend on the eye position. Overall, this assumption has been well supported by recent studies in head-free marmosets. Thus the current work introduces an important methodology for leveraging machine vision to track head gaze and demonstrates its utility for use with interacting marmoset dyads as a first step in that study.

      Weaknesses:

      One weakness that should be easily addressed is that no data is provided to directly assess how accurate the estimated head gaze is based on calibrations of the animals, for example, when they are looking at discrete locations like faces or video on a monitor. This would be useful to get an upper bound on how accurate the 3D gaze vector is estimated to be, for planned use in other studies. Although the accuracy appears sufficient for the current results, it would be difficult to know if it could be applied in other contexts where more precision might be necessary.

      Please see our detailed responses to the reviewer comments below.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes novel technique development and experiments to track the social gaze of marmosets. The authors used video tracking of multiple cameras in pairs of marmosets to infer head orientation and gaze and then studied gaze direction as a function of distance between animals, relationships, and social conditions/stimuli.

      Strengths:

      Overall the work is interesting and well done. It addresses an area of growing interest in animal social behavior, an area that has largely been dominated by research in rodents and other non-primate species. In particular, this work addresses something that is uniquely primate (perhaps not unique, but not studied much in other laboratory model organisms), which is that primates, like humans, look at each other, and this gaze is an important social cue of their interactions. As such, the presented work is an important advance and addition to the literature that will allow more sophisticated quantification of animal behaviors. I am particularly enthusiastic with how the authors approach the cone of uncertainty in gaze, which can be both due to some error in head orientation measurements as well as variable eye position.

      Weaknesses:

      There are a few technical points in need of clarification, both in terms of the robustness of the gaze estimate, and possible confounds by gaze to non-face targets which may have relevance but are not discussed. These are relatively minor, and more suggestions than anything else.

      Please see our detailed responses to the reviewer comments below.

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) It appears that the accuracy of the estimated gaze angle must be well under the size of the gaze cone (+/- 10 degrees), but I can't find any direct estimate of the accuracy even if it is just a ballpark figure. On Lines 219-233 is where performance is described for viewing images and video on a monitor, where it should be possible to reconstruct the point of gaze on the monitor while images and video are shown, in order to evaluate the accuracy of the system for where the marmoset is looking? Would you see eye position traces that would show fixation clusters around those images or videos with stationary points on the monitor much like that seen for head-fixed animals looking at faces on a screen (Mitchell et al, 2014)? If so, what is the typical spread of those clusters during fixations on an image, both in terms of the precision by RMS error during a fixation epoch and the spread around the images at different locations (accuracy of projection)? For example, if gaze clusters were always above the displayed images one would have an idea that the face plane is slightly offset above the true gaze direction. It is not completely clear how well the face plane and corresponding gaze cone do in describing gaze direction in space, but the monitor stimuli could be used as an initial validation of it.

      We thank the reviewer for this important suggestion regarding the quantitative validation of gaze accuracy. We agree that, when animals view stimuli presented on a monitor, the estimated gaze direction can be evaluated by examining the spatial distribution of gaze–monitor intersection points relative to stimulus locations.

      To address this, we generated a new figure (Fig. S2A) analyzing gaze behavior following the onset of video stimuli presented at different locations on the monitor. Specifically, we selected video clips in which human annotators verified that the marmosets were looking at the monitor. Consistent with prior work in head-fixed marmosets (Mitchell et al., 2014), we observe clustering of gaze–monitor intersection centers within and around the corresponding stimulus locations after stimulus onset. These clusters provide an empirical validation that the estimated gaze direction aligns with stimulus position in space.

      Importantly, unlike the head-fixed preparation used in Mitchell et al. (2014), marmosets in our study were freely moving. As a result, they do not exhibit prolonged, stationary fixations on the monitor, and fixation clusters are therefore more diffuse. This increased spread reflects natural head and body motion rather than limitations of the gaze estimation method itself. Despite this, gaze intersection points remain spatially localized to the vicinity of the presented stimuli across different monitor locations.

      We did observe small offsets in some gaze clusters relative to stimulus centers; however, these offsets were not systematic across stimulus locations or animals. Crucially, there was no consistent bias (e.g., clusters appearing uniformly above or below stimuli) that would indicate a systematic misalignment of the face plane or gaze cone relative to true gaze direction. Together, these observations support the conclusion that the face-plane-based gaze cone provides an accurate estimate of gaze direction in space, with precision well within the ±10° aperture of the gaze cone.

      While the freely moving component of the behavior precludes direct estimation of fixation RMS error comparable to head-fixed paradigms, the observed stimulus-locked clustering serves as an initial validation of both the accuracy and practical utility of our approach under naturalistic conditions.

      (2) A second major comment is about clarity in the writing of the results and discussion. At the end of the manuscript, a major takeaway is the difference between familiar and unfamiliar dyads, that males show more interest in viewing females including unfamiliar females, but for familiar females, this distinction is also associated with being likely to look at them if they look at the male, and then to engage in joint gaze with them after looking at them, which indicates more of a social interaction than simply monitoring them when they are unfamiliar. Those aspects of the results could be emphasized more in the topic sentences of paragraphs presenting data to support those features of the gaze data (at present is buried at the ends of results paragraphs and back in the discussion).

      We thank the reviewer for this insightful suggestion. We have restructured the Results and Discussion sections to lead with the primary social takeaways rather than technical descriptions (Tracked changes in Word). Specifically, we now emphasize the distinction between "social monitoring" (characteristic of unfamiliar dyads) and "active social coordination" (characteristic of familiar dyads).

      (1) Topic Sentences: We revised the topic sentences of all Results paragraphs to immediately highlight the findings regarding male interest and the influence of familiarity on reciprocation.

      (2) Conceptual Framework: We added a conceptual distinction in the Discussion, explaining that while unfamiliar marmosets maintain high social attention through "peripheral monitoring" and proximity-dependent joint gaze, familiar pairs exhibit sophisticated, distance-independent coordination and gaze reciprocation.

      (3) Clarification of Male Interest: We explicitly stated that while male interest in females is high regardless of familiarity, it manifests as persistent monitoring in unfamiliar pairs versus a more aware, reciprocal state in familiar pairs.

      Minor comments:

      (1) Methods:

      a) Lines 522-539: The 200 continuous frames used for validation of the model containing two marmosets are sufficient to test how well it generalizes to other animals outside the training set? The RMSE reported, does it vary for animals inside vs outside the training set? To what extent does the RMSE, in image pixels, translate into accuracy in estimating the gaze direction, for example, as assessed by estimating error when marmosets look at images or video on the monitor?

      To address the reviewer’s concern regarding generalization and the translation of pixel RMSE to angular accuracy, we emphasize that the six facial features selected are prominent, high-contrast features across the species. Consequently, we observed that the RMSE remained consistent for marmosets both inside and outside the training set. To quantify how pixel-level tracking error translates into gaze estimation accuracy, we performed a sensitivity analysis. We simulated landmark (i.e., feature) jitter by sampling perturbations from circular distributions based on our empirical data (2.4 pixels for eyes; 2.1 pixels for the central blaze). Our results, illustrated in uthpr response image 1, show that 90% of the resulting head gaze deviations fall within 10°, which is consistent with the angular threshold used for our gaze cone model. This confirms that the reported RMSE provides sufficient precision for reliable gaze estimation.

      Author response image 1.

      Probability distribution of gaze angular deviation under circular perturbation. The histogram (blue) represents the change in reconstructed gaze angle (degrees) following stochastic perturbation of facial features. To simulate real-world variance, noise was sampled from circular distributions with radii of 2.4 pixels (eyes) and 2.1 pixels (central blaze). The red curve represents an exponential fit to the empirical data (y=ae<sup>bx</sup>, a=0.9591, b=0.1813. Approximately 90% of the reconstructed gaze deviations remain below 10°, indicating the model’s localised stability under pixel level coordinate jitter.

      b) Line 542-43: Is there any difference between a rigid model fit to the six facial points, versus using the plane defined by the two eyes and central blaze in terms of direction accuracy (in the ground truth validation)? How does the "semi-rigid" set of six points (mentioned also in lines 201-203) constrain the fit of the three points (two eyes and central blaze) that define the normal plan for the gaze cone?

      We thank the reviewer for the opportunity to clarify our geometric model. The plane used to define the gaze cone's origin was indeed determined by the two eyes and the central blaze. However, a plane defined by only three points was insufficient to determine a unique gaze direction, as the normal vector was ambiguous (it could point forward through the face or backward through the head).

      To resolve this, we utilized the relative positions of the two ear tufts. Because the tufts are anatomically situated behind the eyes and blaze, these additional points provide the necessary spatial context to orient the gaze vector correctly. In our validation, we found that the mouth does not alter the angular accuracy compared to a 3-point fit, supporting that the facial features are correctly identified.

      We use the term 'semi-rigid' to describe the six-point constellation because their relative spatial configurations remain stable across individuals and expressions, imposing a biological constraint on the model. This prevents unphysical warping of the face frame during 3D reconstruction and ensures the gaze cone remains anchored to the animal's true midline.

      (2) Results:

      a) Lines 203-205: What is the distinction between gaze orientation (defined by facial plane, 3D vector) and gaze direction (defined by ear tufts) ... is gaze direction in the 2D x-y plane? Why are two measures needed or different? It does not appear gaze orientation is used further in the manuscript and perhaps could be omitted.

      We appreciate the reviewer’s comment regarding the terminology. We have replaced all instances of ‘gaze orientation’ with ‘gaze direction’ to ensure consistency throughout the manuscript.

      To clarify, both terms referred to the same 3D unit vector. The ear tufts were not used to define a separate 2D measure; rather, they served as posterior anatomical anchors to resolve the 3D polarity of the normal vector (ensuring the vector points 'forward' from the face rather than 'backward'). Gaze direction was calculated in 3D space and was not restricted to a 2D x-y plane. We have clarified this in the revised Methods section (Lines 203–205) to avoid further ambiguity.

      b) Line 215-216: why is head-gaze velocity put in normalized units instead of degrees visual angle per second? How was the normalization performed (lines 549-557)? It would be simpler to see velocity as an angular speed (degrees angle per second) rather than a change in norms.

      We thank the reviewer for this suggestion. We agree that the expression is misleading.

      (1) We have replaced "face norm" with "face normal vector" (N) throughout the manuscript to clarify that we are referring to the 3D unit vector perpendicular to the facial plane.

      (2) Lines 224-225 and the corresponding Methods section (Lines 599-609) have been updated to reflect this change in units and terminology.

      We chose to use the change in the face normal vector in normalized units for our primary calculations because it allows for efficient spatiotemporal smoothing and is computationally robust at the very low thresholds required for our stability analysis. However, to address the reviewer's concern regarding interpretability, we have verified that our threshold of 0.05 normalized units corresponds to an angular velocity of 2.87 degrees/frame duration [33ms]. Since we are operating at very small angular changes, the Euclidean distance between unit vectors is a near-linear proxy for the angular displacement in radians.

      c) Lines 215-216: How do raw gaze traces appear over time ... are there gaze saccades and then stable fixations, or does it vary continuously? A plot of the gaze trace might be useful besides just showing velocity with a threshold, to evaluate to what extent stable fixation vs shifts are distinct.

      Author response image 2.

      Time course of gaze, angular velocity and stability, thresholding. The plot illustrates the temporal dynamics of the face normal vector velocity used to define stable gaze states. The blue trace represents the raw gaze velocity calculated in normalised units. The red dashed line demotes the empirical cut off threshold of 0.05 units per frame.

      To clarify the temporal dynamics of marmoset head movements, we have provided a representative time course of head gaze velocity as shown in Author response image 2. The data clearly show a "saccade-and-fixate" pattern: large, distinct spikes in velocity (representing rapid head redirections) are separated by periods of relative stability.

      While minor high-frequency fluctuations in the raw trace (blue) may be attributed to facial feature detection noise, they remain significantly below our stability threshold (red dashed line). By applying this threshold, we successfully isolated biologically relevant "stable fixations" from "head saccades," ensuring that our subsequent social gaze analysis is based on periods of intentional head gaze direction.

      d) Lines 237-286: The writing in this section does not emphasize the main results. There seem to be three takeaway points that could be emphasized better in the topic sentences of each of the paragraphs: i) Marmosets tended to spend most of their time on either end of the elongated box, not in the middle, ii) Males spent more time near the front of the box near the other animal than females, iii) Familiar pairs spent more time closer to each other.

      To address this comment, we have reorganized this section to lead with the three key behavioral findings:

      (1) We now state clearly in the topic sentence that marmosets preferred the ends of the arena over the middle.

      (2) We have highlighted the finding that males spend significantly more time near the inner edge (closer to the partner) than females, irrespective of familiarity.

      (3) We emphasized that familiar pairs maintain closer and more dynamic social distances over time, whereas unfamiliar pairs tend to move further apart as a session progresses.

      e) Line 303: It would be useful to see time traces of head velocity of each member of the pair and categorization over time of the gaze event types. A stable epoch must be brief on the order of 100-200ms. It is unclear how distinct the stable fixation epochs are from the moments when the gaze is shifting. Also, the state transition analysis treats each stable epoch like one event, and then following a gaze movement by either of the pair, the state is defined again, is that correct?

      We defined stable epochs as continuous periods where the face normal vector velocity remained below 0.05 normalized units for both animals. This ensures that a "gaze state" is only categorized when both marmosets have relatively fixed head orientations. As shown in the provided time traces in Author response image 2), the velocity profile is characterized by sharp peaks (head saccades) and clearly defined troughs (fixations). Further, we generated a probability histogram of stable head-gaze epoch durations (Author response image 3). The median duration of these stable epochs is 200ms, which aligns with biological expectations for fixation durations in primates and confirms that these states are distinct from the high-velocity shifts.

      The reviewer’s interpretation is correct. Our Markov chain model treats each stable epoch as a single event. A transition occurs when at least one animal moves (exceeding the velocity threshold), resulting in a new stable epoch where the relative gaze state is re-evaluated. This approach allows us to model the sequence of social interactions as a series of discrete behavioral decisions.

      Author response image 3.

      Temporal characteristics of stable gaze, head gaze, epochs. The histogram illustrates the probability distribution of the duration (ms) of stablegaze behaviour epochs. A minimum duration threshold of 100 ms was applied to exclude transient, non-purposeful head gazes.

      f) Lines 316-326: Some general summarizing statements to lead this paragraph would be useful. It seems that familiar pairs are more likely to participate in joint gaze, especially when close to each other, and perhaps, that males tended to gaze at females more than the reverse. Is there any notion that males were following the gaze of females?

      We thank the reviewer for these suggestions. We have revised the topic sentences of this section to lead with a summary of the social takeaways, specifically highlighting the higher level of male interest and the shift toward reciprocal coordination in familiar pairs.

      The reviewer correctly identified an important dynamic. Our transition analysis (Fig. 4D) confirms that males in both familiar and unfamiliar dyads frequently follow the female's gaze. This is evidenced by a robust transition probability (~17%) from "Male-to-Female Partner Gaze" (blue node) to "Joint Gaze" (green node). We found that this gaze-following behavior was a general feature of the dyads and did not differ significantly by familiarity, which is why it was not previously emphasized. However, we have now added a statement to the Results (Lines 358-365) to explicitly describe this male-led gaze-following behavior.

      g) Lines 328-337: Can these findings in this paragraph be summarized more generally? It seems males view unfamiliar females longer, whereas for familiar females they are more likely to reciprocate viewing if being viewed by them and then to join in joint gaze with them. Would that event, viewing a female and then a transition to joint gaze, not be categorized as a gaze-following event?

      We have now summarized the paragraph to emphasize the transition from vigilant monitoring in unfamiliar pairs to reciprocal awareness in familiar pairs.

      Regarding "longer" viewing: We have clarified the text to specify that males' interest in unfamiliar females is persistent and robust rather than simply "longer" in a single duration. The high recurrence probability signifies that males consistently re-orient their gaze back to the unfamiliar female even if the interaction is briefly interrupted by movement.

      Regarding gaze following and joint gaze: The reviewer asks if the transition from viewing a female to joint gaze constitutes gaze following. We agree that a transition from "male-to-female gaze" to "joint gaze" is indeed a gaze-following event (as noted in our previous response regarding Fig. 4D). However, the specific transition discussed in this paragraph (female-to-male gaze to male-to-female gaze) is different: it describes a "reciprocal" event where the male responded to being looked at by looking back at the female, while the female simultaneously shifted her gaze away. Since the two gaze cones did not intersect on an external object or on each other's faces simultaneously at the end of this transition, it was not categorized as joint gaze or gaze following.

      h) Lines 339-351: It is not clear why gazing at the region surrounding a female's face (as opposed to the face itself) reflects "gaze monitoring tied to increased social attention (Dal Monte et l, 2022). This hypothesis could be expanded to make the prediction clear in this paragraph.

      We thank the reviewer for identifying the need to clarify the hypothesis regarding the region surrounding the face. We have expanded this paragraph to explain why gazing at the peripheral facial region reflects social monitoring.

      In many primate species, direct and sustained eye contact can be often interpreted as a threat or a challenge, particularly between unfamiliar individuals. Peripheral monitoring (looking at the area immediately surrounding the face) can strategically allow an animal to stay highly attentive to the partner's head orientation, gaze direction, and facial expressions—all critical for anticipating future actions—while minimizing the risk of social conflict. By demonstrating that unfamiliar marmosets utilize this peripheral strategy significantly more than familiar ones, we provide evidence that social attention in novel dyads is characterized by a social monitoring strategy that balances the need for information with social caution.

      i) Lines 354-373: This section seems to suggest again that in a familiar male/female pair, the male is more likely to follow the female gaze and establish a joint gaze, and this occurs less with the unfamiliar pair only when closer in distance. Some summary sentences to begin the paragraph could help frame what to expect from the results.

      We have added summarizing topic sentences to this section to clarify the relationship between familiarity and the spatial distribution of joint gaze.

      (3) Discussion:

      Lines 380-463: This section reads more clearly than most of the results, where it is often hard to connect the data plots to their significance for behavior. Overall, I believe the manuscript could be improved by setting up a hypothesis before presenting results in the paragraphs demonstrating the data. Some of the main findings appear in text from lines 413-419 (somewhat hidden even in discussion).

      We sincerely appreciate the reviewer’s positive feedback on the clarity of the latter sections of our Discussion. We have taken the suggestion to heart and have performed a comprehensive restructuring of the Results and Discussion sections.

      (1) We have moved the key takeaways, specifically the distinction between vigilant monitoring in unfamiliar pairs and reciprocal coordination in familiar pairs, from the end of the Discussion to the topic sentences of the relevant Results paragraphs.

      (2) We established a unified framework throughout the manuscript that connects pixel-level tracking stability to the biological "saccade-and-fixate" movement pattern, and ultimately to the social dimensions of sex and familiarity.

      (4) A couple of additional questions to address in the discussion:

      a) Can you speculate why in this behavioral context the marmosets do not engage in reciprocal gaze where both are simultaneously looking at each other (lines 297-301)? How low is the incidence of this event, numerically, in comparison to the other events (1 in 1000 events, etc)?

      We appreciate the reviewer’s interest in the lack of reciprocal gaze (mutual eye contact).

      Numerically, reciprocal gaze events occurred with a frequency of approximately 1 in 500 social gaze events (comprising less than 0.2% of our social dataset). Given this extreme scarcity, we felt that any statistical comparisons across sex or familiarity would be underpowered and potentially misleading, leading to our decision to focus on partner and joint gaze states.

      We speculate that the rarity of reciprocal gaze is primarily due to our task-free experimental setup. Unlike directed cooperation tasks where animals must look at each other to coordinate actions for a reward (e.g., Miss & Burkart, 2018), our study focused on task-free interactions. In a free-moving context without a common goal, marmosets may prioritize monitoring the environment or the partner’s actions (joint or partner gaze) over direct, sustained mutual eye contact, which can sometimes be perceived as a confrontational or high-arousal signal in primate social hierarchies.

      b) Does a transition from a marmoset viewing their partner, to a joint gaze, count as a gaze-following event? It appears the authors are reluctant to use that terminology. What are the potential concerns in that terminology? Is there a concern that both animals orient to the same object that is salient to them without it being due to their gaze?

      A transition from a partner-directed gaze to a joint gaze is indeed a gaze-following event. We distinguish these events from a transition between partner-directed gazes (e.g., male-to-female to female-to-male). In these "reciprocation" cases, once the second animal looked at the first, the first animal shifted their gaze away. Because the two gaze cones did not intersect on a common object at the end of the transition, I classified such events as a social exchange of attention rather than a coordinated gaze-following event.

      Reviewer #2 (Recommendations for the authors):

      I do have a few questions/points for clarification:

      (1) While your approach appears to be able to track head orientation when the face is occluded or turned away from the primary cameras, how was the accuracy of this validated? Since you have multiple cameras, it should be possible to make the estimate using the occluded cameras and then validate using the non-occluded ones.

      We appreciate the reviewer's comment regarding the validation of our tracking during partial occlusions.

      We wish to clarify that our system does not utilize "primary" vs "auxiliary" cameras. Rather, any two or more cameras that capture facial features with high confidence are used to triangulate the points into 3D space. Thus, the "primary" cameras are dynamically determined frame-by-frame based on the animal's orientation.

      To validate the accuracy of our 3D reconstruction during occlusions, we utilized a "projection-validation" approach. As demonstrated in Figure 2B (left panel), when the face is turned away from a specific camera, leaving only the back of the head visible, we used the facial features triangulated from the other non-occluded cameras and projected them onto the image plane of the occluded camera. The fact that these projected points aligned precisely with the expected (but hidden) anatomical landmarks confirms the global accuracy of our 3D model.

      We previously benchmarked this approach using a three-camera system where we triangulated coordinates via two cameras and successfully projected them onto the third camera's image plane with high accuracy. This ensures that even when a camera is "blind" to the face, the 3D position estimated by the rest of the array remains robust.

      (2) Marmosets, like other non-human primates, also look at other body postures for their social communication, though admittedly marmosets are far more likely to look others in the face than larger primates. The tail-raised genital displays come to mind. While the paper primarily focuses on shared vs deviant gaze, and I believe tracks not only the angle of viewing towards the target but also the distance from the face (please clarify if I am wrong), it would also be useful to know how often marmosets are looking at each other beyond just the face. This is particularly interesting if the gaze towards the partner varies depending on whether that partner was generally oriented towards the gazer, or not. For the joint gaze, were there conditions in which the two were looking at the same target, but had body postures that were not oriented toward one another (i.e. looking at a distant target beyond one of the animals, like looking over someone else's shoulder)?

      We thank the reviewer for highlighting the importance of body postures and non-facial social signals (e.g., genital displays) in marmoset communication.

      At the inception of this project, we explored tracking multiple body parts. However, due to the marmoset's dense fur and the lack of distinct skeletal markers under naturalistic lighting, human annotators and early automated tools struggled to achieve the precision required for high-resolution 3D kinematics. While recent advances in whole-body tracking now make these questions approachable, we chose to focus on the face normal vector because it provided the most robust and high-confidence signal for social orientation in our current dataset.

      Regarding the "looking over the shoulder" scenario, we utilized a hierarchical classification system to prevent wrong categorization. Intersection with the partner’s face always took priority. If one animal’s gaze cone contained the other’s face, the state was classified as "Partner Gaze", even if the two gaze cones happened to intersect at a distant point in space. This ensures that "Joint Gaze" specifically captures instances where both animals ignore one another’s face regions to focus on a shared external target.

      We agree that the relationship between body posture and head gaze is a fascinating area for future research. In our current setup, while "Joint Gaze" requires the head-gaze cones to intersect, the animals' bodies could indeed be oriented in different directions (e.g., looking at a distant target behind the partner). We have added a note to the Discussion acknowledging that incorporating whole-body gestures would further deepen the understanding of marmoset social ethology.

      (3) In the introduction, (line 70), you raise the question of ecological relevance, using rhesus in laboratory settings. This could use a little more expansion/explanation of the limitations of current/past approaches.

      We thank the reviewer for the suggestion to expand upon the ecological limitations of traditional laboratory paradigms.

      We have substantially revised the Introduction (Lines 70–82) to provide a more detailed critique of past approaches. Specifically, we now highlight how traditional head-fixed or screen-based paradigms decouple eye movements from natural head-body dynamics and lack the reciprocal, multi-agent complexity found in real-world social environments (e.g., Land, 2006; Shepherd, 2010). By contrasting these constraints with the spatially and socially embedded nature of marmoset interactions, we clarify why a more naturalistic, quantitative approach is necessary to understand the true dynamics of social gaze. These additions provide a stronger theoretical foundation for our move toward a free-moving experimental model.

    1. eLife Assessment

      This important work examines the effects of side-wall confinement on chemotaxis of swimming bacteria in a shallow microfluidic channel. The authors present convincing experimental evidence, combined with geometric analysis and numerical simulations of simplified models, showing that chemotaxis is enhanced when the distance between the side walls is comparable to the intrinsic radius of chiral circular swimming near open surfaces. This study should be of interest to scientists specializing in bacteria-surface interactions.

    2. Reviewer #1 (Public review):

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, a value close to the average radius of the circle trajectories of the unconfined bacteria in 2D. These chiral circles impose that the bacteria swim preferentially along the right-side wall, which indeed yields chemotaxis in the presence of a chemotactic gradient. These observations are backed by numerical simulations and a geometrical analysis.

    3. Reviewer #3 (Public review):

      This paper addresses, through experiment and simulation, the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established, to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.

      The authors have included a useful intuitive explanation of their results via a geometric model of the trajectories. In future work it would be interesting to analyze further the voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, this might help understand how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. As these would be characterized by significant heterogeneity in pore sizes and geometries, further work will be necessary to translate the present results to those situations.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, a value close to the average radius of the circle trajectories of the unconfined bacteria in 2D. These chiral circles impose that the bacteria swim preferentially along the right-side wall, which indeed yields chemotaxis in the presence of a chemotactic gradient. These observations are backed by numerical simulations and a geometrical analysis.

      Reviewer #3 (Public review):

      This paper addresses, through experiment and simulation, the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established, to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.

      The authors have included a useful intuitive explanation of their results via a geometric model of the trajectories. In future work it would be interesting to analyze further the voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, this might help understand how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. As these would be characterized by significant heterogeneity in pore sizes and geometries, further work will be necessary to translate the present results to those situations.

      Thanks to the referees' input and more work, we think our revised manuscript now meets the high standard of eLife

      Recommendations for the authors:

      The importance of the circular swimming chirality for the observed phenomenon could be further emphasized by actually using the word "chiral" or "chirality" in the text. Also indicating what would change is swimming were counterclockwise rather then clockwise would help the reader understand the key significance of chirality.

      We thank the reviewer for this insightful suggestion. We agree that the chirality of the surface interaction is central to the observed phenomenon and should be explicitly highlighted to improve the reader's understanding.

      In response, we have incorporated the terms "chiral" and "chirality" throughout the manuscript (Abstract, Introduction, Results, and Discussion) to emphasize this aspect. Furthermore, we have added a specific explanation in the Results section (the last paragraph of subsection “The cells in the right sidewall region dominated the chemotaxis of E. coli with lane confinements”) detailing the hypothetical scenario of counter-clockwise swimming. We clarify that in such a case, the hydrodynamic interaction would cause cells to veer left, resulting in up-gradient accumulation along the left sidewall rather than the right. We believe these additions significantly improve the clarity of the underlying physical mechanism.

      Reviewer #1 (Recommendations for the authors):

      I still have several comments that the authors may want to consider for the last version.

      - The run and tumble behavior of the cells at the surface remains puzzling and would need some more explanation in the text. Tumbles with no significant reorientation angle amount largely to smooth swimmers. How can a model based on run-and-tumbles be used to explain the difference between LSW and RSW?

      We apologize for the lack of clarity regarding the surface run-and-tumble behavior. While it is true that surface tumbles often result in smaller reorientation angles compared to bulk swimming, they are not negligible and play a critical role in the observed asymmetry. As shown in the tumble angle distributions (Fig. 2E and 2F), the probability of a tumble angle exceeding π/2 is approximately 9% for sidewall trajectories and 30% for the middle area. This tumbling behavior leads to differences between the left sidewall (LSW) and right sidewall (RSW) in two key ways:

      First, as detailed in our geometric analysis (Fig. 6), running cells following stable clockwise circular paths are geometrically favored to reach the RSW. Because cells moving up-gradient (towards the RSW) experience suppressed tumbling, they maintain these stable circular trajectories and accumulate effectively. Conversely, cells moving down-gradient (towards the LSW) experience enhanced tumbling. These frequent interruptions distort the circular trajectories required to reach the LSW, resulting in fewer bacteria entering the LSW compared to the RSW.

      Second, once at the wall, the difference in tumbling frequency dictates retention. Majority of LSW cells are swimming down-gradient (LSW-DG) and thus tumble more frequently, increasing their probability of escaping the wall. Majority of RSW cells are swimming up-gradient (RSW-UG), suppressing tumbles and increasing their residence time at the wall.

      The relevant clarifications have been included in the last paragraph of “Results” in the manuscript.

      - Figure 5B would need more explanation. I still don't understand the different behaviors for the right and left side walls at small widths. Is it noise really or a more complex behavior? Since most of these calculations are based precisely on the shape of these curves it would be useful to discuss them in more detail.

      We apologize for the lack of clarity. The behavior observed at small widths in Figure 5B is not noise; rather, it reflects the idealized nature of our simulation model.

      In the simulation, bacteria were modeled as active particles without explicit steric exclusion for the flagella and cell body. Consequently, simulated cells retain the ability to reorient and turn freely even in very narrow lanes (w ≤ 6 μm), allowing the geometric sorting mechanism (which favors the RSW) to function efficiently even at small widths. This is why the simulation shows a distinct difference between LSW and RSW proportions in this regime.

      In the experimental reality, however, the finite size of the bacterial body and flagella creates steric hindrance. In narrow channels, this physical constraint restricts the cells' ability to turn, thereby disrupting the circular swimming mechanism required to sort cells into the RSW. As a result, experimental data shows that the proportions of LSW and RSW cells tend to equalize in narrow channels (e.g., w = 6 μm in Fig. 4B), leading to a lower chemotactic drift velocity than predicted by the simulation.

      We have added a discussion regarding these steric effects and the deviation at narrow widths to the Results section (the penultimate paragraph of subsection "Simulation of E. coli chemotaxis within lane confinement") in the revised manuscript.

      - The importance of the chirality of the circular trajectories, although essential, remains insufficiently mentioned in the text.

      We have incorporated the terms "chiral" and "chirality" throughout the manuscript (Abstract, Introduction, Results, and Discussion) to emphasize this aspect. Furthermore, we have added a specific explanation in the Results section (the last paragraph of subsection “The cells in the right sidewall region dominated the chemotaxis of E. coli with lane confinements”) detailing the hypothetical scenario of counter-clockwise swimming.

      - It would be useful to color-code the trajectories of Figure 1B and alike with time.

      Thank you for the suggestion. Now the trajectories in Fig. 1B have been redrawn. Distinct colors denote individual trajectories, with color intensity darkening to indicate time progression.

    1. eLife Assessment

      This interesting study presents a multi-OMICs approach to unify different lines of evidence regarding the epigenetic regulation of the key virulence factor causing placental malaria during P. falciparum infection. Most results are confirmatory of previous observations; nonetheless, the claims are supported by convincing evidence. The combinatorial approach chosen here is unprecedented and therefore provides valuable new data. In addition, the comparative investigation of different DNA methylation modifications is novel and disproves a direct role in var gene regulation.

    2. Reviewer #2 (Public review):

      Summary:

      Dr Lenz and colleagues report on their in vitro studies comparing gene transcription and epigenetic modifications in Plasmodium falciparum NF54 parasites selected or not selected for adhesion of the infected erythrocytes (IEs) to the placental IE adhesion receptor chondroitin sulfate A (CSA).

      The authors report that selection led to preferential transcription of var2csa, the gene that encodes the VAR2CSA-type PfEMP1 well-established as the PfEMP1 mediating IE adhesion to CSA. They confirm that transcriptional activation of var2csa is associated with distinct depletion of H3K9me3 marks and that transcriptional activation is linked to repositioning of var2csa. Finally, they provide preliminary evidence potentially implicating 5mC in transcriptional regulation of var2csa.

      Strengths:

      The study confirms previously reported features of gene transcription and epigenetic modifications in Plasmodium falciparum.

      Weaknesses:

      No major new finding is reported.

      Comments on revisions:

      I suggest replacing the term "pregnancy-associated malaria (PAM)" with the more current and more precise term "placental malaria (PM)" throughout the manuscript.

      L. 59-60: "... shielding of the parasite antigens expressed on pRBC surfaces by leukocytes...". It is unclear to me what this means - I suggest a rephrasing for improved clarity.

      L. 144-6: Please provide a reference for the primary antibody reagent used.

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript by Lenz et al. seeks to investigate molecular mechanisms directing virulence gene expression in the malaria parasite Plasmodium falciparum. The report provides a detailed characterization of the phenotypic and epigenetic features of a var2csa expressing parasite population, the key virulence gene causing the clinical syndrome of placental malaria. Novel evidence supporting the concept that active expression of this gene is associated with nuclear repositioning away from suppressive regions of chromatin is presented. In addition, the authors conducted a preliminary characterization of different forms of DNA methylation, suggesting that 5-methylcytosine is enriched in virulence genes, but does not correlate with their activation or repression. However, a trend towards higher enrichment of 5-methylcytosine in highly active as opposed to inactive genes from the core genome was reported, although this observation requires further validation.

      Strengths:

      The concise study provides a well documented and controlled set of experiments utilizing state-of-the-art OMICs methodologies including ChIPseq, RNAseq, chromatin-conformation capture (Hi-C) and DNA methylation (MeDIPseq) to generate deep insight into the epigenetic regulation of the key virulence factor of P. falciparum. The study unifies different lines of evidence and thereby contributes to a clearer understanding of the mechanisms underlying active expression of var2csa.

      Weaknesses:

      Although all experiments appear to have been rigorously conducted and documented with appropriate replicates and controls, the study is overall lacking statistical support from individual analyses of the biological replicates. In particular, the key novel result suggesting increased distance of the active var2csa gene from regions of heterochromatin as assessed by chromatin conformation capture would benefit from further analysis by comparison with other genetic loci. This also applies to the differential DNA methylation patterns, which should be dissected in more detail to support any association with gene expression or intron function.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Lenz and colleagues describes a detailed examination of the epigenetic changes and alterations in subnuclear arrangement associated with the activation of a unique var gene associated with placental malaria in the human malaria parasite Plasmodium falciparum. The var gene family has been heavily studied over the last couple of decades due to its importance in the pathogenesis of malaria, its role in immune avoidance, and the unique transcriptional regulation that it displays. Aspects of how mutually exclusive expression is regulated have been described by several groups and are now known to include histone modifications, subnuclear chromosomal arrangement, and in the case of var2csa, regulation at the level of translation. Here the authors apply several methods to confirm previous observations and to consider a possible role for DNA methylation. They demonstrate that the histone mark H3K9me3 is found at the promoters of silent genes, var2csa moves away from other var gene clusters when activated, and while DNA methylation is detectable at var genes, it does not seem to correlate with transcriptional activation/silencing. Overall, the data and approach appear sound.

      Strengths:

      The authors employ the latest methods for epigenetic analysis of histone marks, transcriptomic analysis, DNA methylation, and chromosome conformation. They also use strong selection pressure to be able to examine the gene var2csa in its active and silent state. This is likely the only paper that has used all these methods in parallel to examine var gene regulation. Thus, the paper provides readers with confidence in the interpretation of independent methods that address a similar subject.

      We thank the reviewer for this positive assessment. We appreciate the recognition that our study combines complementary approaches including histone mark profiling, transcriptomic analysis, DNA methylation mapping, and chromosome conformation capture in parallel to the use of strong population selection that enables a controlled comparison of var2csa in active versus silent states. We agree that the convergence of independent methods strengthens confidence in the interpretation.

      Weaknesses:

      The primary weakness of the paper is that none of the conclusions are novel and the overall conclusions do not shed much new light on the topic of var gene regulation or antigenic variation in malaria parasites. The paper is largely confirmatory. The roles of H3K9me3 and subnuclear localization in var gene regulation are well established by many groups (including for var2csa), albeit in some cases using alternative methods. The only truly unique aspect of the manuscript is the description of 5mC at var2csa when the gene is transcriptionally active or silent. Here the authors demonstrate that the mark has no clear role in transcriptional activation or silencing, however, this will not be surprising to many in the field who have previously cast doubt on a regulatory role for this modification.

      While we agree that some individual features of var gene regulation, including H3K9me3 enrichment, have been described previously, our study integrate for the first time several layer of gene regulation on the clinically important var2csa locus using phenotypically homogeneous placental-binding parasite populations. As expected, var2csa activation coincided with a loss of H3K9me3 at the locus. However, using high-resolution chromatin conformation capture (to our knowledge, this experiment had never been applied to phenotypically homogeneous parasite populations), we quantified the repositioning of var2csa relative to heterochromatic telomeric clusters. We further assessed DNA methylation in this framework and show that 5-methylcytosine is broadly present at var genes and may correlate with transcript level, but is uncoupled from transcriptional activation, repression, and switching. Together, these findings integrate transcriptional state, chromatin marks, and 3D genome organization at var2csa and argue against models in which 5mC acts as a primary regulatory switch for var gene expression.

      Reviewer #2 (Public Review):

      Summary:

      Dr Lenz and colleagues report on their in vitro studies comparing gene transcription and epigenetic modifications in Plasmodium falciparum NF54 parasites selected or not selected for adhesion of the infected erythrocytes (IEs) to the placental IE adhesion receptor chondroitin sulfate A (CSA).

      The authors report that selection led to preferential transcription of var2csa, the gene that encodes the VAR2CSA-type PfEMP1 well-established as the PfEMP1 mediating IE adhesion to CSA. They confirm that transcriptional activation of var2csa is associated with distinct depletion of H3K9me3 marks and that transcriptional activation is linked to repositioning of var2csa. Finally, they provide preliminary evidence potentially implicating 5mC in the transcriptional regulation of var2csa.

      Strengths:

      The study confirms previously reported features of gene transcription and epigenetic modifications in Plasmodium falciparum.

      As stated in our response to Reviewer 1, our study combines, for the first time, complementary approaches, including transcriptomic analysis, histone mark profiling, DNA methylation mapping, and chromosome conformation capture, together with strong population selection to enable a controlled comparison of var2csa in active versus silent states.

      Weaknesses:

      No major new finding is reported. The strength of the evidence presented is mostly solid, although certain elements, e.g., the role of 5mC in transcriptional regulation of var2cs, appear preliminary and incomplete.

      While we agree that no major new finding is reported, we were able to use for the first time a high-resolution chromatin conformation capture method to quantify the repositioning of var2csa relative to heterochromatic telomeric clusters. We also further assessed that 5-methylcytosine is present at var genes and may correlate with transcript level, but is uncoupled from transcriptional activation, repression, and switching. Together, these findings integrate for the first time transcriptional state, chromatin marks, and 3D genome organization at var2csa and argue against models in which 5mC acts as a primary regulatory switch for var gene expression.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      (1) In the second paragraph of the introduction, the authors state "....such as the shielding of the parasite antigens expressed on pRBC surfaces by other cells and the evasion of splenic clearance (8)." What does "other cells" mean here?

      We thank the reviewer for this comment. We have clarified the cell type in the text.

      (2) In their interpretation of the Hi-C data, the authors conclude that the var2csa expressing parasites display "tighter heterochromatin control of var gene regions" and "interactions around other silent var genes were increased" and "an overall compaction of telomere ends and var gene-containing intrachromosomal regions". While the data appear to show that this is true when they compare the two parasite populations, I am concerned that the authors might be misinterpreting the data. It is important to note that the NF54CSAh line is heavily selected to be nearly entirely homogeneous for var gene expression while the NF54 line is exceptionally heterogeneous. This is shown in Figure 1G. Thus, any chromosomal arrangement specific for var gene expression in the unselected NF54 population will be similarly heterogeneous and therefore could appear less tight. In other words, interactions around silent var genes and overall compaction of telomere ends might be identical between individual parasites within these populations, but appear tighter or more compact in the var2csa expressing line simply because it is a homogeneous population. Perhaps this is what the authors meant to convey, however as currently written, it seems that they conclude the expression of var2csa results in a unique change in chromosome organization. A better comparison would be two populations homogeneously expressing different var genes, one expressing var2csa and one expressing an alternative var gene. Such lines can be generated through clonal isolation or selection for binding to a different host receptor.

      We thank the reviewer for this comment. The reviewer is correct, and we have revised the Discussion section of the manuscript to clarify this issue.

      (3) The title of the last section of the Results is "Distribution of DNA methylation influences gene expression overall but does not mediate transcriptional activation and switching in antigenic variation". This is an overstatement. The authors show that DNA methylation is absent at var gene promoter regions and enriched in coding regions, but there they provide no evidence that it "influences gene expression overall". This is speculation. Lastly, when the authors examined 5mC occupancy across genes, did they normalize for GC content of the DNA sequences? GC content is known to increase dramatically in coding regions (particularly in var genes) and thus could explain the distribution of this mark. If the authors corrected for this, they should directly state this in the results section. If they did not, they should explain why they don't think this property of the P. falciparum genome explains the distribution of 5mC.

      There is often a misconception in the field that DNA methylation is primarily confined to CpG islands in promoter regions and functions mainly as a repressor of transcription. However, in contrast to promoter methylation, methylation within gene bodies is generally associated with higher levels of gene expression, suggesting a role in facilitating transcription elongation. Gene-body methylation can also repress internal promoters, thereby preventing spurious transcription initiation within the gene. In addition, it has been shown to influence alternative splicing by affecting RNA polymerase II elongation kinetics.

      We propose that, in Plasmodium, DNA methylation may be associated with priming genes for transcriptional activity rather than repressing transcription. Specifically, higher methylation levels may facilitate recruitment of the RNA polymerase II transcriptional machinery to enable transcription. In Figure 4B, we observe higher levels of DNA methylation in the first exon of highly expressed genes in both the NF54 and NF54CSAh lines. Interestingly, we also detect high levels of methylation across most introns of the var genes, introns that must be transcribed, cannot be degraded, and are essential for var gene regulation, suggesting a possible sequence-recognition function. We have edited the manuscript to improve clarity.

      (4) In the legend to Figure 3D, the authors state that the centromeres are shown in blue, however in the figure they appear to be grey while var2csa is blue.

      We have revised the figure legend accordingly.

      Reviewer #2 (Recommendations For The Authors):

      I recommend using the term "transcription" rather than "expression" when discussing events at the gene level.

      We have revised the manuscript accordingly.

      I also recommend using the term "adhesion" to describe the physical interaction between infected erythrocytes and adhesion receptors rather than adherence", which should be reserved to describe non-physical affinity (e.g., beliefs, faith).

      We have revised the manuscript accordingly.

      Important new evidence regarding transcriptional regulation of var genes in general and var2csa in particular should be discussed and cited.

      We have revised the manuscript accordingly.

    1. eLife Assessment

      This important research investigates the precision of numerosity perception in two types of tasks and concludes that human performance aligns with an efficient coding model optimized for current environmental statistics and task goals. The proposed model receives compelling evidence from two numerosity perception experiments and a reanalysis of an existing dataset of risky decision-making. These findings have theoretical implications for our understanding of numerosity perception and decision-making as well as the ongoing debate on different efficient coding models.

    2. Reviewer #3 (Public review):

      Summary:

      This work investigates whether human imprecision in numeric perception is a fixed structural constraint or an endogenous property that adapts to environmental statistics and task objectives. By measuring behavioral variability across different uniform prior distributions in both estimation and discrimination tasks, the authors show that perceptual imprecision increases sublinearly with prior width. They demonstrate that the specific exponents of this scaling (1/2 for estimation and 3/4 for discrimination) can be derived from an efficient-coding model, wherein decision-makers optimally balance task-specific expected rewards against the metabolic costs of neural coding. The revised manuscript expands this framework to accommodate logarithmic representations and validates the core model against an independent dataset of risky choices.

      Strengths:

      The authors have effectively addressed my previous concerns with rigorous additions:

      (1) The mathematical formulation has been revised into a discrete signal accumulation framework, making the objective function and resource trade-offs much more transparent and mathematically tractable.

      (2) The incorporation of the logarithmic representation resolves prior ambiguities regarding structural constraints.

      (3) The new split-half analysis effectively addresses the temporal dynamics of adaptation. The stability of the sublinear scaling across the experiment provides solid evidence that human subjects utilize rapid, top-down modulation to adjust their encoding strategy when explicitly informed about the environment.

      (4) Validating the derived scaling exponents on an independent risky-choice dataset robustly supports the generalizability of the theoretical framework beyond a single cognitive domain.

      Comments on revisions:

      The authors have addressed my remaining theoretical concern regarding the model's predictions for mean estimation bias. I have no further comments.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The "number sense" refers to an imprecise and noisy representation of number. Many researchers propose that the number sense confers a fixed (exogenous) subjective representation of number that adheres to scalar variability, whereby the variance of the representation of number is linear in the number.

      This manuscript investigates whether the representation of number is fixed, as usually assumed in the literature, or whether it is endogenous. The two dimensions on which the authors investigate this endogeneity are the subject's prior beliefs about stimuli values and the task objective. Using two experimental tasks, the authors collect data that are shown to violate scalar variability and are instead consistent with a model of optimal encoding and decoding, where the encoding phase depends endogenously on prior and task objectives. I believe the paper asks a critically important question. The literature in cognitive science, psychology, and increasingly in economics, has provided growing empirical evidence of decision-making consistent with efficient coding. However, the precise model mechanics can differ substantially across studies. This point was made forcefully in a paper by Ma and Woodford (2020, Behavioral & Brain Sciences), who argue that different researchers make different assumptions about the objective function and resource constraints across efficient coding models, leading to a proliferation of different models with ad-hoc assumptions. Thus, the possibility that optimal coding depends endogenously on the prior and the objective of the task, opens the door to a more parsimonious framework in which assumptions of the model can be constrained by environmental features. Along these lines, one of the authors' conclusions is that the degree of variability in subjective responses increases sublinearly in the width of the prior. And importantly, the degree of this sublinearity differs across the two tasks, in a manner that is consistent with a unified efficient coding model.

      Comments on revisions:

      The authors have done an excellent job addressing my main concerns from the previous round. The new analyses that address the alternative model of "no cognitive noise and only motor noise" are compelling and provide quantitative evidence that bolsters the paper's overall contribution. The authors also went above and beyond by reanalyzing the Frydman and Jin (2022) dataset to provide new and very interesting analyses that provide an additional out of sample test of the model proposed in the current paper.

      Reviewer #2 (Public review):

      Summary:

      This paper provides an ingenious experimental test of an efficient coding objective based on optimization as a task success. The key idea is that different tasks (estimation vs discrimination) will, under the proposed model, lead to a different scaling between the encoding precision and the width of the prior distribution. Empirical evidence in two tasks involving number perception supports this idea.

      Strengths:

      - The paper provides an elegant test of a prediction made by a certain class of efficient coding models previously investigated theoretically by the authors. The results in experiments and modeling suggest that competing efficient coding models, optimizing mutual information alone, may be incomplete by missing the role of the task.

      - The paper carefully considers how the novel predictions of the model interact with the Weber/Fechner law.

      Weaknesses:

      The claims would be even more strongly validated if data were present at more than two widths in the discrimination experiment (also noted in Discussion).

      Reviewer #3 (Public review):

      Summary:

      This work investigates whether human imprecision in numeric perception is a fixed structural constraint or an endogenous property that adapts to environmental statistics and task objectives. By measuring behavioral variability across different uniform prior distributions in both estimation and discrimination tasks, the authors show that perceptual imprecision increases sublinearly with prior width. They demonstrate that the specific exponents of this scaling (1/2 for estimation and 3/4 for discrimination) can be derived from an efficient-coding model, wherein decision-makers optimally balance task-specific expected rewards against the metabolic costs of neural coding. The revised manuscript expands this framework to accommodate logarithmic representations and validates the core model against an independent dataset of risky choices.

      Strengths:

      The authors have effectively addressed my previous concerns with rigorous additions:

      (1) The mathematical formulation has been revised into a discrete signal accumulation framework, making the objective function and resource trade-offs much more transparent and mathematically tractable.

      (2) The incorporation of the logarithmic representation resolves prior ambiguities regarding structural constraints.

      (3) The new split-half analysis effectively addresses the temporal dynamics of adaptation. The stability of the sublinear scaling across the experiment provides solid evidence that human subjects utilize rapid, top-down modulation to adjust their encoding strategy when explicitly informed about the environment.

      (4) Validating the derived scaling exponents on an independent risky-choice dataset robustly supports the generalizability of the theoretical framework beyond a single cognitive domain.

      Weaknesses:

      The methodological and theoretical issues raised in the first round have been thoroughly resolved, and the evidence supporting the claims regarding response variance is convincing.

      There is one remaining theoretical point that warrants discussion to provide a complete picture of the proposed generative model. The manuscript exquisitely models and predicts response variance (imprecision), but it remains largely silent on the closed-form predictions for the mean estimation (i.e., bias). Under the assumption of optimal Bayesian decoding combined with specific encoding schemes (e.g., linear vs. logarithmic), the model implicitly generates mathematical predictions for the subjects' mean estimates. Specifically, varying the scaling exponent (α) and the prior width (w) should systematically alter the predicted bias in different conditions.

      While fitting or explicitly explaining this mean bias is not strictly necessary for the core claims regarding variance scaling, acknowledging what the optimal decoder analytically predicts for the mean estimation-and how it aligns or contrasts with typical empirical observations-would strengthen the theoretical transparency of the paper.

      We thank the reviewers for their attention to our revised manuscript. We are very glad that the reviewers seem satisfied with how we have addressed their concerns. The paper is now stronger than in its first iteration.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have no further requests for the authors, I congratulate the authors on a great paper.

      Reviewer #2 (Recommendations for the authors):

      No further suggestions.

      Reviewer #3 (Recommendations for the authors):

      In the Figure 2b caption, the phrase "from which the numbers of dots are sampled" appears to be a typo carried over from the estimation task. It should likely read "from which the numbers are sampled", as the discrimination task uses Arabic numerals rather than dot arrays.

      We thank the reviewers for their attention to our revised manuscript. We are very glad that the reviewers seem satisfied with how we have addressed their concerns. The paper is now stronger than in its first iteration.

      Reviewer #3 points out that we have focused on the subjects’ response variability, and we did not report the mean estimates. We agree that the reader could reasonably expect to see this. We now include this in Figure 6.

      The subjects exhibit the typical patterns observed in numerosity-estimation task (most notably, the ‘central tendency of judgment’). The dotted line shows the predictions of the best-fitting model (with 𝛼 = 1/2) with the logarithmic encoding, which reproduces the subjects’ main behavioral patterns.

      We have slightly revised the manuscript. The revised version includes this Figure, in Methods (p. 28). We have modified the text of the Methods accordingly (bottom of p. 27), and we now refer to this analysis in the main text (line 6 of p. 5). We have also corrected the typo noted by Reviewer #3 (caption of Fig. 2b).

    1. eLife Assessment

      This valuable study is an approach to integrating and comparing single-cell genomics data across species. The evidence supporting the conclusions of this work is solid, and ANTIPODE presents an updated methodological approach to determining how gene expression at the cell-type level has evolved. Thus, ANTIPODE should provide broad utility to studies of comparative neurogenomics and be of use to neuroscientists and evolutionary biologists.

    2. Reviewer #1 (Public review):

      Summary:

      The integration of single-cell datasets across species is a powerful approach to understanding how cell types and patterns of gene expression have evolved. Current methods to perform such integrations require multiple steps: clustering, the integration itself, and downstream differential expression analysis. In this study, the authors describe a new approach, called ANTIPODE, that combines these steps by integrating deep learning with interpretable decoding and linear modeling. This method builds on previous deep learning approaches to dataset integration, namely SCVI and scANVI, that employ a variational autoencoder to model single-cell RNA-sequencing datasets. However, gene expression estimates from these previous methods are challenging to interpret due to non-linear decoding from the modeled latent space. ANTIPODE seeks to address this issue by using a single-layer decoder coupled to a linear model to estimate patterns of differential expression, e.g. differential expression by coexpression module, across cell types, etc.

      The authors apply their framework to a large single-cell RNA-seq dataset (~1.8M cells) containing cells from the central nervous systems of humans, macaques, and mice spanning in utero developmental time points. They identify a consensus set of cell clusters across each species. They find that ANTIPODE performs at least as well as SCVI in terms of species integration and batch correction. The authors demonstrate several use cases of this integrated approach by analyzing differential expression that correlates with gene structure, the evolution of expression differences in neuropeptide systems, and the anatomical and phylogenetic variation in neurodevelopmental timing.

      Strengths:

      ANTIPODE is a welcome addition to techniques that integrate large single-cell RNA-seq datasets across multiple species. The approach's simultaneous inference of cell clusters, integration manifolds, and differential expression should streamline analysis pipelines whose elements are often disjointed and sometimes work at cross purposes.

      Weaknesses:

      The authors note several limitations to their method that will be targets for future development. First, clustering "resolution" is inferred from the data and cannot be tuned as with other approaches. Second, because of the linear decoding, ANTIPODE does not accommodate combining datasets obtained from different modalities (e.g. single-cell with single-nucleus RNA-seq). Third, as currently implemented, ANTIPODE does not explicitly model phylogenetic relationships. However, the authors describe an extension that could enable this, enhancing the power of multiple species integrations. A weakness with the current manuscript is the organization and readability of the figures. The supplemental figures in particular need to be restructured and reformatted to increase their interpretability.

    3. Reviewer #2 (Public review):

      Summary:

      This work presents ANTIPODE, a bilinear generative model developed for the simultaneous integration and identification of cell types across species and developmental stages using single-cell RNA-seq data. ANTIPODE is inspired by scANVI, a well-established semi-supervised framework for single-cell transcriptomics. After describing its implementation, the authors use ANTIPODE to integrate data from 15 species comprising 1,854,767 cells. Then, the authors benchmark ANTIPODE against commonly used methods (scVI, Harmony, and Scanorama) using two snRNAseq datasets and report comparable or superior performance. They then return to the initial integrated dataset and analyse patterns of gene expression evolution. Finally, they leverage the model to study the "later-is-larger" concept, evaluating the relationship between gene expression, developmental timing and structure size and finding gene expression signatures of this concept.

      Strengths:

      A major strength of the paper is that ANTIPODE employs a bilinear decoding architecture, which produces more interpretable model parameters while performing at least as well as existing, more opaque nonlinear integration approaches.

      The authors demonstrate the utility of ANTIPODE by integrating single-cell mRNA sequencing data from mouse, macaque, and human brains and confirming general principles regarding developmental timing and cell-type-specific gene expression divergence.

      They also propose a conceptually interesting framework for studying gene expression evolution: instead of focusing solely on differentially expressed genes between homologous cell types, they jointly model gene expression across developmental states and species-specific divergence, allowing them to define and analyse four categories of differential expression.

      Finally, the authors' conclusions are well supported by the analyses presented, although these conclusions remain relatively conservative and reinforce already established principles.

      Weaknesses:

      A central weakness of the paper is its limited accessibility to a broad audience. Despite attempting to keep computational details in the supplement, the main text still uses substantial jargon, undermining the goal of providing an intuitive explanation of the model. The figures are also insufficiently annotated (e.g., colour schemes in Figure 2 heatmap, bubble plot details in Figure 3, entropy definition in Figure 3), and the figure legends are overly brief and lack essential information. I strongly recommend that the authors revise both text and figures to improve clarity and readability.

      Similarly, the materials and methods lack a lot of information about the implementation of the model, the statistical tests used, the calculations of entropy, etc.

      The study sits between tool development and biological discovery but does not fully commit to either. As a result, it cannot be evaluated as a full benchmarking study, yet it also does not provide new biological insights that are validated experimentally.

      Finally, the GitHub repository for ANTIPODE is not yet functional and lacks documentation or tutorials, making it impossible to assess usability or reproducibility.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The integration of single-cell datasets across species is a powerful approach to understanding how cell types and patterns of gene expression have evolved. Current methods to perform such integrations require multiple steps: clustering, the integration itself, and downstream differential expression analysis. In this study, the authors describe a new approach, called ANTIPODE, that combines these steps by integrating deep learning with interpretable decoding and linear modeling. This method builds on previous deep learning approaches to dataset integration, namely SCVI and scANVI, that employ a variational autoencoder to model single-cell RNA-sequencing datasets. However, gene expression estimates from these previous methods are challenging to interpret due to non-linear decoding from the modeled latent space. ANTIPODE seeks to address this issue by using a single-layer decoder coupled to a linear model to estimate patterns of differential expression, e.g. differential expression by coexpression module, across cell types, etc.

      The authors apply their framework to a large single-cell RNA-seq dataset (~1.8M cells) containing cells from the central nervous systems of humans, macaques, and mice spanning in utero developmental time points. They identify a consensus set of cell clusters across each species. They find that ANTIPODE performs at least as well as SCVI in terms of species integration and batch correction. The authors demonstrate several use cases of this integrated approach by analyzing differential expression that correlates with gene structure, the evolution of expression differences in neuropeptide systems, and the anatomical and phylogenetic variation in neurodevelopmental timing.

      Strengths:

      ANTIPODE is a welcome addition to techniques that integrate large single-cell RNA-seq datasets across multiple species. The approach's simultaneous inference of cell clusters, integration manifolds, and differential expression should streamline analysis pipelines whose elements are often disjointed and sometimes work at cross purposes.

      Weaknesses:

      The authors note several limitations to their method that will be targets for future development. First, clustering "resolution" is inferred from the data and cannot be tuned as with other approaches. Second, because of the linear decoding, ANTIPODE does not accommodate combining datasets obtained from different modalities (e.g. single-cell with single-nucleus RNA-seq). Third, as currently implemented, ANTIPODE does not explicitly model phylogenetic relationships. However, the authors describe an extension that could enable this, enhancing the power of multiple species integrations. A weakness with the current manuscript is the organization and readability of the figures. The supplemental figures in particular need to be restructured and reformatted to increase their interpretability.

      We thank this reviewer for their positive feedback regarding the utility of the model and how it may simplify challenging evolutionary analysis.

      We acknowledge that the figures are a bit difficult to read, and we will improve annotation and tidiness to make them more accessible to the reader.

      We have implemented changes for an ANTIPODE version 0.2 version which includes regression of gene expression differences on a phylogeny. We have updated the github with this “antipode.phylo” module. For this study, the 3 species case is equivalent for flat or phylogenetic regression, where for example mouse up is equivalent to primate down, so we will do not plan to redo the analyses in the text using this new version.

      We have already provided examples for running ANTIPODE on our own and public datasets (https://github.com/mtvector/scANTIPODE/tree/main/real_examples), as well as in-line documentation of classes and functions, however it is true that these may be insufficient information for new users. We will provide true explanatory tutorials for both to address the reviewer’s concerns. ANTIPODE version 0.1 is currently installable from either github or PyPI.

      Reviewer #2 (Public review):

      Summary:

      This work presents ANTIPODE, a bilinear generative model developed for the simultaneous integration and identification of cell types across species and developmental stages using single-cell RNA-seq data. ANTIPODE is inspired by scANVI, a well-established semi-supervised framework for single-cell transcriptomics. After describing its implementation, the authors use ANTIPODE to integrate data from 15 species comprising 1,854,767 cells. Then, the authors benchmark ANTIPODE against commonly used methods (scVI, Harmony, and Scanorama) using two snRNAseq datasets and report comparable or superior performance. They then return to the initial integrated dataset and analyse patterns of gene expression evolution. Finally, they leverage the model to study the "later-is-larger" concept, evaluating the relationship between gene expression, developmental timing and structure size and finding gene expression signatures of this concept.

      Strengths:

      A major strength of the paper is that ANTIPODE employs a bilinear decoding architecture, which produces more interpretable model parameters while performing at least as well as existing, more opaque nonlinear integration approaches.

      The authors demonstrate the utility of ANTIPODE by integrating single-cell mRNA sequencing data from mouse, macaque, and human brains and confirming general principles regarding developmental timing and cell-type-specific gene expression divergence.

      They also propose a conceptually interesting framework for studying gene expression evolution: instead of focusing solely on differentially expressed genes between homologous cell types, they jointly model gene expression across developmental states and species-specific divergence, allowing them to define and analyse four categories of differential expression.

      Finally, the authors' conclusions are well supported by the analyses presented, although these conclusions remain relatively conservative and reinforce already established principles.

      Weaknesses:

      A central weakness of the paper is its limited accessibility to a broad audience. Despite attempting to keep computational details in the supplement, the main text still uses substantial jargon, undermining the goal of providing an intuitive explanation of the model. The figures are also insufficiently annotated (e.g., colour schemes in Figure 2 heatmap, bubble plot details in Figure 3, entropy definition in Figure 3), and the figure legends are overly brief and lack essential information. I strongly recommend that the authors revise both text and figures to improve clarity and readability.

      Similarly, the materials and methods lack a lot of information about the implementation of the model, the statistical tests used, the calculations of entropy, etc.

      The study sits between tool development and biological discovery but does not fully commit to either. As a result, it cannot be evaluated as a full benchmarking study, yet it also does not provide new biological insights that are validated experimentally.

      Finally, the GitHub repository for ANTIPODE is not yet functional and lacks documentation or tutorials, making it impossible to assess usability or reproducibility.

    1. eLife Assessment

      This manuscript identifies temperature-dependent alternative splicing of PIF4 in Arabidopsis thaliana and shows that heat stress promotes the accumulation of a short exon 5-skipping isoform that is predicted to encode a non-functional protein. This finding is important, and it provides an intriguing new layer of regulation for PIF4; however, the strength of the mechanistic conclusions is limited, and several key conclusions rely on indirect evidence. As a result, while the data robustly demonstrate heat-regulated alternative splicing of PIF4, the causal role of PIF4 isoforms' balance in shaping heat-induced developmental responses remains only partially supported and the strength of the evidence presented is incomplete. This work will be of interest to biologists working on alternative splicing.

    2. Reviewer #1 (Public review):

      This manuscript by Niño-González and collaborators shows that PIF4 undergoes alternative splicing in response to elevated temperature, generating distinct isoforms that may contribute to early seedling responses of Arabidopsis thaliana to heat stress (37 {degree sign}C). This work provides an intriguing perspective on how PIF activity may be modulated under stress conditions.

      The authors report rapid heat-induced changes in seedling morphology, with cotyledon angle and hypocotyl length altered as early as 3 hours after transfer to 37 {degree sign}C. These responses correlate with a transient increase in PIF4 transcript levels, followed by a return to control values at later time points. Notably, heat induces preferential production of an exon 5-skipping isoform of PIF4. The resulting short protein variant (PIF4-S) lacks part of the bHLH domain and is therefore unlikely to be transcriptionally active.

      To explore functional consequences, the authors expressed the exon 5 inclusion (functional) isoform, PIF4-L, in the pif4-101 mutant background. Some heat-induced phenotypes, such as protochlorophyllide accumulation and subsequent photobleaching, were reduced or absent in these lines. Interestingly, pif4-101 mutants themselves largely resemble WT plants for most heat-responsive traits, with the exception of hypocotyl length. PIF4-L expression specifically attenuates the cotyledon angle response to heat, without strongly affecting hypocotyl elongation.

      An important point is that PIF4 itself is not essential for the observed heat responses, as pif4 mutants respond largely like wild-type plants. This implies that the phenotypes described are likely controlled by multiple PIFs acting redundantly. In this context, the generation of the PIF4-S isoform may represent one of several mechanisms by which heat stress reduces overall functional PIF levels, rather than a PIF4-specific regulatory switch.

      Other caveats should be considered when interpreting the work. The functional relevance of the PIF4-S isoform under heat stress is not tested, as heat responses of these transgenic lines were not examined. Transcriptome analysis of heat-stressed WT, pif4-101 mutant, and PIF4-L-expressing plants revealed an enrichment of PIF-regulated genes, supporting a possible role for this family of transcription factors in the heat stress response. Notably, the heat responsiveness of the mutant and of the transgenic lines differs only marginally from that of WT plants. In addition, the study relies primarily on total transcript-level analyses, without quantitative assessment of individual PIF isoforms or direct measurement of PIF protein abundance. Given that other PIFs are also expressed and may be subject to alternative RNA processing, it needs to be determined whether PIF4-S alone could exert a dominant effect, counteracting all the other functional PIFs by itself, under heat stress. Hence, the proposed model is a plausible but still incomplete framework that requires further experimental validation and analysis.

      Altogether, the results presented in this manuscript could also be interpreted as follows: multiple PIFs contribute to the observed phenotypes in response to heat, with overlapping (redundant) functions. Heat stress may reduce functional PIF levels through different mechanisms, one of which is the regulation of alternative splicing, as shown here for PIF4, leading to the production of non-functional proteins or protein variants that could act as negative competitors (such as PIF4-S). Restoring PIF levels to values of control conditions could therefore reverse heat-induced phenotypes, as observed in the PIF4-L expression lines.

      Main concerns:

      (1) The existence of a shorter isoform of PIF4 and PIF6 is relevant, and PIF4 could indeed play a role in the context of heat stress, as it does in thermomorphogenesis. In this sense, the interplay between PIF4-S and PIF4-L might be linked to plant morphological responses to heat; however, the present work requires further investigation to determine whether this is indeed the case. It is important to note that pif4 mutants behave similarly to WT plants, indicating that PIF4 is not necessary for the observed responses. These phenotypes are therefore most likely related to several PIFs rather than to one specific family member. The results obtained with the transgenic lines expressing PIF4-L or PIF4-S support this interpretation, as increasing a functional PIF (PIF4-L) reduces some phenotypes, while expressing a dominant-negative version mimics heat-induced phenotypes under control conditions. Thus, it is reasonable to interpret that under heat stress, functional PIF levels are reduced through multiple mechanisms, alternative splicing and PIF4-S generation being one of them in the case of PIF4, but likely with additional effects on other family members. This clearly requires further study.

      (2) RT-qPCR quantification of total PIF4 transcripts, as well as the long and short isoforms under the tested conditions, is necessary. While we agree with the authors that PIF4-S could act as a dominant-negative factor, demonstrating this requires comparison of phenotypes under heat versus control conditions using the PIF4-S transgenic lines. Importantly, for the authors' hypothesis to be valid, PIF4-S must be able to outcompete other PIFs; therefore, accurate quantification of its expression levels across conditions is crucial. Combining the results shown in Figures 2A and Figure 2G suggests that the levels of the functional PIF4-L isoform are unchanged or even reduced after 3 h of heat treatment, as the increase in total PIF4 does not fully compensate for the diversion toward PIF4-S. Additionally, it would be equally relevant to quantify the expression of other PIFs (or at least those shown in Suppl. Fig. 6) to determine whether PIF4-S could exert such a strong effect even when expressed at relatively low levels. By "proper quantification", we refer specifically to functional protein-coding variants, as in the PIF4-L case. Supplemental Figure 6 shows that PIF3 and PIF5 appear unaffected by heat, while PIF1 expression is increased. However, JBrowse data for dark-grown seedlings indicate that PIF1 is subject to alternative transcription initiation, alternative splicing, and alternative polyadenylation at its 3′ end. A similar situation occurs for PIF3, at least at the 5′ end of the transcriptional unit. Therefore, alternative RNA processing mechanisms may play a key role in modulating functional PIF protein levels in response to heat. Without considering diverted isoforms of other PIFs, the interpretation becomes problematic, as PIF1 is upregulated by heat, and PIF4-S would therefore need to overcome its activity as well. This is particularly relevant given that the cotyledon angle phenotype at 37 {degree sign}C appears even stronger than in the pif1pif3pif5 triple mutant, if such a comparison is feasible.

      (3) In addition, PP2A is a well-established housekeeping gene for normalization across different light regimes, as its expression is not affected by light. However, we are not convinced this holds true under heat stress conditions (see Li et al., Plant Cell 2019 Jul 29;31(10):2353-2369. doi:10.1105/tpc.19.00519).

      (4) Furthermore, the mechanistic conclusions would be strengthened by directly assessing PIF protein levels, for example, by western blot analysis, to determine whether changes in transcript isoform abundance translate into corresponding changes in protein accumulation under heat stress.

      (5) Importantly, the authors' interpretation that "PIF4-L.1 expresses the long isoform at levels similar to those of WT plants (Supplemental Figure 9A), ruling out the possibility that the suppression of heat-induced phenotypes (cotyledon opening and Pchlide accumulation) is due to elevated PIF4 expression levels" is not correct. The RT-qPCR assay quantifies all isoforms containing exon 6, which include both long and short variants with respect to exon 5 inclusion. Since WT plants at 37 {degree sign}C express both isoforms (L/S ≈ 60/40), the PIF4-L lines actually express 2-4-fold higher levels of the functional PIF4 isoform, based on the values shown in the figures.

      (6) Figure 3B should include a statistical analysis, as it appears that PIF4-L expression does not significantly reduce photobleaching. Cotyledon angle is not affected by either the pif4 mutation or PIF4-L expression under 22 {degree sign}C conditions (Figure 3C). However, after 24 h at 37 {degree sign}C, there is a clear effect, with cotyledon angles closer to those observed in WT plants at 22 {degree sign}C. Regarding hypocotyl length, although statistical testing was not performed, it is evident that pif4-101 affects this parameter, while PIF4-L expression in this background does not substantially alter the mutant response.

      Other comments:

      (1) We do not believe that Figure 3E is an optimal way to demonstrate attenuation of transcriptional changes by PIF4-L expression in pif4 mutants. A heat map representation would likely be more direct and informative.<br /> The authors should consider expressing another functional PIF in the pif4 mutant background to determine whether the observed effects are specific to PIF4, as proposed, or whether they reflect a general PIF function.

      (2) It would also be informative to examine the response under Light + 37 {degree sign}C conditions. Since PIF4 mRNA accumulation is induced by light, the authors should test whether plants incubated in light show a similar response to heat or whether it is attenuated. Potential cross-regulation between light and heat responses would be worth exploring.

      (3) As the authors acknowledge in the introduction, most of our knowledge regarding PIFs in temperature signalling has focused on thermomorphogenesis. Therefore, we believe it is important to place these new findings (exon 5 skipping) within that framework, as they could help explain observations made under better-characterized conditions. In addition, would be interesting to see the phenotypes of the pifq mutant under heat stress. Even though this mutant line displays a heat-stress-like phenotype under control conditions, it may still respond to heat treatment. If so, this would indicate that PIFs are not fully determinative of this response.

      (4) The authors should clearly state the genetic background of the PIF4-S expression lines, which appear to be in the pif4-101 background but are not explicitly described as such in the manuscript.

    3. Reviewer #2 (Public review):

      The manuscript "Alternative splicing of PIF4 regulates plant development under heat stress" by Niño-González et al. describes a heat-responsive alternative splicing (AS) event in PIF4 in Arabidopsis and its potential impact on seedling development. The authors observe that etiolated ings exposed to heat respond with a more photomorphogenic developmental behaviour, as reflected, for example, by increased cotyledon opening and reduced hypocotyl elongation. They propose that the AS event in PIF4 may contribute to this response, due to reduced formation of the full-length PIF4 protein and an increase in the shorter PIF4 protein with potentially dominant negative functions.

      Expressing the individual variants in a pif4 mutant background was used to further examine their function. In the case of the full-length PIF4 variant, some of the heat-induced phenotypes were suppressed. For the lines overexpressing the shorter PIF4 variant, heat responses were not examined.

      The authors describe an interesting phenotype and present an appealing model of how AS of PIF4, a well-known key regulator of developmental processes including light- and temperature responses, might be involved. However, I don't think that the authors provide strong evidence for their model, and the unaltered heat response of pif4 mutants argues against a major role of this gene and its AS event under these conditions. Regarding the heat responses, it remains open how distinct those are from thermomorphogenesis.

      Weaknesses:

      (1) In the manuscript, it is emphasized that previous studies on PIFs' role in temperature responses have mainly focused on thermomorphogenesis under high ambient temperature and not under hot temperatures causing heat stress. How do the authors know that the effects they are looking at are specific to hot temperatures and do not also occur at more moderate temperature increases? So, what would PIF4 splicing look like upon a shift from 22{degree sign}C to 28{degree sign}C (instead of 37{degree sign}C as used in the manuscript)?

      (2) The potential role of PIF4 and its AS event in the heat response is the key point of this manuscript, as also reflected by the title. As summarized above, I don't see direct evidence for this and a functional characterization of the AS event is lacking. First, the pif4 mutant doesn't show an altered response, which argues against its requirement under these conditions, and in particular against the proposed model that a shortened version of PIF4 acts in a dominant negative manner. Second, the impact of AS on PIF4 protein levels remains open. Antibodies against PIF4 exist and have been used before, e.g. in Lee et al. (2021), Nat Comm, and Fan et al. (2025), Nat Comm - both studies address the role of PIF4 in thermomorphogenesis and should also be discussed in this manuscript. Detecting PIF4 proteins would allow testing if indeed both PIF4 protein variants are detectable and whether, upon heat stress, the longer variant decreases while the shorter variant increases. This could be expected based on transcript data; however, due to regulation at multiple steps, a correlation between transcript and protein levels might not exist. Third, the transgenic lines expressing either the short or long PIF4 variant do not really reflect the situation in the wild type and might be/are overexpression lines. Specifically, constructs for both variants lack the UTRs according to the description in the method section. Furthermore, is the short version expressed as GFP fusion, as I understood from the method description? The PIF4-L mutants have similar PIF levels as the WT (SFig. 9); however, this refers to total transcripts, which makes a difference in the wild type, in particular under heat stress. Comparing here only the PIF4-L levels would be more informative. Accordingly, the transgenic lines may overexpress PIF4-L compared to the wild type. All the PIF4-S lines show 4 to 5-fold overexpression (again for total transcripts) compared to WT. Including lines with lower overexpression levels would be needed for a direct comparison to the wild type. Moreover, immunoblot analysis of the PIF4 protein would be needed for a direct comparison between the wild type and the two types of mutants.

      (3) Apart from the question of what level of (over)expression the transgenic lines have, several aspects of the phenotyping experiments are not in line with a simple model of PIF4 regulation or have not been addressed. Expressing the long PIF4 variant in the pif4 mutant background suppresses some of the heat-induced changes, but not the hypocotyl shortening, suggesting that the hypocotyl effect is not caused by a heat-induced lack of PIF4.

      When expressing the short variant, the authors observe increased cotyledon opening in darkness, consistent with a suppression of skotomorphogenesis due to a negative function of PIF4-S, at least when it is overexpressed. For hypocotyl length, no consistent difference between wild type and PIF4-S lines was observed: seedlings grown for 3 d in darkness had identical lengths, for 4-d-old seedlings, the PIF4-S lines did not give consistent results: PIF4S.1 (which has highest transgene expression) had same length as wild type; a pronounced difference was only seen for PIF4-S.3, which is the line with lowest expression. Have the experiments been reproduced with independent seed badges? I'm also wondering why the authors haven't performed the heat stress experiments with these PIF4-S lines, as they did for the PIF4-L mutants. According to the authors' model, the PIF4-S lines might show an opposite response compared to the PIF4-L lines, i.e. an even more pronounced heat effect compared to the wild type.

      (4) Why was the heat effect on AS of PIF6 not further analysed? Previous work showed the role of PIF6 in seed development and germination; in line with this, PIF6 expression is particularly high in embryos and seeds, but it is also expressed and alternatively spliced in other tissues and conditions, as shown in Figure 1 and SFigure 2. From the data in Figure 1, it looks like the AS pattern in heat might also be different from other conditions. So, it would be interesting to see how AS of PIF6 changes in the control and heat samples that the authors analysed for PIF4 AS, in particular, if this response is distinct for PIF4 versus PIF6.

      (5) The presentation of the RNA-seq data is incomplete. According to the method section, WT, pif4-101, PIF4-L.1 and PIF4-L.2 seedlings upon 3 h heat/control treatment were analysed. Why are DE and DAS genes and comparisons of different genotypes not shown? The FC data displayed in Figure 2E and the overlap between heat-regulated genes (Fig. 3D; only in WT) and PIF regulation show only some aspects of the data.

    4. Reviewer #3 (Public review):

      Summary:

      PIFs play a pivotal role not only in light and temperature signaling pathways, but in many other signaling pathways regulating plant development by modulating transcription of a large number of genes both directly and indirectly. Similarly, alternative splicing (AS) plays a critical role in shaping the splice isoforms of thousands of genes under different environmental conditions to regulate plant development. In fact, AS of PIF6 has been shown to be involved in seed development. PIF4 is a central transcription factor integrating light and temperature signaling pathways. However, AS of PIF4 has not been involved in any pathways. This story first describes how AS of PIF4 is regulated by heat stress, and this regulation is involved in heat stress signaling to regulate plant development. This is an important finding of general interest.

      Strengths:

      The authors first describe AS of PIF4 is regulated by heat stress, and this regulation is involved in heat stress signaling to regulate plant development.

      Weaknesses:

      There are many loose ends in this story that need to be tied up.

      Major points:

      (1) The authors are showing only the AS transcripts by PCR, but no protein data. Given that the hypothesis is that the short form of PIF4 is functioning in a dominant negative fashion, the authors need to show that this short isoform expresses a protein. In addition, they need to show that this form is functioning in a dominant negative fashion with other PIFs, either by showing that this form reduces the DNA binding and/or transcriptional responses of other PIFs.

      (2) The two mutant alleles used for this study (pif4-100 and pif4-2) have T-DNA insertion after the AS exon. Do these alleles express any short version of the protein? The previous studies showed no protein production, and thus, they may not function as a dominant negative form. Usually, the T-DNA insertion alleles may express truncated transcripts, but many do not express any protein due to a lack of stop codon and/or degradation of the transcripts. But in this case, the mutants are behaving like WT. The authors need to show that these alleles are expressing a truncated version of the PIF4 protein.

      (3) Figure 4 shows phenotypes of independent lines expressing the PIF4 short version. The authors analyzed only the cotyledon and hypocotyl phenotypes, but not Pchlide or bleaching assays. The authors need to do a thorough phenotype analysis, including heat-stress phenotypes of these lines, to test if the data make sense with their hypothesis.

    5. Author response:

      We would like to thank the Editor and the three Reviewers for their detailed assessment of our manuscript and their constructive feedback. We found the suggestions valuable for refining our work. Before presenting the fully updated manuscript, we would like to clarify a few points in this initial response. This manuscript identifies a heat-induced, alternativelyspliced short isoform of PIF4 (PIF4-S) that contributes to the physiological responses observed in heat-stressed etiolated seedlings. First, we agree with all Reviewers that including PIF4 protein data will strengthen our findings an more definitely demonstrate the generation of a protein-coding alternative isoform under heat stress. Therefore, this will be one of our main priorities in the revision. Evidence for the functionality of this alternative isoform is clearly demonstrated by the distinct phenotypes exhibited by transgenic lines expressing either the long or the short versions of PIF4. Nevertheless, we agree that a more comprehensive characterization of these lines, as well as of the pif4 mutant lines, will further strengthen the demonstration of the functional relevance of this alternative splicing event. In addition, we will extend the phenotypic analysis of the PIF4-S lines to heat stress conditions. Importantly, the phenotypes observed in these lines suggest that additional molecular mechanisms may act in parallel with this alternative splicing event to regulate development in heat-stressed etiolated seedlings. As proposed by Reviewer #1, other PIFs may be involved in this response, and we will address this possibility. We will also provide new experimental data to show that alternative splicing in this gene is specific to heat stress and does not occur in other PIFs. Finally, we would like to clarify that the main scope of this manuscript is to demonstrate the functional relevance of the alternative isoform generated by splicing in PIF4 under heat stress. A detailed investigation of its molecular mode of action is beyond the scope of the present study. We sincerely appreciate the thoughtful feedback provided by all Reviewers. We will carefully consider their suggestions and use them to guide the inclusion of additional experiments and analyses in our revised manuscript to reinforce and clarify our conclusions.

    1. eLife Assessment

      The revised manuscript by Shukla et al. provides important mechanistic insights into kinesin-1 autoinhibition and cargo-mediated activation. Through a convincing integration of protein engineering, computational modeling, biophysical assays, HDX-MS, and electron microscopy, the study delineates how cargo binding induces an allosteric transition that propagates along the coiled-coil stalk to the motor domains, enhancing MAP7 engagement. The revisions substantially improve clarity, figure annotation, and methodological transparency, leaving the remaining limitations, primarily those inherent to conformational heterogeneity and resolution, appropriately acknowledged. Overall, the updated manuscript presents a coherent mechanism for kinesin-1 activation that will be of broad interest to the motor protein, structural biology, and cell biology communities.

    2. Reviewer #1 (Public review):

      The authors aim to interrogate the sets of intramolecular interactions that cause kinesin-1 hetero-tetramer autoinhibition and the mechanism by which cargo interactions via the light chain tetratricopeptide repeat domains can initiate motor activation. The molecular mechanisms of kinesin regulation remain a key question with respect to intracellular transport and this study adds important perspectives to our understanding. It has implications for the accuracy and efficiency of motor transport by different motor families, for example the direction of cargos in one or other direction on microtubules.

      The authors focus on the response of inactivated kinesin-1 to peptides found in cargos and the cascade of conformational changes that are induced. They also test the effects of the known activator of kinesin-1 - MAP7 - in the context of their model. The study benefits from multiple complementary, albeit relatively low-resolution, methods - structural prediction using AlphaFold3, 2D and 3D analysis of (mainly negative stain) TEM images of several engineered kinesin constructs, biophysical characterisation of the complexes, peptide design, hydrogen/deuterium-exchange mass spectrometry and simple cell-based imaging. Each set of experiments is carefully designed and the intrinsic limitations of each method are offset by other approaches, such that the assembled data convincingly supports the authors' regulatory model of kinesin activation.

      This study benefits from prior work by the authors on this system and the tools and constructs they previously accrued, as well as from other recent contributions to the field. This work will be of broad interest to cell and structural biologists, especially those seeking to tackle small and flexible macromolecular complexes, as well as biophysicists and those interested in protein engineering.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Shukla, Cross, Kish, and colleagues investigate how binding of a cargo-adaptor mimic (KinTag) to the TPR domains of the kinesin-1 light chain, or disruption of the TPR docking site (TDS) on the kinesin-1 heavy chain, triggers release of the TPR domains from the holoenzyme. This dislocation provides a plausible mechanism for transition out of the autoinhibited lambda-particle toward the open and active conformation of kinesin-1. Using a combination of negative-stain electron microscopy, AlphaFold modeling, biochemical assays, hydrogen-deuterium exchange mass spectrometry (HDX-MS) and other methods, the authors show how TPR undocking propagates conformational changes through the coiled-coil stalk to the motor domains, increasing their mobility, and enhances interactions with the microtubule-bound cofactor MAP7. Together, they propose a model in which the TDS on CC1 of the heavy chain forms a "shoulder" in the compact, autoinhibited state. Cargo-adaptor binding, mimicked here by KinTag, dislodges this shoulder, liberating the motor domains and promoting MAP7 association, driving kinesin-1 activation.

      Strengths:

      Throughout the study, the authors use clever construct design - e.g. delta-Elbow, ElbowLock, CC-Di and the high-affinity KinTag - to test specific mechanisms by directly perturbing structural contacts or effecting interactions. The proposed mechanism of releasing autoinhibition via adaptor-induced TPR undocking is also interrogated with a number of complementary techniques that converge on a convincing model for activation that can be further tested in future studies.

      Weaknesses:

      These reflect limits of what the current data can establish rather than flaws in execution. It remains to be tested if the open state of kinesin-1 initiated by TPR undocking is indeed an active state of kinesin-1 capable of processive movement and/or cargo transport. It also remains to be determined what the mechanism of motor domain undocking from the autoinhibited conformation is. But this important study provides the groundwork for testing these open questions.

      Comments on revisions:

      My original minor concerns have been addressed in the revision.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Shukla and colleagues presents a comprehensive study that addresses a central question in kinesin-1 regulation-how cargo binding to the kinesin light chain (KLC) tetratricopeptide repeat (TPR) domains triggers activation of full-length kinesin-1 (KHC). The authors combine AlphaFold3 modeling, biophysical analysis (fluorescence polarization, hydrogen-deuterium exchange), and electron microscopy to derive a mechanistic model in which the KLC-TPR domains dock onto coiled-coil 1 (CC1) of the KHC to form the "TPR shoulder," stabilizing the autoinhibited (λ-particle) conformation. Binding of a W/Y-acidic cargo motif (KinTag) or deletion of the CC1 docking site (TDS) dislocates this shoulder, liberating the motor domains and enhancing accessibility to cofactors such as MAP7. The results link cargo recognition to allosteric structural transitions and present a unified model of kinesin-1 activation. I recommend acceptance of the manuscript subject to the following additions:

      Strengths:

      (1) The study addresses a fundamental and long-standing question in kinesin-1 regulation using a multidisciplinary approach that combines structural modeling, quantitative biophysics, and electron microscopy.

      (2) The mechanistic model linking cargo-induced dislocation of the TPR shoulder to activation of the motor complex is well supported by both structural and biochemical evidence.

      (3) The authors employ elegant protein-engineering strategies (e.g., ElbowLock and ΔTDS constructs) that enable direct testing of model predictions, providing clear mechanistic insight rather than purely correlative data.

      (4) The data are internally consistent and align well with previous studies on kinesin-1 regulation and MAP7-mediated activation, strengthening the overall conclusion.

      Weaknesses:

      (1) While the EM and HDX-MS analyses are informative, the conformational heterogeneity of the complex limits structural resolution, making some aspects of the model (e.g., stoichiometry or symmetry of TPR docking) indirect rather than directly visualized.

      (2) The dynamics of KLC-TPR docking and undocking remain incompletely defined; it is unclear whether both TPR domains engage CC1 simultaneously or in an alternating fashion.

      (3) The interplay between cargo adaptors and MAP7 is discussed but not experimentally explored, leaving open questions about the sequence and exclusivity of their interactions with CC1.

      Comments on revisions:

      The authors have addressed my comments satisfactorily.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The manuscript by Shukla et al. provides important mechanistic insights into kinesin-1 autoinhibition and cargo-mediated activation. Using a convincing combination of protein engineering, computational modeling, biophysical assays, HDX-MS, and electron microscopy, the authors reveal how cargo binding induces an allosteric transition that propagates to the motor domains and enhances MAP7 binding. Despite limitations arising from conformational heterogeneity and structural resolution, the study presents a unified mechanism for kinesin-1 activation that will be of broad interest to the motor protein, structural biology, and cell biology communities.

      We are grateful for the time and effort from the reviewers and editors in providing fair and constructive comments that have helped to improve the manuscript. Our point-by-point response is provided below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to interrogate the sets of intramolecular interactions that cause kinesin-1 hetero-tetramer autoinhibition and the mechanism by which cargo interactions via the light chain tetratricopeptide repeat domains can initiate motor activation. The molecular mechanisms of kinesin regulation remain an important question with respect to intracellular transport. It has implications for the accuracy and efficiency of motor transport by different motor families, for example, the direction of cargos towards one or other microtubules.

      Strengths:

      The authors focus on the response of inactivated kinesin-1 to peptides found in cargos and the cascade of conformational changes that occur. They also test the effects of the known activator of kinesin-1 - MAP7 - in the context of their model. The study benefits from multiple complementary methods - structural prediction using AlphaFold3, 2D and 3D analysis of (mainly negative stain) TEM images of several engineered kinesin constructs, biophysical characterisation of the complexes, peptide design, hydrogen/deuterium-exchange mass spectrometry, and simple cell-based imaging. Each set of experiments is thoughtfully designed, and the intrinsic limitations of each method are offset by other approaches such that the assembled data convincingly support the authors' conclusions. This study benefits from prior work by the authors on this system and the tools and constructs they previously accrued, as well as from other recent contributions to the field.

      Weaknesses:

      It is not always straightforward to follow the design logic of a particular set of experiments, with the result that the internal consistency of the data appears unconvincing in places.

      For example, i) the Figure 1 AlphaFold3 models do not include motor domains whereas the nearly all of the rest of the data involve constructs with the motor domains;

      We appreciate the reviewer’s comment regarding the absence of the motor domains in the AlphaFold3 models shown in Figure 1. These domains were intentionally excluded to improve visual clarity and to better highlight the interaction between the TPR domains and CC1 in the inhibited kinesin-1 conformation. We felt that this simplified presentation in the main figure helps readers focus on the key mechanistic advance introduced in this work at the outset of the paper. For completeness, we have provided full-length kinesin-1 AlphaFold3 models that include the motor domains in the Supplementary Information (Fig. S1), and they are described in detail in the main text. In addition, we have added a note to the Figure 1 legend to explicitly direct readers to these full-length models.

      ii) the kinesin constructs are chemically cross-linked prior to TEM sample preparation - this is clear in the Methods but should be included in the Results text, together with some discussion of how this might influence consistency with other methods where crosslinking was not used.

      Thank you. Chemical crosslinking is typically important for obtaining high-quality negative-stain TEM grids of kinesin-1 complexes and has been employed in all prior EM studies by our group and others. While this was described in the Methods, we agree that it should also be stated explicitly in the Results. Accordingly, we have added a sentence to the Results section noting that the proteins were stabilized using the amine-to-amine crosslinker BS3 (“Proteins were also stabilised using the amine-to-amine crosslinker BS3 that was important for achieving reproducibly high-quality samples for imaging.”).

      Please see point below for acknowledgement of risks of using crosslinker.

      Can those cross-links themselves be used to probe the intramolecular interactions in the molecular populations by mass spec?

      We had considered this, however, cross-linking mass spectrometry (XL-MS) has been applied extensively to essentially identical kinesin-1 complexes by Tan et al. (eLife 2023). That work provided important insights into the overall architecture of the complex, including the new head–CC1 interactions. However, as fully acknowledged by the authors, significant ambiguity remained with respect to the positioning of the TPR domains, with many cross-links that could not be straightforwardly rationalized in a single model. These unresolved aspects provided part of the motivation for the present study, as highlighted in the Introduction.

      We believe that this ambiguity likely reflects an underlying conformational equilibrium of the kinesin-1 complex (e.g. opening/closing transitions) and/or dynamic docking and undocking of the TPR domains, and lysine-rich features of the TPR domains (most notably the loops that connect the TPR alpha helices) which may make them prone to lock in non-native states, which limits the interpretability of static cross-linking data in this system. In this context therefore, we feel that XL-MS has already been thoroughly explored for kinesin-1 and that its practical limitations in resolving these TPR interactions have been reached.

      This consideration was a primary motivation for pursuing cross-linker-free, solution-based approaches, particularly HDX-MS, which we argue provide the most relevant new insights into the assembly and conformational dynamics of the complex. To make this rationale clearer, we have added an explicit note in the HDX-MS section emphasizing that this is a cross-linker-free method. The added text reads:

      “To determine how the local structural changes from adaptor binding and shoulder dislocation affected the dynamics of kinesin-1 complexes in solution, as directly and least invasively as possible, and without the risk of cross-linker artefacts.”

      In general, the information content of some of the figure panels can also be improved with more annotations (e.g. angular relationship between views in Figure 1B, approximate interpretations of the various blobs in Fig 3F, and more thought given to what the reader should extract from the representative micrographs in several figures - inclusion of the raw data is welcome but extraction and magnification of exemplar particles (as is done more effectively in Fig S5) could convey more useful information elsewhere.

      We appreciate these suggestions. We have modified the figures throughout the manuscript in line with the reviewer’s points. Raw data is now provided at higher magnification throughout so the reader can better distinguish individual particles, angular relationships have been added and further annotations provided on 2D class averages. We do not want the reader to draw too many conclusions from images of single closed particles (with the exception of open vs closed in Fig S7) as these require averaging and 2D classification to obtain meaningful insights, and so we have not added zoom panels in these cases. Figure 3F has been annotated as requested.

      Reviewer #2 (Public review):

      Summary:

      In this paper, Shukla, Cross, Kish, and colleagues investigate how binding of a cargo-adaptor mimic (KinTag) to the TPR domains of the kinesin-1 light chain, or disruption of the TPR docking site (TDS) on the kinesin-1 heavy chain, triggers release of the TPR domains from the holoenzyme. This dislocation provides a plausible mechanism for transition out of the autoinhibited lambda-particle toward the open and active conformation of kinesin-1. Using a combination of negative-stain electron microscopy, AlphaFold modeling, biochemical assays, hydrogen-deuterium exchange mass spectrometry (HDX-MS), and other methods, the authors show how TPR undocking propagates conformational changes through the coiled-coil stalk to the motor domains, increasing their mobility and enhancing interactions with the microtubule-bound cofactor MAP7. Together, they propose a model in which the TDS on CC1 of the heavy chain forms a "shoulder" in the compact, autoinhibited state. Cargo-adaptor binding, mimicked here by KinTag, dislodges this shoulder, liberating the motor domains and promoting MAP7 association, driving kinesin-1 activation.

      Strengths:

      Throughout the study, the authors use a clever construct design - e.g., delta-Elbow, ElbowLock, CC-Di, and the high-affinity KinTag - to test specific mechanisms by directly perturbing structural contacts or affecting interactions. The proposed mechanism of releasing autoinhibition via adaptor-induced TPR undocking is also interrogated with a number of complementary techniques that converge on a convincing model for activation that can be further tested in future studies. The paper is well-written and easy to follow, though some more attention to figure labels and legends would improve the manuscript (detailed in recommendations for the authors).

      Weaknesses:

      These reflect limits of what the current data can establish rather than flaws in execution. It remains to be tested if the open state of kinesin-1 initiated by TPR undocking is indeed an active state of kinesin-1 capable of processive movement and/or cargo transport. It also remains to be determined what the mechanism of motor domain undocking from the autoinhibited conformation is, and perhaps this could have been explored more here. The authors have shown by HDX-MS that the motor domains become more mobile on KinTag binding, but perhaps molecular dynamics would also be useful for modelling how that might occur.

      We are grateful for the reviewer’s comments. We agree that the weaknesses the reviewer has outlined define the limitations of the study and establish important priorities for future work, that includes molecular dynamics simulations. An important prerequisite for the latter is a starting model that one has confidence in. We think that our study and earlier work now provide a good experimentally supported foundation for using AF3 generated assemblies for this purpose, by ourselves and others.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Shukla and colleagues presents a comprehensive study that addresses a central question in kinesin-1 regulation - how cargo binding to the kinesin light chain (KLC) tetratricopeptide repeat (TPR) domains triggers activation of full-length kinesin-1 (KHC). The authors combine AlphaFold3 modeling, biophysical analysis (fluorescence polarization, hydrogen-deuterium exchange), and electron microscopy to derive a mechanistic model in which the KLC-TPR domains dock onto coiled-coil 1 (CC1) of the KHC to form the "TPR shoulder," stabilizing the autoinhibited (λ-particle) conformation. Binding of a W/Y-acidic cargo motif (KinTag) or deletion of the CC1 docking site (TDS) dislocates this shoulder, liberating the motor domains and enhancing accessibility to cofactors such as MAP7. The results link cargo recognition to allosteric structural transitions and present a unified model of kinesin-1 activation.

      Strengths:

      (1) The study addresses a fundamental and long-standing question in kinesin-1 regulation using a multidisciplinary approach that combines structural modeling, quantitative biophysics, and electron microscopy.

      (2) The mechanistic model linking cargo-induced dislocation of the TPR shoulder to activation of the motor complex is well supported by both structural and biochemical evidence.

      (3) The authors employ elegant protein-engineering strategies (e.g., ElbowLock and ΔTDS constructs) that enable direct testing of model predictions, providing clear mechanistic insight rather than purely correlative data.

      (4) The data are internally consistent and align well with previous studies on kinesin-1 regulation and MAP7-mediated activation, strengthening the overall conclusion.

      Weaknesses:

      (1) While the EM and HDX-MS analyses are informative, the conformational heterogeneity of the complex limits structural resolution, making some aspects of the model (e.g., stoichiometry or symmetry of TPR docking) indirect rather than directly visualized.

      We agree with the reviewers point. Conformational heterogeneity is a significant challenge, and the model has been developed from multiple complementary approaches. A higher resolution cryoEM study remains a priority, but is challenging because of the size, shape and flexibility of the particle, but we hope that some the approaches used here (e.g. nanobody TPR stabilisation, ElbowLock) will provide a path to achieve this.

      (2) The dynamics of KLC-TPR docking and undocking remain incompletely defined; it is unclear whether both TPR domains engage CC1 simultaneously or in an alternating fashion.

      We agree that this is a limitation. We strongly suspect that the TPR domains dynamic and are working to overcome experimental challenges to resolve this important outstanding question. We have expanded the discussion section to better highlight this important priority.

      (3) The interplay between cargo adaptors and MAP7 is discussed but not experimentally explored, leaving open questions about the sequence and exclusivity of their interactions with CC1.

      We agree that this is a limitation but will be an important priority for future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are a number of places where the text could be more precise or clear, or the figures could be designed to be more informative:

      (1) The word "unitarily" is used in several places, and I don't know what it means in this context.

      We have changed the phrasing throughout the manuscript to this term. We were attempting to contrast with presumed cooperative multivalent interactions in the context of the kinesin-1 tetramer but agree that this choice of word doesn’t quite achieve that.

      (2) On page 5 the phrase "We focused on the ElbowLock background" is introduced and needs to be explained more clearly.

      Thank you. We have amended the text to read “This KIF5C construct contains a short 5 amino acid deletion that restricts flexibility around the elbow and helps maintain particles in their lambda conformation, providing homogenous samples, and facilitating subsequent analysis (34).”

      (3) On page 6, the phrase "To improve the resolution of our images, we turned to single-particle cryoEM analysis" is imprecise - what do the authors mean by the resolution of the images? Cryo-EM data does not always guarantee a higher resolution structure, but it offers the possibility of visualising finer structural features. This is probably what is meant here, but needs to be stated more precisely.

      We have amended the text to ‘visualise finer structural details’ as suggested.

      (4) Page 7 - "suggesting that TPR domains had loosely dissociated from the core" - I don't think the evidence points to dissociation of KLCs from the complex, but the phrase "loosely dissociated" implies this - would benefit from rephrasing.

      We have changed this to ‘undocked’ for consistency with other descriptions in the manuscript.

      (5) Was the effect of the CC-Di insertion (ΔTDS) detectable by AlphaFold prediction? It would be interesting to include this, partly for completeness and partly because a slightly imperfect and maybe a more dynamic coiled-coil in this region of the molecule may be important in supporting the conformational changes required for activation.

      Thank you for this suggestion. Modelling of deltaTDS complex indeed shows displacement of the TPR domains. In the standard 5 output models, the TPR domains now occupy a variety of different positions, all with essentially zero confidence (high position error). Consistent with biochemical data, the CCDi insertion is modelled with with no overall disruption to the architecture or length of CC1 as expected. We think that this is a valuable addition to the study and have included it as a new supplementary figure (Fig S5), with main text reading.

      …. “Supporting this, models of ΔTDS complexes using AF3 showed the expected seamless insertion of CCDi into CC1, with displacement of the TPR domains to a variety of different positions, in 5 models, all with high position error with respect to KHC (Fig S5).”

      (6) Figure S1 has two sections designated (C) in the legend.

      Corrected

      (7) Figure S3 - given the resolution and level of interpretation of the 3D reconstructions, it is not relevant to include an FSC curve, but other standard information, such as angular distribution and any evidence of variability from 3D classifications (and how many particles per 3D class) should be included for all structures.

      Thank you, a complete workflow for all complexes has now been provided in Figure S8 with the information requested. In each case there were typically two ‘good’ classes. For ElbowLock, this included one without a prominent shoulder, consistent with 2D classification and quantification. We assume this may reflect a docking/undocking equilibrium. For the deltaTDS and KinTag particles, neither class showed the shoulder feature. The main text has been modified to reflect this and reads “For ElbowLock complexes, this resulted in classes with and without a prominent shoulder, in agreement with 2D classification. For ElbowLock-ΔTDS and ElbowLock-KinTag complexes, no prominent shoulder containing classes were observed.”

      Reviewer #2 (Recommendations for the authors):

      Overall, the figures would benefit from more labels for clarity, some examples and suggestions below:

      (1) Figure 1A - Connect motors to the rest of the structure e.g., wiggly lines.

      Corrected.

      (2) Figure 1B - Add arrows and angles to indicate different views of the model.

      Corrected.

      (3) Figure 1B - Label TPR1-6 (e.g., inset zoom in).

      Corrected.

      (4) Figure 2D and 3D - Label the lack of a shoulder in all averages (perhaps with an arrow instead of a circle to not obscure density), include an example average which shows prominent shoulder density.

      Corrected. Full sets of classes showing shoulder like features for deltaTDS and KinTag complexes are now shown in Figure S4.

      (5) Figure 3D: Label motor domains and elbow as in other figures.

      Corrected.

      (6) Methods: Include more information on how EM classes were compared to AF projections (e.g., Figure 1D). Was this done visually or computationally? Likewise, more information is needed on how classes were judged to have prominent/weak shoulder density (Figure 2D). In the figure legend, there is a statement that "Full sets of classes are provided in Fig. S4" but this is absent in the supplement.

      Thank you. This information has been added to the methods.

      “For comparison to the AF3 model, simulated density was generated using the molmap command in ChimeraX (73) filtering to 15 Å, and projections were generated/selected automatically using the Reference Based Auto Selected 2D function in CryoSPARC”.

      Full sets of classes are now provided in Figure S4.

      (7) Figure 1-3 - Raw micrographs are a very useful inclusion but would benefit from being a more zoomed-in view (e.g., Figure S5 scale). Particularly useful for 3C, where the mixture of open and closed would be good to see.

      Higher zoom micrographs have been provided throughout.

      (8) Figure 5D: Panels too small to see the result, suggest making full width and moving E below.

      Thank you. We have expanded the panel and moved the model to a new Figure 6.

      (9) Figure S1: PAE plot convincing, but pLDDT colour models needed.

      A representative model coloured for pLDDT has been added to Figure S1. Most of the structure sits within the light blue confident range (90 > pLDDT > 70) with the exception of the disordered regions and neck coil.

      (10) Figure 5B: Reason for the variable inputs?

      The reviewer raises an interesting point. The slightly reduced expression of deltaElbow and slightly increased expression of ElbowLock is a consistent feature of these experiments. We note that this effect is in the ‘opposite direction’ to the impact on binding to MAP7 and so does not affect our conclusions from the experiment. However, we wonder whether opening and closing of the complex may impact on turnover of kinesin proteins, which could have implications for their normal homeostasis and possible degradation after transport in polarised cells. We are considering how to explore this going forwards. We have added a note to the results section to highlight this interesting observation to the reader.

      “We also noted slightly elevated expression of ElbowLock complexes and slightly lower expression of DeltaElbow complexes, suggesting that opening/closing of the complex could impact on kinesin-1 turnover”

      (11) Figure legend 5B: Insufficient detail, the end result is stated, but the three separate gels are not described.

      Legend has been expanded.

      (12) Figure 3F: Currently somewhat problematic. It is unclear if the models are in the same view, and so comparison is difficult. Figure 1C (bottom right) shows class averages with a clear, separate CC density, so the relatively featureless model in this region is puzzling. A statement on how the three model views are related to each other, if aligned with each other, would be useful.

      We appreciate the reviewers point. Models were aligned in Chimera, using the fit in map command. Because of the limited features of the models presumably due to flexibility, achieving a good alignment for all three models was challenging, but we think that showing the 180-degree rotations is probably about the best we can achieve here.

      (13) The following statement is too strong: "Nonetheless, we obtained reference-free 2D class averages that appeared to show full-length 'side' views of the complex with clear definition of the elbow, hinge 2, and KHC-KLC (coiled-coil) interface features which enabled us to identify CC1 confidently (Fig. 1D)". Given that the negative-stain EM data were collected primarily to validate the AlphaFold model, the assignment of CC1 should be described as consistent with rather than confidently identified from the class averages. The resolution of the EM data does not independently support such an assignment, and the wording needs to be softened.

      We appreciate the reviewer’s point, we have softened the wording as suggested. The paragraph now reads.

      “To visualise finer structural details, we turned to single-particle cryoEM analysis of frozen-hydrated samples. We were unable to obtain optimal samples suitable for determining the complete structure. Nonetheless, we obtained reference-free 2D class averages that appeared to show full-length ‘side’ views of the complex with clear definition of the elbow, hinge 2, and KHC-KLC (coiled-coil) interface features (Fig. 1D). The motor domains were poorly resolved in these classes, suggesting that the head assembly is somewhat flexible relative to the coiled coil/TPR body. A comparison to low-pass filtered back-projections from the AF3 model (without motor domains) revealed density at a position concurrent with the docked TPR domains (Fig. 1D).”

      (14) There is a typo in the figure legend of Figure 3 - (E) and (F) should be (F) and (G).

      Corrected

      Reviewer #3 (Recommendations for the authors):

      I recommend the following additions:

      (1) Figure 1 labeling - In panel A, please label the "linker domain" and the "KLC subunits" explicitly to help orient the reader. In panel B, please mark the "TPR shoulder" corresponding to the docked TPR domains on CC1; this will help the reader connect parts B and C.

      Thank you, we have modified Figure 1A with this additional information.

      (2) The TPR docking site (TDS) is a central structural element, and its sequence boundaries are provided in the Methods. It would help to visualize this directly in Figure 2A or in an inset.

      We hope that the reviewer agrees that the zoomed in model in Figure 5A (alongside MAP7) provides a sufficiently detailed view of the structural interface to highlight the orientation of TPR1 with respect to CC1. The side chain contacts in the model are very plausible and confidently predicted (and can be straightforwardly reproduced in AF3 using the sequence information provided in the methods), but as our study has not explored this interaction at the single residue level, we would prefer not to imply this to the reader at this stage.

      (3) The authors' model of cargo-induced TPR dislocation is convincing. However, the Discussion could benefit from a clarification on whether both KLC-TPR domains are expected to be bound simultaneously or if a dynamic exchange occurs, as the EM data suggest potential asymmetry.

      Thank you, please see point 5 below where we have modified the discussion to reflect the reviewer’s thoughtful comments.

      (4) The HDX-MS analysis is comprehensive, but the authors may want to briefly comment on the coverage of low-signal regions (especially within CC2-CC3) to enhance clarity.

      We have added an additional supplementary figure (S10) showing sequence coverage. Overall, this is 88% but with some lower coverage around KHC-CC0 (neck) and the acidic linker that connects the KLC coiled-coil to the TPR. We have added a note to the main text to reflect this.

      “Sequence coverage was high (overall 88%) with the exception of KHC-CC0 (neck coil) and the acidic-linker region that connects the KLC coiled-coil to the TPR domains where coverage was lower”

      (5) In the Discussion, the proposed interplay between MAP7 and cargo adaptors is intriguing, especially considering the results from Anna Akhmanova's lab showing that MAP7 activates kinesin-1 processivity. Do the authors suggest that competition for CC1 is mutually exclusive or sequential? The answer has mechanistic implications.

      We have been considering questions for some time, and the short answer is that we don’t fully understand the dynamics yet. However, we appreciate the reviewer’s prompt to clarify our thinking on this. We have attempted to do this in a revised discussion section where we more explicitly outline these outstanding questions.

    1. eLife assessment

      This manuscript provides an important contribution to the field of platelet biogenesis, and the convincing evidence will advance our understanding of signal transduction driving the development of late megakaryopoiesis and platelet reactivity that results in bleeding diathesis. The paper is noteworthy for analyzing two related, either singly or in combination, tyrosine phosphatases in this conditional, stage development gene knockout. Because SHP1 is a negative regulator and SHP2 is an activator, the synergistic effects found in the double knockout were surprising.

    2. Reviewer #1 (Public review):

      Barré et al. investigated the role of Shp1 and Shp2 in megakaryocytes (MKs) and platelets by conditional knock-out of Shp1, Shp2, or both under the control of the Gp1ba promoter. Deletion of Shp1 and Shp2 in MKs and platelets was almost complete. The Shp1/Shp2 double knock-out mice displayed macrothrombocytopenia and increased bleeding, whereas the single knock-outs did not show significant defects. Platelet function was aberrant in DKOs, but not in single knock-outs, and so was ligand-induced signaling, particularly Syk phosphorylation.

      Megakaryocyte maturation was impaired in Shp1/Shp2 DKO mice. Ligand-induced signaling was impaired in Shp2 knock-out and DKO. Ex vivo formation of platelets and in vivo maturation of MKs were impaired in DKO mice. Pharmacological inhibitors of Shp1 and Shp2 had largely similar effects as observed in the single knock-outs. The authors conclude that Shp1 and Shp2 have synergistic functions in the MK/platelet lineage, and that Shp2 may be a potential therapeutic target in myeloproliferative neoplasms.

      Strengths:

      The data clearly show effects of the Shp1/Shp2 double knock-out on MKs and platelets.

      Weaknesses:

      There appears to be a discrepancy between the results with the Shp2 single knock-out and the Shp2 inhibitor: the Shp2 knock-out does not affect MKs and platelets, except Erk1/2 signaling, whereas the Shp2 inhibitors appear to affect MK function.

      This work is interesting and may have potential from a therapeutic point of view.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Barré et al. investigate the roles of the phosphatases Shp1 and Shp2 in the megakaryocyte and platelet lineage using genetic depletion in mice. By employing Gp1ba-Cre-based models, the study builds on the authors' previous work and addresses some limitations associated with earlier Pf4-Cre approaches. The authors report relatively mild alterations in megakaryocyte and platelet parameters in mice lacking either Shp1 or Shp2 alone, whereas combined deletion of both phosphatases results in macrothrombocytopenia, mild bleeding, and impaired GPVI-dependent platelet aggregation accompanied by reduced Syk phosphorylation. The functional platelet defects are linked to reduced expression of GPVI and integrin α2, while thrombocytopenia is associated with impaired megakaryocyte maturation, reduced ploidy, defective proplatelet formation, and altered TPO-dependent Ras/MAPK signaling. Similar effects on megakaryopoiesis are also observed in vitro following treatment with newly developed Shp2 inhibitors.

      Strengths and Weaknesses:

      The study addresses an important biological question and presents a substantial dataset that could contribute to a better understanding of Shp1 and Shp2 function in platelet biology. However, several aspects of data presentation and interpretation would benefit from additional clarification. In particular, while the authors conclude that single genetic deletion or pharmacological inhibition of Shp1 has a limited impact and that the major phenotypes are specific to combined Shp1/2 deletion or Shp2 inhibition, some of the data suggest more nuanced effects that may warrant further discussion.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Barré et al utilize the Gp1ba-Cre transgenic mouse model to build upon previous findings in a Pf4-Cre system to investigate the effects of individual and combined Shp1 and Shp2 deletion in megakaryocytes and platelets. They report decreased megakaryocyte maturation, macrothrombocytopenia, and increased bleeding primarily in association with the Shp1/Shp2 double-knockout condition. The authors further show that this phenotype appears to be driven primarily by Shp2 and implicate dysregulation of Mpl signaling and downstream Ras/MAPK pathways, including ERK1/2. Given the key role of these pathways in human diseases such as myeloproliferative neoplasms and the challenges associated with modulating such a central pathway, identification of a specific regulator of Mpl signaling poses intriguing questions for future studies on clinical applicability.

      Strengths:

      Overall, the experiments combine in vitro, in vivo, and ex vivo approaches and appear to have been carefully designed and carried out, with multiple technical and biological replicates where relevant. The authors make a compelling argument for using the Gp1ba-Cre as opposed to the Pf4-Cre system and demonstrate both the dose- and stage-dependent effects of Shp1 and Shp2 on megakaryopoiesis and thrombopoiesis. They find that Shp1 and Shp2 are required in late-stage megakaryocyte maturation and that even low levels of expression compared to baseline are likely sufficient to yield generally normal megakaryocytes. Their findings also lead to specific future directions, such as the mechanism by which Shp1 regulates megakaryopoiesis and thrombopoiesis that is distinct from TPO-mediated signaling.

      Weaknesses:

      While the experiments have been thoughtfully designed and carried out, there is limited background explanation on relatively complex or niche pathways/mechanisms, such as the relationship between P-selectin, CRP, and PAR4p; the interactions between SFK, Syk, GPVI, and CLEC-2; and TPO, MPL, ERK1/2, AKT, and STAT3, which, while likely intuitive to experts in their respective fields, may be less obvious to a reader approaching this manuscript with a global interest in megakaryopoiesis/thrombopoiesis and thus detract from the impact of the findings.

      With regard to the science itself, some of the conclusions feel premature based on the available data.

      (1) The section "Aberrant ITAM signaling in Shp1- and Shp2-deficient platelets" is challenging to follow for those not well-versed in ITAM signaling and associated pathways, and may take additional outside reading to follow the conclusion that Syk-dependent signaling is modulated downstream of GPVI and CLEC-2 based on lack of change in Src p-Tyr418, especially considering that Src p-Tyr418 was previously introduced as a measure of SFK rather than Syk. In the introduction, Shp1 is specifically mentioned as a negative regulator of the ITAM/Syk/phospholipase pathway. However, in Figure 4Ai and Bi, Syk phosphorylation/activation in Shp1 knockout cells did not appear to be different from Shp2 knockout cells, and is lower than the control, which is surprising for a negative regulator. It is also not clear why, in the section (Figure 4A-B), there is reduced Syk activation in Shp1 and Shp2 single knockout cells upon CLEC2 stimulation (but apparently not with CRP) when there was no difference in response to CLEC2 (but a difference in response to CRP) in the previous section (Figure 3A, C).

      (2) In the section "Reduced Tpo signaling in Shp1/2-deficient MKs," only Western blot data for (p)ERK1/2, AKT, and STAT3 are presented before concluding that decreased ERK1/2 activity is a mechanistic explanation for thrombocytopenia seen in the Shp1/2 double-knockout condition. Such a statement would benefit from additional experiments, such as protein or transcriptional levels of ERK1/2 targets specifically relevant to megakaryopoiesis, such as ETS, FOS, and JUN, to assess the consequences of decreased phosphorylated ERK1/2.

      (3) Suggesting that "inhibiting Shp2 will not hav[e] any bleeding consequence in patients" and that Shp2 may be a therapeutic target in myeloproliferative neoplasms when none of these studies have been carried out in a human model is a bold conclusion. There are no data presented on, for example, whether Shp2 inhibition can help reverse the MPL/JAK/STAT pathway in the setting of gain-of-function mutations specifically associated with myeloproliferative neoplasms.

    5. Author response:

      eLife Assessment

      This manuscript provides an important contribution to the field of platelet biogenesis, and the convincing evidence will advance our understanding of signal transduction driving the development of late megakaryopoiesis and platelet reactivity that results in bleeding diathesis. The paper is noteworthy for analyzing two related, either singly or in combination, tyrosine phosphatases in this conditional, stage development gene knockout. Because SHP1 is a negative regulator and SHP2 is an activator, the synergistic effects found in the double knockout were surprising.

      We thank the reviewer for acknowledging the importance and novelty of our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Barré et al. investigated the role of Shp1 and Shp2 in megakaryocytes (MKs) and platelets by conditional knock-out of Shp1, Shp2, or both under the control of the Gp1ba promoter. Deletion of Shp1 and Shp2 in MKs and platelets was almost complete. The Shp1/Shp2 double knock-out mice displayed macrothrombocytopenia and increased bleeding, whereas the single knock-outs did not show significant defects. Platelet function was aberrant in DKOs, but not in single knock-outs, and so was ligand-induced signaling, particularly Syk phosphorylation.

      Megakaryocyte maturation was impaired in Shp1/Shp2 DKO mice. Ligand-induced signaling was impaired in Shp2 knock-out and DKO. Ex vivo formation of platelets and in vivo maturation of MKs were impaired in DKO mice. Pharmacological inhibitors of Shp1 and Shp2 had largely similar effects as observed in the single knock-outs. The authors conclude that Shp1 and Shp2 have synergistic functions in the MK/platelet lineage, and that Shp2 may be a potential therapeutic target in myeloproliferative neoplasms.

      Strengths:

      The data clearly show effects of the Shp1/Shp2 double knock-out on MKs and platelets.

      Weaknesses:

      There appears to be a discrepancy between the results with the Shp2 single knock-out and the Shp2 inhibitor: the Shp2 knock-out does not affect MKs and platelets, except Erk1/2 signaling, whereas the Shp2 inhibitors appear to affect MK function.

      This work is interesting and may have potential from a therapeutic point of view.

      Pharmacological effects do not always correlate with congenital anomalies arising for genetic defects. The Shp2 allosteric inhibitors used in our study only inhibit catalytically inactive Shp2, whereas targeted deletion of Ptpn11 results in a loss of total Shp2 expression, including catalytic and non-catalytic related functions, with developmental consequences. Further, Gp1ba-Cre+;Shp2fl/fl megakaryocytes express approximately 22% of normal Shp2 level, which likely also contributes to differences observed between pharmacological inhibition and genetic ablation of Shp2.

      We thank the reviewer for recognizing the therapeutic potential of our findings.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Barré et al. investigate the roles of the phosphatases Shp1 and Shp2 in the megakaryocyte and platelet lineage using genetic depletion in mice. By employing Gp1ba-Cre-based models, the study builds on the authors' previous work and addresses some limitations associated with earlier Pf4-Cre approaches. The authors report relatively mild alterations in megakaryocyte and platelet parameters in mice lacking either Shp1 or Shp2 alone, whereas combined deletion of both phosphatases results in macrothrombocytopenia, mild bleeding, and impaired GPVI-dependent platelet aggregation accompanied by reduced Syk phosphorylation. The functional platelet defects are linked to reduced expression of GPVI and integrin α2, while thrombocytopenia is associated with impaired megakaryocyte maturation, reduced ploidy, defective proplatelet formation, and altered TPO-dependent Ras/MAPK signaling. Similar effects on megakaryopoiesis are also observed in vitro following treatment with newly developed Shp2 inhibitors.

      Strengths and Weaknesses:

      The study addresses an important biological question and presents a substantial dataset that could contribute to a better understanding of Shp1 and Shp2 function in platelet biology. However, several aspects of data presentation and interpretation would benefit from additional clarification. In particular, while the authors conclude that single genetic deletion or pharmacological inhibition of Shp1 has a limited impact and that the major phenotypes are specific to combined Shp1/2 deletion or Shp2 inhibition, some of the data suggest more nuanced effects that may warrant further discussion.

      We thank the reviewer for raising this point. The manuscript is being revised accordingly, including highlighting the potential role of Shp1 in megakaryopoiesis and thrombopoiesis under steady-state and stressed conditions, requiring more detailed investigation.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Barré et al utilize the Gp1ba-Cre transgenic mouse model to build upon previous findings in a Pf4-Cre system to investigate the effects of individual and combined Shp1 and Shp2 deletion in megakaryocytes and platelets. They report decreased megakaryocyte maturation, macrothrombocytopenia, and increased bleeding primarily in association with the Shp1/Shp2 double-knockout condition. The authors further show that this phenotype appears to be driven primarily by Shp2 and implicate dysregulation of Mpl signaling and downstream Ras/MAPK pathways, including ERK1/2. Given the key role of these pathways in human diseases such as myeloproliferative neoplasms and the challenges associated with modulating such a central pathway, identification of a specific regulator of Mpl signaling poses intriguing questions for future studies on clinical applicability.

      We thank the reviewer for acknowledging the importance and novelty of our findings.

      Strengths:

      Overall, the experiments combine in vitro, in vivo, and ex vivo approaches and appear to have been carefully designed and carried out, with multiple technical and biological replicates where relevant. The authors make a compelling argument for using the Gp1baCre as opposed to the Pf4-Cre system and demonstrate both the dose- and stagedependent effects of Shp1 and Shp2 on megakaryopoiesis and thrombopoiesis. They find that Shp1 and Shp2 are required in late-stage megakaryocyte maturation and that even low levels of expression compared to baseline are likely sufficient to yield generally normal megakaryocytes. Their findings also lead to specific future directions, such as the mechanism by which Shp1 regulates megakaryopoiesis and thrombopoiesis that is distinct from TPO-mediated signaling.

      Weaknesses:

      While the experiments have been thoughtfully designed and carried out, there is limited background explanation on relatively complex or niche pathways/mechanisms, such as the relationship between P-selectin, CRP, and PAR4p; the interactions between SFK, Syk, GPVI, and CLEC-2; and TPO, MPL, ERK1/2, AKT, and STAT3, which, while likely intuitive to experts in their respective fields, may be less obvious to a reader approaching this manuscript with a global interest in megakaryopoiesis/thrombopoiesis and thus detract from the impact of the findings.

      We thank the reviewer for raising this point. The manuscript is being revised, to better explain the rationale and molecular mechanisms linking these pathways and functions.

      With regard to the science itself, some of the conclusions feel premature based on the available data.

      (1) The section "Aberrant ITAM signaling in Shp1- and Shp2-deficient platelets" is challenging to follow for those not well-versed in ITAM signaling and associated pathways, and may take additional outside reading to follow the conclusion that Syk-dependent signaling is modulated downstream of GPVI and CLEC-2 based on lack of change in Src p-Tyr418, especially considering that Src p-Tyr418 was previously introduced as a measure of SFK rather than Syk. In the introduction, Shp1 is specifically mentioned as a negative regulator of the ITAM/Syk/phospholipase pathway. However, in Figure 4Ai and Bi, Syk phosphorylation/activation in Shp1 knockout cells did not appear to be different from Shp2 knockout cells, and is lower than the control, which is surprising for a negative regulator. It is also not clear why, in the section (Figure 4A-B), there is reduced Syk activation in Shp1 and Shp2 single knockout cells upon CLEC2 stimulation (but apparently not with CRP) when there was no difference in response to CLEC2 (but a difference in response to CRP) in the previous section (Figure 3A, C).

      We thank the reviewer for raising these important points. The manuscript is being revised accordingly, including clarifying the roles of SFKs, Shp1 and Shp2 in the ITAM-Syk-PLCg2 signaling pathway.

      Briefly, SFKs are essential for phosphorylating ITAMs, allowing SH2-dependent docking of Syk. Reduced reactivity of Shp1/2 DKO platelets to CRP and collagen is likely due to downregulation of the ITAM-containing GPVI-FcR g-chain complex and integrin a2 subunit, and concomitant reduction in Syk phosphorylation.

      However, the marginal albeit significant reduction in Syk phosphorylation downstream of CLEC-2 in Shp1 and Shp2 KO platelets was not determined and was insufficient to impact CLEC-2-mediated platelet aggregation under the conditions tested.

      Differences in the stoichiometry and docking of Syk to phosphorylated GPVI-FcR g-chain and CLEC-2 likely contribute to the differences in platelet reactivity and Syk phosphorylation downstream of the two receptors in the absence of Shp1 and Shp2.

      (2) In the section "Reduced Tpo signaling in Shp1/2-deficient MKs," only Western blot data for (p)ERK1/2, AKT, and STAT3 are presented before concluding that decreased ERK1/2 activity is a mechanistic explanation for thrombocytopenia seen in the Shp1/2 doubleknockout condition. Such a statement would benefit from additional experiments, such as protein or transcriptional levels of ERK1/2 targets specifically relevant to megakaryopoiesis, such as ETS, FOS, and JUN, to assess the consequences of decreased phosphorylated ERK1/2.

      We thank the reviewers for these constructive comments. Further experiments are being planned to determine the biological and transcriptional consequences of reduced ERK1/2 phosphorylation during megakaryopoiesis and thrombopoiesis.

      (3) Suggesting that "inhibiting Shp2 will not have any bleeding consequence in patients" and that Shp2 may be a therapeutic target in myeloproliferative neoplasms when none of these studies have been carried out in a human model is a bold conclusion. There are no data presented on, for example, whether Shp2 inhibition can help reverse the MPL/JAK/STAT pathway in the setting of gain-of-function mutations specifically associated with myeloproliferative neoplasms.

      This conclusion is being tempered in the revised manuscript. Genetic- and pharmacological-based approaches will be used to establish the therapeutic potential of inhibiting Shp1 and Shp2 in mouse models of MPN, including Jak2 gain-of-function mice. Bleeding and thrombotic complications of inhibiting Shp1 and Shp2 will be explored as part of these studies.

    1. eLife Assessment

      This study provides valuable findings in the study of enhancer biology by identifying and dissecting a minimal enhancer regulating dlx2b expression during zebrafish tooth development, supported by promoter dissection, reporter assays, and genome-editing approaches. The work offers a resource and extends previous findings but has limited broader impact, with several conclusions about general cis-regulatory principles and functional consequences remaining only partially supported. Accordingly, the strength of evidence is at present incomplete, as additional functional validation would be needed to fully substantiate some of the claims.

    2. Reviewer #1 (Public review):

      Summary:

      Jackman et al report the analysis of a cis-regulatory region upstream of the dlx2b gene in zebrafish, that is hypothesised to control gene expression in the developing tooth. To demonstrate this, the authors performed solid promoter bashing analysis to assess the gene expression driven by the regulatory region, and validated the expression against a GFP-reporter knock-in. They narrowed down the tooth-specific enhancer activity to the MTE, which was sufficient to drive gene expression. Interestingly, they have identified a vertebrate conserved region which contained four predicted transcription factor binding sites, which when mutated individually, did not alter the reported gene expression. However, in combination, the expression was disrupted. The authors propose a putative upstream regulator cebpa binding one of the predicted TFBS, using in situ hybridisation to show overlapping gene expression domains.

      Strengths:

      The experiments presented in this paper were rigorously executed and the authors' effort to systematically dissect the different elements of the enhancer are commendable. The discussion and limitations of the study were very well-balanced.

      First, the results represent important findings first for the enhancer biology field to sustain evidence of the role of redundant TFBSs. Too often, only TFBS mutations that are sufficient and necessary to drive gene expression patterns are reported, but work providing evidence that some TFBS are necessary but not sufficient by themselves to drive expression is rarer. TFBS redundancy is a crucial concept in enhancer biology but also a difficult concept to prove that hinders the accurate prediction of enhancer function. In an era where increasingly more powerful machine learning models are developed to predict enhancer function, this work is a reminder of the complexity of enhancer biology and provides ground truths for experimental validation.

      Second, the results present valuable findings for the field of tooth development. While the authors have comprehensively described work performed in this space, there are still not many tooth-specific enhancers identified and accurately described. The work also presents further avenues for studying upstream regulators.

      Weaknesses:

      It seems to me that one of the greatest outcomes of this work is demonstrating the collective action of mutated TFBSs where individual mutations are not affecting gene expression. These findings fall into the realm of enhancer redundancy but this concept was not thoroughly discussed in the introduction of the paper.

      The claimed results are generally well-supported by the experiments performed, and hypothesis and speculations have been clearly stated. However, some speculative statements remain that should be addressed, for example in the abstract line 33 "These findings suggest that loss of MTE function permits alternative cis-regulatory elements to gain control of the promoter". There is no data indicating what these cis-regulatory elements could be, hence this sentence might be better suited in the discussion.

      The manuscript could be strengthened by further exploration of the wider region upstream of dlx2b to support the recruitment of other TFBSs: Were there any other vertebrate-conserved regulatory regions just outside of the MTE? Were there any other family members of the predicted TFs expressed in the tooth? Transcription factor binding sites identity remains a prediction; it could be expanded to other TFs within the same family.

    3. Reviewer #2 (Public review):

      The manuscript by Jackman et al. explores the role of a candidate enhancer of dlx2b in zebrafish tooth formation.

      They have mapped the dental epithelium and mesenchyme activity of a 4kb promoter proximal region previously identified as a candidate enhancer region. They identified candidate TFBS and candidate transcription factors regulating this enhancer and proposed that their findings reveal principles of enhancer function during vertebrate organogenesis (tooth development) and the power of dissecting cis regulatory architecture. The study offer valuable genetic tagging resource for studying tooth development while further experiments and analyses would be needed to support the suggestion for novel findings on in cis-regulatory principles of tooth development. In the lack of functional evidence on endogenous target gene pr tooth development, some of the claims of the paper may need rephrasing.

      (1) The candidate enhancer region has previously been published, this study narrows the enhancer effect to a well-conserved region within. To what degree the element is unique in the locus for tooth development and to what degree this element is required for tooth morphogenesis, is not addressed.

      (2) The knock-in approach is convenient for reporter activity based analyses, however it lacks the precision that would be necessary to conclude on enhancer- autonomous effects or enhancer effects on the endogenous target promoter. The HSP promoter inserted in within a 5kb(?) insert in the UTR region of dlx2b creates an chimeric E-P context. The expression profile of the knock-in reporter is substantially different from the endogenous gene (Figure 1B and C) suggesting E-P interaction dependent expression profile, which may confuse what in the expression comes solely from the enhancer and not as a result of the HSP promoter interaction with the enhancer. An alternative heterologous promoter would help in defining the enhancer specific effects.

      (3) Function of the candidate enhancer: The MTE enhancer effect is measured by gain of function towards dlx2b regulation. The deletion assays are limited to plasmids designed to test the enhancer in isolation from the endogenous enhancer architecture, or to a deletion in the knock-in, which may be impacted by the chimeric regulatory interaction with a heterologous HSP promoter. As a result we do not learn whether the enhancer targets or needs for endogenous target gene activity. This design allows a conclusion on tissue activity of the enhancer but not the requirement for tooth development.

      (4) Since the locus is scattered by candidate enhancers (see genome annotation resources) it is feasible that additional E-P interactions lead to potential enhancer redundancies with the MTE. For a conclusive functional test/requirement of the MTE enhancer, the authors would need to delete it in the endogenous locus context. The knock-in could theoretically be used for an enhancer function on dlx2b activity, if the authors show that there is interaction with the endgogenous promoter (3C type experiment); and that the MTE enhancer-driven GFP activity was identical to the endogenous tagged dlx2b activity. This does not appear to be the case, as ectopic expression in Fig 1C as compared to B is shown. Of note, RNA detection by WISH would be more precise for comparisons. The figure likely compares protein (legend is unclear, but text suggests protein) to mRNA, which is imprecise.

      (5) There is an experimental design question arising with generating the MTE deletion in the knock-in (line 391): the authors describe generating the transgenic lines by screening for reduced reporter activity first. This suggests the authors pre-emptively looked for an effect as result they predicted when generating the transgenic lines, which would create a circular argument. All transgenic lines carrying the deletion (tested by sequencing first) would need to be assayed for activity change and then can conclusion could be made on effect of MTE loss by statistical analyses of reporter activities in the generated lines.

      (6) Most transgenic work described are based on single transgenic lines. Enhancer promoter contexts may be affected either by position effects (in case of the reporter constructs) or by the heterologous promoter context of the knock which may be affected by unexpected recombination events. Such unintended confound effects can be excluded by replicates.

      (7) GFP protein detection does not allow precise spatio-temporal resolution due to varying protein stability in tissues, which potentially impacts endogenous gene activity comparison, and accurate determination of activity dynamics towards conclusions on lineage determining/maintenance roles of the dlx2b enhancer.

      (8) The expression pattern change upon MTE loss (retention of mesenchyme, loss of epithelium) is an interesting observation, which would benefit from more comprehensive analysis of the grammar (TFBS contributions) to the pattern variation by dissection of the combination of TFBS contributions. Without such, enhancer grammar remains mostly unclear, thus, principles of morphogenesis may not have been uncovered.

      (9) The diagrammatic models of the conclusions are illustrating simple logic which does not add to the text.

      (10) Author contributions need to be explained in more detail to be sufficiently granular for fair credit.

    4. Reviewer #3 (Public review):

      In the manuscript entitled "A Minimal tooth Enhancer Regulates dlx2b Expression During Zebrafish Tooth 1 Formation: Insights into Cis-Regulatory Logic in Organogenesis", the authors explore the cis-regulatory logic of a dlx2b minimal enhancer capable of directing dlx2b gene expression to the developing tooth germs. The study combines (1) CRISPR-mediated GFP knock-in to track endogenous gene expression; (2) a promoter-bashing approach to identify a minimal tooth enhancer (MTE); (3) site-directed mutagenesis coupoled with transgenesis to assess the individual role of conserved TF binding sites; and (4) in vivo deletgion of the MTE to examine the consequences for gene expression. Overall, this is a technically solid study that provides some novel insights into tooth development and extends previous observations by the authors (Jackman & Stock, 2006; PNAS). However, the added value of the manuscript is limited by both the narrow experimental scope and the relatively modest impact of the findings for the broader field of developmental biology.

      Main concerns:

      (1) My main concern is that the study restricts the search for cis-regulatory information to the 5' region 4kb upstream of the TSS of the gene, rather than encompassing the full genomic locus. This is particularly limiting given that a knock-in allele was generated, which in principle allows interrogation of regulatory elements across the entire locus, and that the authors acknowledge the availability of genome-wide regulatory datasets (e.g. DANIO-CODE) in the Discussion. Despite this, no systematic effort is made to test additional regulatory elements beyond the proximal promoter/enhancers.<br /> This has important implications for the interpretation of the current work as: (a) dlx2b, as many developmental genes, resides in a gene desert enriched in open chromatin regions that may function as distal enhancers, and (b) the deletion of the MTE unmasked a cis-regulatory activity which nature cannot be explained with the information provided, and that may seem relevant for the expression of the gene in the dental mesenchyme.

      (2) A second concern is the absence of information on the functional consequences of deleting the gene or the MTE on tooth primordium development. From the description of the KI strategy, it is unclear whether the GFP insertion results in a functional fusion protein. The cytoplasmic GFP distribution and the schematic in Figure S1 instead suggest the presence of a terminal stop codon in the GFP sequence, which would result in a dlx2b loss-of-function allele. If this interpretation is correct, the manuscript does not describe the developmental consequences in homozygous embryos. Similar concerns apply to the MTE deletion: it remains unclear whether loss of this enhancer results in any detectable morphological or developmental defects.

    1. eLife Assessment

      This fundamental study presents experimental evidence on how geomagnetic and visual cues are integrated in a nocturnally migrating insect. The evidence supporting the conclusions is compelling. The work will be of broad interest to researchers studying animal migration and navigation.