10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This important study addresses the contribution of pericytes to the organization and permeability control of the zebrafish blood-brain barrier (BBB). By analyzing pdgfrb mutant zebrafish that lack brain pericytes, the authors reveal that the resulting cerebrovascular network is abnormally patterned. Remarkably, however, the barrier retains its restrictive permeability during larval and juvenile stages. More pronounced vascular defects become evident in adults, where localized BBB leakage coincides with hemorrhages and aneurysm formation. Based on convincing and beautifully documented imaging data, the authors argue that, unlike what has been reported in rodent systems, pdgfrb-dependent pericytes are not essential for maintaining BBB integrity in the zebrafish brain.

    2. Reviewer #1 (Public review):

      Summary:

      The study investigates the role of vascular mural cells, specifically pericytes and vascular smooth muscle cells (vSMCs), in maintaining blood-brain barrier (BBB) integrity and regulating vascular patterning. Analyzing zebrafish pdgfrb mutants that lack brain pericytes and vSMCs, the show that mural cell deficiency does not impair BBB establishment or maintenance during larval and early juvenile stages. However mural cells seem to be crucial for preventing vascular aneurysms and hemorrhage in adulthood as focal leakage, basement membrane disruption and increased caveolae formation are observed in adult zebrafish at aneurysm hotspots. The authors challenge the paradigm that mural cells are essential for BBB regulation in early development while highlighting their importance for long-term vascular stability.

      Strengths:

      Previous studies have established that the zebrafish BBB shares molecular and morphological homology with e.g. the mammalian BBB and therefore represents a suitable model. By examining mural cell roles across different life stages-from larval to adult zebrafish-the study provides an unprecedented comprehensive developmental analysis of brain vascular development and of how mural cells influence BBB integrity and vascular stability over time. The use of live imaging, whole-brain clearing, and electron microscopy offers high-resolution insights into cerebrovascular patterning, aneurysm development, and structural changes in endothelial cells and basement membranes. By analyzing "leakage hotspots" and their association with structural endothelial defects in adults the presented findings add novel insights into how mural cell loss may lead to vascular instability.

    3. Reviewer #2 (Public review):

      Summary:

      The authors generated a zebrafish mutant of the pdgfrb gene. The presented analyses and data confirm previous studies demonstrating that Pdgfrb signaling is necessary for mural cell development in zebrafish. In addition, the data support previously published studies in zebrafish showing that mural cell deficiency leads to hemorrhages later in life. The authors presented quantified data on vessel density and branching, assessed tracer extravasation, and investigated the vasculature of adult mice using electron microscopy.

      Strengths:

      The strength of this article is that it provides independent confirmation of the important role of Pdgfrb signaling for the development of mural cells in the zebrafish brain. In addition, it confirms previous literature on zebrafish that provides evidence that, in the absence of pericytes/VSMC, hemorrhages appear (Wang et al, 2014, PMID: 24306108 and Ando et al 2021, PMID: 3431092)".

      The Reviewing Editor has carefully reviewed the revised manuscript and is fully satisfied with the authors' revisions.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates the role of vascular mural cells, specifically pericytes and vascular smooth muscle cells (vSMCs), in maintaining blood-brain barrier (BBB) integrity and regulating vascular patterning. Analyzing zebrafish pdgfrb mutants that lack brain pericytes and vSMCs, they show that mural cell deficiency does not impair BBB establishment or maintenance during larval and early juvenile stages. However, mural cells seem to be crucial for preventing vascular aneurysms and hemorrhage in adulthood as focal leakage, basement membrane disruption, and increased caveolae formation are observed in adult zebrafish at aneurysm hotspots. The authors challenge the paradigm that mural cells are essential for BBB regulation in early development while highlighting their importance for long-term vascular stability.

      Strengths:

      Previous studies have established that the zebrafish BBB shares molecular and morphological homology with e.g. the mammalian BBB and therefore represents a suitable model. By examining mural cell roles across different life stages - from larval to adult zebrafish - the study provides an unprecedented comprehensive developmental analysis of brain vascular development and of how mural cells influence BBB integrity and vascular stability over time. The use of live imaging, whole-brain clearing, and electron microscopy offers high-resolution insights into cerebrovascular patterning, aneurysm development, and structural changes in endothelial cells and basement membranes. By analyzing "leakage hotspots" and their association with structural endothelial defects in adults the presented findings add novel insights into how mural cell loss may lead to vascular instability.

      Weaknesses:

      The study uses quantitative tracer assays with multiple molecular weight dyes to evaluate blood-brain barrier (BBB) permeability. The study normalizes the intensity of tracer signals (e.g., 10 kDa, 70 kDa dextrans) in the brain parenchyma to the vascular signal of a 2000 kDa dextran tracer (assumed to remain within vessels). Intensity normalization is used to control for variations in tracer injection efficiency or vascular density. This method doesn't directly assess the absolute amount of tracer present in the parenchyma, potentially underestimating leakage severity. As the lack of BBB impairment is a "negative" finding, more rigorous controls or other methods might be needed to corroborate it.

      In response to these and comments from other reviewers, we have now performed further carefully controlled analysis to test leakage of tracers using molecular weights ranging from 1 to 2000 kDa. We have performed additional normalisation approaches (new data in Fig. 2a–d) imaging tracer extravasation together with vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this transgenic reporter for normalisation (as suggested by Reviewer #2). The results of these experiments all supported our initial conclusions (revised Extended Data Fig. 3a–d) further validating the reliability of our method. Furthermore, as suggested by the reviewer analysis of the raw tracer intensity amounts in the parenchyma were also performed with no normalization at all (see Author response image 1). This also supports our conclusion that the BBB is intact in young animals. Finally, we now use our methods to demonstrate that we can detect an immature leaky BBB at 3 dpf and a mature functional BBB at 7 dpf (Fig. 2e-f), a suitable positive control to show that our methods and analyses are reliable.

      Author response image 1.

      Raw intensity values from the parenchyma confirm findings in Figure 2 and Extended Data Figure 3.a–d, Raw mean fluorescence intensity values of extravasated tracers in the midbrain.(a–b) show unnormalized values corresponding to Extended Data Fig. 3a–d, and (c–d) show unnormalized values corresponding to Fig. 1a–d. Unpaired t-tests for 70 and 10 kDa at 14 dpf in (a–b), for 10 kD at 7 dpf, and for 70 kDa at 14 dpf in (c–d). Mann-Whitney tests for 70 and 10 kDa at 7 dpf in (a–b), for 70 kDa at 7 dpf, and for 10 kDa at 14 dpf (c–d), due to non-normal distribution. These data were all generated in genotype blind assays, display variance in signal that is generated between embryos due to injection differences and show no difference between the genotypes analysed in BBB integrity. Comparison of this to normalised data using 2000 kDa tracer or kdrl expression in endothelial cells (Fig. 2 and Extended Data Fig. 3) confirms that normalisation improves the analysis, effectively controlling for embryo-to-embryo differences in delivery of tracer and imaging.

      Reviewer #2 (Public review):

      Summary:

      The authors generated a zebrafish mutant of the pdgfrb gene. The presented analyses and data confirm previous studies demonstrating that Pdgfrb signaling is necessary for mural cell development in zebrafish. In addition, the data support previously published studies in zebrafish showing that mural cell deficiency leads to hemorrhages later in life. The authors presented quantified data on vessel density and branching, assessed tracer extravasation, and investigated the vasculature of adult mice using electron microscopy.

      Strengths:

      The strength of this article is that it provides independent confirmation of the important role of Pdgfrb signaling for the development of mural cells in the zebrafish brain. In addition, it confirms previous literature on zebrafish that provides evidence that, in the absence of pericytes/VSMC, hemorrhages appear (Wang et al, 2014, PMID: 24306108 and Ando et al 2021, PMID: 3431092). The study by Ando et al, 2021 did not report experiments assessing BBB leakage in pdgfrb mutants but in the review article by Ando et al (PMID: 34685412) it is stated that "indicating that endothelial cells can produce basic barrier integrity without pericytes in zebrafish."

      We thank the reviewer for their comments and pointing out literature that we had not cited (this has been corrected in our revised manuscript).

      As noted by other reviewers, our study goes beyond simply confirming previous literature. The quoted section by the reviewer from Ando et al 2021 regarding intact barrier integrity in pdgfrb mutants is a conclusion based on apparent lack of haemorrhages in pdgfrb mutants[1]. Our work shows haemorrhages in older animals and as such is in line with these previously published results, but it also extends previous work, for the first time reporting detailed functional analysis to assess BBB integrity. Our study uses definitive tracer assays (now including extensive revisions) to identify intact the BBB in pdgfrb mutants in live animals. This has not been previously described and is important because it offers a new perspective on the evolutionary conservation (or otherwise) of pericyte control of BBB function. Furthermore, our study investigates the nature of hotspot leakage and haemorrhages in more detail than in previous work.

      Weaknesses:

      (1) The authors should avoid using violin plots, which show distribution. Instead, they should replace all violin plots in the figures with graphs showing individual data points and standard deviation. For Figure 2f specifically, the standard deviation in the analyzed cohort should be shown.

      This is a good point and we have replaced the violin plots with individual data points and shown all data as mean±SEM.

      (2) The authors have not shown the reduced PDGFRB protein or the effect of mutation on mRNA level in their zebrafish mutant.

      Our pdgfrb<sup>uq30bh</sup> mutant allele introduces a mutation predicted to generate a truncated protein very similar to previously validated alleles (see detail in revised Extended Data Fig. 1a and methods). Our pdgfrb<sup>uq30bh</sup> mutant also phenocopies previous pdgfrb mutants (sa16389 and um148 alleles)[2,3], displaying mural cell loss with multiple markers (Fig. 1a, new data in Extended Data Fig. 1b–c, Fig. 3b–c; Extended Data Fig. 4c–d) and the same typical morphological defects and survival rates (new data in Extended Data Fig. 1d–f). Thus our mutant phenocopy gives confidence it is most likely a null allele, in line with previous papers studying presumed null alleles[1].

      We believe this provides sufficient confidence in this allele of pdgfrb. Moreover, considering that our manuscript focusses on loss of mural cells and we show definitively that this mutant has robust loss of mural cells in the brain, our mutant is suitable for this study.

      (3) Statistical data analysis: Did the authors perform analyses to investigate whether the data has a normal distribution (e.g., Figures 1d, e)?

      We thank the reviewer for raising this and apologise for this oversight. All data have now been assessed for normality using Shapiro-Wilk test and further statistical analyses have been performed accordingly. The specific quantifications referred to by the reviewer in Extended Data Fig. 3a–d (previously Fig. 1d-e), have normal distribution except for quantification measuring 70 kDa extravasation at 7 dpf, therefore Mann-Whitney test has been used for this comparison. Further information can be found in figure legends and methods.

      (4) Analysis of tracer extravasation. The use of 2000 kDa dextran intensity as an internal reference is problematic because the authors have not provided data demonstrating that the 2000 kDa dextran signal remains consistent across the entire vasculature. The authors have not provided data demonstrating that the 2000 kDa dextran signal in vessels exhibits acceptable variance across the vasculature to serve as a reliable internal reference. The variability of this signal within a single animal remains unknown. The presented data do not address this aspect.

      We thank the reviewer for their comment and agree that analysis was needed for showing 2000 kDa dextran as a reliable normalization signal.

      We now show the data in the following Figures that demonstrate the consistency of signal throughout the vasculature using this 2000-kDa tracer: Extended Data Fig. 2b, Extended Data Fig. 3a and c, Extended Data Fig. 5a, Extended Data Fig. 6. In fact, we observe that this 2000 kDa tracer provides a very reliable marker of large and small calibre vessels in larval, juvenile and adult animals, even in fixed and cleared whole tissues and animals (e.g. Extended Data Fig. 2d-e, Extended Data Fig. 5 and 6).

      Our further experiments and analysis support the use of this tracer as an ideal way to normalise for variation between animals and coupled with improved masking of vessels using transgenic labels (e.g. Extended Data Fig. 2b) we can quantify across whole vascular networks to reduce the concern about variation within individual animals. We also find 2000 kDa shows negligible leakage through the brain vessels Extended Data Fig. 2b–c (new data) at 2 hours post-injection (hpi) and provided images in Extended Data Fig. 6b–b′′ showing detectable signals even at 6 hpi. Finally, results generated with this approach, normalisation to transgenic markers or even raw parenchymal values of tracer intensity, generate the same conclusions. In addition, we point the reviewer to a recent pre-print that further validates this method from our team[4].

      Overall, we find the use of this tracer an ideal way to normalise for differences in injection volumes between animals and we recommend the use of this method to other groups assessing BBB leakage in zebrafish.

      Additionally, it's intriguing that the signal intensity in the parenchyma of the tested tracers presents a substantial range, varying by 20-30% in the analysed cohort (Figure 1g, Extended Figure 1e). Such large variability raises the question of its origin. Could it be a consequence of the normalization to 2000 kDa dextran intensity which differs between different fish? Or is it due to the differences in the parenchymal signal intensity while the baseline 2000 kDa intensity is stable? Or is the situation mixed?

      This is a good point raised by the reviewer.

      To address this, we have used the following approaches:

      (1) We provide additional experiments and normalisation methods that support the utility of our tracer studies (new data in Fig 2a–f and Extended Data Fig. 2b–c), discussed in detail below.

      (2) We provide graphs of the raw parenchymal distribution of tracer not normalised at all (also requested by reviewer 1). This is provided in Author response image 1 and further supports all our conclusions, showing that our normalisation methods generate meaningful data.

      Overall, the range of parenchymal intensity that we see after tracer injection and live imaging shows variations introduced during microinjection. However, these ranges are in-line with previous publications using similar methods (see studies by O’Brown et al 2019 and 2023)[5,6], allow reliable statistical comparisons to be drawn between control and mutants and allow us to detect both immature and functional BBB states during zebrafish development (new data in Fig. 2e-f).

      Of note, the variability we see is likely introduced during the injection process into tiny larval blood vessels and is the reason why we perform normalization of parenchymal tracers to a vascular dextran signal that doesn’t leak from brain vessels. In our studies, 2000-kDa dextran has been co-injected with the smaller size tracers, therefore any potential differences in injection volumes as well as imaging conditions (however consistent) should be reduced by this method.

      An alternative and potentially more effective approach would be to cross the pdgfrb mutant line with a line where endothelial cells are genetically labeled to define vessels (e.g. the line kdrl used in acquiring data presented in Figure 2a). Non-injected controls could then be used as a baseline to assess tracer extravasation into the parenchyma.

      We thank the reviewer for this suggestion.

      In response, we have performed new tracer leakage experiments at 7 and 14 dpf in siblings and pdgfrb mutants and quantified parenchymal tracer extravasation by normalizing to vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). The results were in-line with the previously presented and independent experiments and showed indistinguishable phenotypes between siblings and pdgfrb mutants (new data, Fig. 2a–d). We also used uninjected controls to assess baseline and saw consistent values approaching zero in these images and did not include this in the revised paper.

      Furthermore, we have also used this approach in wild-type larvae at 3 dpf (immature BBB) and 7 dpf (functional BBB)[5]. We detected significantly higher parenchymal extravasation of 10 and 70 kDa tracers at 3 dpf compared to 7dpf, demonstrating that our method can detect leakage (new data, Fig. 2e–f).

      We believe that both normalization approaches have advantages (as discussed above), therefore showing the same results with these two different approaches has further strengthened our findings.

      How is the data presented in Figure 3e generated? How was the dextran intensity calculated? It looks like the authors have used the kdrl line to define vessels. Was the 2000 kDa still used as in previous figures? If not, please describe this in the Materials and Methods section.

      We have moved this data to Fig. 4e (previously Fig. 3e).

      Previously, we had plotted raw data due to the nature of the experiment being conducted on a vibratome sectioned tissue. The 2000 kDa tracer was not used. In response to this query and to be consistent with the new approach suggested by the reviewer, we have revised the quantification by normalizing the 10 kDa tracer extravasation to Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) for this and the new experiments on juveniles (Fig. 5h–i). Please see the corresponding figure legends or revised methods (lines 464–472).

      (5) The authors state that both controls and mutants show extravasation of 1 kDa NHS-ester into the parenchyma. However, the presented images do not illustrate this; it is not obvious from these images (Extended Data Figure 1c). Additionally, the presented quantification data (Extended Data Figure 1e) do not show that, at 7 dpf, the vasculature is permeable to this tracer. Note that the range of signal intensity of the 1 kDa NHS-ester is similar to the 70 kDa dextran (Figure 1g and Extended Figure 1e). Would one expect an increase in the ratio in case of extravasation, considering that the 2000 kDa dextran has the same intensity in all experiments? Please explain.

      We thank the reviewer for raising this important point.

      To clarify, we have never claimed that “2000-kDa dextran has the same intensity in all experiments”. On the contrary, vascular 2000 kDa normalization has been used to account for potential differences caused by injection, as stated in the submitted supplementary materials and now made more clear in the revision.

      In response to this query, we conducted more detailed analysis on tracer extravasation patterns based on molecular weight (new data, Extended Data Fig 2b–c). This analysis showed that 1- and 10-kDa tracers have much higher extravasation rate compared to 70- and 2000-kDa tracers. Interestingly, we did not find a significant difference between 1 and 10 kDa extravasation. Therefore, in the revised manuscript we used only 10 kDa in further experiments and have removed 1 kDa from the figures.

      To assess the tracers individually (new data in Extended Data Fig. 2c), parenchymal extravasation of individual tracers was normalised to their own vascular signal (eg. Mean intensity of 10 kDa in midbrain/mean intensity of 10 kDa in vasculature), to account for potential differences in injection volume. This provides a suitable method to assess leakage in wild-type animals and is now in line with how previous studies have analysed such tracer injections[5,6]. Please see revised figure legends and supplementary materials for details.

      (6) The study would be strengthened by a more detailed temporal analysis of the phenotype. When do the aneurysms appear? Is there an additional loss of VSMC?

      We thank the reviewer for this suggestion, and we have now performed staged imaging of the pdgfrb mutants and siblings between 7 and 21 dpf using TgBAC(acta2:EGFP)<sup>uq17bh</sup> transgene (new data, Fig. 3b-c; Extended Data Fig. 4a–d). Consistent with previous results, acta2:EGFP-positive cells surrounding the middle mesencephalic central arteries (MMCtA) were missing in pdgfrb mutants. At 21 dpf, we have also observed a mild dilation of these vessels, likely the earliest changes to generate aneurysms (new data, Fig. 3c).

      To extend the number of stages analysed in this study, we have also performed new tracer leakage experiments in juveniles (30 dpf) and found that aneurysms can be detected at this age when the 10 kDa tracer is used (new data in Fig. 5b–b′). Consistent with the adult stage phenotype, aneurysms were limited to the larger calibre vessels (arteries) in the brain. We have also observed hotspots, and upon quantification, we found fewer numbers in juveniles compared to adults, suggesting that severity of aneurysms and hotspots increase with age.

      Taken together, our results show that the aneurysms in pdgfrb mutants start appearing at late larval/early juvenile stages (~21 dpf) with observable dilations. By 30 dpf, aneurysms accompanied by small numbers of hotspots are observed, which exhibits significantly increased numbers by adulthood. This also correlates with reduced development and survival rate of pdgfrb mutants after 30 dpf (new data, Extended Data Fig. 1d–e).

      (7) The authors intended to analyze the BBB at later stages (line 128), but there is not a significant time difference between 2 months (Figure 2) and 3 months (Figure 3) considering that zebrafish live on average 3 years. Therefore, the selection of only two time-points, 2 and 3 months, to analyze BBB changes does not provide a comprehensive overview of temporal changes throughout the zebrafish's lifespan. How long do the pdgfb mutants live?

      Respectfully, zebrafish transition from juvenile stages to adulthood between 2 and 3 months and there are many significant differences in the physiology of this organism at these two ages. At 2 months, zebrafish are still juveniles undergoing metamorphosis with rapid growth and ongoing skeletal and vascular development. By 3 months, they are sexually mature adults and have much more developed cranioskeletal and vascular systems. Having said that, we take the reviewers important point that further temporal resolution would improve the study.

      We have performed new experiments in 1-month-old animals and provided comprehensive analysis of the vascular phenotypes occurring in pdgfrb mutants. These were very informative experiments analysing leakage using 10-kDa tracer injections and have significantly improved the study. We had previously provided experiments at 5-month-old adults as well (previously Fig. 4a–b and Extended Data Fig. 4a) and so now the study includes larval stages (7, 14 dpf), juveniles at 1 and 2 months and adults at 3 and 5 months. While the additional timepoints did not offer up any new conclusions, they significantly enhanced the body of work overall.

      Of further note, we provided survival data up to 90 dpf where survival of the pdgfrb mutants is significantly reduced compared to siblings (Extended Data Fig. 1e). We believe this is associated with the severity of the aneurysms and haemorrhages which probably lead to lethality in these mutants.

      (8) Why is there a difference in tracer permeability between 2 and 3 months (Figures 2 and 3)? Are hemorrhages not detected in 2-month-old zebrafish?

      In response to this and other queries, we have added new additional experiments that provide more detailed temporal analysis on tracer accumulation (new data in Fig. 5b–c, Fig. 5f–g).

      In short, we do not see obvious haemorrhages in 1- or 2-month fish at a gross level during dissections (not shown). We find that using 10-kDa tracer, we can detect small hotspots at aneurysms as early as 1 month, likely representing the earliest loss of integrity. We do not see obvious hotspots in 2-month-old animals when we use the 70-kDa tracer, this suggests to us that it is less sensitive for hotspot detection (in line with new Extended Data Fig. 2c). Finally, we find that the number of hotspots increases dramatically from Juvenile to Adult stages in our datasets, which we take as indicative of a progressive phenotype.

      Overall, tracer size matters for detecting hotspots and they become more apparent in older animals - we have added a note in the main text to cover these points (lines 200–205)

      (9) Figure 3: The capillary bed should be presented in magnified images as it is not clearly visible. Figure 3e shows that in the pdgfb mutant the dextran intensity is higher also in regions 6-10. How do the authors explain this?

      We thank the reviewer for raising this important point.

      Firstly, we now include enlarged views of the capillary beds for this experiment (Fig. 4d′) and new experiments mentioned below.

      Secondly, in relation to why there is higher tracer in lateral locations and not just medial sites of haemorrhage, we believe that this is most likely due to the progressive spread of tracer from the medial hotspots. To test if this is likely, we performed additional experiments and tested tracer accumulation at 2 different timepoints in brains collected at 0.5 or 6 hpi (new data in Fig. 5f–g, Extended Data Fig. 6a–b′′). Tracer accumulation at 0.5 hpi was very minimal and was primarily limited to hotspots and nearby regions new data in (Fig. 5h), whereas a higher tracer accumulation in brains was observed across medial to lateral regions at 6 hpi (new data in Fig. 5i) in pdgfrb mutants. Comparing the data in Figure 4 (2 hpi) and new data in Figure 5i (6 hpi), the 10 kDa-tracer appears to have spread to more lateral locations given the increased time allowed post injection.

      We cannot formally exclude the possibility that tracer leakage does occur slower through capillaries than at major hotspots, which might fit with the proposed model of slow leakage via increased EC transcytosis[7-9]. However, considering that we cannot detect increased tracer accumulation in pdgfrb mutants that lack aneurysms and haemorrhages at 7 and 14 dpf, such a scenario would require capillary transcytosis to be active at later juvenile and adult stages but not in larval and late larval animals. Thus, we believe the most plausible explanation is that aneurysm/haemorrhage associated leakage is the primary cause of the vascular integrity defects in zebrafish pdgfrb mutants.

      We have added discussions addressing this in the revised manuscript (lines 220–230, 300–302).

      (10) In general, the manuscript would benefit from a more detailed description of the performed experiments. How long did the tracer circulate in the experiments presented in Figures 2, 3, and 4?

      We thank the reviewer for this suggestion and have now ensured that this is clearly described for in figure legends and methods (lines 391–395).

      (11) How do the authors explain the poor signal of the 70 kDa dextran from the vasculature of 5-month-old zebrafish presented in Extended Data Figure 3?

      We agree that the dextran signal was reduced compared to the other experiments in that Figure. This is likely due to sample preparation and clearing causing reduced fluorescence. Upon consideration of the presented data and the additional experiments using 10 kDa tracers providing further validations for our claims, we decided to remove this data from the paper.

      (12) The study would benefit from a clear separation of the phenotypes caused by the loss of VSMC. The title eludes that also capillaries present hemorrhages which is not the case. How do vascular mural cells differ from mural cells? Are there any other mural cells?

      We take the reviewers point and have now updated the title as "Mural cells protect the adult brain from haemorrhage but do not control the blood-brain barrier in developing zebrafish."

      (13) I have a few comments about how the authors have interpreted the literature and why, in my opinion, they should revise their strong statements (e.g., the last sentence in the abstract).

      Scientists have their own insights and interpretations of data. However, when citing published data, it should be clearly indicated whether the statement is a direct quote from the original publication or an interpretation. In the current manuscript, the authors have not correctly cited the data presented in the two published papers (references 5 and 6). These papers do not propose a model where pericytes suppress "adsorptive transcytosis" (lines 73-76). While increased transcytosis is observed in pericyte-deficient mice, the specific type of vesicular transport that is increased or induced remains unknown.

      Similarly, lines 151-152 refer to references 5 and 6 and use the term "adsorptive transcytosis," but the authors of both papers did not use this term. Attributing this term to the original authors is inaccurate. Additionally, lines 152-153 do not accurately represent the findings of references 5 and 6. These papers do not state that there is an induction of "caveolae" in endothelial cells in pericyte-deficient mice. In the absence of pericytes, many vesicles can be observed in endothelial cells, but these vesicles are relatively large. It is more likely that there is some form of uncontrolled transcytosis, perhaps micropinocytosis. Please refer to the original papers accurately.

      We thank the reviewer for these comments. We take the point and have rewritten the manuscript carefully to improve accuracy and avoid misrepresenting any previous claims made in specific papers.

      Also, the authors have missed the fact that in mice, the extent of pericyte loss correlates with the extent of BBB leakage. To a certain extent, the remaining pericytes, can compensate for the loss by making longer processes and so ensure the full longitudinal coverage of the endothelium. This was shown in the initial work of Armulik et al (reference 5) and later in other studies.

      We certainly did not miss this important point (as we are also working with these mouse models) and we now include reference to this in our expanded discussion. Of note, we do think it would be worthwhile assessing if the extent of BBB leakage and pericyte coverage also correlates with the presence of microhaemorrhages in these hypomorphic mouse models, although this is more challenging to do in mice than in zebrafish.

      The bold assertion on lines 183 -187 that a lack of specific BBB phenotype in pdgfrb zebrafish mutant invalidates mouse model findings is unfounded. Despite the notion that zebrafish endothelium possesses a BBB, I present a few examples highlighting the differences in brain vascular development and why the authors' expectation of a straightforward extrapolation of mouse BBB phenotypes to zebrafish is untenable.

      In mice Pdgfrb knockout is lethal, but in zebrafish, this is not the case. In marked contrast to mice, however, zebrafish pdgfrb null mutants reach adulthood despite extensive cerebral vascular anomalies and hemorrhage. Following the authors' argumentation about the unlikely divergence of zebrafish and mice evolution, does it mean that the described mouse phenotype warrants a revisit and that the Pdgfrb knockout in mice perhaps is not lethal? Another example where the role of a gene product is not one-to-one, which relates to pericyte development, is Notch3. Notch3-null mice do not show significant changes in pericyte numbers or distribution, suggesting a less prominent role in pericyte development compared to zebrafish.

      Although many aspects of development are conserved between species, there are significant differences during brain vascular development between zebrafish and mice. These differences could reveal why the BBB is not impaired in zebrafish pdgfrb mutants. There is a difference in the temporal aspect when various cellular players emerge. The timing of microglia colonization in the brain differs. In mice, microglia colonization starts before the first vessel sprouts enter the brain, while in zebrafish, microglia enter after. Additionally, microglia in zebrafish and mice have a different ontogeny. In mice, astrocytes specialize postnatally and form astrocyte endfeet postnatally. In zebrafish, radial glia/astrocytes form at 48 hpf, and as early as 3 dpf, gfap+ cells have a close relationship with blood vessels. Thus, these radial glia/astrocyte-like cells could play an important role in BBB induction in zebrafish. It's worth noting that in Drosophila, the blood-brain barrier is located in glial cells. While speculative, these cells might still play a role in zebrafish, while the role of pericytes does not seem to be crucial. Pericytes enter the brain and contact with developing vasculature (endothelium) relatively late in zebrafish (60 hpf). In mice, the situation is different, as there is no such lag between endothelium and pericyte entry into the brain. I suggest that the authors approach the observed data with curiosity and ask: Why are these differences present? Are all aspects of the BBB induced by neural tissue in zebrafish? What is the contribution of microglia and astrocytes?"

      Another interesting aspect to consider is the endothelial-pericyte ratio and longitudinal coverage of pericytes in the zebrafish brain, and how this relates to what is observed in mice. How similar is the zebrafish vasculature to the mouse vasculature when it comes to the average length of pericytes in the zebrafish brain? Does the longitudinal coverage of pericytes in the zebrafish brain reach nearly 100%, as it does in mice?

      Based on the preceding arguments, it is recommended that the authors present a balanced discussion that provides insightful discussion and situates their work within a broader framework.

      Overall, we agree with most of the points made by the reviewer above. As we have now extended the format of this paper to be a full article, we have space to provide an extended discussion and introduction. We now try to capture many of the points made by the reviewer and we think that this has significantly improved the paper. We thank the reviewer for this contribution.

      We do want to point out that we did not state that our findings using zebrafish pdgfrb mutants invalidate mouse model findings. We suggest that a deeper analysis to understand the nature of the hotspots in mural cell deficient mammalian models could be very interesting in light of the zebrafish observations. We hope that the revised discussion better reflects this.

      Reviewer #3 (Public review):

      This manuscript examines the role of pdgfrb-positive pericytes in the establishment and maintenance of the blood-brain barrier (BBB) in the zebrafish. Previous studies in PDGFB- or PDGFRB-deficient mice have suggested that loss of pericytes results in disruption of the BBB. The authors show that zebrafish pdgfrb mutant larvae have an intact BBB and that pdgfrb mutant adult fish show large vessel defects and hemorrhage but do not exhibit substantial leakage from brain capillaries, suggesting loss of pericytes is not sufficient to "open" the BBB. The authors use beautiful and compelling images and rigorous quantification to back up most of their conclusions. The imaging of the adult brain is particularly nice. The authors rigorously document the lack of BBB leakage in pdgfrbuq30bh mutant larvae and large vessel phenotypes (eg, enlargement and rupture) in pdgfrbuq30bh mutant adults. A few points would help the authors to further strengthen their findings contradicting the current dogma from rodent models.

      We appreciate the reviewer's comments on the manuscript overall and agree that addressing the raised points was needed to strengthen our findings. We have addressed the main points below and believe that this revision greatly improves this study.

      Major point:

      The authors document pericyte loss using a single TgBAC(pdgfrb:egfp)ncv22 transgenic line driven by the promoter of the same gene mutated in their pdgfrbuq30bh mutants. Given their findings on the consequences of pericyte loss directly contradict current dogma from rodent studies, it would be useful to further validate the absence of brain pericytes in these mutants using one of several other transgenic lines marking pericytes currently available in the zebrafish. This could be done using pdgfrb crispants, which the authors show nicely phenocopy the germline mutants, at least in larvae. This would help nail down the absence of any currently identifiable pericyte population or sub-population in the loss of pdgfrb animals and substantially strengthen the authors' conclusions.

      We thank the reviewer and agree that examination of pdgfrb<sup>uq30bh</sup> mutants using another transgenic line labelling pericytes would further validate the absence of brain pericytes. We generated a transgenic line, TgBAC(abcc9:abcc9-T2A-mCherry)<sup>uom139</sup>, to visualise pericytes and validated the absence of brain pericytes in the pdgfrb mutants (revised Extended Data Fig. 1b). The loss of brain pericytes matched our findings using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line as well as previously published data by Ando et al 2016-2021, where the brain pericytes except for metencephalic artery were missing[2,3].

      Other issues:

      The authors should provide more information about the pdgfrbuq30bh mutant and how it was generated (including a diagram in a supplemental figure would be useful).

      We thank the reviewer for this suggestion. In addition to the explanations provided in supplementary materials, we have added a schematic, provided sanger sequencing results showing the mutation as well as predicted effect of the mutation on the protein domains (Extended Data Fig. 1a).

      It would be helpful to show some data on whether mutants show morphological phenotypes or developmental delay at 7 and 14 dpf, to provide some context to better assess the reduced branching and vessel length vascular phenotypes (see Figures 1c-e).

      We thank the reviewer for this suggestion. We have provided further details on body length and survival of the pdgfrb mutants until 90 dpf. As reported by Ando et al 2021, we did not observe any distinguishing feature until about 30 dpf[1,3]. The adult anatomy of our mutant allele matches that of previously described null mutants and is now shown (Extended Data Fig. 1f).

      If available, it would be helpful to have a positive control for the tracer leakage experiments - a genetic manipulation that does cause disruption of the BBB and leakage at 2 hours post-tracer injection (see Figures 1f and g).

      We thank the reviewer for this suggestion and agree that a positive control would validate reliability of our method. We have performed new experiments at 3 dpf when BBB integrity is not yet established and at 7 dpf when BBB is functional in zebrafish[5], testing both 10 and 70 kDa tracers (new data in Fig. 2e–f). We detected significantly higher tracer accumulation at 3 dpf, showing that our methods can detect tracer leakage in the brain.

      Quantification of the findings in Figure 4c, d would be useful, as would the use of germline fish for these experiments if these are now available. If this is not possible, it would be helpful to document that the crispants used in these experiments lack pdgfrb:egfp pericytes at adult stages (this is only shown for 5 dpf larvae, in Extended Data Figure 4b).

      We thank the reviewer for this comment. Using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line, we have imaged coronal brain sections collected from 10-week old pdgfrb crispants and uninjected siblings (age-matched animals used in Fig. 5d–e, previously Fig. 4c–d). We have now included data showing that adult pdgfrb crispants lack brain mural cells, phenocopying pdgfrb<sup>uq30bh</sup> mutants (new data, Extended Data Fig. 6f). These particular crispants are very reliable in our hands and nicely reproduce stable mutant phenotypes, giving us confidence to use the faster F0 approach in this experiment.

      Adult mutants clearly show less dye leakage in the more superficial capillary regions than WT siblings, but dextran intensity is a bit higher, although this could well be diffusion from more central brain regions where overt hemorrhage is occurring. Along similar lines though, the authors' TEM data in Extended Data Figure 4d hints that there may be more caveolae in mutant brain capillaries, although the N number was lower here than for the measurements from TEM of larger central vessels (Figure 4g). It would be useful to carry out additional measurements to increase the N number in Figure 4d to see whether the difference between wild-type sibling and mutant capillary caveolae numbers remains as not significant.

      We thank the reviewer for these raising important points and suggestions.

      Firstly, in relation to signal in capillary regions and likely diffusion from hotspots, please see the response to reviewer 3 point 9 above.

      Secondly, we have imaged and analysed more capillaries in both pdgfrb mutants and siblings (Extended Data Fig. 7a–b, previously Extended Data Fig. 4d). The results showed no significant difference between these groups, suggesting that capillary EC transcytosis is unchanged in our pdgfrb mutants.

      It might be helpful to include some orienting labels and/or additional descriptions in the figure legends to help readers who are not used to looking at zebrafish brain vessels have an easier time figuring out what they are looking at and where it is in the brain.

      We thank the reviewer for this suggestion and agree that adding further information in the figure legends and illustrations about orientation would make it easier for readers. In addition to the information provided in the figure legends in the submitted version, we have added an illustration, more labels on the revised figures, extended the descriptions in figure legends, main text and methods.

      We have added a schematic depicting the tracer leakage assay workflow, orientation of live imaging and analysed region of interest (Extended Data Fig. 1a–b).

      All figure legends have been updated with the anatomical position and microscopy view.

      Additional labels on figures have been added to understand the referenced vessel names (new data in Fig. 3c and Extended Data Fig. 4a–b′).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study uses the intensity of tracer signals within the vessels to analyze BBB permeability, potentially underestimating leakage severity. The dye intensity is measured 2 hours after injection, however, other studies have already observed leakage after 30 Minutes, by imaging directly in the brain parenchyma. The overall intensity should also decrease through leakage from the other vessels of the body, e.g. in the trunk and tail. Probably the loss of intra-vascular dye intensity from leakage in barrier-free vessels is already so high (after 2 hours) that the smaller amount of leakage across the BBB cannot be observed.

      We thank the reviewer for this comment and suggestion. We agree that small sized tracers leak from vasculature, particularly through fenestrated vessels in the trunk and tail. We have based our timing on previous studies and our own experience. In zebrafish, the study by O’Brown et al 2019 also used 2 hpi[5] for detection of leakage in mfsd2aa mutants, which also has been proposed to regulate BBB integrity by controlling EC transcytosis. Therefore, we believe that performing experiments at 2 hpi is appropriate to investigate roles of pericytes in BBB integrity. Our data would suggest that this timing works.

      In response to this and other comments, we performed further experiments and analyses to test leakage of tracers testing molecular weights ranging from 1 to 2000 kDa individually. We showed that these tracers can reliably be detected in brain parenchyma and vasculature when imaged at 2 hpi. In another study, we showed that medium size tracers such as 40 kDa Dextran can be reliably detected in the vasculature in similar timepoints[10]. Considering we have performed experiments using 10 and 70 kDa tracers do detect parenchymal tracer accumulation and tracer still within the vessels, we believe this timepoint is appropriate for assessing BBB integrity in zebrafish.

      In addition to these experiments, see our tracer leakage experiments in 1-month-old animals, at 0.5 and 6 hpi to test leakage pattern described above (Fig. 5 and Extended Data Fig. 6).

      Therefore, the authors will need to validate their method of choice, showing an impairment of the BBB, caused by other agents (known to affect the BBB), and at 48hpf, when the BBB is not tightened yet. One example for BBB impairment can be found in O'Brown et al (2019), eLife 8e47326. doi: 10.7554/eLife.47326

      We thank the reviewer for this suggestion. As shown by O’Brown et al 2019, we have performed experiments at 3 dpf when BBB integrity is not mature and at 7 dpf when BBB is functional[5], testing both 10 and 70 kDa tracers. We detected significantly higher tracer accumulation at 3 dpf, showing our new additional method (see below) can detect tracer leakage in the brain (new data in Fig. 2e–f).

      Ideally, the authors would also supplement the method with additional approaches in the younger developmental stages to validate their findings.

      The validation of the method and the findings is particularly important for the claims of lack of BBB impairment in the absence of mural cells, as this is a "negative" finding.

      In response to this and comments from other reviewers, we performed additional tracer leakage experiments (new data in Fig. 2a–d) where we imaged 10 and 70 kDa tracers with a vascular reporter (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this reporter for normalisation. Both this approach as well as the experiments provided in the first submission (updated as Extended Data Fig. 3a–d) showed that pdgfrb mutants at 7 and 14 dpf have indistinguishable BBB integrity compared to siblings. See also Author response image 1 that further addresses this.

      I also strongly suggest to rephrase and downtown the claim that vascular mural cells do not control the blood-brain barrier in developing zebrafish.

      As a negative finding cannot be proven completely and lots of the previously shown effects on murine BBB impairment are rather weak (when caused by single agents such as Claudin5 deficiency or Sphingosine-phosphate receptor1 knockout), it might be important to only claim that in zebrafish no strong impairment (as observed in the mural cell-deficient mouse) could be observed. Or rephrase it to "no impairment as severe as/comparable to ... could be observed" and then provide an impairment control for the developmental stages.

      We thank the reviewer for this comment and agree that negative findings are very challenging to prove. However, we find no evidence of leakage of the BBB in animals lacking mural cells at 7 and 14 dpf and believe that our data is robust on this point. As such, we believe we show that a vertebrate with a largely conserved EC BBB, can have intact barrier function in the absence of mural cells.

      We have as suggested revised our claims throughout the manuscript to provide more further nuanced discussion of this, but we do not want to water down our claims too much as we believe they are important. We hope that the reviewer will appreciate our carefully worded and expanded discussion section.

      Additional items of interest to the readers and therefore suggestions to improve the manuscript could be

      (1) To include more molecular analysis: while the study identifies caveolae induction and basement membrane thickening as potential contributors to focal leakage, the exact molecular mechanisms linking mural cell loss to these structural changes are not deeply investigated.

      (2) Also, the study primarily associates BBB disruption in the adult with aneurysms. Therefore other subtle or diffuse changes to BBB permeability that might occur even without overt vascular lesions are potentially underrepresented.

      However, following up experimentally on these might exceed the scope of the manuscript.

      We thank the reviewer for these suggestions and agree with both points. However, as stated by the reviewer, these experiments are beyond the scope of the manuscript and represent future directions for our lab and others.

      Reviewer #2 (Recommendations for the authors):

      (1) Mouse genes should be written as follows: Pdgfb, Pdgfrb and be in italics. See line line 70: it should be written "Pdgfb and Pdgfrb (italics)" and not "PdgfB and Pdgfrβ".

      We have updated the text according to the reviewer’s suggestion.

      (2) Please state the age of the fish analyzed in Figure 1f and 1g.

      We have moved this data to Extended Fig. 3a–d (previously Fig. 1f-g) and have placed age information on the images and in the figure legends.

      (3) Is the reduced vascular complexity in pdgfb mutant due to reduced angiogenesis or due to excessive pruning?

      This is a good question, and we do not know at this stage. We have unpublished data that suggest pericytes secrete angiogenic growth factors, but this question warrants a thorough investigation that we believe is beyond the scope of this current study.

      (4) Please check that the figure legends state the correct number of fish analysed. For example, Figure 1 d, e N=8 but there seem to be 9 data points per group - 14dpf.

      We apologise for this mistake and thank the reviewer for raising this. We have updated the graphs and figure legends accordingly.

      (5) Please indicate in the figures the genotypes (wt, het) of a sibling presented alongside a pdgfb mutant.

      Wild-type and heterozygous mutants are commonly used together in zebrafish research as a collective control group termed siblings. Since we didn’t see any difference between wild-type and pdgfrbuq30bh/- groups in any experiments, we reported these groups together. This is now stated in the supplementary materials.

      One exception to this was examination of the growth and survival rates where we show the genotypes separately (new data in Extended Data Fig. 1b-f).

      (6) Please explain clearly what region is shown in Figure 2B. I do not understand the explanation "approximate location of dotted line". Is the image in the panel "a" top view of a brain?

      We have moved this data to Fig. 3a′ (previously Fig. 2b) and replaced the dotted line in Figure 3a (previously Fig. 2a) with a white box indicating the location of the restricted region in the whole brain image.

      We have revised the text as below:

      “Subset of z-slices from the whole brain imaging in (a) and (b) (white boxes) indicating mural cell loss and abnormal capillary network patterning. 100-μm-thick maximum intensity projections (MIP) were generated using the continuation of the left middle mesencephalic central artery (MMCtA, arrow) as an anatomical landmark.”

      In addition, we have updated all our figure legends clearly stating the view and anatomical position of the imaged sample.

      (7) Figure 2e: Note that- the dotted areas do not correspond to the areas magnified. Please adjust.

      We have moved this data to Extended Data Fig. 5a (previously Fig. 2e–e′) and updated the location of the white box in 5a shown in enlarged view in 5a′.

      (8) Lines 112 and 114 - Should the indicated figure be Figure 2b-d and Figure 2c-d, respectively, and not Figure 1?

      We thank the reviewer for pointing out this mistake. All the figure legends are now referred to appropriately in the revised manuscript.

      (9) Data presented in Figure 2 and Figure 3 can be consolidated and presented as one Figure.

      We thank the reviewer for this suggestion. After addition of new data and revising the manuscript we have decided to keep these data presented separately.

      (10) Note that Figure 2a,b shows 5-month-old fish, not 2-month-old fish. Additionally, Extended Data Figure 3 shows 5-month-old fish, not 3-month-old fish.

      The stages noted by the reviewer were correctly indicated.

      (11) Figure 2d: Please clarify the definition of a "large vessel".

      We have observed normal morphology in capillaries and noted aneurysms and hotspots in large calibre vessels such as arteries, which become more severe over time. We have revised this across the manuscript accordingly.

      (12) Figure 4a, b: Please explain how the hotspots of leakage were defined based on the extravasated tracer.

      Hotspots of leakage are scored when fluorescent tracer aggregates are clearly observed outside the vessels. Vessel borders were defined using the transgenic lines (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). We have added a clear description in the methods section (lines 473–475).

      Figure 4c: Why were Pdgfrb crispants used and not the mutant line?

      They were used as pdgfrb crispants phenocopy the lack of brain mural cells (Extended Data Fig. 5e, previously Extended Data Fig. 4b) and mutant phenotype reliably and for practical reasons, because they allow faster experiments and reduce fish usage.

      Figure 4e: The magnification of the electron microscopy images does not make it possible to clearly identify caveolae. What was the magnification of the collected images for caveolae analysis? How did the authors ensure that they quantified only caveolae and not other types of vesicles?

      Respectfully, we disagree that the magnification is insufficient as our images were captured and analysed consistent with previous ultrastructural descriptions[11,12]. We based our quantification of caveolae on the size of vesicles observed and define them as circular profiles of less than 100 nm in diameter and were scored as luminal or abluminal based on proximity to each surface membrane (within 500 nm of each surface or in a thin-walled vessel the caveolae closest to each surface) (lines 398–409). Importantly, comparable analyses at similar magnifications have been independently validated in multiple caveola-deficient zebrafish genetic models[4,13]. Interestingly given the reviewers comments above, we do see increased vesicular structures that are larger than caveolae, but we only provide quantification of the caveolae here.

      Reviewer #3 (Recommendations for the authors):

      Congratulations to the authors on their really beautiful imaging and rigorous quantitative documentation of phenotypes - this is a really nicely done study, and could be very important to the field with just a few additional experiments to buttress the key conclusions.

      We thank the reviewer for their kind comments.

      In addition to the comments noted in the public review, I would only point out that there are two mislabeled call-outs in the text (Lines 112 and 114; says Figure 1, should say Figure 2).

      We thank the reviewer for this point and have now revised the text accordingly.

      (1) Ando, K., Ishii, T. & Fukuhara, S. Zebrafish Vascular Mural Cell Biology: Recent Advances, Development, and Functions. Life (Basel) 11 (2021). https://doi.org/10.3390/life11101041

      (2) Ando, K. et al. Clarification of mural cell coverage of vascular endothelial cells by live imaging of zebrafish. Development 143, 1328-1339 (2016). https://doi.org/10.1242/dev.132654

      (3) Ando, K. et al. Conserved and context-dependent roles for pdgfrb signaling during zebrafish vascular mural cell development. Dev Biol 479, 11-22 (2021). https://doi.org/10.1016/j.ydbio.2021.06.010

      (4) Lim, Y. W. et al. Trans-Endothelial Trafficking in Zebrafish: Nanobio Interactions of Polyethylene Glycol-Based Nanoparticles in Live Vasculature. ACS Nano (2026). https://doi.org/10.1021/acsnano.5c21042

      (5) O'Brown, N. M., Megason, S. G. & Gu, C. Suppression of transcytosis regulates zebrafish blood-brain barrier function. Elife 8 (2019). https://doi.org/10.7554/eLife.47326

      (6) O'Brown, N. M. et al. The secreted neuronal signal Spock1 promotes blood-brain barrier development. Dev Cell 58, 1534-1547 e1536 (2023). https://doi.org/10.1016/j.devcel.2023.06.005

      (7) Armulik, A. et al. Pericytes regulate the blood-brain barrier. Nature 468, 557-561 (2010). https://doi.org/10.1038/nature09522

      (8) Daneman, R., Zhou, L., Kebede, A. A. & Barres, B. A. Pericytes are required for blood-brain barrier integrity during embryogenesis. Nature 468, 562-566 (2010). https://doi.org/10.1038/nature09513

      (9) Mae, M. A. et al. Single-Cell Analysis of Blood-Brain Barrier Response to Pericyte Loss. Circ Res 128, e46-e62 (2021). https://doi.org/10.1161/CIRCRESAHA.120.317473

      (10) Lim, Y.-W. et al. A Standardized Protocol to Investigate Trans- Endothelial Trafficking in Zebrafish: Nano-bio Interactions of PEG-based Nanoparticles in Live Vasculature. bioRxiv, 2025.2007.2023.666282 (2025). https://doi.org/10.1101/2025.07.23.666282

      (11) Parton, R. G. & Simons, K. The multiple faces of caveolae. Nat Rev Mol Cell Biol 8, 185-194 (2007). https://doi.org/10.1038/nrm2122

      (12) Parton, R. G. & del Pozo, M. A. Caveolae as plasma membrane sensors, protectors and organizers. Nat Rev Mol Cell Biol 14, 98-112 (2013). https://doi.org/10.1038/nrm3512

      (13) Lim, Y. W. et al. Caveolae Protect Notochord Cells against Catastrophic Mechanical Failure during Development. Curr Biol 27, 1968-1981 e1967 (2017). https://doi.org/10.1016/j.cub.2017.05.06

    1. eLife Assessment

      The authors aim to understand why Kupffer cells (KCs) die in metabolic-associated steatotic liver disease (MASLD). This is a valuable study using in vitro studies and an in vivo genetic mouse model, suggesting that increased glycolysis contributes to KC death in MASLD. The data presented are now convincing and adequately revised. This work will be of interest to researchers in the immunology and metabolism fields.

    2. Reviewer #3 (Public review):

      This manuscript provides novel insights into altered glucose metabolism and KC status during early MASLD. The authors propose that hyperactivated glycolysis drives a spatially patterned KC depletion that is more pronounced than the loss of hepatocytes or hepatic stellate cells. This concept significantly enhances our understanding of early MASLD progression and KC metabolic phenotype.

      Through a combination of TUNEL staining and MS-based metabolomic analyses of KCs from HFHC-fed mice, the authors show increased KC apoptosis alongside dysregulation of glycolysis and the pentose phosphate pathway. Using in vitro culture systems and KC-specific ablation of Chil1, a regulator of glycolytic flux, they further show that elevated glycolysis can promote KC apoptosis.

      However, it remains unclear whether the observed metabolic dysregulation directly causes KC death or whether secondary factors, such as low-grade inflammation or macrophage activation, also contribute significantly. Nonetheless, the results, particularly those derived from the Chil1-ablated model, point to a new potential target for the early prevention of KC death during MASLD progression.

      The manuscript is clearly written and thoughtfully addresses key limitations in the field, especially the focus on glycolytic intermediates rather than fatty acid oxidation. The authors acknowledge the missing mechanistic link between increased glycolysis and KC death. A few things require clarification.

      Strengths:

      • The study presents the novel observation of profound metabolic dysregulation in KCs during early MASLD and identifies these cells as undergoing apoptosis. The finding that Chil1 ablation aggravates this phenotype opens new avenues for exploring therapeutic strategies to mitigate or reverse MASLD progression.

      • The authors provide a comprehensive metabolic profile of KCs following HFHC diet exposure, including quantification of individual metabolites. They further delineate alterations in glycolysis and the pentose phosphate pathway in Chil1-deficient cells, substantiating enhanced glycolytic flux through 13C-glucose tracing experiments.

      • The data underscore the critical importance of maintaining balanced glucose metabolism in both in vitro and in vivo contexts to prevent KC apoptosis, emphasizing the high metabolic specialization of these cells.

      • The observed increase in KC death in Chil1-deficient KCs demonstrates their dependence on tightly regulated glycolysis, particularly under pathological conditions such as early MASLD.

      Weaknesses:

      • The TUNEL staining in the overview in Figure 2 is not convincing. Typically the signal overlaps with DAPI, which is mostly not the case in the figures shown.

      • The mechanistic link between elevated glycolytic flux and KC death remains unclear.

      • Figure S5: shows deltadelta CT values, not relative values. What are the housekeeping genes? There should be at least 2, and they should not have metabolically related functions such as Gapdh.

      • Figure 1C: shows WT and KO gating side by side

      • The following point has not been answered: "While BMDMs from Chil1 knockout mice are used to demonstrate enhanced glycolytic flux, it remains unclear whether Chil1 deficiency affects macrophage differentiation itself." Expression of certain genes that indicate function does not show whether BMDMs isolated from these KO mice are fully differentiated. Here, counting BM input/ BMDM output, flow cytometry on BMDMs, morphology etc. should be tested.

    3. Reviewer #4 (Public review):

      Summary:

      In this study, He et al. investigate the mechanisms underlying Kupffer cell (KC) loss during metabolic stress. It has long been observed that embryonically derived KCs decline in obesity and liver disease, a loss that is compensated by monocyte recruitment, although the underlying mechanisms remain unclear. The authors propose that metabolic reprogramming, particularly excessive glycolysis, drives KC death. Using an original murine genetic model to modulate glycolysis, they further demonstrate that enhanced glycolytic activity exacerbates KC damage.

      Strengths:

      Overall, the study is extremely clearly presented, with a convincing and simple message destined to a vast audience.

      Weaknesses:

      This manuscript has already undergone one round of revisions in which I was not involved. The authors have tried to address several points raised by the previous reviewers, notably regarding the unexpectedly high level of TUNEL staining observed in KCs. However, I share these concerns expressed by the three reviewers that the reported levels remain difficult to reconcile with the biology. A TUNEL positivity rate of ~60% at week 16 of the HFHC diet would imply massive KC death, which should have led to a near-complete depletion of the KC population, something that is not observed. While I agree that the KC compartment is clearly affected under this dietary challenge, I would strongly encourage the authors to carefully rule out potential technical biases that could account for this implausibly high rate of cell death.

      Considering the new in-vivo experiment with 2-DG, it is definitely convincing and clearly adds some value to the full study.

      So the full story deserves publication.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to investigate the mechanisms underlying Kupffer cell death in metabolic-associated steatotic liver disease (MASLD). The authors propose that KCs undergo massive cell death in MASLD and that glycolysis drives this process. However, there appears to be a discrepancy between the reported high rates of KC death and the apparent maintenance of KC homeostasis and replacement capacity.

      Strengths:

      This is an in vivo study.

      Weaknesses:

      There are discrepancies between the authors' observations and previous reports, as well as inconsistencies among their own findings.

      Before presenting the percentage of CLEC4F<sup>+</sup>TUNEL<sup>+</sup> cells, the authors should have first shown the number of CLEC4F<sup>+</sup> cells per unit area in Figure 1. At 16 weeks of age, the proportion of TUNEL<sup>+</sup> KCs is extremely high (~60%), yet the flow cytometry data indicate that nearly all F4/80<sup>+</sup> KCs are TIMD4<sup>+</sup>, suggesting an embryonic origin. If such extensive KC death occurred, the proportion of embryonically derived TIMD4<sup>+</sup> KCs would be expected to decrease substantially. Surprisingly, the proportion of TIMD4<sup>+</sup> KCs is comparable between chow-fed and 16-week HFHC-fed animals. Thus, the immunostaining and flow cytometry data are inconsistent, making it difficult to explain how massive KC death does not lead to their replacement by monocyte-derived cells.

      We thank the reviewer for the insightful comment and the opportunity to clarify this important point. To ensure consistency between our methodologies, we replaced Clec4f staining with TIM4 staining results as requested by the reviewer. We first showed the number of TIM4<sup>+</sup> cells per unit area in Figure 1B. The results showed a significant and progressive loss of TIM4<sup>+</sup> cells per unit area in the liver parenchyma, decreasing from approximately 60 cells/FOV at baseline (0w) to nearly 50 at 4w and further to about 30 at 16w post-HFHC diet. This finding is fully consistent with our flow cytometry data. The percentage of the embryonically derived KC population (CD11blow F4/80hi TIM4hi) among CD45<sup>+</sup> cells dropped from 30.2% (0w) to 24.3% (4w) and 17.6% (16w) (Revised Figure 1C). The absolute number per gram of liver decreased from roughly 12 x 10<sup>5</sup> (1w) to 9 x 10<sup>5</sup> (4w) and 5 x 10<sup>5</sup> (16w) (Revised Figure 1D).

      These data suggest that despite the reported high rate of cell death among CLEC4F<sup>+</sup>TIMD4<sup>+</sup> KCs, the population appears to self-maintain, with no evidence of monocyte-derived KC generation in this model, which contradicts several recent studies in the field.

      We appreciate the reviewer’s insightful comment. We agree that our data show no substantial generation of monocyte-derived Kupffer cells (MoKCs) within the 16-week HFHC model. However, we do not believe the remaining embryonic KCs(EmKCs) are maintained through self-renewal, as the proportion of Ki67<sup>+</sup>TIM4<sup>+</sup> cells remains low at all time points (Revised Figure S2D). Instead, our observations align with a phased replacement model: recruited monocytes first differentiate into monocyte-derived macrophages (MoMFs), which we see accumulate (Revised Figure S2B, S2C), and only later adopt a KC phenotype. Consistent with this, our 16-week model shows significant EmKC loss and MoMF expansion, but not yet the emergence of TIM4-MoKCs. This timing is supported by prior studies, where TIM4-KCs were observed at 24 weeks, but not at 16 weeks, on similar diets (Ref. 1,2). Therefore, we interpret our findings as capturing an earlier phase of MASLD progression, characterized by EmKC death and MoMF accumulation, prior to their full differentiation into MoKCs.

      Moreover, there is no evidence that TIM4<sup>+</sup>CLEC4F<sup>+</sup> KCs increase their proliferation rate to compensate for such extensive cell death. If approximately 60% of KCs are dying and no monocyte-derived KCs are recruited, one would expect a much greater decrease in total KC numbers than what is reported.

      Thank you for raising this point, which allows for an important clarification. The interpretation that approximately 60% of KCs are dying is correct, but this refers to the proportion of the remaining KC population at 16 weeks that is TUNEL<sup>+</sup>, not to 60% of the original KC pool. Since our data show that over half of the EmKCs are lost by 16 weeks (Revised Figure 1B), the 60% of dying cells at this late time point corresponds roughly to only 25-30% of the total original KC population at baseline. This distinction reconciles the high rate of apoptosis observed late in disease with the overall progressive depletion of the EmKC pool.

      It is also unexpected that the maximal rate of KC death occurs at early time points (8 weeks), when the mice have not yet gained substantial weight (Figure 1B). Previous studies have shown that longer feeding periods are typically required to observe the loss of embryo-derived KCs.

      We appreciate the reviewer’s insightful observation. We think KC death is a continuous event during MASLD. To induce MASH, previous studies typically assess the loss of EmKCs after longer feeding periods, which might leave us an impression that longer feeding periods are required to observe substantive loss of embryonically derived KCs. In our HFHC model, the proportion of dying KCs was already elevated by 8 weeks, and this high rate was sustained through the 16-week endpoint. In a separate MCD dietary model characterized by rapid MASLD progression, a high rate of KC death was detectable as early as 6 weeks (Revised Figure 1F). Collectively, these data suggest that the onset of significant KC death is dependent on the pace of MASLD pathogenesis, more likely an early-initiated event that is through MASLD progression.

      Furthermore, it is surprising that the HFD induces as much KC death as the HFHC and MCD diets. Earlier studies suggested that HFD alone is far less effective than MASH-inducing diets at promoting the replacement of embryonic KCs by monocyte-derived macrophages.

      We appreciate the reviewer’s insightful comment. In our study, we observed significant KCs death under both HFD and HFHC feeding for 20, 16 weeks, respectively. Moreover, both HFHC and HFD induced similar stages of MASLD (characterized by significant lipid accumulation without fibrosis development) by these time points (Authir response image 1). Therefore, these data support that the onset of substantial KCs death may be an early MASLD event, before the progression to MASH. Additionally, this finding aligns with existing literature showing that 16 weeks of HFD feeding alone is sufficient to cause a marked reduction in the TIM4<sup>+</sup>KCs population (Ref. 1).

      Author response image 1.

      Detection of liver fibrosis in MASLD mouse models. Male wild-type C57BL/6J mice were fed a high-fat, high-cholesterol (HFHC) diet for 16 weeks or a high-fat diet (HFD) for 20 weeks to induce MASLD. Mice fed a normal chow diet (NCD) served as controls. (A) Sirius Red staining of liver sections was performed to assess collagen deposition and fibrosis during MASLD progression. Scale bar, 20 μm. (B) Western blot analysis of liver tissue lysates showing α-smooth muscle actin (α-SMA) expression as a marker of hepatic stellate cell activation and liver fibrosis.

      In Figure 2D, TIMD4 staining appears extremely faint, making the results difficult to interpret. In contrast, the TUNEL signal is strikingly intense and encompasses a large proportion of liver cells (approximately 60% of KCs, 15% of hepatocytes, 20% of hepatic stellate cells, 30% of non-KC macrophages, and a proportion of endothelial cells is also likely affected). This pattern closely resembles that typically observed in mouse models of acute liver failure. Given this apparent extent of cell death, it is unexpected that ALT and AST levels remain low in MASH mice, which is highly unusual.

      Thank you for this important feedback. To address concerns about the clarity of our imaging, we have provided high-resolution split-channel raw images for Figure 2D (Revised Figure 2D), which distinctly show the localization of TIM4, TUNEL, and GS. These confirm the progressive reduction of TIM4<sup>+</sup>KCs and the increase in TUNEL<sup>+</sup> TIM4<sup>+</sup>cells over time. We agree that the high proportion of TUNEL<sup>+</sup>cells seems at odds with the modest ALT/AST elevation. This discrepancy might be explained by the distinct nature of cell death in MASLD. Unlike the acute necrosis with membrane rupture seen in acute liver failure—which causes massive, rapid enzyme release— obesity-related liver injury is a chronic process dominated by apoptosis (Ref. 4,5). Apoptosis preserves membrane integrity until late stages (Ref. 6), with dying cells packaged into apoptotic bodies for efficient phagocytic clearance by neighboring macrophages (Ref. 7,8). This controlled disposal system minimizes the leakage of intracellular enzymes. Therefore, the coexistence of widespread apoptosis (high TUNEL signal) with limited enzyme release (low ALT/AST) is a recognized feature of chronic MASLD pathogenesis.

      No statistical analysis is provided for Figure 5D, and it is unclear which metabolites show statistically significant changes in Figure 5C.

      We thank the reviewer for raising this statistical problem. We have now included statistical analysis in Revised Figure 5D.

      In addition, there is no evaluation of liver pathology in Clec4f-Cre × Chil1flox/flox mice. It remains possible that the observed effects on KC death result from aggravated liver injury in these animals. There is also no evidence that Chil1 deficiency affects glucose metabolism in KCs in vivo.

      We thank the reviewer for these important points. We previously characterized the liver pathology of Clec4f<sup>ΔChil1</sup> mice in detail (preprint: eLife 2025, DOI: 10.7554/eLife.107023.1, Fig. 2). On a normal chow diet, these mice showed no differences in body weight, hepatic lipid deposition, metabolic parameters, or glucose tolerance compared to controls. However, on an HFHC diet, Clec4f<sup>ΔChil1</sup> mice developed significantly worse metabolic and histological phenotypes. Crucially, our in vitro data demonstrate that recombinant Chi3l1 directly reduces KC death (preprint, Fig. 6E-F), indicating that the aggravated MASLD in knockout mice is a consequence of increased KC loss, not its cause.

      Regarding glucose metabolism, we have previously shown that Chi3l1 deficiency leads to increased glucose uptake by KCs in vivo using the fluorescent glucose analog 2-NBDG. This effect was reversed by supplementing knockout mice with recombinant Chi3l1 (preprint Fig. 6G-H). This provides direct evidence that Chi3l1 modulates glucose uptake in KCs in vivo.

      Finally, the authors should include a more direct experimental approach to modulate glycolysis in KCs and assess its causal role in KC death in MASH.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) in the HFHC-induced MASLD model (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for four weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity during active disease development. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, He et al. set out to investigate the mechanisms behind Kupffer Cell death in MASLD. As has been previously shown, they demonstrate a loss of resident KCs in MASLD in different mouse models. They then go on to show that this correlates with alterations in genes/metabolites associated with glucose metabolism in KCs. To investigate the role of glucose metabolism further, they subject isolated KCs in vitro to different metabolic treatments and assess cleaved caspase 3 staining, demonstrating that KCs show increased Cl. Casp 3 staining upon stimulation of glycolysis. Finally, they use a genetic mouse model (Chil1KO) where they have previously reported that loss of this gene leads to increased glycolysis and validate this finding in BMDMs (KO). They then remove this gene specifically from KCs (Clec4fCre) and show that this leads to increased macrophage death compared with controls.

      Strengths:

      As we do not yet understand why KCs die in MASLD, this manuscript provides some explanation for this finding. The metabolomics is novel and provides insight into KC biology. It could also lead to further investigation; here, it will be important that the full dataset is made available.

      Weaknesses:

      Different diets are known to induce different amounts of KC loss, yet here, all models examined appear to result in 60% KC death. One small field of view of liver tissue is shown as representative to make these claims, but this is not sufficient, as anything can be claimed based on one field of view. Rather, a full tissue slice should be included to allow readers to really assess the level of death.

      Thank you for raising this point regarding data presentation. We analyzed full tissue slices and found that including a view of the entire slice at a standard magnification makes individual KC difficult to resolve (Author response image 2). To clearly represent the extent and distribution of KCs death across the liver tissue slice, we now include lower-magnification images that provide a wider field of view, allowing readers to assess the pattern across a larger tissue area (Revised Figures 1, 2, 6F).

      Author response image 2.

      Assessment of KCs death on full liver tissue slice. (A) Immunofluorescence staining was performed to detect Kupffer cell (KC) death in liver sections from mice fed an MCD diet for 6 weeks. Cell death was assessed by TUNEL staining (green), and KCs were identified by TIM4 staining (red). Nuclei were counterstained with DAPI (blue). Representative whole-tissue view is shown. Scale bars, 1mm.

      Additionally, there is no consistency between the markers used to define KCs and moMFs, with CLEC4F being used in microscopy, TIM4 in flow, while the authors themselves acknowledge that moKCs are CLEC4F+TIM4-. As moKCs are induced in MASLD, this limits interpretation. Additionally, Iba1 is referred to as a moMF marker but is also expressed by KCs, which again prevents an accurate interpretation of the data. Indeed, the authors show 60% of KCs are dying but only 30% of IBA1+ moMFs, as KCs are also IBA1+, this would mean that KCs die much more than moMFs, which would then limit the relevance of the BMDM studies performed if the phenotype is KC specific. Therefore, this needs to be clarified.

      We thank the reviewer for the constructive comments. For consistency, we have standardized our KC marker to TIM4 for all immunostaining data, aligning it with our flow cytometry analysis (Revised Figures 1, 2D, 6F). We have also clarified that IBA1 is expressed by hepatic macrophages (both KCs and MoMFs)(Revised Figure 2C, Revised manuscript, page 5, lines 182-183). Moreover, we also included the clarification that 60% of TIM4<sup>+</sup> KCs are TUNEL<sup>+</sup> versus 30% of total IBA1<sup>+</sup> cells further supports that KCs undergo death more readily than MoMFs (Revised manuscript, page 5, lines 186-189). We also acknowleged the limitation of BMDM studies in the Revised manuscript, page 8, line 332-340.

      The claim that periportal KCs die preferentially is not supported, given that the majority of KCs are peri-portal. Rather, these results would need to be normalised to KC numbers in PP vs PC regions to make meaningful conclusions.

      We thank the reviewer for this important point. We included the normalized data. At 8 weeks, the normalized death rate was significantly higher in periportal versus pericentral regions (p = 0.041), supporting increased periportal KC susceptibility during early MASLD. By 16 weeks, proportional death rates became comparable between zones (Revised Figure 2D, Revised manuscript, page 6, lines 194-201).

      Additionally, KCs are known to be notoriously difficult to keep alive in vitro, and for these studies, the authors only examine cl. Casp 3 staining. To fully understand that data, a full analysis of the viability of the cells and whether they retain the KC phenotype in all conditions is required.

      We appreciate the reviewer’s suggestions. To confirm the identity and health of isolated KCs in our in vitro studies, we showed that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      Finally, in the Cre-driven KO model, there does not seem to be any death of KCs in the controls (rather numbers trend towards an increase with time on diet, Figure 6E), contrary to what had been claimed in the rest of the paper, again making it difficult to interpret the overall results.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Additionally, there is no validation that the increased death observed in vivo in KCs is due to further promotion of glycolysis.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity in KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments support a contributory role for excessive glycolytic activity in promoting KCs death in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #3 (Public review):

      This manuscript provides novel insights into altered glucose metabolism and KC status during early MASLD. The authors propose that hyperactivated glycolysis drives a spatially patterned KC depletion that is more pronounced than the loss of hepatocytes or hepatic stellate cells. This concept significantly enhances our understanding of early MASLD progression and KC metabolic phenotype.

      Through a combination of TUNEL staining and MS-based metabolomic analyses of KCs from HFHC-fed mice, the authors show increased KC apoptosis alongside dysregulation of glycolysis and the pentose phosphate pathway. Using in vitro culture systems and KC-specific ablation of Chil1, a regulator of glycolytic flux, they further show that elevated glycolysis can promote KC apoptosis.

      However, it remains unclear whether the observed metabolic dysregulation directly causes KC death or whether secondary factors, such as low-grade inflammation or macrophage activation, also contribute significantly. Nonetheless, the results, particularly those derived from the Chil1-ablated model, point to a new potential target for the early prevention of KC death during MASLD progression.

      The manuscript is clearly written and thoughtfully addresses key limitations in the field, especially the focus on glycolytic intermediates rather than fatty acid oxidation. The authors acknowledge the missing mechanistic link between increased glycolysis and KC death. Still, several interpretations require moderation to avoid overstatement, and certain experimental details, particularly those concerning flow cytometry and population gating, need further clarification.

      Strengths:

      (1) The study presents the novel observation of profound metabolic dysregulation in KCs during early MASLD and identifies these cells as undergoing apoptosis. The finding that Chil1 ablation aggravates this phenotype opens new avenues for exploring therapeutic strategies to mitigate or reverse MASLD progression.

      (2) The authors provide a comprehensive metabolic profile of KCs following HFHC diet exposure, including quantification of individual metabolites. They further delineate alterations in glycolysis and the pentose phosphate pathway in Chil1-deficient cells, substantiating enhanced glycolytic flux through 13C-glucose tracing experiments.

      (3) The data underscore the critical importance of maintaining balanced glucose metabolism in both in vitro and in vivo contexts to prevent KC apoptosis, emphasizing the high metabolic specialization of these cells.

      (4) The observed increase in KC death in Chil1-deficient KCs demonstrates their dependence on tightly regulated glycolysis, particularly under pathological conditions such as early MASLD.

      Weaknesses:

      (1) The novelty is questionable. The presented work has considerable overlap with a study by the same lab, which is currently under review (citation 17), and it should be considered whether the data should not be presented in one paper.

      We appreciate the reviewer for the opportunity to clarify the relationship between the two studies. In our previous work (citation 17), we focused on the transcriptional metabolic differences between Kupffer cells (KCs) and monocyte-derived macrophages (MoMFs) and identified Chi3l1 as a selective protective factor that limits glucose uptake and shields KCs from metabolic stress–induced cell death, with minimal effects on MoMFs. That study directly motivated the current work. The observation that KCs are uniquely protected from metabolic stress led us to hypothesize that excessive glycolytic activation itself may be a primary driver of KCs death, which forms the central question of the present study. Accordingly, the current manuscript shifts the focus from Chi3l1-mediated protection to the mechanistic role of hyperglycolysis in driving KCs mortality, using distinct experimental approaches and addressing a different biological question. Because the two studies address conceptually distinct aims—one defining a protective regulator of KCs survival and the other dissecting glycolysis-driven KCs death mechanisms—we believe they are best presented as separate manuscripts. Combining them into a single study would dilute the mechanistic depth and clarity of each story.

      (2) The authors report that 60% of KCs are TUNEL-positive after 16 weeks of HFHC diet and confirm this by cleaved caspase-3 staining. Given that such marker positivity typically indicates imminent cell death within hours, it is unexpected that more extensive KC depletion or monocyte infiltration is not observed. Since Timd4 expression on monocyte-derived macrophages takes roughly one month to establish, the authors should consider whether these TUNEL-positive KCs persist in a pre-apoptotic state longer than anticipated. Alternatively, fate-mapping experiments could clarify the dynamics of KC death and replacement.

      We thank the reviewer for this astute observation. As shown in revised Figure 2D, the proportion of TIM4<sup>+</sup>TUNEL<sup>+</sup>KCs peaks at 8 weeks after HFHC feeding and remains elevated at 16 weeks. However, examination of the corresponding single-channel TIM4 staining during this period reveals that the overall density of TIM4<sup>+</sup> KCs does not undergo abrupt or synchronous depletion. This temporal dissociation between sustained TUNEL positivity and relatively gradual KCs loss suggests that TUNEL-positive KCs do not undergo immediate clearance. Based on these observations, we agree with the reviewer that a substantial fraction of TUNEL-positive KCs likely persists in a prolonged pre-apoptotic or stressed state rather than undergoing rapid cell death. This interpretation is consistent with the absence of extensive KCs depletion or compensatory monocyte infiltration at these time points. Importantly, previous studies (Ref. 1,2) indicate that KCs are eventually lost as MASLD progresses, supporting the notion that KC death is a gradual process that unfolds over an extended time frame rather than acutely.

      (3) The mechanistic link between elevated glycolytic flux and KC death remains unclear.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity of KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      (4) The study does not address the polarization or ontogeny of KCs during early MASLD. Given that pro-inflammatory macrophages preferentially utilize glycolysis, such data could provide valuable insight into the reason for increased KC death beyond the presented hyperreliance on glycolysis.

      We thank the reviewer for this insightful comment. Regarding KCS ontogeny, flow cytometry analysis (Revised Figure 1C) shows that KCs remain uniformly TIM4<sup>hi</sup> during early MASLD, indicating that monocyte-derived KCs (TIM4<sup>low</sup>) have not yet emerged at these stages. To address KCs polarization, we assessed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in KCs isolated from WT mice fed a HFHC diet for 0, 8, and 16 weeks. As shown in revised Figure S5A, M1 markers progressively increase over time, whereas M2 markers remain unchanged or slightly decrease. This polarization shift is consistent with the increased glycolytic activity observed in KCs during early MASLD. Together, these data indicate that embryonically derived KCs undergo a pro-inflammatory polarization accompanied by enhanced glycolytic metabolism during early MASLD, providing mechanistic context for their increased susceptibility to metabolic stress–induced cell death beyond hyperreliance on glycolysis alone (Revised manuscript, page 7-8, line 307-321).

      (5) The gating strategy for monocyte-derived macrophages (moMFs) appears suboptimal and may include monocytes. A more rigorous characterization of myeloid populations by including additional markers would strengthen the study's conclusions.

      We thank the reviewer for raising this important point. To improve the rigor of our analysis, we adopted gating strategies established in previous studies (PMID: 41131393; PMID: 32562600). Specifically, Kupffer cells were defined as CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>hi</sup> TIM4<sup>hi</sup> cells, while monocyte-derived macrophages (MoMFs) were defined as CD45<sup>+</sup>Ly6G<sup>-</sup>CD11b<sup>+</sup>F4/80<sup>low</sup> TIM4<sup>low/−</sup> cells, thereby excluding contaminating neutrophils and minimizing inclusion of circulating monocytes. Using this refined gating strategy, we observed a progressive reduction of KCs accompanied by a corresponding increase in MoMFs in WT mice during HFHC feeding (Revised Figures 1C and S2B–C), (Revised manuscript, page 4, line 154-163).

      (6) While BMDMs from Chil1 knockout mice are used to demonstrate enhanced glycolytic flux, it remains unclear whether Chil1 deficiency affects macrophage differentiation itself.

      We thank the reviewer for this important question. To determine whether Chi3l1 deficiency affects macrophage differentiation, we analyzed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in Kupffer cells isolated from WT and Chil1<sup>-/-</sup> mice fed a HFHC diet for 0, 8, and 16 weeks. At baseline (0 weeks), Chi3l1 deficiency was associated with elevated expression of multiple M1 markers, whereas M2 marker expression was comparable between WT and Chil1<sup>-/-</sup> KCs. During MASLD progression, the pro-inflammatory signature in Chil1<sup>-/-</sup> KCs was further enhanced, while anti-inflammatory marker expression became dysregulated (revised Figure S5C). Together, these data indicate that Chi3l1 deficiency does not impair macrophage differentiation per se but biases KCs toward a partially pro-inflammatory, M1-like phenotype, providing additional context for the enhanced glycolytic flux observed in Chi3l1-deficient macrophages (Revised manuscript, page 7-8, line 307-321).

      (7) The authors use the PDK activator PS48 and the ATP synthase inhibitor oligomycin to argue that increased glycolytic flux at the expense of OXPHOS promotes KC death. However, given the high energy demands of KCs and the fact that OXPHOS yields 15-16 times more ATP per glucose molecule than glycolysis, the increased apoptosis observed in Figure 4C-F could primarily reflect energy deprivation rather than a glycolysis-specific mechanism.

      We thank the reviewer for highlighting this important point. We agree that KCs are highly metabolically active and that perturbations of OXPHOS can influence overall cellular energy balance. As noted in our response to comment #3, we further performed glycolysis inhibition assay by 2-DG in vivo, the protection of KCs observed following 2-DG in vivo (Revised Figure 4E-G) further provides evidence that increased glycolytic flux is not merely correlated with, but functionally contributes to KCs loss in

      MASLD.

      (8) In Figure 1C, KC numbers are significantly reduced after 4 and 16 weeks of HFHC diet in WT male mice, yet no comparable reduction is seen in Clec4Cre control mice, which should theoretically exhibit similar behavior under identical conditions.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      To address the concerns raised in the public review, the authors should:

      (1) Reassess their conclusions using the same panels in flow and microscopy, e.g., the combination of CLEC4F, TIM4, and IBA1. This will allow resKCs (CLEC4F+TIM4+IBA1+), moKCs (CLEC4F+TIM4-IBA1+), and moMFs (CLEC4F-TIM4-IBA1+) to be accurately defined and hence their viability and numbers correctly assessed.

      We thank the reviewer for this insightful suggestion. In our flow cytometry analysis, we did not detect a CD45<sup>+</sup>CD11b<sup>low</sup>F4/80<sup>hi</sup>TIM4<sup>low</sup> population, indicating that monocyte-derived KCs (moKCs) have not emerged in our model at this stage. To more accurately quantify resident KCs (resKCs) in the current study, we replaced CLEC4F with TIM4 staining and enumerated TIM4<sup>+</sup>as well as TIM4<sup>+</sup>TUNEL<sup>+</sup> cells. These data were highly consistent with CLEC4F<sup>+</sup>TUNEL<sup>+</sup>cell counts, confirming that moKCs are not involved in KCs death during early MASLD (Revised Figure 1A,B,E,F).

      (2) Investigate why the number of KCs in controls and MASLD are so distinct between Figures 1 and 6.

      We appreciate the reviewer’s suggestions. Like we explained above, Cre insertion promotes KCs self-renewal (Revised manuscript, Figure S8). This enhanced proliferative capacity likely accounts for the relative preservation of KCs numbers in Clec4f-Cre mice during HFHC feeding, explaining the apparent discrepancy with WT mice (Revised manuscript, Figure 6D-E).

      (3) Normalise the tunel+ cells based on the number of KCs in PP vs PC regions.

      After normalizing KCs death to KCs numbers in periportal (PP) versus pericentral (PC) regions, we found the proportion was significantly higher in PV regions compared to CV regions at 8 weeks of HFHC feeding. We have therefore revised our texts. (Revised manuscript, page 5, lines 194-201).

      (4) Demonstrate the viability of KCs in vitro across conditions.

      To confirm the identity and health of isolated KCs in our in vitro studies, we show that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      (5) Confirm previous studies demonstrating different degrees of KC loss depending on the model of MASLD.

      We thank the reviewer for highlighting this point. Consistent with previous studies, KCs loss has been reported to varying degrees depending on the MASLD model used, reflecting the heterogeneity of hepatic macrophages, marker choice, mouse husbandry, and diet regimen. For example, in a 6-week MCD feeding model, ~10% of CLEC4F<sup>+</sup> KCs were TUNEL<sup>+</sup> (Figure 4A, Ref. 9). Another 6-week MCD study reported a drop from 66% to 26% TIM4<sup>+</sup> KCs (Figure 2A, Ref. 12). In an HFD model, TIM4<sup>+</sup> KCs decreased by ~20% after 16 weeks (Figure 1G, Ref. 1). In a Western diet model, TIM4<sup>+</sup>KCs decreased by >50% at 36 weeks (Figures 1J and 2C, Ref. 2). Together, these studies underscore the model-dependent nature of KCs loss and highlight the importance of experimental context and marker selection when assessing KCs dynamics in MASLD. We have included these studies in our discussion section (Revised manuscript, page 9-10, line 393-402)

      (6) Demonstrate in vivo that loss of CHIL1 drives further glycolysis in KCs.

      In Figure 6G-H of our previous study, we showed that Chi3l1 deficiency leads to more glucose uptake by KCs in vivo whereas suppelementing KO mice with recombinant Chi3l1 will significantly reduced glucose uptake by KCs through treating mice with a fluorescent glucose analog 2-NBDG. We included the related figure here as Author response image 3.

      Author response image 3.

      Chi3l1 limits glucose uptake by Kupffer cells in vivo. (A) Measurement of 2-NBDG (a fluorescent glucose analog) uptake by KCs in vivo. WT and Chil1<sup>-/-</sup> mice, either untreated or supplemented with rChi3l1, were injected intraperitoneally with 12 mg/kg 2-NBDG. After 45mins, KCs were isolated and glucose uptake assessed by spectrophotometry. (B) Representative immunofluorescence images of liver sections stained for TIM4 (red) and 2-NBDG uptake (green) to visualize glucose uptake by KCs in situ. Scale bar = 10 µm (zoom). Quantification is shown as the percentage of TIM4<sup>+</sup> cells that are also 2-NBDG<sup>+</sup>. Representative images were shown in B. One-way ANOVA was performed in A, B. P value is as indicated.

      (7) There is no mention of the publication of the metabolomics dataset; this should be released with the manuscript.

      We included the raw metabolomics dataset as Table S1 and S2 now.

      Reviewer #3 (Recommendations for the authors):

      (1) Methods: Reconsider which methods are described in the main text versus the Supplementary Information to improve readability and consistency.

      Thank you for your valuable suggestion. We have reevaluated and adjusted the placement of the methods section between the main text and the supplementary materials.

      (2) Line 34: Check for grammar issues.

      L34 has been revised as follows : Additionally, using Chi3l1-deficient mice, we further demonstrated that increased glucose utilization accelerates KCs death in vivo.

      (3) Lines 101, 110: Explicitly reference the corresponding Supplementary Methods sections.

      We have included the references for these two methods sections (Revised supplementary materials and methods, Line 30, 65, respectively).

      (4) Figure 2: Iba1 marks all macrophages, not only monocyte-derived macrophages; both figure and text (line 205) require correction.

      We have corrected Iba1 represent hepatic macrophages including both KCs and MoMFs (Revised Figure 2C, manuscript page 5, line 182).

      (5) Line 218-219: Avoid overinterpretation, as only KCs, hepatocytes, and hepatic stellate cells were assessed - not all hepatic populations.

      We appreciate the reviewer’s valuable suggestion and rephrased our description accordingly (Revised manuscript, page 5, line 186-189).

      (6) Line 262: Use abbreviations consistently throughout the manuscript.

      We have gone through the whole manuscript and double checked the abbreviations.

      (7) Line 264: Include the palmitic acid (PA) concentration used.

      We included 800 µM PA in the revised manuscript (Revised manuscript, page 6, line 250).”

      (8) Lines 316-317: Check for grammar errors.

      Grammar errors are checked (Revised manuscript, page 8, line 340-341).

      (9) Line 337-338: See comment above on gating strategy.

      We updated gating strategy accordingly (Revised manuscript, page 9, line 361-362).

      (10) Line 343-344: Note that Chi3l1 is not exclusively expressed by KCs.

      We rephrased our words accordingly (Revised manuscript, page 9, line 374-378).

      (11) Lines 355-358: The statement that "sustained glycolytic hyperactivation culminates not in sustained activation, but in apoptotic cell death" is unsupported by data or literature, as macrophage polarization was not analyzed in this study.

      We removed the statement from the revised manuscript.

      (12) Lines 375-379: Rephrase to clarify that while KCs are metabolically active and glucose-demanding, excessive glycolytic flux accelerates apoptosis.

      We have rephrased to clarify (Revised Manuscript, page 10, lines 405-407).

      (13) Lines 375-385 & 387-397: Consolidate overlapping statements for conciseness and coherence.

      We have consolidate the overlapping statements (Revised manuscript, page 10, lines 405-425).

      Reference

      Daemen, S. et al. Dynamic Shifts in the Composition of Resident and Recruited Macrophages Influence Tissue Remodeling in NASH. Cell Rep 34, 108626, doi:10.1016/j.celrep.2020.108626 (2021).

      Remmerie, A. et al. Osteopontin Expression Identifies a Subset of Recruited Macrophages Distinct from Kupffer Cells in the Fatty Liver. Immunity 53, 641-657.e614, doi:10.1016/j.immuni.2020.08.004 (2020).

      Ozer, J., Ratner, M., Shaw, M., Bailey, W. & Schomaker, S. The current state of serum biomarkers of hepatotoxicity. Toxicology 245, 194-205, doi:10.1016/j.tox.2007.11.021 (2008).

      Malhi, H. & Gores, G. J. Molecular mechanisms of lipotoxicity in nonalcoholic fatty liver disease. Semin Liver Dis 28, 360-369, doi:10.1055/s-0028-1091980 (2008).

      Ibrahim, S. H., Hirsova, P. & Gores, G. J. Non-alcoholic steatohepatitis pathogenesis: sublethal hepatocyte injury as a driver of liver inflammation. Gut 67, 963-972, doi:10.1136/gutjnl-2017-315691 (2018).

      Kerr, J. F., Wyllie, A. H. & Currie, A. R. Apoptosis: a basic biological phenomenon with wide-ranging implications in tissue kinetics. British journal of cancer 26, 239-257, doi:10.1038/bjc.1972.33 (1972).

      Poon, I. K., Lucas, C. D., Rossi, A. G. & Ravichandran, K. S. Apoptotic cell clearance: basic biology and therapeutic potential. Nat Rev Immunol 14, 166-180, doi:10.1038/nri3607 (2014).

      Krenkel, O. & Tacke, F. Liver macrophages in tissue homeostasis and disease. Nat Rev Immunol 17, 306-321, doi:10.1038/nri.2017.11 (2017).

      Tran, S. et al. Impaired Kupffer Cell Self-Renewal Alters the Liver Response to Lipid Overload during Non-alcoholic Steatohepatitis. Immunity 53, 627-640.e625, doi:10.1016/j.immuni.2020.06.003 (2020).

      O'Neill, L. A. & Pearce, E. J. Immunometabolism governs dendritic cell and macrophage function. J Exp Med 213, 15-23, doi:10.1084/jem.20151570 (2016).

      Vander Heiden, M. G. & DeBerardinis, R. J. Understanding the Intersections between Metabolism and Cancer Biology. Cell 168, 657-669, doi:10.1016/j.cell.2016.12.039 (2017).

      Zhang J, Wang Y, Fan M, Guan Y, Zhang W, Huang F, Zhang Z, Li X, Yuan B, Liu W, Geng M, Li X, Xu J, Jiang C, Zhao W, Ye F, Zhu W, Meng L, Lu S, Holmdahl R. Reactive oxygen species regulation by NCF1 governs ferroptosis susceptibility of Kupffer cells to MASH. Cell Metab. 2024 Aug 6;36(8):1745-1763.e6. doi: 10.1016/j.cmet.2024.05.008. Epub 2024 Jun 7. PMID: 38851189.

    1. eLife Assessment

      The present manuscript by Cordeiro et al., shows convincing evidence that α-mangostin, a xanthone obtained from the fruit of the Garcinia mangostana tree, behaves as a strong activator of the large-conductance (BK) potassium channels. The authors suggest that α-mangostin activation of the BK channel is state-independent, and molecular docking and mutagenesis suggest that α-mangostin binds to a site in the internal cavity. Additionally, the authors show that α-mangostin can relax arteries, further suggesting the plausibility of the proposed effects of this compound. These are valuable findings that should be of interest to channel biophysicists and physiologists alike.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors aimed to identify the molecular target and mechanism by which α-Mangostin, a xanthone from Garcinia mangostana, produces vasorelaxation that could explain the antihypertensive effects. Building on on prior reports of vascular relaxation and ion channel modulation, the authors convincingly show that large-conductance potassium BK channels are the primary site of action. Using electrophysiological, pharmacological, and computational evidence, the authors achieved their aims and showed that BK channels are the critical molecular determinant of mangostin's vasodiltory effects, even though the vascular studies are quite preliminary in nature.

      Strengths:

      (1) The broad pharmacological profiling of mangostin across potassium channel families, revealing BK channels - and the vascular BK-alpha/beta1 complex - as the potently activated target in a concentration-dependent manner.

      (2) Detailed gating analyses showing large negative shifts in voltage-dependence of activation and altered activation and deactivation kinetics.

      (3) High-quality single-channel recordings for open probability and dwell times.

      (4) Convincing activation in reconstituted BKα/β1-Caᵥ nanodomains mimicking physiological condition and functional proof-of-concept validation in mouse aortic rings.

      Weaknesses are minor:

      (1) Some mutagenesis data (e.g., partial loss at L312A) could benefit from complementary structural validation.

      The author's rebuttal provides alphafold3 models for mutants. While there are interesting preliminary observations, the authors decided not to include these in the main manuscript, awaiting further structual validation. I concur.

      (2) While Cav-BK nanodomains were reconstituted, direct measurement of calcium signals after mangostin application onto native smooth muscle could be valuable.

      In their response, the authors acknowledge the importance of measuring Ca2+ sparks in smooth muscle cells to further validate their findings. However, this is not provided in the manuscript. Part of my earlier comment alludes to the possibility of α-Mangostin directly affecting Cav1.2 or ryanodine receptor activity, and therefore BK activity would go up. With the current provided evidence, these possibilities cannot be excluded and need to be acknowledged.

      (3) The work has impact for ion channel physiology and pharmacology, providing a mechanistic link between a natural product and vasodilation. Datasets include electrophysiology traces, mutagenesis scans, docking analyses, and aortic tension recordings. The latter however are preliminary in nature.

      The authors acknowledge that additional vascular physiology experiments would strengthen the argument they make. They are however unable to provide such evidence in the present manuscript. Therefore, I strongly suggest that the authors tune down the physiological implications of α-Mangostin that they include in the manuscript. I'd also suggest that "vasorelaxation" is removed from the manuscript title, given the preliminary nature of the findings.

    3. Reviewer #2 (Public review):

      Summary:

      In the present manuscript, Cordeiro et al. show that α-mangostin, a xanthone obtained from the fruit of the Garcinia mangostana tree, behaves as an agonist of the BK channels. The authors arrive at this conclusion by examining the effects of mangostin on macroscopic and single-channel currents elicited by BK channels formed by the α subunit and α + β1 subunits, as well as αβ1 channels coexpressed with voltage-dependent Ca2+ (CaV1,2) channels. The single-channel experiments show that α-mangostin produces a robust increase in the probability of opening without affecting the single-channel conductance. The authors contend that α-mangostin activation of the BK channel is state-independent, and molecular docking and mutagenesis suggest that α-mangostin binds to a site in the internal cavity. Importantly, α-mangostin (10 μM) alleviates noradrenaline-induced contracture. Mangostin is ineffective if the contracted muscles are pretreated with the BK toxin iberiotoxin.

      In this revised version of the manuscript by Cordeiro et al., the authors have adequately answered my previous concerns. However, as I stated in my comments, without determining the probability of opening across a wide range of voltages, any conclusion about the drug's mechanism of action can be questioned. For example, the statement in Discussion line 481: "The higher shift observed in 1 μM Cai 2+ may reflect the steep Cai2+-dependence of the closed-open equilibrium (Cui, Cox and Aldrich, 1997) and the allosteric coupling of voltage and Cai2+ signals (Horrigan and Aldrich, 2002; Magleby, 2003; Clay, 2017), which are effective in this concentration range, which may lead to a higher apparent activation when voltage activation is facilitated by Cai 2+ (Sun and Horrigan, 2022)." has no support in the data and is not predicted by the allosteric model. In order to have a larger shift induced by the drug in the presence of Ca2+, you need either to alter the Ca2+ binding or the allosteric coupling factor C.<br /> Please note that in the manuscript, there are several problems with the English in this sentence.

      Minor

      In Figure 1E, BKa should read BKalpha.

    4. Reviewer #3 (Public review):

      Summary:

      This research shows that a-mangostin, a proposed nutraceutical, with cardiovascular protecting properties, could act through the activation of large conductance potassium permeable channels (BK). The authors provide convincing electrophysiological evidence that the compound binds to BK channels and induces a potent activation, increasing the magnitude of potassium currents. Since these channels are important modulators of the membrane potential of smooth muscle in vascular tissue, this activation leads to muscle relaxation, possibly explaining cardiovascular protecting effects.

      Strengths:

      The authors have satisfactorily answered my previous comments and present evidence based on several lines of experiments that a-mangostin is a potent activator of BK channels. The quality of the experiments and the analysis is high and represents an appropriate level of analysis. This research is timely and provides a basis to understand the physiological effects of natural compounds with proposed cardio protective effects.

      Weaknesses:

      The identification of the binding site continues to be the least developed point of the manuscript. The authors show that the binding site is probably located in the hydrophobic cavity of the pore and show that point mutations reduce the magnitude of the negative voltage shift of activation produces by a-mangostin. This binding site should be demonstrated in the future using structural techniques such as cryo-EM.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors aimed to identify the molecular target and mechanism by which α-Mangostin, a xanthone from Garcinia mangostana, produces vasorelaxation that could explain the antihypertensive effects. Building on prior reports of vascular relaxation and ion channel modulation, the authors convincingly show that large-conductance potassium BK channels are the primary site of action. Using electrophysiological, pharmacological, and computational evidence, the authors achieved their aims and showed that BK channels are the critical molecular determinant of mangostin's vasodilatory effects, even though the vascular studies are quite preliminary in nature.

      Strengths:

      (1) The broad pharmacological profiling of mangostin across potassium channel families, revealing BK channels - and the vascular BK-alpha/beta1 complex - as the potently activated target in a concentration-dependent manner.

      (2) Detailed gating analyses showing large negative shifts in voltage-dependence of activation and altered activation and deactivation kinetics.

      (3) High-quality single-channel recordings for open probability and dwell times.

      (4) Convincing activation in reconstituted BKα/β1-Ca<sub>v</sub> nanodomains mimicking physiological conditions and functional proof-of-concept validation in mouse aortic rings.

      We thank the reviewer for acknowledging the strength of the different aspects investigated in our study.

      Weaknesses are minor:

      (1) Some mutagenesis data (e.g., partial loss at L312A) could benefit from complementary structural validation.

      In the attempt to improve structural insight for the presented mutagenesis data, we have used Alphafold3 (AF3; Abramson et al., 2024) to generate models of the I308A, L312M and A316P substitutions and repeated the docking for each (Fig. R1). According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift).

      Author response image 1.

      Alphafold3 models of BK I308A, L312M, and A316P with α-Mangostin docked to the mutant structures. The upper row shows an overview of the mutant pore helices (AF3 models) used for molecular docking. The lower row shows the binding region with the wildtype structure overlaid in gray. Only 3 helices are shown for clarity.

      Although these results provide interesting tentative explanations for the effect of the mutations and conclusions from AF3 models become increasingly robust, we think that definitive statements of their mechanistic contributions would require experimental studies of mutant channels, i.e., cryo-EM or crystallography, that are beyond our means. Therefore, we have decided not to include this data in the manuscript; however, it is accessible for the interested reader within the public review. Hopefully, as cryo-EM structures have been obtained for the wildtype channel, there will be studies on mutations of this gating-relevant S6 segment in the future.

      (2) While Cav-BK nanodomains were reconstituted, direct measurement of calcium signals after mangostin application onto native smooth muscle could be valuable.

      We are not sure if a global elevation of cellular calcium concentration would be informative. We rather expect that the relevant local Ca<sup>2+</sup> elevation would occur as sparks in the BK-Ca<sub>v</sub> nanodomains, close to the membrane. We would anticipate a change in spark duration, as the Ca<sup>2+</sup> inward current would be stopped faster by the enhanced repolarization via a-Mangostin activated BKα/β1 channels. This would require fast Ca<sup>2+</sup> imaging acquisition speed to capture spark activity. We concur that this would be an informative experiment to investigate a more native situation. However, we would have to accomplish such methodologically challenging measurements in a separate project, which could fruitfully be combined with a more extensive characterization of aortic contraction as also suggested in the following remark (3).

      (3) The work has an impact on ion channel physiology and pharmacology, providing a mechanistic link between a natural product and vasodilation. Datasets include electrophysiology traces, mutagenesis scans, docking analyses, and aortic tension recordings. The latter, however, are preliminary in nature.

      We completely agree with the reviewer that there is ample room for further studies that could characterize different tissues important in blood pressure regulation (such as resistance arteries), elucidate even more physiological detail (such as modulatory effects of the endothelium), or look deeper into the pharmacology using chemically altered Mangostin derivatives. While we very much like this to happen in future projects, in this study we focused on the functional aspects of a-Mangostin in BK channel gating. We present our tension recordings as a proof-of-concept to underline the activity of a-Mangostin in native tissues, and we clearly show the importance of the BK channel by using iberiotoxin as a specific inhibitor which impressively abolished relaxation.

      References:

      Abramson, J. et al. (2024) “Accurate structure prediction of biomolecular interactions with AlphaFold 3,” Nature, 630(8016), pp. 493–500. Available at: https://doi.org/10.1038/s41586-024-07487-w.

      Reviewer #2 (Public review):

      Summary:

      In the present manuscript, Cordeiro et al. show that α-mangostin, a xanthone obtained from the fruit of the Garcinia mangostana tree, behaves as an agonist of the BK channels. The authors arrive at this conclusion through the effect of mangostin on macroscopic and single-channel currents elicited by BK channels formed by the α subunit and α + β1 sununits, as well as αβ1 channels coexpressed with voltage-dependent Ca2+ (CaV1,2) channels. The single-channel experiments show that α-mangostin produces a robust increase in the probability of opening without affecting the single-channel conductance. The authors contend that α-mangostin activation of the BK channel is state-independent and molecular docking and mutagenesis suggest that α-mangostin binds to a site in the internal cavity. Importantly, α-mangostin (10 μM) alleviates the contracture promoted by noradrenaline. Mangostin is ineffective if the contracted muscles are pretreated with the BK toxin iberiotoxin.

      Strengths:

      The set of results combining electrophysiological measurements, mutagenesis, and molecular docking reveals α-mangostin as a potent activator of BK channels and the putative location of the α-mangostin binding site. Moreover, experiments conducted on aortic preparations from mice suggest that α-mangostin can aid in developing drugs to treat a myriad of diverse diseases involving the BK channel.

      We thank the reviewer for pointing out the significance of our study.

      Weaknesses:

      Major:

      (1) Although the results indicate that α-mangostin is modifying the closed-open equilibrium, the conclusion that this can be due to a stabilization of the voltage sensor in its active configuration may prove to be wrong. It is more probable that, as has been demonstrated for other activators, the α-mangostin is increasing the equilibrium constant that defines the closed-open reaction (L in the Horrigan, Aldrich allosteric gating model for BK). The paper will gain much if the authors determine the probability of opening in a wide range of voltages, to determine how the drug is affecting (or not), the channel voltage dependence, the coupling between the voltage sensor and the pore, and the closed-open equilibrium (L).

      We would like to take the opportunity to clarify this potential misunderstanding. In our manuscript, we have discussed three mechanistic explanations for the Mangostin activation: (1) an electrostatic effect at the selectivity filter, (2) structural and electrostatic changes of S6 that facilitate the opening of a putative lower gate, and (3) hydrophobic gating, i.e., counteracting dewetting of the pore. All possibilities would impact S6 and lower the free energy for pore opening, and we concur that therefore Mangostin most likely affects the closed-open equilibrium (L) of the BKα channel.

      The sentence at the original lines 470-471, “(…) caused by an enhanced shift of the closed-open equilibrium toward the open state, such as the stabilization of the voltage sensor in an active conformation” refers to the observation that the presence of the β1 subunit enhances this closed-open shift. The stabilization of the voltage sensor domain was mentioned as one example of how it achieves this. We recognize that this example was an unfortunate choice, as β1 rather facilitates Ca<sup>2+</sup>-dependent allosteric pore opening unrelated to the discussed mechanisms of Mangostin. We have therefore removed this statement.

      As to the suggestion to dissect the effect of Mangostin on C, D, and L, we agree with the reviewer that this would surely add to a full biophysical characterization. However, in our project, we strove towards including more experiments showing the physiological implications of Mangostin activation to emphasize the implication for vasodilation. We hope the reviewer understands that, with limited resources, this came at the expense of a full investigation of the different gating components, which could pose a separate project by itself.

      (2) Apparently, the molecular docking was performed using the truncated structure of the human BK channel. However, it is unclear which one, since the PDB ID given in the Methods (6vg3), according to what I could find, corresponds to the unliganded, inactive PTK7 kinase domain. Be as it may, the apo and Ca2+ bound structures show that there is a rotation and a displacement of the S6 transmembrane domain. Therefore, the positions of the residues I308, L312, and A316 in the closed and open configurations of the BK channel are not the same. Hence, it is expected that the strength of binding will be different whether the channel is closed or open. This point needs to be discussed.

      We apologize for the typing error and thank the reviewer for indicating this erroneous PDB ID. (“6vg3”). It should have read PDB ID 6v3g as in the legend to Fig. 4B. The reviewer appropriately points out that there are differences in the S6 segment addressed in our study between the two available cryo-EM structures obtained in the presence (PDB ID 6v38) and absence of Ca<sup>2+</sup> (PDB ID 6v3g) (Tao and MacKinnon, 2019).

      We had actually performed the docking with both structures, but chosen to show the Ca<sup>2+</sup>-free structure to better visualize the I308 position. a-Mangostin is found in the same S6 region in both, not obstructing the K<sup>+</sup> conduction pathway. The binding energies of the favored poses are very similar; the binding energy in the best-ranking conformational cluster in the Ca<sup>2+</sup>-bound structure even was slightly lower (-8.64 kcal mol<sup>-1</sup>) than in the docking with the Ca<sup>2+</sup>-free channel (-8.58 kcal mol<sup>-1</sup>; Fig. 4B), which may not be a relevant difference.

      We compared the residue interactions in both dockings (Author response table 1). S317 and Y318, which did not reduce the shift in V<sub>½</sub> upon substitution, were not predicted to contact a-Mangostin in either structure. In both structures, L312 and F315 were predicted to interact in virtually all poses analyzed. In the docking to the Ca<sup>2+</sup>-free state, also I308 was predicted to interact in 17/20 poses, while contacts to A316 occurred in 5/20 poses. In the Ca<sup>2+</sup>-bound state, predicted interactions shifted from I308 (which is expected as it is buried in the protein) to A316, and the isoprenyl moiety close to I308 rotated downwards. This could indicate that a-Mangostin adopts a more horizontal position following the upward reorientation of S6 in the Ca<sup>2+</sup>-bound state when the channel moves from one to the other conformation (Fig. S4).

      Author response table 1.

      Number of interactions of S6 residues in 20 analyzed α-Mangostin poses in the molecular dockings to the Ca2+-free and Ca2

      These docking results are consistent with our functional measurements. Recent structures of the BK/γ1 complex showed that the VSD and Ca<sup>2+</sup>-bowl are stabilized in an active-like conformation that corresponds to the conformation seen in the Ca<sup>2+</sup>-bound state (Kallure et al., 2023; Yamanouchi et al., 2023; Redhardt, Raunser and Raisch, 2024), indicating that very likely the Ca<sup>2+</sup>-bound and Ca<sup>2+</sup>-free structures indeed represent open and closed conformations of the channel. We observed that α-Mangostin can bind to both of these states to activate the channel (Fig. 3C, D), showing the presence of a binding site in both conformations. Further, α-Mangostin induced a left-shift in V<sub>½</sub> also in higher Ca<sup>2+</sup> concentration (Fig. 2D), indicating that it still binds to and activates the channel after the conformational change in S6. As we could not determine affinity for the mutants due to limited solubility, we have no information on the nature of the contribution of the substitutions, i.e., reduced binding or allosteric effect. As I308 is buried in the Ca<sup>2+</sup>-bound state, its contribution is likely mostly allosteric. We have also proposed dewetting as possible activation mechanism, which we expect to be less sensitive to the exact pose of a molecule (as shown for NS11021, Nordquist et al., 2024). Therefore, α-Mangostin could, e.g., change solvent accessibility of the I308 sidechain, energetically favoring the buried (open) state.

      We have now included both dockings and Author response table 1 in Fig. S4, and we have added passages to the results section (starting at line 373) and discussion section (starting at lines 544, 588).

      Minor:

      (1) From Figure 3A, it is apparent that the increase in Po is at the expense of the long periods (seconds) that the channel remains closed. One might suggest that α-mangostin increases the burst periods. It would be beneficial if the authors measured both closed and open dwell times to test whether α-mangostin primarily affects the burst periods.

      We thank the reviewer for this valuable suggestion, which we have implemented. In our single channel measurements shown in our original Fig. 3 we have not observed burst behavior of the BKɑ channels. This can be explained by the fact that we measured in resting condition (100 nM free Ca<sub>i</sub></sup>2+</sup>) and with rather mild depolarisation (+40 mV) where Po was very low. We have therefore analyzed measurements in 5 µM free a<sub>i</sub></sup>2+</sup> where we recorded sufficient burst activity also in the basal state.

      The burst analysis showed that ɑ-Mangostin indeed prolongs bursts and shortens the interburst closures. Within bursts, both closed times and open times were increased, and we recorded a higher number of opening events per burst. We conclude that ɑ-Mangostin acts in both the closed and the open state, where it slows open-closed transitions resulting in less flicker, and stabilizes the open state via longer open times and a higher probability for closed-open transitions.

      We now show this data in Fig. 3D-F and Table S8, and have accordingly added passages to the results section (starting at line 285), the discussion (line 510), and the methods section (starting at line 746).

      (2) In several places, the authors make similarities in the mode of action of other BK activators and α-mangostin; however, the work of Gessner et al. PNAS 2012 indicates that NS1619 and Cym04 interact with the S6/RCK linker, and Webb et al. demonstrated that GoSlo-SR-5-6 agonist activity is abolished when residues in the S4/S5 linker and in the S6C region are mutated. These findings indicate that binding of the agonist is not near the selectivity filter, as the authors' results suggest that α-mangostin binds.

      We will gladly clarify our ideas concerning the binding sites of other activators and ɑ-Mangostin. We first hypothesized that ɑ-Mangostin may share characteristics and mode of action with the class of negatively charged activators (NCA) that we have described before (Schewe et al., 2019). NCA were found to occupy a common fenestration site that is located close to the selectivity filter in TREK K2P channels, and in this manuscript we have shown by THexA competition and mutagenesis experiments that ɑ-Mangostin also binds in this fenestration region in TREK-1 channels (Fig. S3).

      The existence of this common NCA binding site was also proposed for BK channels, as a docking placed the NCA NS11021 in an equivalent binding region, and, among others, NS11021 and GoSlo-SR-5-6 competed with THexA for binding in the pore (Schewe et al., 2019). These results were indeed not fully in agreement with the proposed binding site of GoSlo-SR-5-6 in Webb et al. (2015), although the most effective (double) mutants were located at S317 and I323, at the intracellular end of the cleft between neighboring S6 segments. In this manuscript, we have shown that α-Mangostin is present in the pore of BK channels by molecular docking, a THexA competition assay, and two mutations that reduced the shift in V<sub>½</sub> induced not only by ɑ-Mangostin but also by GoSlo-SR-5-6 (Fig. 4). While the docking was rather a starting point, both functional tests argue against a binding site in the S4/5 linker/S6C region; however, allosteric mechanisms could still reduce activation also in mutants in the S4/5 linker/S6C region far from the pore binding region proposed by us in the 2019 study and the present manuscript.

      To summarize, we did not mean to imply that all BK activators should bind to this site, especially if they are not part of the NCA class (as NS1619, Cym4, as well as BC5, whose different binding site enabled us to use it as a control in our THexA competition assay). However, the cleft close to gating relevant S6 residues may well pose a region especially susceptible to modulator binding (as BL-1249, GoSlo-SR-5-6, and ɑ-Mangostin). We have moved, respectively separated, the initial GoSlo references from the reference to the pore binding site in the paragraph (lines329, 358) to improve clarity.

      (3) The sentence starting in line 452 states that there is a pronounced allosteric coupling between the voltage sensors and Ca2+ binding. If the authors are referring to the coupling factor E in the Horrigan-Aldrich gating model, the references cited, in particular, Sun and Horrigan, concluded that the coupling between those sensors is weak.

      We are grateful for the opportunity to improve this passage. We intended to express that observed effects (in this case the shift in V<sub>½</sub>) are pronounced around 1 µM Ca<sup>2+</sup>. As the reviewer states, the coupling factor between the voltage and calcium sensors (E; 2.4) is weak compared to the coupling of Ca<sup>2+</sup> (C; 8) and voltage (D; 25) to the pore in the Horrigan-Aldrich model. However, the shape of the Ca<sup>2+</sup>-dependence of V<sub>½</sub> cannot be completely described when E is neglected, with the highest difference around 1-2 µM Ca<sup>2+</sup> (Horrigan and Aldrich, 2002). Deletion of the gating ring underlines the allosteric sensor coupling (Clay, 2017). This together with the steep Ca<sup>2+</sup>-dependence in this concentration range (meaning high Po changes upon occupancy increase; Cui, Cox and Aldrich, 1997) explains the higher apparent activation, visible as the higher shift in V<sub>½</sub> observed at the 1 µM Ca<sup>2+</sup>. Speaking with the model of Sun and Horrigan (2022), the suppressing “molecular logic gate” is already relieved by the presence of intermediate Ca<sup>2+</sup>, and the direct “gating lever” pathway via voltage acts synergistically and achieves the observed higher V<sub>½</sub> shift upon depolarization. We have adapted the sentence and separated the citations for better understanding (lines 503-507).

      References:

      Clay, J.R. (2017) “Novel description of the large conductance Ca2+-modulated K+ channel current, BK, during an action potential from suprachiasmatic nucleus neurons,” Physiological Reports, 5(20), p. e13473. Available at: https://doi.org/10.14814/phy2.13473.

      Cui, J., Cox, D.H. and Aldrich, R.W. (1997) “Intrinsic Voltage Dependence and Ca2+ Regulation of mslo Large Conductance Ca-activated K+ Channels,” Journal of General Physiology, 109(5), pp. 647–673. Available at: https://doi.org/10.1085/jgp.109.5.647.

      Horrigan, F.T. and Aldrich, R.W. (2002) “Coupling between voltage sensor activation, Ca2+ binding and channel opening in large conductance (BK) potassium channels,” The Journal of General Physiology, 120(3), pp. 267–305. Available at: https://doi.org/10.1085/jgp.20028605.

      Kallure, G.S. et al. (2023) “High-resolution structures illuminate key principles underlying voltage and LRRC26 regulation of Slo1 channels.” bioRxiv, p. 2023.12.20.572542. Available at: https://doi.org/10.1101/2023.12.20.572542.

      Nordquist, E.B., Jia, Z., Chen, J., 2024. “Small Molecule NS11021 Promotes BK Channel Activation by Increasing Inner Pore Hydration.” J. Chem. Inf. Model. 64, 7616–7625. https://doi.org/10.1021/acs.jcim.4c01012

      Redhardt, M., Raunser, S. and Raisch, T. (2024) “Cryo-EM structure of the Slo1 potassium channel with the auxiliary γ1 subunit suggests a mechanism for depolarization-independent activation,” FEBS Letters, 598(8), pp. 875–888. Available at: https://doi.org/10.1002/1873-3468.14863.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Sun, L. and Horrigan, F.T. (2022) “A gating lever and molecular logic gate that couple voltage and calcium sensor activation to opening in BK potassium channels,” Science Advances, 8(50), p. eabq5772. Available at: https://doi.org/10.1126/sciadv.abq5772.

      Tao, X. and MacKinnon, R. (2019) “Molecular structures of the human Slo1 K+ channel in complex with β4,” eLife 8, p. e51409. Available at: https://doi.org/10.7554/eLife.51409.

      Webb, T.I. et al. (2015) “Molecular mechanisms underlying the effect of the novel BK channel opener GoSlo: Involvement of the S4/S5 linker and the S6 segment,” Proceedings of the National Academy of Sciences, 112(7), pp. 2064–2069. Available at: https://doi.org/10.1073/pnas.1400555112.

      Yamanouchi, D. et al. (2023) “Dual allosteric modulation of voltage and calcium sensitivities of the Slo1-LRRC channel complex,” Molecular Cell, 83(24), pp. 4555-4569.e4. Available at: https://doi.org/10.1016/j.molcel.2023.11.005.

      Reviewer #3 (Public review):

      Summary:

      This research shows that a-mangostin, a proposed nutraceutical, with cardiovascular protective properties, could act through the activation of large conductance potassium permeable channels (BK). The authors provide convincing electrophysiological evidence that the compound binds to BK channels and induces a potent activation, increasing the magnitude of potassium currents. Since these channels are important modulators of the membrane potential of smooth muscle in vascular tissue, this activation leads to muscle relaxation, possibly explaining cardiovascular protective effects.

      Strengths:

      The authors present evidence based on several lines of experiments that a-mangostin is a potent activator of BK channels. The quality of the experiments and the analysis is high and represents an appropriate level of analysis. This research is timely and provides a basis to understand the physiological effects of natural compounds with proposed cardio-protective effects.

      We sincerely thank the reviewer for appraising the achievements of our study.

      Weaknesses:

      The identification of the binding site is not the strongest point of the manuscript. The authors show that the binding site is probably located in the hydrophobic cavity of the pore and show that point mutations reduce the magnitude of the negative voltage shift of activation produced by a-mangostin. However, these experiments do not demonstrate binding to these sites, and could be explained by allosteric effects on gating induced by the mutations themselves.

      We are aware that our functional data are unfortunately not sufficient to clearly distinguish between effects due to affinity loss or due to allosteric mechanisms. Our attempts to generate complete dose–response curves for the mutants to determine accurate apparent IC<sub>50</sub> values were unfortunately limited by the solubility of the compound. Consequently, we have avoided making claims about affinity loss in the mutant analysis, and have instead only reported the reduction in potency, expressed as the shift in V<sub>½</sub>. To reduce confounding effects from the mutations themselves, we selected substitutions that preserved the most wildtype-like GV-relationships, based on the extensive mutagenesis work of (Chen, Yan and Aldrich, 2014). We address this matter also in our answer to Recommendation (6) below, and we have replaced the word “binding” in the title of the manuscript. Nevertheless, we consider the proposed binding region to be well supported by the THexA competition experiments in combination with molecular docking, even though the specific mechanistic contributions of individual residues cannot yet be resolved.

      Reviewer #3 (Recommendations for the authors):

      (1) Natural xanthones as α-Mangostin induce vasorelaxation via binding to key gating residues in the S6 domain of BK channels.

      (2) If α-Mangostin occupies a similar binding site to quaternary ammoniums, what is the explanation for not observing a reduction in the single-channel current (fast blocking effect)? The α-Mangostin site proposed here is in a region of the channel that should occlude ion permeation. The authors should discuss possible explanations for this apparently contradictory observation.

      As the reviewer states, we indeed have not observed a reduced single channel amplitude in any measurement. The THexA competition assay showed that ɑ-Mangostin is present in the pore cavity and interferes with THexA access to its binding site. However, we do not think that their binding sites are similar, as QA ions bind directly below the filter entrance to block permeation, while our studies suggest that ɑ-Mangostin binds in the upper portion of the cleft between S6 helices. In this position, it would clearly overlap with the QA binding site and hinder access, but not block permeation. We would therefore not expect to see an amplitude reduction by intermittent α-Mangostin block. Consistently, all binding poses in our dockings were close to the cavity wall, without interfering with the central ion conduction pathway. To better illustrate this, we have added updated intracellular views of the dockings in the Ca<sup>2+</sup>-free and Ca<sup>2+</sup>-bound state (which we have also now included as suggested by another reviewer) to the supplementary information (Fig. S4A).

      (3) In Figure 2D, it is difficult to appreciate the differences between the symbols representing the G-V relationships of BKa channels at different intracellular Ca concentrations, before and after activation with 10 μM a-Mangostin. A clearer distinction between the curves would help to interpret the data more easily.

      We thank the reviewer for the suggestion to improve figure accessibility. We have changed the line appearance for better discrimination of the overlying portions.

      (4) Both THexA and TPA block BK channels through voltage and state-dependent mechanisms. Therefore, their apparent affinity could change if a-Mangostin simply increases open probability or alters dwell times rather than physically blocking access to the binding site.

      The reviewer addresses valid limitations that can affect the meaningfulness of competition experiments under certain conditions. However, we think that this does not apply to our results:

      Previous studies have shown that the voltage dependence of quaternary ammonium blockers up to C<sub>10</sub> is rather weak in BK channels, and only a slight increase in block is present in the voltage range +30 mV to +100 mV (Li and Aldrich, 2004; Thompson and Begenisich, 2012). Hence, THexA voltage dependence has already reached a plateau in the competition assay (at +40 mV), and its voltage dependence would have little effect on our results.

      Controversy exists about the nature of the state dependence of different quaternary ammonium blockers, but TBA is often recognized as an open channel blocker of BK channels, which probably also applies to THexA (Wilkens and Aldrich, 2006; Tang, Zeng and Lingle, 2009; Thompson and Begenisich, 2012; Posson, McCoy and Nimigean, 2013). Assuming such an open-channel block, apparent IC<sub>50</sub> values would be inversely proportional to Po. The THexA IC<sub>50</sub> was about 80 nM in the basal state, when Po is very low (0.024 at +40 mV as derived from the GV-relationship); an increase of open dwell times, respectively Po, in the presence of α-Mangostin to, e.g., 0.3 would therefore lead to a ≈10-fold decrease in apparent IC<sub>50</sub>. However, the apparent THexA IC<sub>50</sub> strongly increased rather than decreased (more than 20-fold to around 1.6 µM). This cannot arise from Po change and must reflect the altered access of THexA to its binding site caused by α-Mangostin. Assuming a pure closed channel block where apparent IC<sub>50</sub> would correlate with the closed times, an increase of about 1.4-fold is expected. However, we recorded a much stronger 20-fold increase. Therefore, we are convinced that we have conclusively shown that α-Mangostin is present in the BK pore irrespective of the state dependence of THexA block.

      (5) The pH dependence of the V1/2 shift supports the idea that α-Mangostin becomes more negatively charged at higher pH (enhancing its effect.) However, although the data are consistent with this interpretation, additional controls such as using a non-ionizable analog or assessing solubility changes with pH would be needed to confirm that the shift is caused specifically by ionization of α-Mangostin and not by indirect pH effects on channel gating.

      We agree with the reviewer that the pH experiment by itself is not sufficient to clearly tie the existence of a charge to a possible activation mechanism. We still think that this is an interesting observation and should be made known, as we have investigated the mechanism of negatively charged activators in different K<sup>+</sup> channel families before (Schewe et al., 2019). Unfortunately, we do not have access to uncharged derivatives mimicking the 3D conformation. From the commercially available substances, the bare xanthone backbone is completely insoluble in water. We have therefore tested the derivative 3-hydroxyxanthone as example with a minimal number of hydroxyl substituents (Author response image 2, Author response table 2 ). The 3-hydroxyxanthone indeed shows reduced activation compared to α-Mangostin. The shift in V<sub>½</sub> induced by 10 µM 3-hydroxyxanthone was only 14.99 ± 5.67 mV (≈50 mV for α-Mangostin). This supports that the presence of several (potentially) charged substituents is important for the activation mechanism. However, we have no knowledge about the efficacy of the compound or the local pK<sub>a</sub> of the different hydroxyl groups. As the reviewer stated, systematic chemical modifications would be necessary to elucidate the importance of the charged substituent number and positions, which is not within our capabilities.

      Author response image 2.

      Activation of BKα by 3-hydroxyxanthone. (A) GV-relationship before and after application of 10 µM 3-hydroxyxanthone. (B) V<sub>½</sub> before and after application of 10 µM 3-hydroxyxanthone compared to α-Mangostin and the resulting difference in V<sub>½</sub> (ΔV<sub>½</sub>). Measurements were conducted as described in the main manuscript with 100 nM free Ca<sub>i</sub><sup>2+</sup>.

      Author response table 2.

      Comparison of the V<sub>½</sub> ± SEM and ΔV<sub>½</sub> ± SEM before and after activation by 10 µM α-Mangostin or 10 µM 3-hydroxyxanthone in BKα channels. Unpaired t-test, two-tailed P values (α=0.05)

      (6) The reduced V1/2 shifts observed in the I308A, L312M, and A316PP mutants may result from intrinsic gating alterations rather than a true loss of a-Mangostin binding. The GoSlo-SR-5-6 control is informative, but the persistence of activation in A316P does not fully resolve this. A more convincing test would be employing double or triple mutants.

      As stated above, we acknowledge that our functional data do not allow us to definitively separate effects arising from a true loss of binding affinity from those due to potential allosteric effects. We tried to minimize intrinsic gating alteration brought by substitutions by not conducting a pure alanine or cysteine scanning mutagenesis. Instead, substitutions were chosen to be closest to the wildtype GV-relationship in (Chen, Yan and Aldrich, 2014) where possible. While L312M was virtually identical to the wildtype, A316P showed a change in slope in high Ca<sup>2+</sup> concentrations, which could indicate a changed voltage sensitivity. Additionally, A316P completely abolished α-mangostin activation. We therefore also used A316G to ensure that the channel is functional and retains voltage sensitivity, even if its V<sub>½</sub> was shifted stronger. As we have conducted paired measurements and assessed the V<sub>½</sub> before and after activation, we are confident that we can attribute a reduced shift to the reduced action of α-mangostin.

      Following the reviewer’s suggestion, we have generated and measured the double mutants I308A/L312M, I308A/A316G, and L312M/A316G (the triple mutant I308A/L312M/A316G did not produce measurable currents). The mutants I308A/L312M and I308A/A316G showed a moderate energy-additive effect and reduced the shift in V<sub>½</sub> by further ≈7 mV compared to the single mutation with the stronger shift. The combination L312M/A316G, however, did not further reduce the shift seen in the single mutations and did not even produce the shift induced by A316G alone.

      Author response image 3.

      Double Mutants I308A/L312M, I308A/A316G and L312M/A316G compared to the single mutations in the main manuscript. The V½ before and after activation with 10 µM α-Mangostin, the resulting shift in V½, and the GV-relationships are shown (n=6-7), measurements were made as in Fig. 4.

      Author response table 3.

      Summary of the V<sub>½</sub> before and after Mangostin activation and the resulting shifts in V<sub>½</sub> for the double mutants compared to the single mutants shown in the main manuscript.

      Following a suggestion by another reviewer, we have generated Alphafold3 (AF3) models for I308A, L312M and A316P and repeated the Mangostin docking. We learned that the mutations are all predicted to substantially impact the structure of the S6 helix, therefore altering the binding region, and A316P especially impacted the nature of residue interactions. This could be an explanation why the double mutants do not show a clear and consistent additive effect.

      Unfortunately, this outcome is not conclusive and the double mutants do not reveal further information compared to the single mutants. We have therefore decided not to include these measurements in the manuscript.

      As we do not know if our answers will be sent to all reviewers, we repeat the relevant part about the AF3 models here:

      (…) According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift). (…)

      (7) The subtraction approach used to isolate BK currents (difference before and after a-Mangostin) assumes that the compound affects only BK channels. However, a-Mangostin could also modulate Cav currents directly, as reported for other polyphenolic compounds. No vehicle (DMSO) control is shown.

      We agree with the reviewer that α-Mangostin could also modulate Ca<sub>v</sub> currents; however, this would not interfere with the conclusions drawn from this nanodomain experiment. We intended to show the overall current modulation by ɑ-Mangostin in the voltage range relevant for Ca<sub>v</sub>-BK coupling, as this would be the determinant for the membrane potential mediating the vasoactive effect. In native tissue, BK and Ca<sub>v</sub> channels (among others) would likewise contribute to the net membrane conductance, with BK channels being a major contributor when activated. In fact, a concomitant inhibition of Ca<sub>v</sub> channels could act synergistically in favor of vasodilation. This could therefore be a subject for the further investigation of potential ɑ-Mangostin targets. However, the fact that iberiotoxin prevented relaxation in aortic preparations conclusively showed that BK channels are the major player in native tissue.

      We have reformulated some sentences to prevent misunderstandings that we refer to isolated BK currents instead of α-Mangostin activated currents.

      DMSO controls were conducted and did not impact BK or Ca<sub>v</sub>1.2 currents or the aortic tissue contraction. We have added representative measurements as Fig. S6 and stated the DMSO concentration in the Methods section (line 655).

      (8) Most kinetic fits were obtained at strong depolarizations (around +100 mV), which limits how well these results can be extrapolated to physiological voltages. Although the BK-Cav experiments show facilitation between -50 and +50 mV, providing plots for activation and deactivation in that range would strengthen the physiological relevance.

      We thank the reviewer for this valuable suggestion. We now additionally show that the impact of ɑ-Mangostin on activation is high at lower depolarisation, indeed underlining its physiological relevance. To address the activation time course in a more physiological voltage range, we have used our measurements of BKɑ channels in 10 µM Ca<sub>i</sub></sup>2+</sup> (where the V<sub>½</sub> shift induced by ɑ-Mangostin is equal to 100 nM ca<sub>i</sub><sup>2+</sup>+; Fig. 2D). The outward currents already present in the lower voltage range under these conditions allowed us to fit a monoexponential function to the traces of 0 mV to 100 mV prepulses. The τ of activation decreased from 29.6 ± 3.1 ms at 0 mV to 2.4 ± 2 ms at +100 mV. After ɑ-Mangostin activation, the time course was accelerated, with a τ of activation of 9.5 ± 4.7 ms at 0 mV to 2 ± 0.6 ms at +100 mV. This faster activation was particularly effective in the lower voltage range far from high Po, e.g., ɑ-Mangostin caused a decrease of more than half of the τ of activation at +20 mV (from 12.2 ± 0.6 ms to 4.98 ± 1.6 ms).

      Our data consists of families of different prepulse voltages and a fixed repolarisation step (to -50 mV for 100 nM free Ca<sub>i</sub><sup>2+</sup>, and to -100 mV for 10 µM free Ca<sub>i</sub><sup>2+</sup>). Thus, we are not able to add plots for the voltage-dependence of deactivation in the same way as for activation. However, we can present the deactivation time constants of lower prepulse voltage steps that produce outward currents in symmetrical ion conditions with 10 µM free Ca<sub>i</sub></sup>2+</sup>. For -20 mV and +20 mV prepulse voltages, which better reflect physiological depolarisation, the deactivation time constant shows a 3-to 5-fold increase after ɑ-Mangostin activation.

      We now show the plot for the voltage dependence of activation in Fig. S2A and a bar graph for activation/ deactivation time constants at +20 mV as Fig. S2B; data are summarized in Table S5. We hope this adds to illustrating the effect of ɑ-Mangostin under physiological conditions.

      (9) Minor: In several parts of the paper, induced shifts to negative voltages are referred to "leftward shifts". It would be useful to be consistent and employ a more specific reference to negative or positive directions.

      We thank the reviewer for the careful reading and have harmonized the terminology.

      References

      Chen, X., Yan, J. and Aldrich, R.W. (2014) “BK channel opening involves side-chain reorientation of multiple deep-pore residues,” Proceedings of the National Academy of Sciences, 111(1), pp. E79–E88. Available at: https://doi.org/10.1073/pnas.1321697111.

      Li, W. and Aldrich, R.W. (2004) “Unique Inner Pore Properties of BK Channels Revealed by Quaternary Ammonium Block,” Journal of General Physiology, 124(1), pp. 43–57. Available at: https://doi.org/10.1085/jgp.200409067.

      Posson, D.J., McCoy, J.G. and Nimigean, C.M. (2013) “The voltage-dependent gate in MthK potassium channels is located at the selectivity filter,” Nature Structural & Molecular Biology, 20(2), pp. 159–166. Available at: https://doi.org/10.1038/nsmb.2473.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Tang, Q.-Y., Zeng, X.-H. and Lingle, C.J. (2009) “Closed-channel block of BK potassium channels by bbTBA requires partial activation,” The Journal of General Physiology, 134(5), pp. 409–436. Available at: https://doi.org/10.1085/jgp.200910251.

      Thompson, J. and Begenisich, T. (2012) “Selectivity filter gating in large-conductance Ca2+-activated K+ channels,” Journal of General Physiology, 139(3), pp. 235–244. Available at: https://doi.org/10.1085/jgp.201110748.

      Wilkens, C.M. and Aldrich, R.W. (2006) “State-independent block of BK channels by an intracellular quaternary ammonium.,” The Journal of General Physiology, 128(3), pp. 347–364. Available at: https://doi.org/10.1085/jgp.200609579.

    1. eLife Assessment

      This study resolves a cryo-EM structure of the GPCR, human GPR30, which responds to bicarbonate and regulates cellular responses to pH and ion homeostasis. Understanding the ligand and the mechanism of activation is important to the field of receptor signaling and potentially facilitates drug development targeting this receptor. Structures and functional assays provide solid evidence for a potential bicarbonate binding site.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers.]

      Summary:

      This study resolves a cryo-EM structure of the GPCR, GPR30, in the presence of bicarbonate, which the author's lab recently identified as the physiological ligand. Understanding the ligand and the mechanism of activation is of fundamental importance to the field of receptor signaling. This solid study provides important insight into the overall structure and suggests a possible bicarbonate binding site.

      Strengths:

      The overall structure, and proposed mechanism of G-protein coupling are solid. Based on the structure, the authors identify a binding pocket that might accommodate bicarbonate. Although assignment of the binding pocket is speculative, extensive mutagenesis of residues in this pocket identifies several that are important to G-protein signaling. The structure shows some conformational differences with a previous structure of this protein determined in the absence of bicarbonate (PMC11217264). To my knowledge, bicarbonate is the only physiological ligand that has been identified for GPR30, making this study an important contribution to the field. However, the current study provides novel and important circumstantial evidence for the bicarbonate binding site based on mutagenesis and functional assays.

      Weaknesses:

      Bicarbonate is a challenging ligand for structural and biochemical studies, and because of experimental limitations, this study does not elucidate the exact binding site. Higher resolution structures would be required for structural identification of bicarbonate. The functional assay monitors activation of GPR30, and thus reports on not only bicarbonate binding, but also the integrity of the allosteric network that transduces the binding signal across the membrane. However, biochemical binding assays are challenging because the binding constant is weak, in the mM range.

      The authors appropriately acknowledge the limitations of these experimental approaches, and they build a solid circumstantial case for the bicarbonate binding pocket based on extensive mutagenesis and functional analysis. However, the study does fall short of establishing the bicarbonate binding site.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, "Cryo-EM structure of the bicarbonate receptor GPR30," the authors aimed to enrich our understanding of the role of GPR30 in pH homeostasis by combining structural analysis with a receptor function assay. This work is a natural development and extension of their previous work on Nature Communications (PMID: 38413581). In the current body of work, they solved the cryo-EM structure of the human GPR30-G-protein (mini-Gsqi) complex in the presence of bicarbonate ions at 3.15 Å resolution. From the atomic model built based on this map, they observed the overall canonical architecture of class A GPCR and also identified 3 extracellular pockets created by ECLs (Pockets A-C). Based on the polarity, location, size, and charge of each pocket, the authors hypothesized that pocket A is a good candidate for the bicarbonate binding site. To identify the bicarbonate binding site, the authors performed an exhaustive mutant analysis of the hydrophilic residues in Pocket A and analyzed receptor reactivity via calcium assay. In addition, the human GPR30-G-protein complex model also enabled the authors to elucidate the G-protein coupling mechanism of this special class A GPCR, which plays a crucial role in pH homeostasis.

      Strengths:

      As a continuation of their recent Nature Communications publication, the authors used cryo-EM coupled with mutagenesis and functional studies to elucidate bicarbonate-GPR30 interaction. This work provided atomic-resolution structural observations for the receptor in complex with G-protein, allowing us to explore its mechanism of action, and will further facilitate drug development targeting GPR30. There were 3 extracellular pockets created by ECLs (Pockets A-C). The authors were able to filter out 2 of them and hypothesized that pocket A was a good candidate for the bicarbonate binding site based on the polarity, location, and charge of each pocket. From there, the authors identified the key residues on GPR30 for its interaction with the substrate, bicarbonate. Together with their previous work, they mapped out amino acids that are critical for receptor reactivity.

      Weaknesses:

      When we see a reduction of a GPCR-mediated downstream signaling, several factors could potentially contribute to this observation: 1) a reduced total expression of this receptor due to the mutation (transcription and translation issue); 2) a reduced surface expression of this receptor due to the mutation (trafficking issue); and 3) a dysfunctional receptor that doesn't signal due to the mutation.

      Altogether, the wide range of surface expression across the different cell lines, combined with the different receptor function readouts, makes the cell functional data only partially support their structural observations.

    4. Reviewer #3 (Public review):

      Summary

      GPR30 responds to bicarbonate and plays a role in regulating cellular pH and ion homeostasis. However, the molecular basis of bicarbonate recognition by GPR30 remains unresolved. This study reports the cryo-EM structure of GPR30 bound to a chimeric mini-Gq in the presence of bicarbonate, revealing mechanistic insights into its G-protein coupling. Nonetheless, the study does not identify the bicarbonate-binding site within GPR30.

      Strengths

      The work provides strong structural evidence clarifying how GPR30 engages and couples with Gq.

      Weaknesses

      Several GPR30 mutants exhibited diminished responses to bicarbonate, but their expression levels were also reduced. As a result, the mechanism by which GPR30 recognizes bicarbonate remains uncertain.

    5. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This study resolves a cryo-EM structure of the GPCR, GPR30, in the presence of bicarbonate, which the author's lab recently identified as the physiological ligand. Understanding the ligand and the mechanism of activation is of fundamental importance to the field of receptor signaling. This solid study provides important insight into the overall structure and suggests a possible bicarbonate binding site.

      Strengths:

      The overall structure, and proposed mechanism of G-protein coupling are solid. Based on the structure, the authors identify a binding pocket that might accommodate bicarbonate. Although assignment of the binding pocket is speculative, extensive mutagenesis of residues in this pocket identifies several that are important to G-protein signaling. The structure shows some conformational differences with a previous structure of this protein determined in the absence of bicarbonate (PMC11217264). To my knowledge, bicarbonate is the only physiological ligand that has been identified for GPR30, making this study an important contribution to the field. However, the current study provides novel and important circumstantial evidence for the bicarbonate binding site based on mutagenesis and functional assays.

      Weaknesses:

      Bicarbonate is a challenging ligand for structural and biochemical studies, and because of experimental limitations, this study does not elucidate the exact binding site. Higher resolution structures would be required for structural identification of bicarbonate. The functional assay monitors activation of GPR30, and thus reports on not only bicarbonate binding, but also the integrity of the allosteric network that transduces the binding signal across the membrane. However, biochemical binding assays are challenging because the binding constant is weak, in the mM range.

      The authors appropriately acknowledge the limitations of these experimental approaches, and they build a solid circumstantial case for the bicarbonate binding pocket based on extensive mutagenesis and functional analysis. However, the study does fall short of establishing the bicarbonate binding site.

      We thank the reviewer for this thoughtful and constructive assessment of our revised manuscript. We are grateful for the recognition of the overall quality of the cryo-EM structure and the proposed mechanism of G-protein coupling, as well as for highlighting the importance of identifying bicarbonate as a physiological ligand for GPR30 and the contribution this work makes to the receptor signaling field. We also appreciate the reviewer’s careful and balanced discussion of the inherent challenges posed by bicarbonate as a low-affinity, small, negatively charged ligand, and we fully agree that, given current experimental limitations, our data provide circumstantial—rather than definitive—evidence for the binding site and that higher-resolution structures would be required for direct visualization. Importantly, we value the reviewer’s acknowledgement that we transparently describe these limitations and that our extensive mutagenesis and functional analyses nonetheless build a solid case for the proposed bicarbonate-binding pocket, which we believe will serve as a useful framework for future biochemical and structural investigation

      Reviewer #1 (Recommendations for the authors):

      Overall, the authors do a good job responding to the previous review, with updated structures and experimental data. I have two comments on the current version:

      (1) When the authors compare their structure to a previously published structure of the same receptor, they say that the previous structure came out while the current manuscript was in revision (line 255). This is not correct. The previous manuscript was published May 14, 2024, and the current manuscript was received by eLife on May 20, 2024. This sentence should be corrected to "During the preparation of this manuscript..."

      We corrected the sentence accordingly (line 259).

      (2) Line 173: what other structures are the authors referring to? Citations should be included here.

      Is Line 193 correct? We added citations (line 190).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, "Cryo-EM structure of the bicarbonate receptor GPR30," the authors aimed to enrich our understanding of the role of GPR30 in pH homeostasis by combining structural analysis with a receptor function assay. This work is a natural development and extension of their previous work on Nature Communications (PMID: 38413581). In the current body of work, they solved the cryo-EM structure of the human GPR30-G-protein (mini-Gsqi) complex in the presence of bicarbonate ions at 3.15 Å resolution. From the atomic model built based on this map, they observed the overall canonical architecture of class A GPCR and also identified 3 extracellular pockets created by ECLs (Pockets A-C). Based on the polarity, location, size, and charge of each pocket, the authors hypothesized that pocket A is a good candidate for the bicarbonate binding site. To identify the bicarbonate binding site, the authors performed an exhaustive mutant analysis of the hydrophilic residues in Pocket A and analyzed receptor reactivity via calcium assay. In addition, the human GPR30-G-protein complex model also enabled the authors to elucidate the G-protein coupling mechanism of this special class A GPCR, which plays a crucial role in pH homeostasis.

      Strengths:

      As a continuation of their recent Nature Communications publication, the authors used cryo-EM coupled with mutagenesis and functional studies to elucidate bicarbonate-GPR30 interaction. This work provided atomic-resolution structural observations for the receptor in complex with G-protein, allowing us to explore its mechanism of action, and will further facilitate drug development targeting GPR30. There were 3 extracellular pockets created by ECLs (Pockets A-C). The authors were able to filter out 2 of them and hypothesized that pocket A was a good candidate for the bicarbonate binding site based on the polarity, location, and charge of each pocket. From there, the authors identified the key residues on GPR30 for its interaction with the substrate, bicarbonate. Together with their previous work, they mapped out amino acids that are critical for receptor reactivity.

      Weaknesses:

      When we see a reduction of a GPCR-mediated downstream signaling, several factors could potentially contribute to this observation: 1) a reduced total expression of this receptor due to the mutation (transcription and translation issue); 2) a reduced surface expression of this receptor due to the mutation (trafficking issue); and 3) a dysfunctional receptor that doesn't signal due to the mutation. In the current revision, based on the gating strategy, the surface expression of the HA-positive WT GPR30-expressing cells is only 10.6% of the total population, while the surface expression levels of the mutants range from 1.89% (P71A) to 64.4% (D111A). Combining this information with the functional readout in Figure 3F and G, as well as their previous work, the authors concluded that mutations at P71, E115, D125, Q138, C207, D210, and H307 would decrease bicarbonate responses. Among those sites,

      E115, Q138, and H307 were from their previous Nature Comm paper.

      Authors claim P71 and C207 make a structural-stability contribution, as their mutations result in a significant reduction in surface expression: P71A (1.89%) and C207A (2.71%). However, compared to 10.6% of the total population in the WT, (P71A is 17.8% of the WT, and C207A is 25.6% of the WT), this doesn't rule out the possibility that the mutated receptor is also dysfunctional: at 10 mM NaHCO3, RFU of WT is ~500, RFU of P71 and C207 are ~0.

      The authors also interpret "The D125ECL1A mutant has lost its activity but is located on the surface" and only mention "D125 is unlikely to be a bicarbonate binding site, and the mutational effect could be explained due to the decreased surface expression". Again, compared to 10.6% of the total population in the WT, D125A (3.94%) is 37.2% of the WT. At 10 mM NaHCO3, the RFU of the WT is ~500, the RFU of D125 is ~0. This doesn't rule out the possibility that the mutated receptor is also dysfunctional. It is not clear why D125A didn't make it to the surface.

      Other mutants that the authors didn't mention much in their text: D111A (64.4%, 607.5% of WT surface expression), E121A (50.4%, 475.5% of WT surface expression), R122 (41.0%, 386.8% of WT surface expression), N276A (38.9%, 367.0% of WT surface expression) and E218A (24.6%, 232.1% of WT surface expression) all have similar RFU as WT, although the surface expression is about 2-6 times more. On the other hand, Q215A (3.18%, 30% of WT surface expression) has similar RFU as WT, with only a third of the receptor on the surface.

      Altogether, the wide range of surface expression across the different cell lines, combined with the different receptor function readouts, makes the cell functional data only partially support their structural observations.

      We sincerely thank the reviewer for their careful reading and thoughtful evaluation of our manuscript on the cryo-EM structure of the bicarbonate receptor GPR30. We greatly appreciate the reviewer’s positive assessment of the overall significance of combining structural determination with extensive mutagenesis and functional assays to advance understanding of bicarbonate–GPR30 interactions and G-protein coupling, as well as their recognition that these atomic-level insights will be valuable for future mechanistic studies and drug-development efforts. We are also grateful for the reviewer’s constructive critique regarding the interpretation of reduced signaling in the context of variable surface expression across mutants, which highlights an important point about disentangling effects of expression/trafficking from intrinsic receptor dysfunction; these comments are highly insightful and will help us strengthen the clarity and rigor of our presentation and conclusions in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      In this revision, the authors have made a significant effort to improve and validate the structural observations, as well as address the comments in the previous submission. They updated the functional assays and evaluated the receptor function by measuring intracellular calcium mobilization, which is a more direct measurement for the downstream signaling of hGPR30-Gq signaling. They also used flow cytometry with an HA-antibody for a more direct measurement of the surface expression of the receptor, replacing their previous assay that normalized to the housekeeping gene Na-K-ATPase.

      I appreciate the effort the authors made to address the previous comments made by the reviewers. However, there are still some concerns about the current data.

      (1) The authors have addressed my previous comment on untangling the mixture of their previous and new data in the "insights into bicarbonate binding" section. They have made it clear that the importance of E115, Q138, and H307 in the receptor-bicarbonate interaction was shown in their Nature Communications paper.

      (2) The authors have addressed my previous comment on adding some content about the physiological concentration of HCO3, or referring more to their previous work about the rationale to select the bicarbonate dose in their functional assay.

      (3) The authors have updated Figure 3

      (4) The authors have updated Supplemental Figure 1 to show the full gel with molecular weight markers in the supplemental data to demonstrate the sample purity.

      (5) The authors have updated the predicted model using AF3

      (6) The authors added E218A as suggested before.

      Some new suggestions for this R1:

      (1) The wide range of surface expression across the different cell lines, combined with the different receptor function readouts, makes the cell functional data only partially support their structural observations.

      We acknowledge this limitation. The wide range of surface expression among cell lines, together with differences in assay modalities, may introduce variability that complicates direct quantitative comparisons and therefore only partially supports the structural observations. Future work using more standardized expression systems and matched functional readouts will be important to strengthen the structure–function linkage.

      (2) Line 101, "ICL1 and ECL1 contain short α helices", no α helix of ICL1 is shown in Figure 2C

      We removed the word “ICL1” (line 98).

      (3) For the unsolved region of ECL2, could the author put a dashed line connecting ECL2 with TM4? In the current Figure 2B, it looks like ECL2 connects TM3 and TM5.

      According to the suggestion, we corrected Figure 2B.

      (4) I appreciate that the authors updated the predicted model with AF3, but they didn't make it clear why they had the comparison between their cryo-EM structure (bicarbonate-activated G-protein-incorporated GPR30) and the predicted AF3 model (inactive GPR30)

      We wish to assert the usefulness of experimental structures, not merely predictions. These include structures independent of receptor activation, such as SS bonds.

      (5) I appreciate that the authors have addressed my previous comment on adding some content about the physiological concentration of HCO3, but it was still not clear to me why they picked 11 mM in Figure 3G for the bar graph. Also, since a dose-response curve was made in Figure 3F, why not just calculate and report the EC50 of NaHCO3 for each mutant?

      Thank you for your comment. Thank you for the comment. We’ve calculated the EC50 of the calcium response and assessed its correlation with receptors’ cell surface expression. We chose 11 mM in Fig .3G since our previous paper in Nature Communications showed the EC50 value of IPs assay was around 11 mM. However, the calcium response was more sensitive and gave a lower value than expected. Therefore, according to your advice, we deleted the bar graph with 11 mM responses, calculated EC50, and drew pictures of the correlation among cell surface expression, EC50, and maximum responses (Figure 3F-I, Supplementary File 1). Moreover, we revised the explanation about this mutagenesis study (lines139-154 and 217-230).

      (6) In the previous submission and comments, E218 was in close contact with bicarbonate in the previous Figure 4D (the bicarbonate is deleted in the new structure). I thank the authors for making an E218A mutant and performing the functional assay. As mentioned above, E218A (24.6%, 232.1% of WT surface expression) has a similar functional readout as WT. Doesn't this also indicate that E218A is partially broken, so you will need twice as much as WT to have the same downstream signal?

      Thank you for your comment. In our revised manuscript, we described the correlation between cell surface expression and EC50 and found that cell surface expression and the response to bicarbonate are not correlated, which you mentioned in your review comment (Figure 3F-I, Supplementary File 1). There are many possibilities that could explain this: GPR30 localization in specific spots on the plasma membrane might limit the response stoichiometry, GPR30 might also work intracellularly to blunt the increased response because of more GPR30 expression on PM, redundant GPR30 on PM might be broken, or E118A might be less functional and need twice as much as WT. We will examine cell surface expression of GPR30 and its response to bicarbonate in a future study.

      I would suggest that the authors in future studies consider using the Tet-on inducible cell lines, such as HEK293 Flp-In Trex. These cell lines will allow the authors to fine-tune the surface expression of their mutants to the same level with different doses of Tetracycline in their stable cell lines.

      We appreciate your advice. We’ll introduce Tet-on inducible cell lines for future research.

      Reviewer #3 (Public review):

      Summary

      GPR30 responds to bicarbonate and plays a role in regulating cellular pH and ion homeostasis. However, the molecular basis of bicarbonate recognition by GPR30 remains unresolved. This study reports the cryo-EM structure of GPR30 bound to a chimeric mini-Gq in the presence of bicarbonate, revealing mechanistic insights into its G-protein coupling. Nonetheless, the study does not identify the bicarbonate-binding site within GPR30.

      Strengths

      The work provides strong structural evidence clarifying how GPR30 engages and couples with Gq.

      Weaknesses

      Several GPR30 mutants exhibited diminished responses to bicarbonate, but their expression levels were also reduced. As a result, the mechanism by which GPR30 recognizes bicarbonate remains uncertain, leaving this aspect of the study incomplete.

      We sincerely thank the reviewer for this thoughtful and balanced assessment of our manuscript, including the clear summary of the central advance and the constructive identification of remaining limitations. We particularly appreciate the recognition that our cryo-EM analysis provides strong structural evidence for how GPR30 engages and couples with Gq, and we agree that pinpointing the bicarbonate-binding site remains a critical open question. In the revised manuscript, we will make this point more explicit, clarify the interpretation of the mutagenesis results in light of reduced receptor expression for some variants, and further strengthen the presentation and discussion of what our current data do—and do not—allow us to conclude regarding bicarbonate recognition by GPR30

      Reviewer #3 (Recommendations for the authors):

      The authors have removed the bicarbonate assignment from their model and have addressed all of my concerns. In this study, or in future work, it would be advisable for the authors to explore the use of bicarbonate mimetics with higher binding affinity to facilitate more definitive structural characterization.

      Thank you for this constructive suggestion. We agree that exploring bicarbonate mimetics with higher binding affinity would be an important next step to enable more definitive structural characterization of GPR30 and to strengthen mechanistic conclusions. In future work, we plan to pursue the identification and/or design of such mimetics, guided by the architecture and mutational landscape of the extracellular pocket described here, and to combine these ligands with optimized cryo-EM sample preparation and complementary functional assays to better stabilize and visualize the bound state.

    1. eLife Assessment

      This study introduces a valuable toolkit for zebrafish transgenesis, significantly enhancing the flexibility and efficiency of transgene generation for immunological applications. The authors provide convincing evidence through well-designed experiments, demonstrating the toolkit's utility in generating diverse and functional transgenic lines.

    2. Reviewer #1 (Public review):

      Summary:

      The authors introduce ImPaqT, a modular toolkit for zebrafish transgenesis, utilizing the Golden Gate cloning approach with the rare-cutting enzyme PaqCI. The toolkit is designed to streamline the construction of transgenes with broad applications, particularly for immunological studies. By providing a versatile platform, the study aims to address limitations in generating plasmids for zebrafish transgenesis.

      Strengths:

      The ImPaqT toolkit offers a modular method for constructing transgenes tailored to specific research needs. By employing Golden Gate cloning, the system simplifies the assembly process, allowing seamless integration of multiple genetic elements while maintaining scalability for complex designs. The toolkit's utility is evident from its inclusion of a diverse range of promoters, genetic tools, and fluorescent markers, which cater to both immunological and general zebrafish research needs. Even small DNA fragements, such as the viral 2a sequence, can be cloned into a multi-component plasmid in one step. The components can be assembled from PCR fragments or synthesized DNA fragments, forgoing the need for "entry" vectors. Further, the authors show that the exisiting PaqCI sites can be domesticated to improve the versatility of the system. The validation provided in the manuscript is Convincing, demonstrating the successful generation of several functional transgenic lines. These examples highlight the toolkit's efficacy, particularly for immune-focused applications.

      Comments on revisions:

      The authors have addressed all the concerns raised in the first review. Congratulations to the authors for their effort.

    3. Reviewer #2 (Public review):

      Summary:

      Hurst et al. developed a new Tol2-based transgenesis system, ImPaqT, an Immunological toolkit for PaqCl-based Golden Gate Assembly of Tol2 Transgenes, to facilitate the production of transgenic zebrafish lines. This Golden Gate assembly-based approach relies on only a short 4-base-pair overhang sequence in the final construct, and the insertion construct and backbone vector can be assembled in a single-tube reaction using PaqCl and a ligase. This approach can also be expandable by introducing new overhang sequences while maintaining compatibility with existing ImPaqT constructs, allowing users to add fragments as needed.

      The generation of several transgenic zebrafish lines for immunological studies demonstrates the feasibility of the ImPaqT in vivo. Lineage tracing of macrophages via LPS injection demonstrates the approach's functionality and validates its use in vivo.

      Comments on revisions:

      The authors have addressed all my concerns.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their careful reading of our manuscript and thoughtful comments on it. We appreciate the overall positive opinion on our manuscript and helpful comments and suggestions from the reviewers. Overall, the main points identified by reviewers were 1) further broadening of the system to a range of inputs as well as the construct types that can be generated with the system and 2) Further consideration of any off-target joining or off-target effects on genes/proteins and the limits to the expandability of the kit. To address these concerns, we have added new data in Figure 6, illustrating the generation of a new construct using PCR and dsDNA fragments, new constructs for mpeg1.1 and for CRISPR gRNA expression and have revised the text to further address concerns and limitations of the toolkit. We thank the reviewers and editors for these suggestions and feel that they have substantially improved the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors introduce ImPaqT, a modular toolkit for zebrafish transgenesis, utilizing the Golden Gate cloning approach with the rare-cutting enzyme PaqCI. The toolkit is designed to streamline the construction of transgenes with broad applications, particularly for immunological studies. By providing a versatile platform, the study aims to address limitations in generating plasmids for zebrafish transgenesis.

      Strengths:

      The ImPaqT toolkit offers a modular method for constructing transgenes tailored to specific research needs. By employing Golden Gate cloning, the system simplifies the assembly process, allowing seamless integration of multiple genetic elements while maintaining scalability for complex designs. The toolkit's utility is evident from its inclusion of a diverse range of promoters, genetic tools, and fluorescent markers, which cater to both immunological and general zebrafish research needs. Furthermore, the modular design ensures expandability, enabling researchers to customize constructs for diverse experimental designs. The validation provided in the manuscript is solid, demonstrating the successful generation of several functional transgenic lines. These examples highlight the toolkit's efficacy, particularly for immune-focused applications.

      We appreciate the overall positive evaluation of our toolkit and the time and effort in evaluating it.

      Weaknesses:

      While the toolkit's technical capabilities are well-demonstrated, there are several areas where additional validation and examples could enhance its impact. One limitation is the lack of data showing whether the toolkit can be directly used for rapid cloning and testing of enhancers or promoters, particularly cloning them directly from PCR using PaqCI overhangs without needing an entry vector. Similarly, the feasibility of cloning genes directly from PCR products into the system is not demonstrated, which would significantly increase the utility for researchers working with genomic elements.

      This is an excellent point. Given the increased use of gene synthesis and dsDNA fragments, we also thought it was good to demonstrate incorporation of these as well. We have added a new figure, Figure 6, which demonstrates generation of two new transgene constructs constructed by direct cloning of three PCR products along with a synthetic dsDNA fragment into a Tol2 flanked backbone plasmid as an alternative, rapid approach to generation of transgenes. The resulting plasmids, encoding the mpeg1.1. promoter, a separate p2a, and a tdTomato fluorescent protein along with either wildtype or dominant negative rac2 were properly assembled and in transient transgenic zebrafish injected with these constructs, dominant negative rac2 prevented macrophage recruitment to tail wounds, indicating that this approach worked for the generation of functional transgenes. These results are discussed in new text (lines 304-391) describing this new experiment and the finding that both PCR products and synthesized dsDNA could be efficiently incorporated in constructions generated with our approach as well as in the discussion (lines 494-499).

      The authors discuss potential applications such as using the toolkit for tissue-specific knockout applications by assembling CRISPR/Cas9 gRNA constructs. However, they do not demonstrate the cloning of short fragments, such as gRNA sequences downstream of a U6 promoter, which would be an important proof-of-concept to validate these applications. Furthermore, while the manuscript focuses on macrophage-specific promoters, the widely used mpeg1.1 promoter is not included or tested, which limits the toolkit's appeal for researchers studying macrophages and microglia.

      Yes, in the new figure described above, we have now shown that this method works with shorter PCR fragments such as the p2a fragment cloned within the tdTomato-p2a-rac2 constructs described above. This fragment is ~70 bp and while this is somewhat longer than a simple gRNA targeting sequence (though smaller than a complete sgRNA), we believe that this indicates that smaller size fragments can still be incorporated within these constructs. We also agree with the general idea of increasing functionality to incorporate CRISPR/Cas9 and now include a 3E encoding the zebrafish U6 promoter. As CRISPR expression constructs frequently incorporate complex construction, for instance, expression of tagged Cas9 along with the U6 driven gRNA as in Zhou et al., 2018 or along with rescue constructs as in Wang et al., 2021, we have given these constructs the non-standard 5’ end O3c, to enable multiplexing in these complex constructs.

      We agree that it is important to include mpeg1.1, given the broad use of this promoter within the field, we’ve now included an 5E mpeg1.1 construct within the toolkit.

      Another potential limitation is the handling of sequences containing PaqCI recognition sites. Although the authors discuss domestication to remove these sites, a demonstration of cloning strategies for such cases or alternative methods to address these challenges would provide practical guidance for users.

      Absolutely, we have now included a new figure (Supplementary Figure 6) that illustrates one domestication approach using PCR and homology-based cloning as an easy approach to domestication. In addition, we have also mentioned alternative approaches for domestication in the discussion (lines 439-444).

      Reviewer #2 (Public review):

      Summary:

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, an Immunological toolkit for PaqCl-based Golden Gate Assembly of Tol2 Transgenes, to facilitate the production of transgenic zebrafish lines. This Golden Gate assembly-based approach relies on only a short 4-base pair overhang sequence in their final construct, and the insertion construct and backbone vector can be assembled in a single-tube reaction using PaqCl and ligase. This approach can also be expandable by introducing new overhang sequences while maintaining compatibility with existing ImPaqT constructs, allowing users to add fragments as needed.

      Strengths:

      The generation of several lines of transgenic zebrafish for the immunologic study demonstrates the feasibility of the ImPaqT in vivo. The lineage tracing of macrophages by LPS injection shows this approach's functionality, validating its usage in vivo.

      We appreciate the positive sentiments for our toolkit and the effort put into reviewing our manuscript.

      Weaknesses:

      (1) There is no quantitative data analysis showing the percentage of off-target based on these 4bp overhang sequences.

      While we agree that this is an important variable for the method, we feel that previous studies that have broadly tested off-target effects of all potential 4 bp overhang sequences have already given an effective overview of interactions between each of these overhangs (Potapov et al., 2018; Pryor et al., 2020). The results from these studies were incorporated into the NEB ligase fidelity viewer that we used to predict the overhangs that would have minimal off-target with each other: the tool also reports the expected off-target ligation of individual 4 bp overhangs. In all cases, we selected overhangs that would have minimal off-target efficiency, with each of the overhangs showing 1% or less off-target ligation with any of the other overhangs chosen. We have added new text, lines 119-124, that further clarifies that our selection for these ends.

      (2) There is no statement for the upper limitation of the expandability.

      Yes, we’ve been curious as well. While our cloning of 6 distinct fragments in Figure 5 and a new 5 fragment cloning added in revision seen in Figure 6, suggests that 5-6 fragments can be readily assembled, in the course of revisions we also attempted to generate a larger product of 11 fragments that ultimately failed. While the 11 fragment construct was unsuccessful, it is unclear whether this is due to the constructs chosen, the potential size of the plasmid or due to a failure of the technique/enzymes themselves. Given that published descriptions of PaqCI Golden Gate cloning approaches have found that PaqCI can assemble at least 32 fragments and can produce large sequences (e.g. in Sikkema et al., 2023, where they assemble the ~40 kbp T7 genome from 12, 24 and 32 distinct fragments using a PaqCI Golden Gate reaction), we suspect that our issues with the 11 fragment assembly are likely due to complications with the specific group of constructs that were combined, however, we have not been able to exhaustively test a range of constructs and assemblies of varying complexity levels. To recognize this, we have added additional text (lines 490-493) to the discussion describing that we have only combined 6 constructs, but that we think that this likely encompasses many of the applications that may be needed for this system, while recognizing that expansion beyond this number may be possible.

      (3) There is no data about any potential side effect on their endogenous function of promoter/protein of interest with the ImPaqT method.

      Absolutely, we have added new text (lines 457-470) to our discussion describing the potential side effects on protein function. For instance, the need to be aware of whether N- or C-termini of proteins can be modified and recognition of the potential for affecting/creating ectopic transcription factor binding sites as potential pitfalls to keep in mind.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The data presented in the manuscript is robust and well-supported. However, to fully demonstrate the broad applicability of the toolkit and strengthen its impact, a few additional experiments could be beneficial. Specific suggestions for these experiments and areas of improvement are outlined in the 'Weaknesses' section of the Public Review. Additionally, Figures 2-4 illustrate the same concept - cloning three fragments from entry vectors-which comes across as repetitive. Incorporating a more diverse range of use cases would better highlight the versatility of the toolkit.

      As we described in our replies to your public points above, we have now added new Figure 6 and new Supplementary Figure 6 addressing the cloning of PCR fragments, short fragments as well as a mechanism of domestication. We have also included the mpeg1.1 promoter within the toolkit. In addition, your point on the repetition of assay is fair and in our new Figure 6, we instead used wild type and dominant-negative Rac2 expression and failure of macrophage recruitment to the tail wound.

      Reviewer #2 (Recommendations for the authors):

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, it is interesting and potentially efficient, but I have a few concerns:

      (1) The author claimed that the ImPaqT system is more efficient than other existing systems. The authors should provide such data to support their claim.

      Our argument wouldn’t be that the ImPaqT system is strictly speaking more efficient, but rather that the combination of minimal added sequence, the ability to expand or contract the fragments used, and, in our new Figure 6, the ability to directly utilize PCR products and dsDNA fragments, while retaining the ability to combinatorially build constructs from a suite of existing sequences is the main point of the method. We now explicitly state that Golden Gate cloning isn’t more efficient than existing techniques in the text (lines 534-537), but rather the particular strength of the method is the flexibility and minimal added sequence.

      (2) The ImPaqT is theoretically less prone to have off-target effects than existing systems, the authors should provide such data to validate their claim.

      Good point, we have now searched the zebrafish genome for PaqCI sites as well as for BsaI and BsmBI which are the 6-base cutters most commonly used for Golden Gate cloning. We found that PaqCI cuts every ~17 kb in the zebrafish genome while BsaI and BsmBI cut every ~9 kb or ~13 kb respectively, further supporting that PaqCI sites are rarer in the genome and should generally require domestication less often. We have now added new text describing this in lines 129-132.

      (3) The authors should mention any potential side effects of this system on the endogenous function of the promoter/protein of interest, at least in their discussion part.

      Yes, this should absolutely be expanded, as we said in your public comments above, we have now added new text describing potential pitfalls that this method may have on promoter or gene expression.

      (4) The authors are suggested to provide a balanced discussion about the expandable usage of this system beyond the immune system.

      We agree, this is also a good point that we should have emphasized more. We’ve added new text (lines 537-541) recognizing that in principle, many of the components we’ve derived should be useful in non-immune systems, but we also recognize that adapting this to new tissues will require the development of new promoters within the Golden Gate system which can be combined with these already developed tools.

      References

      Potapov, V., Ong, J.L., Kucera, R.B., Langhorst, B.W., Bilotti, K., Pryor, J.M., Cantor, E.J., Canton, B., Knight, T.F., Evans, T.C., Jr., et al. (2018). Comprehensive Profiling of Four Base Overhang Ligation Fidelity by T4 DNA Ligase and Application to DNA Assembly. ACS Synth Biol 7, 2665-2674.

      Pryor, J.M., Potapov, V., Kucera, R.B., Bilotti, K., Cantor, E.J., and Lohman, G.J.S. (2020). Enabling one-pot Golden Gate assemblies of unprecedented complexity using data-optimized assembly design. PLoS One 15, e0238592.

      Sikkema, A.P., Tabatabaei, S.K., Lee, Y.J., Lund, S., and Lohman, G.J.S. (2023). High-Complexity One-Pot Golden Gate Assembly. Curr Protoc 3, e882.

      Wang, Y., Hsu, A.Y., Walton, E.M., Park, S.J., Syahirah, R., Wang, T., Zhou, W., Ding, C., Lemke, A.P., Zhang, G., et al. (2021). A robust and flexible CRISPR/Cas9-based system for neutrophilspecific gene inactivation in zebrafish. J Cell Sci 134.

      Zhou, W., Cao, L., Jeffries, J., Zhu, X., Staiger, C.J., and Deng, Q. (2018). Neutrophil-specific knockout demonstrates a role for mitochondria in regulating neutrophil motility in zebrafish. Dis Model Mech 11.

    1. eLife Assessment

      This study maps the genotype-phenotype landscapes of three E. coli transcription factors and the topographical features of these landscapes. It shows that ruggedness and epistasis do not hinder the evolution of strong transcription factor binding sites. These convincing findings contribute important insights into fitness landscape theories and highlight the role of chance, contingency, and evolutionary biases in gene regulation. The authors then study the topographical features of these landscapes, especially the number and distribution of local maxima, as well as the statistical properties of evolutionary paths on these landscapes.

    2. Reviewer #1 (Public review):

      Summary:

      For each of three key transcription factor (TF) proteins in E. coli, the authors generate a large library of TF binding site (TFBS) sequences on plasmids, such that each TFBS is coupled to the expression of a fluorescence reporter. By sorting the fluorescence of individual cells and sequencing their plasmids to identify each cell's TFBS sequence (sort-seq), they are able to map the landscape of these TFBSs to the gene expression level they regulate. The authors then study the topographical features of these landscapes, especially the number and distribution of local maxima, as well as the statistical properties of evolutionary paths on these landscapes. They find the landscapes to be highly rugged, with about as many local peaks as a random landscape would have, and with those peaks distributed approximately randomly in sequence space. This is quite different from previous work on landscapes for eukaryotic TFBSs, which tend to be rather smooth. The authors find that there are a number of peaks that produce regulation stronger than that of the wild-type sequence for each TF, and that it is not too unlikely to reach one of those "high peaks" from a random starting sequence. Nevertheless, the basins of attraction for different peaks have significant overlap, which means that chance plays a major role in determining which peak a population will evolve to.

      Strengths:

      (1) The apparent differences in landscape topography between prokaryotic TFBSs and other molecular landscapes is a fascinating discovery to add to the field of genotype-phenotype maps. I am really excited to learn the molecular mechanisms of this in the future.

      (2) The experiments and analysis of this paper are very well-executed and, by and large, very thorough. I appreciated the systematic nature of the project, both the large-scale experiments done on three TFs with replicates, and the systematic analysis of the resulting landscapes. This not only makes the paper easy to follow, but also inspires confidence in their results since there is so much data and so many different ways of analyzing it. It's a great recipe for other studies of genotype-phenotype landscapes to follow.

      (3) Considering how technical the project was, I am really impressed at how easy to read I found the paper, and the authors deserve a lot of credit for making it so. They do a great job of building up the experiments and analyses step-by-step, and explaining enough of the basics of the experimental design and essence of each analysis in the main text without getting too complicated with details that can be left to the Methods or SI.

      Weaknesses:

      (1) Regarding the effect of measurement uncertainties, one way in which they attempt to test their effect is to simulate dynamics on noisy and noise-free versions of the landscape and measure visitation frequencies. While they show that visitation frequencies are highly correlated between these cases, I'd prefer a more direct test of epistasis or navigability (e..g, number of local peaks), since that's how they are characterizing the landscapes, and the connection between that and visitation frequency of individual states is unclear.

      (2) I am still a little concerned about the fraction of sequences missing from the data due to filtering, although I appreciate the difficulties in testing the importance of this (requiring additional assumptions) and the authors' good-faith efforts to do their best with the data they have.

    3. Reviewer #2 (Public review):

      The authors aim to investigate the ability of evolution to create strong transcription factor binding sites (TFBSs) de novo in E. coli. They focus on three global transcriptional regulators: CRP, Fis, and IHF, using a massively parallel reporter assay to evaluate the regulatory effects of over 30,000 TFBS variants. By analyzing the resulting genotype-phenotype landscapes, they explore the ruggedness, accessibility, and evolutionary dynamics of regulatory landscapes, providing insights into the evolutionary feasibility of strong gene regulation. Their experiments show that de novo adaptive evolution of new gene regulation is feasible. It is also subject to a blend of chance, historical contingency, and evolutionary biases that favor some peaks and evolutionary paths.

      (1) Strengths of the methods and results:

      The authors successfully employed a well-designed sort-seq assay combined with high-throughput sequencing to map regulatory landscapes. The experimental design ensures reliable measurement of regulation strengths. Their system accounts for gene expression noise and normalizes measurements using appropriate controls.

      Comprehensive Landscape Mapping:<br /> The study examines ~30,000 TFBS variants per transcription factor, providing statistically robust and thorough maps of the regulatory landscapes for CRP, Fis, and IHF. The landscapes are rigorously analyzed for ruggedness (e.g., number of peaks) and epistasis, revealing parallels with theoretical uncorrelated random landscapes.

      Evolutionary Dynamics Simulations:<br /> Through simulations of adaptive walks under varying population dynamics, the authors demonstrate that high peaks in regulatory landscapes are accessible despite ruggedness. They identify key evolutionary phenomena, such as contingency (multiple paths to peaks) and biases toward specific evolutionary outcomes.

      Biological Relevance and Novelty:<br /> The author's work is novel in focusing on global regulators, which differ from previously studied local regulators (e.g., TetR). They provide compelling evidence that rugged landscapes are navigable, facilitating de novo evolution of regulatory interactions. The comparison of landscapes for CRP, Fis, and IHF underscores shared topographical features, suggesting general principles of global transcriptional regulation in bacteria.

      (2) Weaknesses of the methods and results:

      Undersampling of Genotype Space:<br /> Approximately 40% of the theoretical TFBS genotype space remains uncharacterized after quality filtering. The authors now discuss this limitation more explicitly and provide analyses suggesting that undersampling does not strongly bias their conclusions at the landscape level. Nevertheless, predictive modeling approaches could further extend these landscapes in future work.

      Simplified Regulatory Architecture:<br /> The study considers a minimal system consisting of a single TFBS upstream of a reporter gene. While this simplification allows clean interpretation and high-throughput measurement, natural promoters often involve combinatorial regulation and chromosomal context effects that may alter landscape topography.

      Lack of Experimental Evolution Validation:<br /> The evolutionary conclusions are based on simulations rather than direct experimental evolution. The authors provide a reasonable justification for this choice and frame their conclusions at the statistical level rather than for specific trajectories, but experimental validation would be a valuable future extension.

      Impact on the Field:<br /> This study advances our understanding of adaptive landscapes in gene regulation and offers a critical step toward deciphering how global regulators evolve de novo binding sites. The findings provide foundational insights for synthetic biology, evolutionary genetics, and systems biology by highlighting the evolutionary accessibility of strong regulation in bacteria.

      Utility of Methods and Data:<br /> The sort-seq approach, combined with landscape analysis, provides a robust framework that can be extended to other transcription factors and systems. If made publicly available, the study's data and code would be valuable for researchers modeling transcriptional regulation or studying evolutionary dynamics.

      Additional Context:<br /> The study builds on a growing body of work exploring regulatory evolution. For instance, recent studies on local regulators like TetR and AraC have revealed high ruggedness and epistasis in TFBS landscapes. This study distinguishes itself by focusing on global regulators, which are more complex biologically and more influential in bacterial gene networks. The observed evolutionary contingency aligns with findings in other biological systems, such as protein evolution and RNA folding landscapes, underscoring the generality of these evolutionary principles.

      Conclusion:<br /> The authors successfully mapped the genotype-phenotype landscapes for three global regulators and simulated evolutionary dynamics to assess the feasibility of strong TFBS evolution. They convincingly demonstrate that ruggedness and epistasis, while prominent, do not preclude the evolution of strong regulation. Their results support the notion that gene regulation evolves through a blend of chance, contingency, and evolutionary biases.

      This paper makes a significant contribution to the understanding of regulatory evolution in bacteria. While minor limitations exist, the authors' methods are robust, and their findings are well-supported. The work will likely be of broad interest to researchers in molecular evolution, synthetic biology, and gene regulation.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The main weakness of this paper, in my view, is that it felt disconnected from the larger body of work on fitness and genotype-phenotype landscapes, including previous data on TFBSs in E. coli, genotype-phenotype maps of TFBSs in other systems, protein sequence landscapes (e.g., from mutational scans or combinatorially-complete libraries), and fitness landscapes of genomic mutations (e.g., combinatorially-complete landscapes of antibiotic resistance alleles). I have no doubt the authors are experts in this literature, and they probably cite most of it already given the enormous number of references. But they don't systematically introduce and summarize what was already known from all that work, and how their present study builds on it, in the Abstract and Introduction, which left me wondering for most of the paper why this project was necessary. Eventually, the authors do address most of these points, but not until the end, in the Discussion. Readers who have no familiarity with this literature might read this paper thinking that it's the first paper ever to study topography and evolutionary paths on genotype-phenotype landscapes, which is not true.

      There were two points that made this especially confusing for me. First, in order to choose which nucleotides in the binding sites to vary, the authors invoke existing data on the diversity of these sequences (position-weight matrices from RegulonDB). But since those PWMs can imply a genotype-phenotype map themselves, an obvious question I think the authors needed to have answered right away in the Introduction is why it is insufficient for their question. They only make a brief remark much later in the Results that the PWM data is just observed sequence diversity and doesn't directly reflect the regulation strength of every possible TFBS sequence. But that is too subtle in my opinion, and such a critical motivation for their study that it should be a major point in the Introduction.

      The second point where the lack of motivation in the Introduction created confusion for me was that they report enormous levels of sign epistasis in their data, to the point where these landscapes look like random uncorrelated landscapes. That was really surprising to me since it contrasts with other empirical landscape data I'm familiar with. It was only in the Discussion that I found some significant explanation of this - namely that this could be a difference between prokaryotic TFBSs, as this paper studies, and the eukaryotic TFBSs that have been the focus of many (almost all?) previous work. If that is in fact the case - that almost all previous studies have focused on eukaryotic TFBSs or other kinds of landscapes, and this is the first to do a systematic test of prokaryotic TFBS, then that should be a clear point made in the Abstract and Introduction. (I find a comparable statement only in the very last paragraph of the Discussion.) If that's the case, then I would also find that point to be a much stronger, more specific conclusion of this paper to emphasize than the more general result of observing epistasis and contingency (as is currently emphasized in the Abstract), which has been discussed in tons of other papers. This raises all sorts of exciting questions for future studies - why do the landscapes of prokaryotic TFBSs differ so dramatically from almost all the other landscapes we've observed in biology? What does that mean for the evolutionary dynamics of these different systems?

      We thank the reviewer for this thoughtful and detailed critique. We agree that the original version of the manuscript did not sufficiently motivate the study early on, nor did it clearly position our work within the broader literature on genotype–phenotype (GP) and fitness landscapes. We also agree that two specific issues, the role of PWMs and the unexpectedly high levels of sign epistasis, were insufficiently explained early on, which could lead to confusion for readers not already familiar with this field.

      Positioning within the broader landscape literature

      In response, we have substantially revised the Abstract and Introduction to explicitly situate our work within existing empirical studies of GP and fitness landscapes, including TFBS landscapes in bacteria, eukaryotic TFBS genotype–phenotype maps, in vitro TF–DNA binding studies, deep mutational scans of proteins, and combinatorially complete fitness landscapes such as antibiotic resistance alleles (Abstract; Introduction, lines 64–85). We now make clear that our study builds directly on this extensive body of work, rather than introducing the landscape framework itself. For example, we write in the introduction:

      “Over the last decade, genotype–phenotype (GP) maps and fitness landscapes have become central tools for understanding how molecular systems evolve under mutation and selection[22–25]. Such maps and landscapes have been experimentally studied for DNA[6,8,18,19,26,27], protein[28–32] and RNA[33–35] molecules, revealing key topographical properties that shape evolutionary outcomes, including epistasis[24,36]—the non-additive effects of multiple mutations on phenotype—landscape ruggedness, reflected in the number and distribution of fitness peaks, and constraints on adaptive evolution.”

      At the same time, we clarify what remains rare in the literature: large-scale, in vivo genotype–phenotype landscapes for bacterial transcription factor binding sites that are sufficiently dense to support explicit evolutionary analyses. While numerous high-throughput studies have characterized bacterial regulatory elements, these datasets typically do not provide quantitative regulatory phenotypes across large genotype spaces, nor do they analyze evolutionary accessibility. To our knowledge, only one such in vivo TFBS landscape had previously been characterized at comparable resolution for a bacterial local regulator (TetR). Our work extends this approach to three global regulators, enabling systematic comparisons across prokaryotic systems (Abstract, Introduction, lines 64–85). For example, we write in the introduction:

      “For transcription factor binding sites, most pertinent large-scale studies are based on in vitro binding assays, such as protein-binding microarrays (PBMs), and they focus predominantly on eukaryotic transcription factors[6]. While these studies have been instrumental in characterizing transcription factor binding preferences, they typically do not measure regulatory output in a native cellular context. In contrast, comprehensive in vivo data for bacterial TFBSs remain extremely rare. To our knowledge, only two high-resolutionin vivo landscapes have been previously mapped for bacterial regulators, those of the local regulators TetR[18] and LacI[27]. As a result, it remains unclear whether principles inferred from protein landscapes, eukaryotic TFBSs, or in vitro binding assays generalize to transcriptional regulation in bacteria, particularly for global regulators[11] that integrate multiple physiological signals.”

      Why PWMs are insufficient for our question.

      We agree with the reviewer that our original explanation of the role of PWMs was too cursory and should have been addressed explicitly in the Introduction. We have now revised the Introduction to clearly explain why PWMs derived from RegulonDB cannot substitute for empirical GP landscapes in our study (Introduction, lines 102–113).

      In this passage we now explain that, first, PWMs are inferred from a limited number of naturally occurring binding sites—typically on the order of hundreds of sequences—whose diversity reflects evolutionary history and genomic context rather than systematic exploration of sequence space. As a result, PWMs sample only a small and biased subset of the possible TFBS variants, whereas our libraries probe tens of thousands of sequences in a controlled manner, providing substantially broader and more uniform coverage of genotype space (Introduction, lines 102–113).

      Second, PWM scores are not direct measurements of regulatory strength. Instead, they represent probabilistic or heuristic scores that are primarily used for identifying candidate binding sites in genomes. Numerous studies have shown that PWM scores often correlate weakly with in vivo binding affinity or regulatory output, where DNA shape, cooperative interactions, and chromosomal context play important roles. As such, PWMs do not provide quantitative genotype–phenotype relationships for regulation strength (Introduction, lines 102–113).

      Third, PWMs assume independent and additive contributions of individual nucleotide positions. This assumption excludes epistatic interactions by construction. Because epistasis is central to landscape ruggedness, peak structure, and evolutionary accessibility, PWM-based models are fundamentally unsuited to address the evolutionary questions we study here (Introduction, lines 102–113). We now explicitly state this limitation early in the manuscript, rather than only alluding to it later in the Results.

      Sign epistasis and contrast with prior TFBS landscapes.

      We also agree with the reviewer that the extensive sign epistasis we observe—approaching levels expected for uncorrelated random landscapes—is surprising in light of much of the existing empirical landscape literature. Importantly, as the reviewer notes, most previous TFBS landscape studies have focused on in vitro binding systems or on eukaryotic transcription factors, which tend to exhibit smoother and more additive landscapes.

      To address this concern, we have revised the Abstract and Introduction to explicitly frame this contrast as a central result of the study (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “We showed that the regulatory landscapes of all three TFs are highly rugged and have multiple peaks. The ruggedness of all three landscapes is also supported by the prevalence of epistasis between pairs of TFBS mutations (Supplementary Table S5). A particularly important form of epistasis is sign epistasis[24,93,94], because it can lead to multiple adaptive peaks [24,93,94] (see Supplementary Methods 7.5). Our landscapes contain up to 65% of mutation pairs with sign epistasis, a value that is especially high compared to the almost exclusively additive interactions of mutations in eukaryotic TFs[6,125].”

      We now emphasize that prokaryotic TFBS landscapes, particularly for global regulators, appear to be substantially more rugged and epistatic than most previously characterized TFBS landscapes, and that this difference likely reflects fundamental biological distinctions between regulatory systems.

      Revised emphasis and conclusions.

      Following the reviewer’s suggestion, we have adjusted the emphasis of the manuscript accordingly. Rather than highlighting epistasis and contingency as generic evolutionary phenomena, we now present the extreme ruggedness of prokaryotic TFBS landscapes as a system-specific finding with important implications for the evolution of gene regulation. We explicitly note that this raises new questions for future work—such as why prokaryotic regulatory landscapes differ so markedly from eukaryotic ones, and how these differences shape evolutionary dynamics—which we now highlight in the Introduction and Discussion (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “… A possible reason for this greater incidence of epistasis lies in the nature of prokaryotic TFBSs. Specifically, prokaryotic TFBSs are at approximately 20bps twice as long as eukaryotic TFBSs[80,128] and exhibit symmetries that reflect the dimeric state of their cognate TFs[129–131]. These factors may increase the likelihood of intramolecular epistasis. Our observations raise important questions for future work, such as why the landscapes of prokaryotic TFBSs differ so dramatically from those of eukaryotic ones. And what do these differences imply for the evolutionary dynamics of gene regulation?”

      We believe that these revisions substantially improve the clarity, motivation, and positioning of the manuscript, and directly address the reviewer’s concerns by making both the necessity and the novelty of the study clear from the outset.

      (2) I am a bit concerned about the lack of uncertainties incorporated into the results. The authors acknowledge several key limitations of their approach, including the discreteness of the sort-seq bins in determining possible values of regulation strength, the existence of a large number of unsampled sequences in their genotype space, as well as measurement noise in the fluorescence readouts and sequencing. While the authors acknowledge the existence of these factors, I do not see much attempt to actually incorporate the effect of these uncertainties into their conclusions, which I suspect may be important. For example, given the bin size for the fluorescence in sort-seq, how confident are they that every sequence that appears to be a peak is actually a peak? Is it possible that many of the peak sequences have regulation strengths above all their neighbors but within the uncertainty of the fluorescence, making it possible that it's not really a peak? Perhaps such issues would average out and not change the statistical nature of their results, which are not about claiming that specific sequences are peaks, just how many peaks there are. Nevertheless, I think the lack of this robustness analysis makes the results less convincing than they otherwise would be.

      We thank the reviewer for raising this important concern. We fully agree that uncertainties arising from experimental resolution, measurement noise in fluorescence and sequencing, and incomplete sampling of genotype space should be incorporated explicitly into the analysis. While these limitations were acknowledged qualitatively in the original manuscript, we recognize that a direct, quantitative assessment of their impact on our conclusions is essential to strengthen the robustness of the study.

      We first clarify that regulation strength is not discretized in our analysis. For each TFBS, regulation strength is calculated as a continuous weighted average of fluorescence across all sorting bins, based on the sequencing read-count distribution of each sequence across bins. We clarified this information in the main text (Results, lines 201-203). Nevertheless, finite binning resolution and experimental noise introduce uncertainty in these estimates, which could in principle affect the identification of local peaks.

      Importantly, our study does not aim to assert that specific TFBS sequences are definitively peaks. Rather, our focus is on landscape-level statistical and topological properties—such as ruggedness, the abundance and distribution of peaks, and the evolutionary accessibility of strong regulation. We therefore centered our new analyses on testing whether these conclusions are robust to experimentally plausible sources of uncertainty, rather than on the identity of individual peaks.

      To address the reviewer’s concern, we performed two complementary analyses. The first evaluates whether the observed ruggedness of the landscapes could arise as an artifact of incomplete sampling. It addressed the effects of missing genotypes and the possibility of spurious peak identification due to unsampled neighbors. Sparse sampling can introduce opposing biases: true peaks may be missed, while other genotypes may be falsely classified as peaks because fitter neighbors are absent. As shown for uncorrelated random (House-of-Cards) landscapes (Kauffman & Levin, 1987), these effects can partially cancel.

      In this analysis, we constructed a null model by randomly permuting regulation strengths across the mapped genotype network while preserving its topology. The number of peaks in these randomized landscapes is only modestly higher than in the empirical data, indicating that the measured landscapes are close to the maximal ruggedness compatible with the sampled network (Results, lines 308–320).

      In addition, we quantified potential sampling bias by analyzing genotype connectivity. Here we defined the relative connectivity of a genotype as the fraction of possible single-mutant neighbors for which we had measured regulation strength. We observed only a very weak correlation between connectivity and regulation strength (R=-0.1, -0.1, 0.01 for the CRP, Fis, and IHF landscapes, Figures S13-S15). Similarly, the relative connectivity of peak genotypes is only weakly correlated with their regulation strength (R=-0.05, -0.04, 0.06 for the CRP, Fis, and IHF landscapes). (Results, lines 321–330), indicating that strongly regulating genotypes are not preferentially oversampled or undersampled (Results, lines 321–330).

      The second, and most important, analysis directly addresses the reviewer’s concern that experimental uncertainty could affect peak classification and, consequently, landscape navigability. We explicitly incorporated experimentally measured, genotype-specific noise estimates from biological replicates when comparing fitness values between neighboring genotypes. Using these uncertainty-aware comparisons, we then recomputed adaptive-walk dynamics and genotype visitation frequencies on the resulting noisy landscapes.

      We observe strong correlations between visitation frequencies in the noise-free and noisy landscapes across all three transcription factors (new Supplementary Figure S35), indicating that evolutionary accessibility patterns are robust to realistic levels of experimental uncertainty. These analyses are described in the revised Results (lines 622–636) and in a new Supplementary Methods section (“Incorporation of experimental uncertainty into adaptive walks”).

      Reviewer #2 (Public review):

      The authors aim to investigate the ability of evolution to create strong transcription factor binding sites (TFBSs) de novo in E. coli. They focus on three global transcriptional regulators: CRP, Fis, and IHF, using a massively parallel reporter assay to evaluate the regulatory effects of over 30,000 TFBS variants. By analyzing the resulting genotype-phenotype landscapes, they explore the ruggedness, accessibility, and evolutionary dynamics of regulatory landscapes, providing insights into the evolutionary feasibility of strong gene regulation. Their experiments show that de novo adaptive evolution of new gene regulation is feasible. It is also subject to a blend of chance, historical contingency, and evolutionary biases that favor some peaks and evolutionary paths.

      (1) Strengths of the methods and results:

      The authors successfully employed a well-designed sort-seq assay combined with high-throughput sequencing to map regulatory landscapes. The experimental design ensures reliable measurement of regulation strengths. Their system accounts for gene expression noise and normalizes measurements using appropriate controls.

      Comprehensive Landscape Mapping:

      The study examines ~30,000 TFBS variants per transcription factor, providing statistically robust and thorough maps of the regulatory landscapes for CRP, Fis, and IHF. The landscapes are rigorously analyzed for ruggedness (e.g., number of peaks) and epistasis, revealing parallels with theoretical uncorrelated random landscapes.

      Evolutionary Dynamics Simulations:

      Through simulations of adaptive walks under varying population dynamics, the authors demonstrate that high peaks in regulatory landscapes are accessible despite ruggedness. They identify key evolutionary phenomena, such as contingency (multiple paths to peaks) and biases toward specific evolutionary outcomes.

      Biological Relevance and Novelty:

      The author's work is novel in focusing on global regulators, which differ from previously studied local regulators (e.g., TetR). They provide compelling evidence that rugged landscapes are navigable, facilitating de novo evolution of regulatory interactions. The comparison of landscapes for CRP, Fis, and IHF underscores shared topographical features, suggesting general principles of global transcriptional regulation in bacteria.

      (2) Weaknesses of the methods and results:

      Undersampling of Genotype Space:

      While the quality filtering of the data ensures robustness, ~40% of the TFBS space remains uncharacterized. The authors acknowledge this limitation but could improve the analysis by employing subsampling or predictive modeling.

      We thank the reviewer for raising this point. We agree that undersampling of genotype space is an important limitation of our dataset and that, in principle, subsampling or predictive modeling approaches could be used to address missing genotypes. We have now clarified in the manuscript why these approaches are not straightforward in the context of our analyses and why we did not pursue them here.

      Although approximately 40% of TFBS genotypes were removed during the filtering step due to lack of reliable measurements, this filtering step was necessary to ensure robust estimation of regulation strength from sort-seq data. Importantly, random subsampling of the genotypes in our data set would not alleviate this limitation, because many of our key analyses—such as peak identification, quantification of epistasis, and assessment of evolutionary accessibility—require combinatorially complete local neighborhoods in genotype space. Subsampling would remove mutational neighbors from many neighborhoods, and thus further limit our ability to characterize landscape topology.

      Predictive modeling approaches could, in principle, be used to infer missing genotypes and reconstruct more complete landscapes. However, developing, experimentally validating, and benchmarking such models would not only substantially expand the scope of an already long paper, it would  also require additional assumptions about genotype–phenotype relationships that entail their own limitations. Our primary goal in this work was to provide the first large-scale empirical in vivo regulatory landscapes for global bacterial transcription factors, comprising tens of thousands of experimentally measured variants. We view these empirical landscapes as a necessary foundation upon which predictive modeling and landscape completion can be built in future, complementary studies.

      We have now revised the Discussion (lines 760-770) to explicitly articulate these points and to clarify that, while undersampling remains a limitation, it does not invalidate the landscape-level conclusions we draw from the combinatorially complete neighborhoods present in our data. There we also outline predictive modeling as an important directions for future work.

      For a more detailed answer regarding subsampling and peak classification, please also see our response to comment (2) of Reviewer #1.

      Simplified Regulatory Architecture:

      The study considers a minimal system of a single TFBS upstream of a reporter gene. While this may have been necessary for clarity, this simplification may not reflect the combinatorial complexity of transcriptional regulation in vivo.

      Point well taken. We have added paragraph to state explicitly that the system we use to study gene regulation is much simpler than most in vivo regulatory circuits (Discussion, lines 797-802)

      Lack of Experimental Validation of Simulations:

      The adaptive walks are based on simulated dynamics rather than experimental evolution. Incorporating in vivo experimental evolution studies would strengthen the conclusions. Although this is a large request for the paper, that would not prevent publication.

      We thank the reviewer for this important point. We fully agree that in vivo experimental evolution would provide a valuable and complementary way to validate the evolutionary dynamics inferred from our simulations. However, we ask for the reviewer's understanding that adding experimental evolution to an (already long) paper would go far beyond the scope of our study.

      Also, the goal of our study was not to reproduce evolutionary trajectories experimentally, but to characterize the structure of large empirical regulatory landscapes, and to use these landscapes as a data-driven basis for exploring evolutionary accessibility under well-defined population-genetic assumptions. The adaptive walks we employ are parameterized directly from experimentally measured genotype–phenotype maps, and incorporate established fixation probabilities. Such walks have been widely used to study evolutionary dynamics on empirical landscapes when experimental evolution is not tractable, because it would involve tens of thousands of genotypes that represent small mutational targets and would thus take a long time to evolve.

      An additional issue related to the feasibility of experimental evolution is that performing in vivo experimental evolution for the regulatory landscapes analyzed here would require tracking large populations across a combinatorially vast TFBS space, while simultaneously measuring regulatory phenotypes for thousands of evolving lineages, which is currently not experimentally feasible. This is another reason why simulation-based approaches have been the standard method for linking large-scale empirical landscapes to evolutionary dynamics in both theoretical and experimental studies.

      Furthermore, our conclusions are intentionally framed at the level of statistical and landscape-wide properties (e.g., accessibility of high peaks, contingency, and evolutionary bias), rather than at the level of specific mutational trajectories. As such, they do not rely on the precise reproduction of any single evolutionary path, but on aggregate patterns that are robust to reasonable variation in population-genetic parameters.

      In sum, we do not view experimental evolution as essential for the conclusions we draw, but as an important and exciting direction for future work that may be enabled by the landscapes we have experimentally mapped.

      Impact on the Field:

      This study advances our understanding of adaptive landscapes in gene regulation and offers a critical step toward deciphering how global regulators evolve de novo binding sites. The findings provide foundational insights for synthetic biology, evolutionary genetics, and systems biology by highlighting the evolutionary accessibility of strong regulation in bacteria.

      Utility of Methods and Dat

      The sort-seq approach, combined with landscape analysis, provides a robust framework that can be extended to other transcription factors and systems. If made publicly available, the study's data and code would be valuable for researchers modeling transcriptional regulation or studying evolutionary dynamics.

      Additional Context:

      The study builds on a growing body of work exploring regulatory evolution. For instance, recent studies on local regulators like TetR and AraC have revealed high ruggedness and epistasis in TFBS landscapes. This study distinguishes itself by focusing on global regulators, which are more biologically complex and influential in bacterial gene networks. The observed evolutionary contingency aligns with findings in other biological systems, such as protein evolution and RNA folding landscapes, underscoring the generality of these evolutionary principles.

      Conclusion:

      The authors successfully mapped the genotype-phenotype landscapes for three global regulators and simulated evolutionary dynamics to assess the feasibility of strong TFBS evolution. They convincingly demonstrate that ruggedness and epistasis, while prominent, do not preclude the evolution of strong regulation. Their results support the notion that gene regulation evolves through a blend of chance, contingency, and evolutionary biases.

      This paper makes a significant contribution to the understanding of regulatory evolution in bacteria. While minor limitations exist, the authors' methods are robust, and their findings are well-supported. The work will likely be of broad interest to researchers in molecular evolution, synthetic biology, and gene regulation.

      We thank the reviewer for their thorough evaluation and for their supportive opinion of this paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28 (Abstract): "Landscape ruggedness does not prevent the evolution of strong regulation, because more than 10% of evolving populations can attain one of the highest peaks." I did not find this interpretation very convincing; only 10% of populations being able to achieve strong regulation sounds to me like ruggedness DOES impede adaptation in the vast majority of cases.

      We thank the reviewer for this thoughtful comment and agree that our original phrasing in the Abstract overstated this conclusion. We did not intend to imply that landscape ruggedness has only a minor effect on adaptation. On the contrary, our results clearly show that ruggedness strongly constrains evolutionary outcomes and prevents the majority of evolving populations from reaching the globally highest regulatory peaks. We have therefore toned down the wording in both the Abstract and the Discussion (lines 670-679) to reflect this more accurately. For example, in the abstract we now state

      “Nonetheless, evolutionary simulations show that ~10% of evolving populations can reach a peak of strong regulation, a proportion that is significantly greater than in comparable random landscapes.”

      In the discussion we state:

      “… Specifically, our evolutionary simulations show that 10% of populations with a size typical of E. coli reach one of the highest peaks. This percentage is significantly higher than in randomized landscapes (Supplementary Methods 9; Supplementary Figure S30)"

      Our intended interpretation was more limited: namely, that ruggedness does not fully preclude the evolution of strong regulation. In highly rugged landscapes with extensive sign epistasis—whose topological properties approach those of uncorrelated random landscapes—the a priori expectation is that access to the strongest peaks could be vanishingly rare or effectively impossible under Darwinian evolution. In this context, observing that a non-negligible fraction of populations (on the order of 10%) can reach one of the highest peaks suggests that strong regulation remains evolutionarily attainable, even though it is far from guaranteed.

      Motivated by the reviewer’s suggestion, we also added a null-model analysis that makes this point more explicitly and quantitatively. Specifically, we constructed randomized landscapes by permuting regulation-strength values across genotypes while preserving the experimentally sampled genotype network topology and all parameters of the evolutionary simulations (Supplementary Methods 9, “Randomized landscape null model for peak accessibility”). We then repeated the adaptive-walk simulations on these shuffled landscapes. This null model provides an expectation for peak accessibility in landscapes with identical sampling, neighborhood structure, and evolutionary dynamics, but without genotype–phenotype correlations.

      Using this null model, we find that the fraction of populations that reach high peaks in the empirical landscapes is substantially higher than expected by chance alone (new Supplementary Figure S30; Results, lines 504–516). Specifically, across the three transcription factors, empirical landscapes exhibit on average a ~3-fold higher accessibility of high regulatory peaks than shuffled landscapes. This comparison does not weaken the conclusion that ruggedness strongly impedes adaptation; rather, it shows that the structure of the measured genotype–phenotype landscapes enables greater accessibility of strong regulation than would be expected in equally rugged but unstructured landscapes.

      In response to the reviewer’s concern, we have revised the abstract and main text to avoid the phrase “does not prevent” and to more accurately convey this balance between constraint and accessibility. We now emphasize that ruggedness strongly constrains adaptation, while still allowing access to strong regulatory peaks at rates that exceed null expectations. (Discussion, lines 512-516). For example, in the discussion we state:

      “… In sum, rugged regulatory landscapes strongly constrain evolutionary trajectories, yet do not render the evolution of strong regulation vanishingly rare. Instead, strong regulatory phenotypes remain evolutionarily attainable at levels that exceed null expectations, even though they are reached by only a minority of evolving populations.”

      We believe that the revised wording, together with the added null-model analysis more faithfully represents our results and strengthens the quantitative interpretation of accessibility in these landscapes.

      (2) Line 123: I found the explanation of the plasmid system and the accompanying SI figures (Figures S1 and S2) confusing in terms of how many plasmids there were. In particular, the Figure S1 graphics show the plasmid specifically with CRP but the text in the graphic and in the caption refers to the plasmid pCAW-Sort-Seq-V2 (which, according to Table S1, isn't that just the base plasmid without any TF?). Figure S2 also shows the plasmid with CRP and does specify pCAW-Sort-Seq-V2-CRP-CRP0 in the graphic, but then the caption refers again only to the base plasmid pCAW-Sort-Seq-V2. I recommend the authors clarify these items for readers who might want to reproduce or build upon their system. In particular, I recommend the main text explain more explicitly that they generate three versions of this plasmid (one for each TF), and then on the backgrounds of each of those three plasmids, a whole library with all the binding site variants.

      We thank the reviewer for pointing out this lack of clarity. We agree that the original description of the plasmid system and the accompanying Supplementary Figures S1 and S2 could be confusing with respect to how many plasmids were used and how they differ.

      To clarify the experimental design, we start from a common backbone plasmid, pCAW-Sort-Seq-V2, which contains all shared regulatory and reporter elements but does not encode any transcription factor. From this backbone, we generated three distinct TF-specific plasmids, each carrying one of the transcription factors studied here—CRP, Fis, or IHF—resulting in pCAW-Sort-Seq-V2-CRP, pCAW-Sort-Seq-V2-Fis, and pCAW-Sort-Seq-V2-IHF. On the background of each TF-specific plasmid, we then constructed a complete library of plasmids containing all variants of the corresponding TF binding site cloned upstream of the reporter gene.

      We have revised the main text to explicitly describe this plasmid hierarchy and library construction strategy and to clarify that three TF-specific plasmids were generated prior to TFBS library construction (Results, Landscape mapping section; lines 159–193). In addition, we have redesigned Supplementary Figures S1 and S2 to facilitate understanding of the plasmid system. Specifically, these figures now clearly distinguish between the base plasmid backbone and the TF-specific plasmid derivatives. Also, the plasmid names shown in the graphics and captions are now consistent with those listed in Supplementary Table S1. Upon final publication, we will also deposit the sequences of all plasmids in Addgene to further facilitate reproducibility.

      (3) Line 135: Can the authors clarify whether these TFs are essential in these media conditions and, if not, why? I was expecting them to be so given the core functions of these TFs as described in the Introduction, but then Figure S3 appears to show that all knockouts are viable.

      We thank the reviewer for raising this important point and apologize for the lack of clarity in the original version of the manuscript. The transcription factors CRP, Fis, and IHF are not essential for viability under the growth conditions used in this study, but they are important for optimal growth and cellular fitness, consistent with their roles as global regulators.

      Under our experimental conditions, single-gene knockout strains (Δcrp, Δfis, and Δihf) are viable but exhibit slower growth dynamics compared to the wild-type strain, reflecting impaired regulation of core cellular processes (Supplementary Figure S3). This behavior is consistent with previous work showing that many global transcriptional regulators in E. coli are conditionally essential or strongly fitness-affecting, rather than absolutely essential under standard laboratory conditions.

      Importantly, while single knockouts remain viable, double mutants involving these global regulators are not viable, indicating substantial functional redundancy and network-level essentiality among global transcription factors. This explains why each TF can be studied individually in isolation, while combinations of deletions cannot be maintained.

      We have now clarified this point in the Results section by explicitly stating that the knockout strains show reduced growth rates but reach comparable cell densities during late exponential or early stationary phase, the growth phase at which all measurements were performed (Results, Landscape mapping section; lines 185–193). This clarification reconciles the apparent discrepancy between the biological importance of these transcription factors discussed in the Introduction and the viability of the single-knockout strains shown in Supplementary Figure S3.

      (4) Lines 141 and 227: The authors appear to refer to two different citations for different versions of RegulonDB (refs. 47 and 66). Did they actually use both versions for different purposes (if so, why?), or is this a typo?

      We thank the reviewer for noticing this inconsistency. We did not use two different versions of RegulonDB. The two separate references were an error. We have now corrected this by using a single, consistent RegulonDB citation in both locations.

      (5) Line 166 (Figure 1 caption): I think 2^8 here should be 4^8.

      Thank you. We have corrected “2<sup>8</sup>” to “4<sup>8</sup>” in the Figure 1 caption.

      (6) Figure 2Are the distributions in Figure 2a (regulation strengths across all TFBSs in the libraries) equivalent to the distributions in Figures S4-S6 (direct fluorescence readout from cell sorting), just transformed from fluorescence to regulation strength? If so I think that would be helpful to clarify, perhaps in the captions to Figures S4-S6 so that it's clear these contain the same information.

      No. Figures S4–S6 and Figure 2a do not show the same distributions. Figures S4–S6 display the raw fluorescence distributions obtained from cell sorting, whereas Figure 2a shows regulation strengths (S), which are derived quantities computed from these fluorescence data. Specifically, regulation strength is calculated as a weighted average over fluorescence bins using the sequencing read distribution for each TFBS (see Methods, “Regulation strengths”).

      To clarify this relationship, we have revised the main text (lines 201-203 and Figure 1b-c), to explicitly state how regulation strengths (S) were calculated.

      (7) Figure 2b: Can the authors label each logo/frequency matrix with its corresponding TF name in the graphic itself? I think this is only implied in the caption.

      We have updated Figure 2b to label each sequence logo / frequency matrix directly in the graphic with its corresponding transcription factor name (CRP, Fis, or IHF), in addition to mentioning these names in the caption. This change clarifies the figure and makes the TF identity immediately apparent to the reader.

      (8) Lines 290 and 298 (Figure 2 caption): The labels for panels b and c appear to be swapped in the caption.

      We thank the reviewer for pointing this out. The labels for panels b and c in the Figure 2 caption were indeed swapped. This has now been corrected.

      (9) Line 379: There is a missing period at the end of this line.

      We have added the missing period at the end of this line.

      (10) Line 400 (Figure 3 caption): There is a missing subtitle for panel c in the caption for this figure (all other panels seem to have bolded subtitles in their captions).

      We have added the missing subtitle for panel c in the Figure 3 caption to match the formatting of the other panels.

      (11) Line 583: There is a missing period after "Methods 7.5)".

      We have added the missing period after “Methods 7.5)”.

      (12) Line 641: "All three landscapes highly rugged" should probably be "All three landscapes are highly rugged".

      We have corrected the sentence to read “All three landscapes are highly rugged.”

    1. eLife Assessment

      Findings from this study are considered fundamental because they identify amino acid uptake, cholesterol synthesis, and protein prenylation as key metabolic regulators of B cell activation, proliferation, and survival, advancing understanding of T-independent immune responses. The study links metabolic reprogramming directly to B cell function, highlighting how cellular metabolism supports immune fitness. The evidence is compelling, combining unbiased proteomic profiling with genetic and pharmacological validation to demonstrate causal roles for these pathways.

    2. Reviewer #1 (Public review):

      The work presented by Cheung et al. used a quantitative proteomics method to capture molecular changes in B cells exposed to LPS and IL-4, a combination of stimuli activating naive B cells. Amino acid transporters, cholesterol biosynthetic enzymes, ribosomal components, and other proteins involved in cell proliferation were found to increase in stimulated B cells. Experiments involving genetic loss-of-function (SLC7A5), pharmacological inhibition (HMGCR, SQLE, prenylation), and functional rescue by metabolites (mevalonate, GGPP) validated the proteomics data and revealed that amino acid uptake, cholesterol/mevalonate biosynthesis, and cholesterol uptake played a crucial role in B cell proliferation, survival, biogenesis, and immunoglobulin class switching. Experiments involving cholesterol-free medium showed that both biosynthesis and LDLR-mediated uptake catered to the cholesterol demand of LPS/IL-4-stimulated B cells. A role for protein prenylation in LDLR-mediated cholesterol uptake was postulated and backed by divergent effects of GGPP rescue in the presence and absence of cholesterol in culture medium.

      Strengths:

      The discovery was made by proteome-wide profiling and unbiased computational analysis. The discovered proteins were functionally validated using appropriate tools and approaches. The metabolic processes identified and prioritized from this comprehensive survey and systematic validation highly likely represent mechanisms of high importance and influence. Analysis of immune cell metabolism at the protein level is relatively compared to transcriptomic and metabolomic analysis.

      The conclusions from functional validation experiments were supported by clear data and based on rational interpretations. This was enabled by well-established readouts/analytical methods used to determine cell proliferation, viability, size, cholesterol content, and transporter/enzyme function. The data generated from these experiments strongly support the conclusions.

      This work reveals a complex, yet intriguing, relationship between cholesterol metabolism and protein prenylation as they serve to promote B cell activation. The effects of pharmacological inhibition and metabolite replenishment on the cholesterol content and activation of B cells were determined and logically interpreted.

      Weaknesses:

      The findings of this study were obtained almost exclusively from ex vivo B cell stimulation experiments. Their contribution to B cell state and B cell-mediated immune responses in vivo was not explored. Without in vivo data, the study still provides valuable mechanistic information and insights, but it remains unknown, and there is no discussion about, how the identified mechanisms may play out in B cell immunity.

      The role of HMGCR, SQLE, and prenylation in B cell activation was assessed using pharmacological inhibitors. Evidence from other loss-of-function approaches, which could strengthen the conclusions, does not exist. This is a moderate weakness and somewhat offset by other data, including those obtained from the tests involving multiple distinct pharmacological inhibitors and the metabolite replenishment experiments.

    3. Reviewer #2 (Public review):

      This study uses mass spectrometry to quantify how LPS + IL-4 modify the mouse B cell proteome as naïve cells undergo blastogenesis and enter the cell cycle. This analysis revealed changes in key proteins involved in amino acid transport and cholesterol biosynthesis. Genetic and pharmacological experiments indicated important roles for these metabolic processes in B cell proliferation.

      This work provides new information about the regulation of TI B cell responses by changes in cell metabolism and also a comprehensive mass spectrometry dataset which will be an important general resource for future studies. The experiments are thorough and carefully carried out. The majority of conclusions are backed up by data that is shown to be highly significant statistically. The comprehensive mass spectrometry dataset will be an important general resource for future studies.

      After revision, the study now includes new data showing that the up regulation of amino acid uptake and cholesterol metabolism is not restricted to LPS + IL-4 (TLR4 + IL4R) stimulation but is also observed after stimulation of TLR7, TLR9, CD40 and the BCR. This increases the impact of this work and shows that this metabolic rewiring is a common feature of B cell activation. The inclusion of inhibitor data showing important roles for MTOR and ERK/p38a MAP kinases in the metabolic changes identified and provides preliminary insights into the mechanisms involved.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We agree with the reviewer that a limitation of our study is its focus on cell-based assays rather than in vivo experiments. We did consider evaluating the effects of statins on B cell responses in vivo; however, this approach is complicated by findings that statins can influence antigen presentation by dendritic cells, thereby impacting antibody responses (Xia et al, 2018). We have revised the discussion section to acknowledge this points.

      The reviewer also noted that our study assessed the roles of HMGCR, SQLE, and prenylation in B cell activation using pharmacological inhibitors and genetic knockdown/out approaches. Loss-of-function techniques such as RNAi, siRNA, and CRISPR can be challenging to apply to primary B cells, but we are exploring their feasibility for future revisions. While we acknowledge the limitations of using pharmacological inhibitors, we have taken several steps to mitigate these, including targeting multiple steps in the cholesterol biosynthetic pathway using structurally distinct inhibitors and conducting rescue experiments by supplementing downstream metabolites. To strengthen the results on prenylation further, we have added data using two further distinct prenylation inhibitors (revised Figure 6). To further investigate potential off-target effects of statins, we performed proteomic analysis of B cells treated with and without fluvastatin. The data suggest that fluvastatin primarily affects cholesterol metabolism and does not cause widespread off-target effects (new Supplementary Figure 9).

      Reviewer #1 (Recommendations for the authors):

      What signalling mechanisms link LPS sensing to proteomic and metabolic changes? Do these changes depend on specific signalling modules downstream of TLR4 (e.g., MyD88, TRIF, NF-kappaB, MAPKs)? Other receptors found to produce similar effects (TLR7, TLR9, CD40) may share these modules. This information could strengthen the conclusion by showing the chain of molecular events through which immune stimuli reprogram B cell metabolism.

      Signalling through most TLRs, including TLR4, TLR7 and TLR9, requires the adaptor protein MyD88. To determine if MyD88 is required for LPS-induced signalling, we carried out immunoblotting to compare signalling in B cells between WT mice and MyD88-deficient mice. We found that phosphorylation of key downstream proteins, including p38 and ERK1/2 (MAPK signalling), Akt, p70S6K and S6 (mTOR signalling) was diminished in MyD88-deficient mice (Figure S11). These results have been added to the manuscript as Supplementary Figure 11.

      We assessed the requirement of these signalling pathways for LPS-induced proliferation by treating B cells with rapamycin to block mTORC1, PD184352 for MEK1/MEK2 (the upstream activators of ERK1/2), VX745 for p38 or a combination of PD184352 and VX745. These results have been added to the manuscript as the new Figure 9. Rapamycin demonstrated the strongest inhibitory effect on proliferation, and combinatorial blocking of MAPK signalling mildly reduced proliferation (Figure 9A-B). In terms of cholesterol metabolism, treatment with all of these inhibitors reduced cholesterol levels; however, treatment with PD184352 and VX745 reduced cholesterol to the same level as naïve B cells (Figure 9F).

      Other activating stimuli appear to have similar effects, we showed originally that TLR7 and TLR9 activation had a similar effect on proliferation and cholesterol to TLR4, as did activation of CD40 and the BCR (Figure 10). We have now expanded this and shown that these other receptors can also promote protein synthesis (new Supplementary Figure 4).

      There seem to be errors in the manuscript text.

      (1) Page 6, line 232: ssRNAseq?

      We that the reviewer for spotting these issues. This has been amended to scRNAseq.

      (2) Page 13, line 490: SC7A5?

      This has been amended to SLC7A5

      (3) The abbreviation CF (cholesterol-free?) is not defined when it first appears.

      This has been amended to cholesterol-free (CF) on page 9, line 411.

      Reviewer #2 (Public review):

      The reviewer suggested that the study would be strengthened by determining whether the observed changes are specific to LPS + IL-4 stimulation or represent a more general B cell response to mitogenic signals. We believe that these effects are not specific to LPS and also occur with other mitogenic stimuli. We have expanded on the data in the original draft showing that other TLR agonists as well as CD40 and BCR stimulation increase both B cell proliferation and cholesterol levels and also looked at the effects of these stimuli on protein synthesis.

      Reviewer #2 (Recommendations for the authors):

      (1) One of the most highly enriched processes is 'response to interferon alpha'. This stands out as most of the other processes identified involve more general cellular processes (i.e., cell proliferation, cell metabolism, etc...). Minimally, interferon alpha should be discussed. It would also be interesting to test whether type I interferons regulate any of the metabolic changes identified.

      Response to interferon alpha has the highest fold enrichment of 6.78. To look at this further compiled a list of proteins upregulated by IFN-α stimulation in murine B cells, derived from (Mostafavi et al, 2016) and compared these with our proteome. We found that most of the IFNα regulated genes were not significantly upregulated following LPS + IL-4 stimulation compared to naïve B cells (Figure S3A). We also measured phosphorylation of the transcription factor STAT1, which is induced by IFNα and IFNβ signalling, and found that LPS stimulation did not induce p-STAT1 (Figure S3B-C). These results have been added to the manuscript as Supplementary Figure 3. Despite this, as discussed further in the manuscript we cannot rule out a weak interferon response in the proteomics.

      (2) The proteome of BCR-stimulated B cells has been analyzed by mass spectrometry. This dataset should be compared with the LPS + IL-4 dataset of the current study. This may reveal whether these two stimulations have similar or different effects on B-cell function. In particular, it is interesting to know whether BCR stimulation induces SLC7A5 expression and whether proteins involved in cholesterol metabolism are altered by BCR stimulation.

      A similar study using anti-IgM and anti-CD40 to activate murine B cells has found an upregulation of amino acid transporters, including SLC7A5, in their proteomic data, suggesting that this is not a stimulus-specific effect. This has been added to the text subsection “Protein synthesis in LPS + IL-4 stimulated B cells is dependent on the uptake of amino acids.” In line with this we have also shown that stimulation of the BCR upregulates protein synthesis (new Supplementary Figure 4). We have added data on HMGCR, SQLE and LDLR form the BCR proteomics experiments to the new Supplementary Figure 13. As the BCR proteome published as a preprint (James et al 2024) is about to be resubmitted as a distinct paper that does not deal with cholesterol metabolism, we have not expanded on this dataset further.

      (3) A role for mTORC1 has been shown for proteome remodelling following BCR stimulation of naïve B cells, regulating the expression of amino acid transporters. Is mTORC1 involved in any of the changes detected following LPS + IL-4 stimulation? (i.e., cell proliferation, ribosome biogenesis, amino acid transport, cholesterol biogenesis).

      To determine the importance of mTORC1 for B cell function, we treated B cells with rapamycin. We found that rapamycin treatment slightly reduced protein synthesis (Figure S12A) and amino acid uptake (Figure S12B). These results have been added to the manuscript as Supplementary Figure 12. Rapamycin reduced cholesterol to almost the levels in naïve B cells (new Figure 9F) and had a significantly inhibitory effect on proliferation (new Figure 9A-B).

      (4) Analysis of Slc7a5 knockout B cells showed that SLC7A5 is required for LPS-induced proliferation (Figure 4G). Is SLC7A5 required for B cell growth following LPS + IL-4 stimulation? Is SLC7A5 required for BCR-induced B cell proliferation/growth?

      There appears to be a misunderstanding, as Figure 4G compares proliferation between WT and SLC7A5 KO B cells following LPS + IL-4 stimulation and not LPS stimulation alone.

      Unfortunately, we no longer have access to Slc7a5fl/fl/Vav-iCre+/- mice and will not be able to measure CTV staining for proliferation following BCR stimulation. However, a similar study using anti-IgM and anti-CD40 to activate murine B cells found that B cells from Slc7a5fl/fl/Vav-iCre+/- mice were significantly smaller, had reduced expression of the chaperone protein CD98 and impaired expression of the transferrin receptor CD71, which is required for iron uptake, compared to WT B cells (James et al, 2024).

      (5) The expression of several key proteins (regulating proliferation/amino acid transport/cholesterol metabolism) is shown to be significantly upregulated by LPS + IL-4 stimulation of naïve B cells. It would be interesting to determine whether these increases result from induced transcription of the relevant genes. This could initially be assessed by qRT-PCR analysis of LPS + IL-4 stimulated primary B cells, or alternatively, mining of online RNAseq datasets.

      We mined RNA-Seq data from C57BL/6 mice (Tesi et al, 2019) which compared naïve B cells and B cells after 2,4, or 8 hours of LPS stimulation. We found that the transcription of genes that coded for the amino acid transporter SLC7A5/SLC3A2 (Figure S6A-B) and key genes involved in cholesterol metabolism followed the same pattern of upregulation as our proteomic data (Figure S6C-F). These results have been added to the manuscript as a new Supplementary Figure 6.

      (6) Cholesterol levels are shown to be increased following resiquimod, CpG, anti-IgM, and CD40L stimulation (Figure 9). What effect do these agonists have on levels of HMGCR, SQLE, and LDLR in B cells? Is B-cell growth by these agonists impaired by Fluvastatin.

      We found that stimulation of murine B cells with either IL-4, anti-IgM or anti-CD40 could increase levels of HMGCR, SQLE and LDLR, with the largest increase seen with a combination of these stimuli (Figure S13A-D) (James et al, 2024). These results have been added to the manuscript as Supplementary Figure 13.

      Figures 10C-E show that B cell growth, survival and proliferation are impaired by Fluvastatin after Resiquimod, CpG, anti-IgM, and CD40L stimulation, although we do not have proteomic data from these stimuli to confirm the levels of HMGCR, SQLE and LDLR.

      We carried out proteomics after 24 hours of LPS + IL-4 stimulation in normal/CF media, with or without Fluvastatin. We found that Fluvastatin treatment in normal media increased the expression of HMGCR, SQLE and LDLR. Fluvastatin treatment in CF media had the highest increase in the expression of these key proteins (Figure S9G-J). These results have been added to the manuscript as Supplementary Figure 9.

      (7) Do Fluvastatin or FGTI-2734 affect early activation of signaling pathways by LPS + IL-4 stimulation of B cells? (eg. MAPKs, STATs, PI3K/AKT).

      This is an interesting question, we will pursue this in our future work.

      References:

      James O, Sinclair LV, Lefter N, Salerno F, Brenes A & Howden AJM (2024) A proteomic map of B cell activation and its shaping by mTORC1, MYC and iron. bioRxiv 2024.12.19.629506 doi:10.1101/2024.12.19.629506

      Xia Y, Xie Y, Yu Z, Xiao H, Jiang G, Zhou X, Yang Y, Li X, Zhao M, Li L, et al (2018) The Mevalonate Pathway Is a Druggable Target for Vaccine Adjuvant Discovery. Cell 175: 1059-1073.e21

    1. eLife Assessment

      This study represents a useful finding on the social modulation of the complex repertoire of vocalizations made across a variety of strains of lab mice. The evidence supporting the claims is, at present, incomplete, as numerous concerns regarding the appropriate categorization of vocalizations, the averaging of data points with disparate levels of occurrence, the interpretation of the function of noisy calls, and a general lack of adequate analyses of experimental data were raised. With these issues addressed, the work will be of importance to scientists studying rodent vocal communication.

    2. Reviewer #1 (Public review):

      Summary:

      Adult laboratory mice produce ultrasonic vocalizations during free social interactions, as well as lower-frequency, voiced calls (squeaks) during aversive contexts. The question of whether mice possess a more complex repertoire of vocalizations has been of great interest to scientists studying rodent vocal behavior. In the current study, the authors analyze the rates and acoustic features of vocalizations produced by pairs of mice that are allowed to interact across a barrier, which prevents direct physical interaction. In this context, they find that same-sex (but not opposite-sex) pairs of mice produce vocalizations that are lower in frequency than the typical 70 kHz ultrasonic vocalizations produced during free interactions and that are also distinct from squeaks. These lower frequency vocalizations were observed in both male-male and female-female pairs, as well as in same-sex pairs from multiple mouse strains. The authors also report that call rates and acoustic features are not affected in male-male pairs that have been treated with the anxiolytic drug buspirone, suggesting that anxiety is not a major driver of vocalization in this behavioral context.

      Strengths:

      (1) The observation that same-sex pairs of mice produce lower frequency (<70 kHz) vocalizations in this behavioral context is novel.

      (2) The consideration of multiple types of pairs (female-female, male-male, and female-male), as well as the inclusion of multiple strains of mice and barriers with different hole diameters, are all strengths of the study.

      (3) The authors include detailed analyses of vocalization acoustic features, as well as detailed tracking of mouse positions relative to the barrier.

      Weaknesses:

      The categorization applied to vocalizations based on their mean frequencies is poorly supported and ignores the distinction in laryngeal production mechanism between voiced and ultrasonic vocalizations. Specifically, the authors are likely lumping together voiced and ultrasonic vocalizations into their "low frequency" (< 30 kHz) category, while they reserve the term "ultrasonic" exclusively for the subset of ultrasonic vocalizations with the highest mean frequencies (> 50 kHz). This categorization scheme also does not align well with past work on lower frequency rodent vocalizations, which complicates the comparison of the present findings to that past work.

      In some analyses, the authors report that different groups of mice produce different relative proportions of vocalization types (as defined by mean frequency) but then compare acoustic features of vocalizations between groups after pooling all vocalizations together. The analyses of acoustic features conducted in this way may be confounded by the different proportions of vocalization types across groups.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine vocal communication during same-sex dyadic interactions in mice, comparing periods of physical separation (with limited sensory access) to direct social contact. They report that separation dramatically alters the vocal repertoire, shifting it away from canonical ultrasonic vocalizations (USVs) toward low-frequency vocalizations (LFVs) and broadband "noisy" calls. While LFVs and noisy calls have been described previously, largely in aversive contexts, this study provides a detailed, systematic characterization of these vocalizations during social interactions, thereby extending prior work.

      The authors explore several experimental manipulations and analyses, including divider hole size, strain and sex differences, anxiolytic drug treatment, and correlations with spatial proximity, to infer potential functions of these call types. Although the dataset is rich, the results are largely descriptive, and many conclusions remain tentative. Several experimental variables are not fully controlled, and in some cases, the interpretation exceeds what the data can clearly support. Nonetheless, with improved experimental framing, additional analyses of existing data, and a clearer discussion of limitations, this work has the potential to make a valuable contribution by broadening the field's focus beyond USVs to understand a wider vocal repertoire relevant to social context.

      Strengths:

      Much work on mouse vocal communication focuses almost exclusively on USVs. This manuscript convincingly demonstrates that non-USV vocalizations (LFVs and noisy calls) are prominent and systematically modulated by social context, highlighting an underappreciated dimension of mouse communication. Furthermore, the authors employ several experimental manipulations, including sensory access, strain, sex, and pharmacological treatment, to assess changes in vocalization repertoire. This provides a valuable resource for the field and reveals robust context dependence of vocalization. The discussion is thoughtful and integrative, particularly in its consideration of potential communicative roles of LFVs and noisy calls and their relationship to sensory constraints and signal propagation, although these ideas will require further experimental validation.

      Weaknesses:

      There are several concerns regarding experimental design and data interpretation that could be addressed to strengthen the manuscript.

      (1) The terminology used for vocalization types is confusing and needs better clarification. The authors refer to Grimsley et al. (2016) multiple times, yet they use the same names for their vocalizations while applying different definitions. This makes it very difficult to compare the two papers. Since this study and Grimsley et al. use different mouse strains (FVB vs CBA), a direct comparison of absolute frequencies may also not be appropriate. Please explicitly clarify the definitions of the call types (e.g., frequency range, voiced vs. USV) and explain how they relate to those in the previous study earlier in the manuscript.

      (2) In the initial experiment, mice always experience separation first (15 minutes), followed by unification (5 minutes), using novel same-sex dyads. Multiple factors besides physical contact could influence vocalization across this sequence, including habituation to the arena, reduced anxiety over time, or increasing familiarity with the partner despite physical separation. It is unclear whether the authors have tested the reverse order (unification first, followed by separation). If not, this limitation should be explicitly acknowledged. In addition, examining whether vocalizations or behaviors change over the course of the 15-minute separation period, for example, by comparing early vs late phases, could help disentangle effects of habituation from those of physical separation per se.

      (3) The conclusion that separation-induced LFVs are unlikely to be anxiety-driven may overinterpret the buspirone experiment (Figure 8). Vehicle injections themselves produced large changes in call rate and call-type distribution, raising concerns about stress or arousal induced by the injection procedure. Comparisons between buspirone-treated animals and untreated animals are therefore problematic, as these groups differ in their experimental histories, including the number of exposures. The manuscript would benefit from independent measures confirming the anxiolytic efficacy of buspirone compared to vehicle injection in this paradigm, such as behavioral readouts of anxiety. In addition, the experimental design requires a clearer description. It is not always clear whether the same dyads were tested twice, or how social familiarity, contextual familiarity, and habituation to injections were handled. Male data comparing first and second exposures should also be included as supplementary figures to allow direct comparison with the excluded female dataset.

      (4) The idea that noisy calls function to attract conspecific attention is intriguing. However, in Figure 5, all call types, including LFVs and USVs, are most likely to occur when mice are already in close proximity during separation, which seems inconsistent with a long-distance signaling role. Analyses of the temporal relationship between vocalizations and behavior would strengthen this claim. For example, it would be informative to test whether bouts of noisy calls precede approach behavior or a reduction in inter-animal distance. Examining whether calls occur before, during, or after orientation toward the partner could further clarify whether these vocalizations actively modulate social behavior.

      (5) The effects of divider hole size on vocal repertoire are striking but difficult to interpret. Unexpectedly, small holes and no holes yield similar call distributions, whereas large holes produce a markedly different profile dominated by LFVs, which also differs from free interactions. If large holes allow greater tactile or close-range interaction, the reduction in USVs and MFV is counterintuitive. Incorporating behavioral metrics such as distance, orientation, or specific interaction types alongside call classification would greatly aid interpretation and help link vocal output to interaction quality rather than divider type alone.

      (6) Throughout the study, vocalizations are pooled across both animals in the dyad. Because the arena is neutral rather than a home cage, either animal could be initiating vocalization. Assigning calls to individuals, where possible, using spatial or acoustic cues, would substantially strengthen functional interpretations. Even limited analyses, e.g., identifying which animal vocalizes first or whether calls precede approach by the partner, could provide important insight into the communicative role of different call types.

    4. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Adult laboratory mice produce ultrasonic vocalizations during free social interactions, as well as lower-frequency, voiced calls (squeaks) during aversive contexts. The question of whether mice possess a more complex repertoire of vocalizations has been of great interest to scientists studying rodent vocal behavior. In the current study, the authors analyze the rates and acoustic features of vocalizations produced by pairs of mice that are allowed to interact across a barrier, which prevents direct physical interaction. In this context, they find that same-sex (but not opposite-sex) pairs of mice produce vocalizations that are lower in frequency than the typical 70 kHz ultrasonic vocalizations produced during free interactions and that are also distinct from squeaks. These lower frequency vocalizations were observed in both male-male and female-female pairs, as well as in same-sex pairs from multiple mouse strains. The authors also report that call rates and acoustic features are not affected in male-male pairs that have been treated with the anxiolytic drug buspirone, suggesting that anxiety is not a major driver of vocalization in this behavioral context.

      Strengths:

      (1) The observation that same-sex pairs of mice produce lower frequency (<70 kHz) vocalizations in this behavioral context is novel.

      (2) The consideration of multiple types of pairs (female-female, male-male, and female-male), as well as the inclusion of multiple strains of mice and barriers with different hole diameters, are all strengths of the study.

      (3) The authors include detailed analyses of vocalization acoustic features, as well as detailed tracking of mouse positions relative to the barrier.

      Weaknesses:

      The categorization applied to vocalizations based on their mean frequencies is poorly supported and ignores the distinction in laryngeal production mechanism between voiced and ultrasonic vocalizations. Specifically, the authors are likely lumping together voiced and ultrasonic vocalizations into their "low frequency" (< 30 kHz) category, while they reserve the term "ultrasonic" exclusively for the subset of ultrasonic vocalizations with the highest mean frequencies (> 50 kHz). This categorization scheme also does not align well with past work on lower frequency rodent vocalizations, which complicates the comparison of the present findings to that past work.

      We thank the reviewer for their assessment. Firstly, we did not use mean frequencies, but peak frequencies of each single call.

      The distinction between ‘voiced’ and ‘whistled’ vocalizations based on their spectral-temporal features is hardly possible. While evidence in form of audio recordings made from both deer mouse and grasshopper mouse in helium-enriched air suggests vocalizations with lower fundamental frequency being ‘voiced’ (Pasch et al., 2017; Riede et al., 2022), a computational model considering the laryngeal anatomy of Mus musculus estimates fundamental frequencies of vocalizations at subglottal phonation threshold pressures usual for USVs to be in the range of 1 – 5 kHz and approaching 10 kHz for higher subglottal pressures usually found in the production of ‘voiced’ vocalizations (Pasch et al., 2017). Furthermore, a recent study in the singing mouse (Scotinomys teguina) found minimal fundamental frequencies of single song notes, produced by a whistle mechanism, to be about 4 kHz (Zheng et al., 2025). Thus, the presence of low fundamental (peak) frequencies in mouse vocalizations alone appears to be insufficient for deducing the production mechanism of these vocalizations.

      We did not observe differences in acoustic features clearly separating our ‘LFV’ calls into two groups suggestive of different production mechanisms. Thus, we cannot rule out that our ‘LFV’ class contains vocalizations produced by different mechanisms. However, we did not observe any squeaks in our experiments and can therefore rule out that this prominent type of ‘voiced’ call is lumped together with other calls in the ‘LFV’ calls.

      While the questions regarding production mechanism, the neurocircuitry involved, and the context-dependent choice of which mechanism to use is intriguing/enticing, the distinction between ‘voiced’ and ‘whistled’ vocalizations lies beyond the scope of our manuscript. Instead, the neurocircuitry involved in mouse vocalization production, particularly USVs and squeaks has been revealed by other laboratories. Optogenetical activation of RAm Nts neurons elicited emission of both audible vocalizations (fundamental frequencies of 10 kHz and below) and USVs in awake mice in a stimulus-dependent manner (Veerakumar et al., 2023). Furthermore, optogenetical activation of RAm-vocalization neurons led to immediate measurable adduction of vocal folds and emission of canonical USVs (Park et al., 2024). While different populations of PAG neurons are responsible for the production both squeaks and USVs (Ziobro et al., 2024), the two input streams seem to converge on RAm vocalization neurons, as silencing the output of these neurons abolished both squeak and USV emission completely (Park et al., 2024). Thus, while near complete closing of the vocal folds is necessary for the production of canonical USVs (Mahrt et al., 2016; Park et al., 2024), it is not clear which degree of vocal fold opening would result in what fundamental frequencies.

      We will add a paragraph on this issue to the discussion in the next version of the manuscript.

      In some analyses, the authors report that different groups of mice produce different relative proportions of vocalization types (as defined by mean frequency) but then compare acoustic features of vocalizations between groups after pooling all vocalizations together. The analyses of acoustic features conducted in this way may be confounded by the different proportions of vocalization types across groups.

      We displayed the relative distribution of the different call classes demonstrating that 80% of the call repertoire during the separation consisted of noisy calls and ‘LFV’. Thus, the per individual averaged acoustic features e.g. peak frequency would be predominantly shaped by the features of these two call classes. However, we agree with the reviewer’s criticism and will provide a more detailed display and analysis of the acoustic features of each call class.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine vocal communication during same-sex dyadic interactions in mice, comparing periods of physical separation (with limited sensory access) to direct social contact. They report that separation dramatically alters the vocal repertoire, shifting it away from canonical ultrasonic vocalizations (USVs) toward low-frequency vocalizations (LFVs) and broadband "noisy" calls. While LFVs and noisy calls have been described previously, largely in aversive contexts, this study provides a detailed, systematic characterization of these vocalizations during social interactions, thereby extending prior work.

      The authors explore several experimental manipulations and analyses, including divider hole size, strain and sex differences, anxiolytic drug treatment, and correlations with spatial proximity, to infer potential functions of these call types. Although the dataset is rich, the results are largely descriptive, and many conclusions remain tentative. Several experimental variables are not fully controlled, and in some cases, the interpretation exceeds what the data can clearly support. Nonetheless, with improved experimental framing, additional analyses of existing data, and a clearer discussion of limitations, this work has the potential to make a valuable contribution by broadening the field's focus beyond USVs to understand a wider vocal repertoire relevant to social context.

      Strengths:

      Much work on mouse vocal communication focuses almost exclusively on USVs. This manuscript convincingly demonstrates that non-USV vocalizations (LFVs and noisy calls) are prominent and systematically modulated by social context, highlighting an underappreciated dimension of mouse communication. Furthermore, the authors employ several experimental manipulations, including sensory access, strain, sex, and pharmacological treatment, to assess changes in vocalization repertoire. This provides a valuable resource for the field and reveals robust context dependence of vocalization. The discussion is thoughtful and integrative, particularly in its consideration of potential communicative roles of LFVs and noisy calls and their relationship to sensory constraints and signal propagation, although these ideas will require further experimental validation.

      Weaknesses:

      There are several concerns regarding experimental design and data interpretation that could be addressed to strengthen the manuscript.

      (1) The terminology used for vocalization types is confusing and needs better clarification. The authors refer to Grimsley et al. (2016) multiple times, yet they use the same names for their vocalizations while applying different definitions. This makes it very difficult to compare the two papers. Since this study and Grimsley et al. use different mouse strains (FVB vs CBA), a direct comparison of absolute frequencies may also not be appropriate. Please explicitly clarify the definitions of the call types (e.g., frequency range, voiced vs. USV) and explain how they relate to those in the previous study earlier in the manuscript.

      The existence and use of various distinct classification systems for mouse vocalizations is well known and the need to agree on a common classification system is consensus in the field. Thus, it was not our intention to complicate mouse call classification even more.

      Grimsley at al. (2016) reserve the ‘low frequency’ band solely for squeaks (or “low frequency harmonics”). Hence, it appears straight forward to name mouse calls with “mean dominant frequencies” falling between squeaks and USVs, “mid-frequency tonal vocalizations (MFVs)” (Grimsley et al., 2016). We did not observe the emission of squeaks in our experiments, but instead we observed tonal vocalizations in a peak frequency spectrum encompassing both squeaks and Grimsley and colleagues’ ‘MFVs’, representing the lowest peak frequencies we observed (< 32 kHz). Furthermore, we observed vocalizations in the range of 32 – 50 kHz (which were not low frequency components of canonical USVs) and of > 50 kHz (corresponding to canonical USVs). Leaning on the terminology of Grimsley and colleagues (2016), we thought it to be straightforward to name these call classes according to their location on the frequency spectrum: low frequency vocalizations (LFVs; up to 32 kHz), encompassing squeaks, but also Grimsley and colleagues’ MFVs, middle frequency vocalizations (MFVs; 32 – 50 kHz), and finally canonical USVs (> 50 kHz). Admittedly, choosing ‘MFVs’ for mouse calls with different acoustic features than those described by Grimsley and colleagues (2016) has caused unnecessary confusion. We therefore consider adapting our classification scheme for the next version of the manuscript.

      Regarding the comparison of call classes between different mouse strains, strain differences of spectral-temporal features of call classes have been described for canonical USVs (e.g. Scattoni et al., 2008). However, the acoustic features as well as call repertoire are still quite comparable. Furthermore, we have additionally tested both CBA/J and C57BL/6J mice in our study confirming the presence of both noisy calls, ‘LFVs’, ‘MFVs’, and ‘USVs’ in the vocal repertoire of these two strains.

      We will provide a more detailed display and analysis of the acoustic features of the call classes with the next version of the manuscript.

      (2) In the initial experiment, mice always experience separation first (15 minutes), followed by unification (5 minutes), using novel same-sex dyads. Multiple factors besides physical contact could influence vocalization across this sequence, including habituation to the arena, reduced anxiety over time, or increasing familiarity with the partner despite physical separation. It is unclear whether the authors have tested the reverse order (unification first, followed by separation). If not, this limitation should be explicitly acknowledged. In addition, examining whether vocalizations or behaviors change over the course of the 15-minute separation period, for example, by comparing early vs late phases, could help disentangle effects of habituation from those of physical separation per se.

      We had not tested mice in the reverse order, beginning with 5 minutes of unification followed by 15 minutes of separation. Therefore, we acknowledge this limitation of our study and will address it explicitly in the next version of our manuscript. We appreciate the reviewer’s note regarding the inclusion of vocalizations over time and aim to provide this analysis in the next version of the manuscript.

      (3) The conclusion that separation-induced LFVs are unlikely to be anxiety-driven may overinterpret the buspirone experiment (Figure 8). Vehicle injections themselves produced large changes in call rate and call-type distribution, raising concerns about stress or arousal induced by the injection procedure. Comparisons between buspirone-treated animals and untreated animals are therefore problematic, as these groups differ in their experimental histories, including the number of exposures. The manuscript would benefit from independent measures confirming the anxiolytic efficacy of buspirone compared to vehicle injection in this paradigm, such as behavioral readouts of anxiety. In addition, the experimental design requires a clearer description. It is not always clear whether the same dyads were tested twice, or how social familiarity, contextual familiarity, and habituation to injections were handled. Male data comparing first and second exposures should also be included as supplementary figures to allow direct comparison with the excluded female dataset.

      We agree with the reviewer’s point that the injection procedure itself appeared to have an impact on vocalization behavior. In fact, we had included the ‘untreated’ cohort in Fig. 8 despite their different experimental history to appreciate the potential impact of injection onto vocal behavior.

      Furthermore, we appreciate the reviewer’s point of confirming the anxiolytic effect of buspirone treatment with further behavioral readouts and aim to provide such analysis in the next version of the manuscript.

      Regarding the reviewer’s query for clearer experimental design description, the same dyads were tested twice. All mice lived in groups in their home cage, however, they had not met the individual they would face during the experiment before the first experiment. We will improve the description of the experimental design addressing the reviewer’s points in the next version of the manuscript.

      (4) The idea that noisy calls function to attract conspecific attention is intriguing. However, in Figure 5, all call types, including LFVs and USVs, are most likely to occur when mice are already in close proximity during separation, which seems inconsistent with a long-distance signaling role. Analyses of the temporal relationship between vocalizations and behavior would strengthen this claim. For example, it would be informative to test whether bouts of noisy calls precede approach behavior or a reduction in inter-animal distance. Examining whether calls occur before, during, or after orientation toward the partner could further clarify whether these vocalizations actively modulate social behavior.

      We appreciate the reviewer’s remarks regarding the apparent inconsistencies between noisy calls as conspecific attraction calls and their occurrence in close mouse-to-mouse proximity. We must concede that the size of our testing arena limited the maximum distances mice could achieve. Thus, we aim to provide a more extensive analysis including approach behavior and changes of inter-animal distances for resubmission of the manuscript as suggested by the reviewer.

      (5) The effects of divider hole size on vocal repertoire are striking but difficult to interpret. Unexpectedly, small holes and no holes yield similar call distributions, whereas large holes produce a markedly different profile dominated by LFVs, which also differs from free interactions. If large holes allow greater tactile or close-range interaction, the reduction in USVs and MFV is counterintuitive. Incorporating behavioral metrics such as distance, orientation, or specific interaction types alongside call classification would greatly aid interpretation and help link vocal output to interaction quality rather than divider type alone.

      We agree with the reviewer that the interpretation of the divider-hole-size-experiment are difficult and following this reviewer’s input, aim to provide additional behavioral analysis for the effect of divider hole size with the next version of the manuscript.

      (6) Throughout the study, vocalizations are pooled across both animals in the dyad. Because the arena is neutral rather than a home cage, either animal could be initiating vocalization. Assigning calls to individuals, where possible, using spatial or acoustic cues, would substantially strengthen functional interpretations. Even limited analyses, e.g., identifying which animal vocalizes first or whether calls precede approach by the partner, could provide important insight into the communicative role of different call types.

      We agree with the points raised by the reviewer regarding the importance of assigning recorded calls to the respective individual for deciphering the communicative role of different call types. Unfortunately, our system was only equipped with one condenser microphone therefore we are not able to assign calls to individual mice.

      Literature:

      Grimsley, J. M. S., Sheth, S., Vallabh, N., Grimsley, C. A., Bhattal, J., Latsko, M., Jasnow, A., & Wenstrup, J. J. (2016). Contextual Modulation of Vocal Behavior in Mouse: Newly Identified 12 kHz „Mid-Frequency“ Vocalization Emitted during Restraint. Frontiers in Behavioral Neuroscience, 10, 38. https://doi.org/10.3389/fnbeh.2016.00038

      Mahrt, E., Agarwal, A., Perkel, D., Portfors, C., & Elemans, C. P. H. (2016). Mice produce ultrasonic vocalizations by intra-laryngeal planar impinging jets. Current Biology: CB, 26(19), R880–R881. https://doi.org/10.1016/j.cub.2016.08.032

      Park, J., Choi, S., Takatoh, J., Zhao, S., Harrahill, A., Han, B.-X., & Wang, F. (2024). Brainstem control of vocalization and its coordination with respiration. Science (New York, N.Y.), 383(6687), eadi8081. https://doi.org/10.1126/science.adi8081

      Pasch, B., Tokuda, I. T., & Riede, T. (2017). Grasshopper mice employ distinct vocal production mechanisms in different social contexts. Proceedings. Biological Sciences, 284(1859), 20171158. https://doi.org/10.1098/rspb.2017.1158

      Riede, T., Kobrina, A., Bone, L., Darwaiz, T., & Pasch, B. (2022). Mechanisms of sound production in deer mice (Peromyscus spp.). The Journal of Experimental Biology, 225(9), jeb243695. https://doi.org/10.1242/jeb.243695

      Scattoni, M. L., Gandhy, S. U., Ricceri, L., & Crawley, J. N. (2008). Unusual repertoire of vocalizations in the BTBR T+tf/J mouse model of autism. PloS One, 3(8), e3067. https://doi.org/10.1371/journal.pone.0003067

      Veerakumar, A., Head, J. P., & Krasnow, M. A. (2023). A brainstem circuit for phonation and volume control in mice. Nature Neuroscience, 26(12), 2122–2130. https://doi.org/10.1038/s41593-023-01478-2

      Zheng, X. M., Harpole, C. E., Davis, M. B., & Banerjee, A. (2025). Vocal repertoire expansion in singing mice by co-opting a conserved midbrain circuit node. Current Biology: CB, 35(23), 5762-5778.e6. https://doi.org/10.1016/j.cub.2025.10.036

      Ziobro, P., Woo, Y., He, Z., & Tschida, K. (2024). Midbrain neurons important for the production of mouse ultrasonic vocalizations are not required for distress calls. Current Biology: CB, 34(5), 1107-1113.e3. https://doi.org/10.1016/j.cub.2024.01.016

    1. eLife Assessment

      In this important study, the authors demonstrate that generative AI techniques (restricted Boltzmann machine) can be used effectively to design and characterize mutational pathways of WW domains with different binding specificities. The computational studies are complemented by experimental validations, and the results provide solid evidence supporting the idea that sequence landscape holds significance in understanding protein evolution from a transition path perspective. The minor weakness of the study in the current form concerns limited success in designing variants with smoothly varying binding specificities. Nevertheless, the work will likely have a major impact on research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to study mutational paths connecting WW domains with different binding specificities. Their approach combines an unsupervised sequence generative model based on RBMs with a path-sampling algorithm. The key result is that most intermediate sequences along the designed transition paths retain measurable binding activity in wet-lab assays, whereas paths containing the same mutations introduced in a randomized order are largely non-functional. This difference is attributed to epistatic interactions captured by the RBM model.

      Strengths:

      Exploring mutational paths in high-dimensional protein sequence space is a challenging problem. The computational framework used here is state-of-the-art and is strengthened by systematic experimental characterization of binding activity. The study is comprehensive in scope, including multiple transition paths both within and across WW specificity classes, and the integration of modeling with high-throughput experimental validation is a clear strength.

      Weaknesses:

      A major concern is whether the stated goal of specificity switching is fully achieved. Along the sampled transition paths, most intermediate variants appear to retain specificity close to either the initial or the final class, rather than exhibiting gradually shifting specificity. For example, in Figure 4G (Class I to Class II/III), binding appears largely binary, with intermediates behaving similarly to one of the endpoints. A similar pattern is observed in Figure 3H for the Class I to Class IV transition, where binding responses are close to 0 or 1. In this sense, the specificity-switching objective is only partially realized by assigning two endpoints with different specificity. This raises a broader conceptual question: is it possible that different WW specificities evolved from a common ancestor without passing through intermediates that exhibit mixed or intermediate specificity? If so, then inferring specificity-switching pathways purely from extant natural sequences may be fundamentally challenging.

    3. Reviewer #2 (Public review):

      This is an extremely important work that shows how one can use generative models to construct specificity-switching mutational paths in complex fitness landscapes. The experimental evidence is very clear, and the theoretical tools are innovative.

      The work will likely have a deep impact on future research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

      The manuscript is extremely clear and well written, the experimental evidence is strong, and the methods are clearly described, so I do not have major issues to raise. A few minor issues are listed below.

      (1) I consider the WW domain as an 'easy' case from the point of view of generative modelling. The domain is rather short, epistatic effects are not very strong (e.g. Boltzmann learning usually converges very quickly to a very paramagnetic state), and the resulting models are well interpretable (e.g. the hidden units of the RBM correlate well with subclasses).

      This is not always (not often?) the case, however. In more complex proteins, the learning procedures can be slower and the resulting models less interpretable. Just for completeness, perhaps the authors could comment on the generality of the results and what they would expect for other systems based on their experience.

      (2) In Section 3.3, the authors say that direct paths connecting Class I and Class IV behave similarly to indirect paths, despite having lower scores according to the RBM. How generic is this? Does it also happen for other classes? This might be an important point to address, as direct paths are easier to sample.

      (3) The path shown in Figure 4 goes through a region of non-functionality around sequences 18-19. It seems that the sample path is basically exploring the functional regions for Class I and Class II/III separately, trying to approach the other class, but then it can't really make the switch.

      By contrast, the path going from Class I to Class IV seems able to perform the functional switch in a single step (20-21) without losing too much of the function.

      Perhaps the authors could better comment on this? Is this a limitation of the sampling method, or a fundamental biological fact?

      (4) On page 12, it is stated that the temperature was chosen to 1/3 to maximize the score. This is important and should be mentioned earlier (I didn't notice it until that point).

      (5) On page 13, it is stated that: "However, the scores of the ancestral sequences along the phylogenetic pathways assigned by the RBM are significantly lower than the ones of the RBM-designed sequences. This result is expected as ASR reconstruction does not take into account epistasis, differently from RBM, and we expect ASR sequences to generally be of lesser quality."

      I was very surprised by this result. My own experience with ASR shows that, on the contrary, sequences found by ASR (via maximum likelihood) tend to have high scores in the (R)BM, and tend to be more stable than extant sequences. I attribute this to the fact that ASR typically finds a "consensus" sequence that maximizes the contribution to the score coming from the fields (the profile), which is typically dominant over the epistatic signal, resulting in a bigger score. Maybe the authors did not use maximum likelihood in the ASR? Some clarification might be useful here.

    1. eLife Assessment

      This important paper substantially advances our understanding of how Molidustat may work, beyond its canonical role, by identifying its therapeutic targets in cancer. This study presents a compelling and well-structured investigation into the therapeutic vulnerabilities of APC-mutant colorectal cancer. This work will be of broad interest to the cancer community in studying small molecules and their therapeutic targets.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to uncover novel therapeutic vulnerabilities in APC-mutant colorectal cancer (CRC), which constitutes the majority of CRC cases. They hypothesized that modulating oxygen-sensing pathways (via PHD inhibition) could disrupt adaptive stress responses in these tumours.

      Strengths:

      The study employs a powerful, two-pronged approach to identify Molidustat's targets. By using both Thermal Proteome Profiling (TPP) and an orthogonal chemical proteomic competition assay, the authors provide compelling evidence that GSTP1 is a genuine, direct off-target, effectively addressing the common limitation of indirect effects in proteomic screens.

      Weaknesses:

      (1) In Figure 1, the current data rely on a single guide RNA (sgRNA). To make the data solid, at least two independent sgRNAs targeting different regions of PHD2 should be used.

      (2) Figure 3E: Asn205 site should be mutated to prove that whether Molidustat inhibits GSTP1 activity via Asn205 or not.

      (3) Figure 5B and 5C: The metabolic imbalance phenotype observed upon dual knockout of PHD2 and GSTP1 requires rescue experiments to confirm on-target specificity.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to determine Molidustat targets and the potential utility of these findings. They clearly demonstrate that Molidustat interferes with GSTP1 and some other proteins on top of PHD2. They also demonstrate that PHD2 deletion is not sufficient to recapitulate Molidustat effects in cells and proteomes. Finally, they demonstrate synthetic lethality in organoids for Molidustat and APC deletion.

      Strengths:

      The data on Molidustat proteomes, GSTP1 binding, inhibition and metabolic health of organoids is really clear. All biochemical, docking and omic data are really strong. The potential impact of these findings could be the use of Molidustat in APC null tumours and awareness of potential off-target effects.

      Weaknesses:

      A main but minor weakness is that Molidustat also inhibits other PHDs, although these are less expressed. PHD1 has been shown to control the cell cycle and be expressed in the colon, where it is needed for viability. Although this does not explain the lack of effect of other PHD inhibitors, it does warrant some discussion. The use of MTT is not very good to detect viability when it measures metabolism; this also needs to be discussed and perhaps supplemented with colony or cell number measurements.

      Reviewer #3 (Public review):

      In this paper, the authors revealed that Molidustat can induce a dose-dependent increase in Caspase-3/7 activity in the HT29 cell line, which is an APC-mutant colorectal cancer cell line. More importantly, they found that targeting PHD2 alone cannot cause cell death. By using thermal proteome profiling (TPP) and orthogonal chemical proteomic competition assays, they determined GTSP1 as a previously undiscovered off-target of Molidustat. They also revealed that combined PHD2 and GSTP1 loss leads to an increase in intracellular ROS and apoptosis. Moreover, they evaluated the effects of Molidustat in colonic organoids and showed that Molidustat has a high selectivity for colonic organoids with activated WNT signaling and/or KRAS pathway alterations, and this effect is not reproduced by hydroxylase inhibition alone, providing a new potential approach to targeting both PHD2 and GTSP1 for the treatment of APC-mutant CRC.

      Specific comments:

      (1) What is the possible molecular mechanism of dual GSTP1/PHD2 loss, inducing cell death?

      (2) Can the authors mutate the binding site of Molidustat on GTSP1 to verify the in silico docking results?

      (3) Evidence for Molidustat inhibiting PHD2 activity or stabilising HIF-1α should be provided.

    1. eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modelling, and model-based fMRI analyses provides solid support for the main claims. The addition of new model-comparison figures in revision effectively addresses the previously noted potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, while the computational model captures the pattern of "system neglect" well, qualitatively distinct mechanisms, such as hyper-prior attraction toward experiment-wise mean parameters, reporting biases, or probability-outlier underweighting, could produce similar behavioural signatures and cannot be fully disambiguated with the current design alone; however, converging evidence from the authors' prior work partially mitigates this concern.

    2. Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

    3. Reviewer #3 (Public review):

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task,

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples.

    4. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

      Thank you. In this revision, we addressed Reviewer 3’s remaining concern on the potential confound between posterior probability and time in neuroimaging results. First, as suggested by the reviewer, we provided images of activations for the effect of Pt and delta Pt after controlling for intertemporal prior in GLM-2. Second, we compared the effect of Pt and delta Pt between GLM-1 (without intertemporal prior) and GLM-2 (with intertemporal prior) and showed the results in a new figure (Figure 4).

      Regarding issue on reliance on explicitly instructed probabilities, we wish to point out that most of the concerns such as response mode and regression to the mean were addressed in the original behavioral paper by Massey and Wu (2005). Please see our response to this point in detail in Weakness (2) posted by Reviewer 3.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

      Thank you for reviewing our paper and providing constructive comments that helped us improve our paper.

      Reviewer #3 (Public review):

      Thank you again for reviewing the manuscript. In this revision, we focused on addressing your concern on the potential confound between posterior probability and time in neuroimaging results. First, we presented whole-brain results of subjects’ probability estimates (Pt, their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we compared the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. These results will be summarized in a new figure (Figure 4) in the revised manuscript.

      As suggested by the reviewer, we also added slice-by-slice images of the whole-brain results on Pt and delta Pt in the supplement in addition to the Tables of Activation so that the activated brain regions can be clearly seen through these images.

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      Thank you. Yes, people do struggle with conditional probabilities in many studies. However, as our previous work suggested (Massey and Wu, 2005), system-neglect was likely not due to response mode (having to enter probability estimates or making binary predictions, and etc.).

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      We thank the reviewer for this comment. We thank you for putting out that there are alternative models that can describe the over- and underreaction seen in the dataset. Massey and Wu (2005) dealt with this possibility in their original paper. Their concern was not so much about alternative ways of modeling their results, but in terms of alternative psychological processes. For example, asymmetric noise accounts have been posited in the judgment and decision making literature as possible accounts of phenomena like over-confidence. They addressed what might be crudely called “regression/attraction to the mean” in two ways. First, they looked at median responses as well as mean responses (because medians are less affected by the regressive effect) and found the same patterns of over- and underreactions. Second, they also generated sequences that matched particular posterior probabilities (so that over- and underreaction cannot be explained by regression to the mean) and still found under- and overreactions.

      We also wish to point out in the judgment and decision making literature starting from Edwards (1968), there is a long history of using normative Bayesian model as the starting model and subsequently develop quasi-Bayesian models (like the system-neglect model) to describe systematic deviations from the normative Bayesian.

      Finally, we want to clarify that our primary goal is not to engage in model fitting exercise that examines different possible models. To us, what is more important is that system neglect is a psychologically motivated hypothesis. It is built on the idea that the lack of sensitivity to the system parameters is due to the fact that people focus primarily on the signals and secondarily on the system parameters that generate the signals. Massey and Wu (2005) dealt with a host of other potential explanations through experimental manipulations and data analysis. In this paper, we built on Massey and Wu to examine the neurocomputational basis that gives rise to over- and underreactions.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      Thank you. In response, we added a new figure, now Figure 4, showing the results of Pt and delta Pt from GLM-2 where we added the intertemporal prior as a regressor to control for temporal confounds. We compared Pt and delta Pt results in vmPFC and ventral striatum between GLM-1 and GLM-2. We also showed the results on intertemporal prior on vmPFC and ventral striatum from GLM-2.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments (n = 30 subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, P<sub>t</sub>, in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of P<sub>t</sub>. First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of P<sub>t</sub> did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of P<sub>t</sub> can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Thank you for pointing out the inclusion of the intertemporal prior in glm2, this seems like an important control that would address my criticism. Why not present this better-controlled analysis in the main figure, rather than the results for glm1 which has no effective control of the increasing posterior probability of a reversal with time?

      Thank you for this suggestion. We added a new figure (Figure 4) that showed results of Pt and delta Pt from GLM-2. We also compared the effect of Pt and delta Pt between GLM-1 and GLM-2. We found that the effect of Pt and delta Pt did not differ between GLM-1 and GLM-2. GLM-1 and GLM-2 differed on whether various task-related regressors contributing to Pt, including the intertemporal prior, were included in the model. In GLM-1, those task-related regressors were not included. In GLM-2, the task-related regressors were included in addition to Pt and delta P.

      The reason we kept results from GLM-1 (Figure 3) was primarily because we wanted to compare the effect of Pt between experiments under identical GLM. In other words, the regressors in GLM-1 was identical across all 3 experiments. In Experiments 1 and 2, Pt and delta Pt were respectively probability estimates and belief updates that current regime was the Blue regime. In Experiment 3, Pt and delta Pt were simply the number subjects were instructed to press (Pt) and change in number between successive periods (delta Pt).

      Here is the section in the main text where we discussed the new Figure 4 on page 19-22:

      We further examined the robustness of P<sub>t</sub> and ∆P<sub>t</sub> representations in vmPFC and ventral striatum in three follow-up analyses. In the first analysis, we implemented a GLM (GLM-2 in Methods) that, in addition to P<sub>t</sub> and ∆P<sub>t</sub>, included various task-related variables contributing to P<sub>t</sub> as regressors. Specifically, to account for the fact that the probability of regime change increased over time, we included the intertemporal prior as a regressor in GLM-2. The intertemporal prior is the natural logarithm of the odds in favor of regime shift in the t-th period, , where q is transition probability and t = 1, …, 10is the period (Eq. 1 in Methods). It describes normatively how the prior probability of change increased over time regardless of the signals (blue and red balls) the subjects saw during a trial. Including it along with P<sub>t</sub> would clarify whether any effect of P<sub>t</sub> can otherwise be attributed to the intertemporal prior. We found that the results of P<sub>t</sub> and ∆P<sub>t</sub> in the vmPFC and ventral striatum in GLM-2 were identical to those in GLM-1 (Fig. 4): Fig. 4A was meant to depict the results in slices identical to those shown in Fig. 3B for results based on GLM-1. For slice-by-slice results, see Fig. S7 in SI for results based on GLM-1 and Fig. S9 for GLM-2. For Tables of activations, see Tables S1-S3 in SI for GLM-1 and Tables S7-S9 for GLM-2. In a separate, independent region-of-interest (ROI) analysis on vmPFC and ventral striatum (Fig. 4BC; see Independent regions-of-interest (ROIs) analysis in Methods for details), we further compared the effect of both P<sub>t</sub> and ∆P<sub>t</sub> between GLM-1 and GLM-2. For P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.72, p = 0.47 in vmPFC, t(58) = −0.21, p = 0.83 in ventral striatum), while the effect of P<sub>t</sub> from GLM-1 (one sample t-test, t(29) = −3,82, p <.01 in vmPFC; t(29) = −3.06, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.69, p =.01 in vmPFC; t(29) = −2.50, p .02 in ventral striatum). For ∆P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.07, p =0.94 in vmPFC; t(58) = −0.14, p =0.88 in ventral striatum), while the effect of  from GLM-1 (one-sample t-test, t(29) = −3.12, p <.01 in vmPFC; t(29) = −4.14, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.92, p <.01 in vmPFC; t(29) = −3.59, p <.01 in ventral striatum). For the intertemporal prior, activity in both vmPFC and ventral striatum did not correlate significantly with the intertemporal prior (one-sample t-test, t(29) = −0.07, p =0.95 in vmPFC; t(29) = −0.53, p =0.60 in ventral striatum). All the t-tests described above were two-tailed. Taken together, these results suggest that vmPFC and ventral striatum represented P<sub>t</sub> and ∆P<sub>t</sub> regardless of whether the intertemporal prior and other task-related regressors contributing to P<sub>t</sub> were included in the GLM. We also did not find that vmPFC and ventral striatum to represent the intertemporal prior. In the second analysis, we implemented a GLM that replaced P<sub>t</sub> with the log odds of P<sub>t</sub>, 1n (P<sub>t</sub>/(1 - P<sub>t</sub>)) (Fig. S10 in SI). In the third analysis, we implemented a GLM that examined P<sub>t</sub> separately on periods when change-consistent (blue balls) and change-inconsistent (red balls) signals appeared (Fig. S11 in SI). Each of these analyses showed significant correlation with P<sub>t</sub> in vmPFC and ventral striatum, further establishing the robustness of the P<sub>t</sub> findings.

      As a further point I could not navigate the tables of fMRI activations in SI and recommend replacing or supplementing these with images. For example I cannot actually find a vmPFC or ventral striatum cluster listed for the effect of Pt in GLM1 (version in table S1), which I thought were the main results? Beyond that, comparing how much weaker (or not) those results are when additional confound regressors are included in GLM2 seems impossible.

      As suggested by the reviewer, we added slice-by-slice images showing the effect of Pt and delta Pt (Figure S9 in SI for GLM-2 and Figure S7 for GLM-1). The clusters in blue represent Pt effect, the clusters in orange represent delta Pt effect. As can be seen, both Pt and delta Pt are represented in the vmPFC and ventral striatum.

    1. eLife Assessment

      This study represents an important advance in our understanding of how certain inhibitors affect the behavior of voltage gated potassium channels. Robust molecular dynamics simulation and analysis methods lead to a new proposed inhibition mechanism with convincing strength of support. This study has considerable significance for the fields of ion channel physiology and pharmacology and could aid in development of selective inhibitors for protein targets.

    2. Reviewer #3 (Public review):

      Summary

      In this manuscript, Zhang et al. investigate the conduction and inhibition mechanisms of the Kv2.1 channel, with a particular focus on the distinct effects of TEA and RY785 on Kv2 potassium channels. Using microsecond-scale molecular dynamics simulations, the authors characterize K⁺ ion permeation and RY785-mediated inhibition within the central pore. Their results reveal an inhibition mechanism that differs from those described for other Kv channel inhibitors.

      Strengths

      The study identifies a distinctive inhibitory mode for RY785, which binds along the channel walls in the open-state structure while still permitting a reduced level of K⁺ conduction. In addition, the authors propose a long-range allosteric coupling between RY785 binding in the central pore and changes in the structural dynamics of Kv2.1. Overall, this is a well-organized and carefully executed study, employing robust simulation and analysis methodologies. The work provides novel mechanistic insights into voltage-gated potassium channel inhibition and may offer useful guidance for future structure-based drug design efforts.

      Weaknesses:

      As noted in the Discussion, this study focuses primarily on the major binding site within the central pore and was not designed to systematically assess other potential allosteric binding sites for RY785. A more comprehensive structural and biophysical evaluation of possible additional binding sites would be a valuable direction for future investigations.

      Comments on revisions:

      The authors have addressed my comments.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors were seeking to identify a molecular mechanism whereby the small molecule RY785 selectively inhibits Kv2.1 channels. Specifically, the authors sought to explain some of the functional differences that RY785 exhibits in experimental electrophysiology experiments as compared to other Kv inhibitors, namely the charged and non-specific inhibitor tetraethylammonium (TEA). The authors used a recently published cryo-EM Kv2.1 channel structure in the open activated state and performed a series of multi-microsecond-long all-atom molecular dynamics simulations to study Kv2.1 channel conduction under the applied membrane voltage with and without RY785 or TEA present. They observed that while TEA directly blocks K+ permeation by occluding ion permeation pathway, RY785 binds to multiple non-polar residues near the hydrophobic gate of the channel driving it to a semi-closed non-conductive state. They confirmed this mechanism using an additional set of simulations and used it to explain experimental electrophysiology data,

      Strengths:

      The total length of simulation time is impressive, totaling many tens of microseconds. The authors develop their own forcefield parameters for the RY785 molecule based on extensive QM based parameterization. The computed permeation rate of K+ ions through the channel observed under applied voltage conditions is in reasonable agreement with experimental estimates of the single channel conductance. The authors have performed extensive simulations with the apo channel as well as both TEA and RY785. The simulations with TEA reasonably demonstrate that TEA directly blocks K+ permeation by binding in the center of the Kv2.1 channel cavity, preventing K+ ions from reaching the SCav site. The authors conclude that RY785 likely stabilizes a partially closed conformation of the Kv2.1 channel and thereby inhibits K+ current. This conclusion is plausible given that RY785 makes stable contacts with multiple hydrophobic residues in the S6 helix, which they can also validate using a recently published closed-state Kv2.1 channel cryo-EM structure. This further provides a possible mechanism for the experimental observations that RY785 speeds up the deactivation kinetics of Kv2 channels from a previous experimental electrophysiology study.

      Weaknesses:

      The authors, however, did not directly observe this semi-closed channel conformation and in fact acknowledge that more direct simulation evidence would require extensive enhanced-sampling simulations beyond the scope of this study. They have not estimated the effect of RY785 binding on the protein-based hydrophobic pore constriction, which may further substantiate their proposed mechanism. And while the authors quantified K+ permeation, they have not made any estimates of the ligand binding affinities or rates, which could have been potentially compared to experiment and used to validate their models.

      However, despite those relatively minor weaknesses, the conclusions of the study are convincing, and overall this is a solid study helping us to understand two distinct molecular mechanisms of the voltage-gated potassium channel Kv2.1 inhibition by TEA and RY785, respectively.

      Reviewer #2 (Public review):

      Summary

      In this manuscript, Zhang et al. investigate the conduction and inhibition mechanisms of the Kv2.1 channel, with a particular focus on the distinct effects of TEA and RY785 on Kv2 potassium channels. Using microsecond-scale molecular dynamics simulations, the authors characterize K⁺ ion permeation and RY785-mediated inhibition within the central pore. Their results reveal an inhibition mechanism that differs from those described for other Kv channel inhibitors.

      Strengths

      The study identifies a distinctive inhibitory mode for RY785, which binds along the channel walls in the open-state structure while still permitting a reduced level of K⁺ conduction. In addition, the authors propose a long-range allosteric coupling between RY785 binding in the central pore and changes in the structural dynamics of Kv2.1. Overall, this is a well-organized and carefully executed study, employing robust simulation and analysis methodologies. The work provides novel mechanistic insights into voltage-gated potassium channel inhibition and may offer useful guidance for future structure-based drug design efforts.

      Weaknesses:

      The study needs to consider the possibility of multiple binding sites for PY785, particularly given its impact on voltage sensors and gating currents. Specifically, the potential for allosteric binding sites in the voltage-sensing domain (VSD) should be assessed, as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019). Increasing structural and functional evidence supports the existence of multiple ligand-binding modes in voltage-gated ion channels. For example, polyunsaturated fatty acids have been shown to bind to KCNQ1 at both the voltage sensor domain and the pore domain (https://doi.org/10.1085/jgp.202012850). Similarly, cannabidiol has been structurally resolved in Nav1.7 at two distinct sites, one in a fenestration and another near the IFM-binding pocket (https://doi.org/10.1038/s41467-023-39307-6). These advances illustrate that ligand effects cannot always be interpreted based solely on a single binding site identified previously.

      Reviewing Editor: 

      The comments of the reviewers seem thoughtful and constructive. The weaknesses noted in reviews mainly concern mismatch between expectations, created by reading the Abstract, and data in the manuscript. The mismatch could be reconciled by either new simulations examining a semi-open state of the gate and additional RY785 binding sites, or by adjusting wording of the Abstract and Discussion to make it more clear that such simulations were not done. 

      The Abstract and Discussion have been revised to make clear the computer-simulations presented in our study were designed to specifically validate or refute the hypothesis that RY785 is recognized by the pore domain, not the voltage sensors. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors): 

      The authors addressed all the major issues in the original submission identified by the reviewers. I noticed a few minor issues, listed below, which can potentially fix small errors and further improve the readability of the manuscript. 

      p.3 tetramethyl-ammonium -> tetraethylammonium 

      p.7 "Snapshot of the final snapshot" -> "Snapshot of the final simulation coordinates" 

      p. 8 "sigma value" - please spell out what it is. 

      p. 9 "one or other subunit of the tetramer" -> "one or another subunit of the tetramer" or "one or more subunits of the tetramer" 

      p 15 "(the net charge of these constructs is thus zero)." -> ""(the net charge of these constructs is zero for these systems)." Please note that using ionizable amino acid residues in their default protonation state does not guarantee net zero charge of the system since the number of cationic and anionic residues is generally not the same. 

      p. 15 "Two K+ ions were initially positioned in the selectivity filter, one coordinated by residues 373..." Please indicate at which ion binding sites S_1, S_2, e.g. K+ were located and what the residue names are . 

      SI Figs. S3-S20. Please indicate in the figure captions that all those data are for RY785 

      SI Fig. S22 and SI Table S1 captions "shown in Fig. S20" -> "shown in Fig. S21" 

      We thank the Reviewer for this thorough proofreading. We have made the necessary corrections. 

      Reviewer #2 (Recommendations for the authors): 

      The authors have addressed most of my comments satisfactorily, with the exception of the first point. Below, I provide further clarification regarding my concern. 

      First, it appears that the authors may have misunderstood what is meant by the possibility of multiple binding sites for RY785. This does not imply that the central pore is excluded as a binding site. Rather, it refers to the possibility that, in addition to a pore-domain site, the ligand may interact with additional binding sites, either simultaneously or in a statedependent manner. Increasing structural and functional evidence supports the existence of multiple ligand-binding modes in voltage-gated ion channels. For example, polyunsaturated fatty acids have been shown to bind to KCNQ1 at both the voltage sensor domain and the pore domain (https://doi.org/10.1085/jgp.202012850). Similarly, cannabidiol has been structurally resolved in Nav1.7 at two distinct sites, one in a fenestration and another near the IFM-binding pocket (https://doi.org/10.1038/s41467-02339307-6). These advances illustrate that ligand ecects cannot always be interpreted based solely on a single binding site identified previously. Therefore, even if one assumes that there is no precedent for a small-molecule inhibitor that simultaneously acts on both the voltage sensor and pore domain, this does not exclude the possibility that a ligand may bind to both regions in dicerent functional states.  

      The Reviewer’s opinion came across clearly in the previous version. We however disagree that a computational investigation of the possibility that RY785 binds to the voltagesensors is well-advised at this point, given that the model we propose seemingly ocers a rationale for the inhibitory ecects observed experimentally. Our opinion is also that there is no compelling precedent for the mechanism of inhibition envisaged by the Reviewer – and would argue that neither of the two studies referenced above are compelling examples.  As we stated in our previous response to the Reviewer, we believe that the logical next step in this research will be to validate or refute the computational prediction we have put forward, experimentally. 

      In addition, the present computational study does not provide direct mechanistic evidence to explain the statement that RY785 accelerates voltage-sensor deactivation. Specifically, no simulations were performed to model pore-domain closure or voltage-sensor motion upon RY785 binding. Moreover, alternative binding sites were neither explored nor explicitly excluded, as the simulations only involved placing a single molecule of TEA or RY785 approximately 10 Å below the cytoplasmic gate. Under these conditions, conclusions regarding ecects on voltage-sensor dynamics remain speculative. 

      That is a fair characterization. 

      These concerns do not detract from the overall quality of this otherwise strong computational study. There are several straightforward ways to address this issue. For example: 

      (1) Perform molecular docking or related screening approaches to evaluate potential ligand-binding sites beyond the central pore, particularly in regions proximal to the voltage sensor. This should not impose a substantial additional computational burden for a computational chemistry group. 

      (2) Revise the abstract and discussion to clarify that the current work focuses exclusively on pore-domain binding and does not explore possible additional binding sites near the voltage sensor. Explicitly stating this limitation would help prevent potential overinterpretation by readers.

      We have opted for (2), as noted above.

    1. eLife Assessment

      In this manuscript, based on electron microscopy observations of C. elegans embryos, the authors make the bold claim that the plasma membrane ruptures during cell division and that closure of this opening by membrane extension contributes to cytokinesis. Although the findings are potentially valuable, the evidence in support of the authors' claims is inadequate.

    2. Reviewer #1 (Public review):

      Summary:

      Using electron microscopy, the authors report discontinuities in the plasma membrane of C. elegans embryos. They associate these discontinuities with cell division and speculate that membrane rupture and subsequent resealing contribute to cytokinesis. They further discuss the proximity of these sites to vesicles and propose a role for vesicle-mediated membrane extension.

      Weaknesses:

      (1) The possibility that the membrane discontinuity is an artifact

      Although the authors focus on discontinuities in the plasma membrane, similar discontinuities are also observed in mitochondria, the nuclear envelope, and yolk granules. This raises concerns about whether the electron micrographs presented are suitable for assessing membrane continuity.

      Electron micrographs result from a lengthy sample preparation process, including high-pressure freezing, freeze substitution in acetone containing OsO4, gradual warming, uranyl acetate staining, resin embedding, and ultrathin sectioning. In general, lipids are soluble in acetone at temperatures above −30 {degree sign}C, and preservation of membrane structures relies heavily on efficient OsO4 fixation. Insufficient OsO4 treatment would be expected to reduce membrane contrast.

      C. elegans embryos are encapsulated by an eggshell that forms at fertilization and gradually develops during the first few cell divisions. It is unclear how efficiently OsO4 in acetone penetrates the eggshell during freeze substitution, raising further concern about plasma membrane preservation under the conditions used.

      (2) Lack of evidence linking membrane discontinuity to cell division

      The reported plasma membrane discontinuities are not specific to mitotic cells. If this were a physiological process playing an important role in cytokinesis, it should occur in a temporally and spatially coordinated manner with nuclear division. However, it remains unclear at what stage of the cell cycle the membrane rupture occurs and where it is located relative to chromosomes and the mitotic spindle.

      (3) Lack of evidence for extension of the separated membrane

      Although the authors speculate that resealing of the ruptured membrane occurs via extension of the separated membrane, no direct evidence supporting this mechanism is presented. Proximity to vesicles alone does not demonstrate that membrane extension occurs through vesicle fusion. More direct evidence is required to support this claim.

      (4) Inconsistency with published work

      Numerous studies have examined cell division in developing C. elegans embryos using the GFP::PH(PLC1δ1) marker expressed from the ltIs38 transgene [pAA1; pie-1::GFP::PH(PLC1δ1) + unc-119(+)], generated by the Oegema lab (https://wormbase.org/species/c_elegans/transgene/WBTransgene00000911#01--10 ). To date, no study has reported membrane ruptures of the magnitude described here. The complexity of cell surface morphology from the 8- to 12-cell stages onward has been well documented, for example, by Fu et al. (2016) using light-sheet microscopy and 3D reconstruction (doi:10.1038/ncomms11088).

      Supplementary Movies 5, 6, and 10 of this paper illustrate how single-plane images can easily produce apparent membrane discontinuities, for example, due to membrane orientations nearly parallel to the imaging plane.

      The three single-plane images from only three embryos presented in Figure 6 are insufficient to support the authors' strong conclusions. Raw 3D data should be provided.

    3. Reviewer #2 (Public review):

      Summary:

      Liang et al. explore an unusual observation of membrane discontinuities in dividing C. elegans embryonic cells. This report is the first to demonstrate that, instead of the classical invagination of membranes during cytokinesis, cells in the early embryos of C. elegans exhibit separation of sister membranes that extend independently. TEM images of high-pressure-frozen samples provide strong evidence for the presence of Membrane Openings (MOs) in cells at various stages of the cell cycle, predominantly during mitosis. High-resolution images (x 30,000) clearly show the wrinkled plasma membrane and smooth MOs.<br /> The electron microscopy data are supported by the live cell imaging of strains with fluorescently tagged membrane markers. This study opens up the possibility of tracking MOs at other stages of C. elegans development, and also asks if it might be a common phenomenon in other species that exhibit rapid embryonic growth and divisions.

      Strengths:

      (1) Thorough verification of Membrane Openings (MO) by several methods:

      (a) 4 independent sample batches.

      (b) Examined historical collections.

      (c) Analysed embryos at different stages of development. The absence of MOs in later stages (comma) serves as a negative control and gives confidence that MOs are genuine and not technical artifacts.

      (2) Live cell imaging of strain with fluorescently labelled membranes provides real-time dynamics of membrane rupture.

      (3) After observing the membrane rupture, the next obvious question is - what prevents the cytosol from leaking out? The EM images showing PBL and PEL - extracellular matrix serving as barriers for the cytosol are convincing.

      Weakness:

      (1) The association of membrane discontinuities with cell division is not convincing, as there are 159 cells out of 425 showing MOs, but it is not mentioned clearly how many of these are undergoing cell division. Also, it's not clear whether the 20 dividing cells analysed for MOs are a part of the 159 cells or a separate dataset. A graphical representation of the number of samples and observed frequencies would be helpful to understand the data collection workflow.

      (2) In Figures 3A and 3B, the resolution of the images is not enough to verify 3A as classical membrane invagination and 3B as detached sister membranes.

      (3) Figure 6 lacks controls. How does the classical invagination look in this strain? Also, adding nuclear dye would be informative, in order to correlate the nuclear division with membrane rupture, as claimed.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors challenge a dogma in cell biology, namely that cells are at any time point engulfed by a continuous plasma membrane. Liang et al. find that during C elegans embryogenesis, a high number of cells are not entirely surrounded by a plasma membrane but show membrane openings (MOs). These openings are enriched at the embryo's periphery, towards the eggshell. The authors propose that plasma membrane discontinuities emerge during metaphase of mitosis and that independent extension of "sister membranes" engulfs the daughter cells.

      Strengths:

      On the positive side, the authors find plasma membrane discontinuities not only by electron microscopy but also by fluorescence microscopy and provide information about the dynamics of membrane openings and their emergence. While this is assuring, the authors conclude that MOs emerge during metaphase. From what the authors show, this particular information cannot be deduced, as there is no dynamic capture of a membrane scission event together with a chromatin marker that would indicate mitosis. The authors could, however, attempt to find such events in live movies, given the high incidence of MOs reported from their EM data.

      Weaknesses:

      In order to convincingly demonstrate the absence of any plasma membrane in the respective regions of the embryonic periphery or between cells of the embryo, the authors would have to show consecutive serial TEM sections where MOs are detected over more z-planes, beyond the mere 3D reconstructions. Although the authors state in the methods section that continuous ultrathin sections were cut for the metaphase sample (page 21, line 472), consecutive sections are never shown in TEM. While we do see the 3D reconstructions, better documentation of the underlying TEM data is missing. It would be necessary to show a membrane opening in consecutive z sections. Alternatively, the authors could seek the possibility to convincingly back up their claims with volume imaging by focused ion beam scanning EM (FIBSEM), where cellular volumes can be sectioned in almost isotropic resolution.

      Another critical issue concerns the detection of the membrane discontinuities in electron micrographs, which, in my opinion, is ambiguous. How do the authors reliably discriminate in their TEM images whether there is a plasma membrane or not? The absence - or weak appearance - of the stain of the electron dense material at membranes, which seems to be their criterion for MOs, is also apparent at other, intracellular membranes, like at the NE or at the ER (for example, see Figure 1C). Also, the plasma membrane itself appears unevenly stained in regions that the authors delineate as intact (for example, Figure 1C, 2B/1).

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using electron microscopy, the authors report discontinuities in the plasma membrane of C. elegans embryos. They associate these discontinuities with cell division and speculate that membrane rupture and subsequent resealing contribute to cytokinesis. They further discuss the proximity of these sites to vesicles and propose a role for vesicle-mediated membrane extension. 

      Weaknesses:

      (1) The possibility that the membrane discontinuity is an artifact

      Although the authors focus on discontinuities in the plasma membrane, similar discontinuities are also observed in mitochondria, the nuclear envelope, and yolk granules. This raises concerns about whether the electron micrographs presented are suitable for assessing membrane continuity.

      Electron micrographs result from a lengthy sample preparation process, including high-pressure freezing, freeze substitution in acetone containing OsO4, gradual warming, uranyl acetate staining, resin embedding, and ultrathin sectioning. In general, lipids are soluble in acetone at temperatures above −30 {degree sign}C, and preservation of membrane structures relies heavily on efficient OsO4 fixation.

      Insufficient OsO4 treatment would be expected to reduce membrane contrast.

      C. elegans embryos are encapsulated by an eggshell that forms at fertilization and gradually develops during the first few cell divisions. It is unclear how efficiently OsO4 in acetone penetrates the eggshell during freeze substitution, raising further concern about plasma membrane preservation under the conditions used.

      We thank the reviewer for raising this important technical concern. We have taken this question seriously since first observing membrane discontinuities six years ago, and we have since conducted extensive controls to rule out fixation artifacts. Below, we present multiple lines of evidence—ranging from technical reproducibility to orthogonal imaging approaches—that collectively demonstrate the biological reality of these structures.

      (1) Technical expertise and standard protocols

      Our laboratory has extensive experience with electron microscopy across diverse biological systems, including neurons, muscle cells, and hypodermis in C. elegans, as well as tissues from Drosophila, mouse, bacteria, and cultured cells (Chen et al., 2013; Ding et al., 2018; Guan et al., 2022; Y. Li et al., 2018; Miao et al., 2024; Qin et al., 2014; Wang et al., 2026; J. Xu et al., 2022; M. Xu et al., 2021; L. Yang et al., 2020; X. Yang et al., 2019; Zhu et al., 2022). Importantly, we did not introduce any novel or unconventional steps in our EM preparation; all protocols were standard and well-established. Thus, the observed membrane discontinuities are unlikely to stem from technical inexperience or idiosyncratic methods.

      In addition to membrane discontinuities, we would like to emphasize that a large number of single plasma membranes separating adjacent cytoplasmic domains were also detected under EM (Figure 1, 3 and 4, for instance). This observation is particularly significant because the invagination model cannot generate single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes could explain the formation of cytoplasm-enclosed membranes. Furthermore, as the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, this indicates successful EM processing and argues against inefficient fixation or other technical issues.

      (2) Reproducibility across independent preparations and techniques

      To test whether the discontinuities were preparation-specific, we examined four independent sample batches collected in the lab over the years. Membrane discontinuities, as well as cytoplasm-immersed membranes, on embryonic cells were consistently observed across all batches, indicating that the phenomenon is not dependent on a single preparation method. Furthermore, we validated our findings using two EM techniques: transmission electron microscopy (HPF-TEM) and dualbeam scanning electron microscopy (SEM). Membrane discontinuities were clearly identifiable with both techniques, further supporting their robustness.

      (3) Validation using an independent public dataset

      We examined the publicly available C. elegans embryo EM collection (WormAtlas). In several instances, particularly at the embryonic periphery where plasma membrane discontinuities are more readily visualized (https://www.wormimage.org/image.php?id=140265&page=1), we identified similar structures. The presence of these features in an independent dataset generated by different researchers confirms that they are not artifacts unique to our sample preparation.

      (4) Developmental regulation of membrane discontinuities

      We analyzed embryos across multiple developmental stages. Membrane discontinuities were observed in both intrauterine and laid embryos at early stages. However, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity argues strongly against a general fixation artifact, which would be expected to occur randomly across stages. Additionally, the eggshell is present throughout the embryonic stage of C. elegans; therefore, the dramatic reduction of membrane discontinuities in comma-stage of embryo argues against the possibility that the eggshell poses a fixation problem.

      (5) Rigorous criteria for identifying membrane discontinuities

      To ensure unbiased analysis, we systematically collected images from early embryonic cells using the following criteria:

      (1) Random section selection: For each sample, we randomly selected one section containing the largest number of embryos or cells (Sup Figure 2) for initial analysis. We found membrane discontinuities in 159 cells distributed across 57 embryos, representing 95% of the total sampled embryos This portion of the data is summarized in Figure 1.

      (2) Whole-membrane examination: Each putative membrane discontinuity was identified only after examining the entire plasma membrane of the cell on a given section. Importantly, aside from the discontinuity, the remainder of the plasma membrane remained intact. Moreover, in most cells, only a single discontinuity was present per section, arguing against random, widespread membrane tearing during preparation.

      (3) Neighboring section verification: Because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Again, the same membrane discontinuity was confirmed only after inspecting the entire plasma membrane on those neighboring sections as well. We will include this verification protocol in the revised Methods and additional imaging of consecutive sections would be provided if needed.

      (4) Serial section reconstruction: To further determine whether a dividing cell indeed contains one membrane rupture, we performed two serial reconstruction experiments.

      First, we used HPF-TEM to analyze 105 consecutive sections of a metaphase cell, reconstructing the entire plasma membrane and chromosome configuration. We found that one membrane rupture largely encircled the chromosomal disc (Figure 2 and Video S1), spatially aligning with the future segregation zone. Second, we used AutoCUTS-SEM to collect approximately 600 sections covering ~95% of a telophase cell containing three nuclei sharing a common cytoplasm. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site. These three ruptures converged to form a Y-shaped exposed cytoplasmic region spanning >351 sections (Figure 5). Collectively, these reconstructions demonstrate that each cell contains only one discontinuity from a 3D point of view, further supporting that the phenomenon is not due to random sample preparation damage.

      (6) Orthogonal validation by live imaging: In addition to EM, we performed live imaging of plasma membrane dynamics. While live imaging provides important temporal context, we recognize its limitations in resolving membrane ultrastructure. The rapid kinetics of membrane extension (approximately 20–30 seconds for metaphase and less than 3 minutes for cytokinesis), combined with embryo motility, introduces spatiotemporal ambiguities. To capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, nevertheless, both membrane ruptures and free-ended sister membrane structures could be detected (Figures 6), providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos. Notably, 3D membrane dynamics analysis using light-sheet microscopy (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) revealed membrane ruptures in dividing early C. elegans embryonic cells, including during telophase or metaphase. Therefore, live imaging further validates the membrane rupture phenomena in dividing embryonic cells in C. elegans

      While future advances in imaging technology may enable real-time visualization at near-EM resolution, our extensive, multi-year effort to test the artifact hypothesis has convinced us that these membrane discontinuities are genuine biological features of dividing C. elegans embryonic cells.

      We are confident that the cumulative evidence presented here addresses the reviewer's concerns and demonstrates that the observed membrane discontinuities, as well as cytoplasm-immersed membranes, are not procedural artifacts but rather reflect a previously underappreciated aspect of plasma membrane dynamics during embryonic cell division.

      (2) Lack of evidence linking membrane discontinuity to cell division 

      The reported plasma membrane discontinuities are not specific to mitotic cells. If this were a physiological process playing an important role in cytokinesis, it should occur in a temporally and spatially coordinated manner with nuclear division. However, it remains unclear at what stage of the cell cycle the membrane rupture occurs and where it is located relative to chromosomes and the mitotic spindle.

      Thank you for this insightful comment. We agree that establishing a direct link between plasma membrane discontinuities and mitotic progression is critical, and we appreciate the opportunity to clarify this point.

      In C. elegans embryos, the early stages of development are characterized by rapid and extensive cell division. Within approximately 100 minutes, a two-cell embryo develops into an embryo containing nearly 30 cells. The majority of the electron microscopy analyses in our study were performed on embryos at stages with fewer than 30 cells, where most cells are actively dividing. Thus, it is reasonable to infer that the cells exhibiting membrane discontinuities are predominantly mitotic cells.

      Supporting this notion, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of membrane discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity strongly suggests that membrane discontinuities are tightly linked to cell division.

      Importantly, mitotic features such as metaphase chromosomes aligned at the equatorial plane or two (or more) nuclei sharing common cytoplasm can be identified in EM images. In our single random EM section analysis, we captured membrane discontinuities in cells at metaphase, anaphase (characterized by fewer than 10 chromosomal clumps), and telophase (defined by two nuclei sharing cytoplasm). Hence, membrane discontinuities are indeed present on mitotic cells. In addition, a published work by Fu et al (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) using light-sheet microscopy captured similar membrane discontinuities in cells displaying classical mitotic features, including anaphase or telophase.

      To further investigate the spatial relationship between membrane ruptures and chromosome organization, we performed three-dimensional reconstructions on a metaphase cell. As shown in Figure 2 and Video S1, the membrane discontinuities largely encircled the condensed chromosome disc and were spatially aligned with the future segregation zone, further revealing the relative location of membrane discontinuities to chromosomes, at least at metaphase.

      We further collected 3D information for a telophase cell containing three nuclei. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site that merged to form a single rupture. The observation that membrane ruptures are present in a tri-nucleated cell is particularly informative. The tri-nucleated feature indicates that this cell underwent two rounds of cell division and that both divisions were at telophase. The presence of a single membrane rupture suggests that membrane discontinuities may persist throughout the cell cycle, as the second cell cycle began from a mother cell that still shared cytoplasm with its sister cell and already had one membrane rupture. Therefore, in addition to the mitotic phase, membrane discontinuities—at least in this context—also exist during the DNA synthesis stage.

      (3) Lack of evidence for extension of the separated membrane 

      Although the authors speculate that resealing of the ruptured membrane occurs via extension of the separated membrane, no direct evidence supporting this mechanism is presented. Proximity to vesicles alone does not demonstrate that membrane extension occurs through vesicle fusion. More direct evidence is required to support this claim.

      Thank you for raising this important point. We appreciate the opportunity to clarify our conclusion.

      In our study, EM analysis revealed the presence of cellular vesicles in close proximity to both free membrane edges and the already separated sister plasma membranes (Figure 4). However, we acknowledge that without advanced live-cell imaging, it is not possible to conclusively determine whether the extension of these separated sister membranes occurs through vesicle fusion.

      We realize that a statement in the Discussion section—“The expansion of the plasma membrane is generally driven by vesicle fusion”(page 16)—may have inadvertently led the reviewer to interpret this as our own conclusion regarding the mechanism of membrane extension in this context. In fact, that statement was intended to reflect the current general understanding of membrane expansion, not to imply that we had demonstrated such a mechanism for the free-ended sister membranes. As we subsequently noted, “However, this remains speculative and requires further experimental validation.”

      To avoid any misunderstanding, we will revise this section to clearly state that the mechanism by which the separated sister membranes extend remains unknown and that further investigation is needed to determine how existing models of membrane expansion may apply to or be adapted for this novel context.

      We thank the reviewer again for their thoughtful comment, which has helped us improve the clarity of our manuscript

      (4) Inconsistency with published work

      Numerous studies have examined cell division in developing C. elegans embryos using the GFP::PH(PLC1δ1) marker expressed from the ltIs38 transgene [pAA1; pie-1::GFP::PH(PLC1δ1) + unc-119(+)], generated by the Oegema lab (https://wormbase.org/species/c_elegans/transgene/WBTransgene00000911#01--10 ). To date, no study has reported membrane ruptures of the magnitude described here. The complexity of cell surface morphology from the 8- to 12-cell stages onward has been well documented, for example, by Fu et al. (2016) using light-sheet microscopy and 3D reconstruction (doi:10.1038/ncomms11088).

      Supplementary Movies 5, 6, and 10 of this paper illustrate how single-plane images can easily produce apparent membrane discontinuities, for example, due to membrane orientations nearly parallel to the imaging plane.

      The three single-plane images from only three embryos presented in Figure 6 are insufficient to support the authors' strong conclusions. Raw 3D data should be provided.

      Thank you for this important comment. We fully agree that the GFP::PH(PLC1δ1) marker, generated by the Oegema lab, has been widely and effectively used to study various aspects of C. elegans embryonic development. In fact, we also employed this same marker in our study to assess membrane integrity.

      However, while live imaging provides invaluable temporal resolution, its limitations in resolving membrane ultrastructure are substantial. In C. elegans embryos, early development is marked by rapid and extensive cell divisions. Within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. During this fast-dividing stage, the rapid kinetics of membrane extension—approximately 20–30 seconds during metaphase and less than 3 minutes during cytokinesis— combined with embryo motility, introduce considerable spatiotemporal ambiguities. Furthermore, the longstanding invagination model of cytokinesis has shaped interpretations in the field, which may have led to ambiguous structures such as free-ended extensions being dismissed as potential artifacts rather than recognized as alternative morphological features. Theoretical and computational models have largely been built upon invagination-centric assumptions, which may have further constrained conceptual frameworks. Therefore, fluorescence protein-based live imaging analysis alone could not serve as a convincing approach to challenge the current dogma of cell division, nor did we intend it to.

      However, when reexamined in light of our findings, previous studies using this same GFP marker have in fact revealed membrane discontinuities that went unnoticed. For example, Fu et al (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) using light-sheet microscopy and 3D reconstruction, captured membrane discontinuities in cells undergoing mitotic phases such as anaphase or telophase. Similarly, an earlier study by Harrell and Goldstein (Harrell and Goldstein. 2011. Internalization of multiple cells during C. elegans gastrulation depends on common cytoskeletal mechanisms but different cell polarity and cell fate regulators. Developmental Biology. DOI:10.1016/j.ydbio.2010.09.012) showed regions where the GFP::PH signal appeared fuzzy and discontinuous.

      Nevertheless, given the inherent limitations of fluorescence microscopy in resolving membrane ultrastructure, high-resolution electron microscopy—supported by rigorous controls and serial section analysis—remains the gold standard for definitively identifying such membrane discontinuities.

      We acknowledge that our findings are surprising. We did not set out to challenge the long-held view of membrane integrity during cell division. In fact, this study began when our dedicated EM technician, Jingjing Liang, first observed membrane discontinuity phenomena in control samples—wild-type embryos. Had she not come across this observation, we likely would never have pursued this line of inquiry.

      We appreciate the opportunity to clarify these points and thank the reviewer for thoughtful engagement with our work.

      Reviewer #2 (Public review):

      Summary:

      Liang et al. explore an unusual observation of membrane discontinuities in dividing C. elegans embryonic cells. This report is the first to demonstrate that, instead of the classical invagination of membranes during cytokinesis, cells in the early embryos of C. elegans exhibit separation of sister membranes that extend independently. TEM images of high-pressure-frozen samples provide strong evidence for the presence of Membrane Openings (MOs) in cells at various stages of the cell cycle, predominantly during mitosis. High-resolution images (x 30,000) clearly show the wrinkled plasma membrane and smooth MOs.

      The electron microscopy data are supported by the live cell imaging of strains with fluorescently tagged membrane markers. This study opens up the possibility of tracking MOs at other stages of C. elegans development, and also asks if it might be a common phenomenon in other species that exhibit rapid embryonic growth and divisions. 

      Strengths:

      (1) Thorough verification of Membrane Openings (MO) by several methods: 

      (a) 4 independent sample batches.

      (b) Examined historical collections.

      (c) Analysed embryos at different stages of development. The absence of MOs in later stages (comma) serves as a negative control and gives confidence that MOs are genuine and not technical artifacts. 

      (2) Live cell imaging of strain with fluorescently labelled membranes provides realtime dynamics of membrane rupture.

      (3) After observing the membrane rupture, the next obvious question is - what prevents the cytosol from leaking out? The EM images showing PBL and PEL - extracellular matrix serving as barriers for the cytosol are convincing.

      Thanks to the reviewer for the encouragement. Highly appreciated.

      Weakness:

      (1) The association of membrane discontinuities with cell division is not convincing, as there are 159 cells out of 425 showing MOs, but it is not mentioned clearly how many of these are undergoing cell division. Also, it's not clear whether the 20 dividing cells analysed for MOs are a part of the 159 cells or a separate dataset. A graphical representation of the number of samples and observed frequencies would be helpful to understand the data collection workflow.

      We sincerely thank the reviewer for raising this important question and appreciate the opportunity to clarify these points.

      (1) Relationship between membrane discontinuities and cell division

      In C. elegans embryos, early development is characterized by rapid and extensive cell division: within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. Most of our electron microscopy (EM) analyses were performed on embryos at stages with fewer than 30 cells, in which the majority of cells are actively dividing. Therefore, it is reasonable to infer that the cells exhibiting membrane discontinuities (MOs) are predominantly mitotic. Supporting this, as embryos reached the comma stage—when cell proliferation declines and elongation begins—the incidence of MOs dropped sharply (0/13, 0/17, and 0/30 cells examined. This developmental specificity strongly links MOs to cell division.

      Moreover, in single random EM sections, we observed MOs in cells displaying clear mitotic features, such as metaphase chromosomes aligned at the equatorial plate, or anaphase/telophase configurations (fewer than 10 chromosomal clumps or two nuclei sharing common cytoplasm). Thus, MOs are indeed present in mitotic cells.

      From our 3D reconstruction (Figure 5), we identified a telophase cell containing three nuclei, each enclosed by its own plasma membrane, with each membrane harboring a single rupture that converged into a single opening. This tri-nucleated configuration indicates that the cell had undergone two rounds of division and was at telophase in both. The presence of a single membrane rupture in this context suggests that MOs can persist beyond mitosis, as the second cell cycle initiated from a mother cell that already shared cytoplasm with its sister and already contained a rupture. Thus, in this case, MOs were also present during DNA synthesis stage.

      (2) Clarification of sample numbers and datasets

      In Figure 1, we present results from a single EM section per embryonic cell, with sections randomly selected per embryo as detailed in Sup Figure 2. This initial dataset (425 cells) forms the basis of Figure 1.

      From the same pool of 425 cells, we used additional EM sections—distinct from those shown in Sup Figure 2—to locate 20 dividing cells for analysis of membrane discontinuities. Thus, while these 20 cells originated from the same set of embryos, they were not derived from the sections used in Figure 1 or Sup Figure 2.

      A graphical summary of sample numbers from the single-section analysis is already provided in Figure 1. Notably, cells with two clearly visible nuclei are more likely to be sectioned through or near their maximal diameter. In contrast, the randomly selected sections used for Figure 1 captured cells at variable planes, reducing the likelihood of observing MOs. Consistent with this, in the three embryos where no MOs were detected (one example is Sup Figure 2N), the sections likely passed through peripheral regions of the cells. Consequently, the frequency of MOs in randomly sectioned cells (Figure 1) is not directly comparable to that observed in the 20 dividing cells, which were analyzed using sections more likely to capture cells near their maximal diameter. These 20 dividing cells should therefore be considered a separate analysis. We will add detailed explanations in the Methods section to ensure this distinction is clearly understood.

      We are grateful for the reviewer’s thoughtful feedback and believe these clarifications will improve the clarity and rigor of the manuscript.

      (2) In Figures 3A and 3B, the resolution of the images is not enough to verify 3A as classical membrane invagination and 3B as detached sister membranes.

      Thank you for your valuable comment. In the revised manuscript, we will provide additional images at higher magnification to better illustrate the classical membrane invagination in Figure 3A and the detached sister membranes in Figure 3B.

      (3) Figure 6 lacks controls. How does the classical invagination look in this strain? Also, adding nuclear dye would be informative, in order to correlate the nuclear division with membrane rupture, as claimed. 

      Thank you for this important comment. As we addressed how we correlated nuclear division with membrane rupture in response to weakness (1), below we will focus on how we may distinguish classical invagination from membrane rupture.

      While live imaging provides invaluable temporal resolution, its limitations in resolving membrane ultrastructure are substantial. In C. elegans embryos, early development is marked by rapid and extensive cell divisions. Within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. During this fast-dividing stage, the rapid kinetics of membrane extension—approximately 20–30 seconds during metaphase and less than 3 minutes during cytokinesis— combined with embryo motility, introduce considerable spatiotemporal ambiguities. Furthermore, the longstanding invagination model of cytokinesis has shaped interpretations in the field, which may have led to ambiguous structures such as free-ended extensions being dismissed as potential artifacts rather than recognized as alternative morphological features. Theoretical and computational models have largely been built upon invagination-centric assumptions, which may have further constrained conceptual frameworks. Therefore, fluorescence protein-based live imaging analysis alone could not serve as a convincing approach to challenge the current dogma of cell division, nor did we intend it to.

      However, when reexamined in light of our findings, previous studies using GFP::PH or similar markers have in fact revealed membrane discontinuities that went unnoticed. For example, using light-sheet microscopy and 3D reconstruction, Fu et al captured membrane discontinuities in cells undergoing division such as anaphase or telophase (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016.DOI:10.1038/ncomms11088)

      Similarly, an earlier study by Goldstein et al. (Harrell and Goldstein. 2011. Internalization of multiple cells during C. elegans gastrulation depends on common cytoskeletal mechanisms but different cell polarity and cell fate regulators. Developmental Biology. DOI:10.1016/j.ydbio.2010.09.012) showed regions where the GFP::PH signal appeared fuzzy and discontinuous.

      Here, to capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, both membrane ruptures and free-ended sister membrane structures (Figures 6) could be detected, providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos.

      However, given the inherent limitations of fluorescence microscopy in resolving membrane ultrastructure, high-resolution electron microscopy—supported by rigorous controls and serial section analysis—remains the gold standard for definitively distinguishing invagination from membrane discontinuities.

      While future advances in imaging technology may enable real-time visualization at near-EM resolution, our extensive, multi-year effort to test the artifact hypothesis has convinced us that these membrane discontinuities are genuine biological features of dividing C. elegans embryonic cells.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors challenge a dogma in cell biology, namely that cells are at any time point engulfed by a continuous plasma membrane. Liang et al. find that during C elegans embryogenesis, a high number of cells are not entirely surrounded by a plasma membrane but show membrane openings (MOs). These openings are enriched at the embryo's periphery, towards the eggshell. The authors propose that plasma membrane discontinuities emerge during metaphase of mitosis and that independent extension of "sister membranes" engulfs the daughter cells.

      Strengths:

      On the positive side, the authors find plasma membrane discontinuities not only by electron microscopy but also by fluorescence microscopy and provide information about the dynamics of membrane openings and their emergence. While this is assuring, the authors conclude that MOs emerge during metaphase. From what the authors show, this particular information cannot be deduced, as there is no dynamic capture of a membrane scission event together with a chromatin marker that would indicate mitosis. The authors could, however, attempt to find such events in live movies, given the high incidence of MOs reported from their EM data.

      Thanks to the reviewer for the encouragement. Highly appreciated.

      Weaknesses:

      In order to convincingly demonstrate the absence of any plasma membrane in the respective regions of the embryonic periphery or between cells of the embryo, the authors would have to show consecutive serial TEM sections where MOs are detected over more z-planes, beyond the mere 3D reconstructions. Although the authors state in the methods section that continuous ultrathin sections were cut for the metaphase sample (page 21, line 472), consecutive sections are never shown in TEM. While we do see the 3D reconstructions, better documentation of the underlying TEM data is missing. It would be necessary to show a membrane opening in consecutive z sections. Alternatively, the authors could seek the possibility to convincingly back up their claims with volume imaging by focused ion beam scanning EM (FIBSEM), where cellular volumes can be sectioned in almost isotropic resolution

      We Thank the reviewer for raising these important technical concerns. We have taken this question seriously since first observing membrane discontinuities six years ago, and we have since conducted extensive controls to rule out fixation artifacts.

      First of all, in addition to membrane discontinuities, we would like to highlight that a large number of single plasma membranes separating adjacent cytoplasmic domains were detected by EM (Figure 1, 3 and 4). This observation is particularly significant because the invagination model cannot account for the formation of single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes offers a plausible explanation for the generation of cytoplasm-immersed membranes. Furthermore, the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, indicating successful EM processing and arguing against potential issues such as inadequate fixation or other technical limitations.

      Second, we applied rigorous criteria for identifying membrane discontinuities:

      (1) To test whether the discontinuities were preparation specific, we examined four independent sample batches and validated our findings using two EM techniques: transmission electron microscopy (HPF-TEM) and dual-beam scanning electron microscopy (SEM).

      (2) We analyzed embryos across multiple developmental stages. Membrane discontinuities were observed in both intrauterine and laid embryos at early stages. However, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity argues strongly against a general fixation artifact, which would be expected to occur randomly across stages. Additionally, the eggshell is present throughout the embryonic stage of C. elegans; therefore, the dramatic reduction of membrane discontinuities in comma-stage of embryo argues against the possibility that the eggshell poses a fixation problem.

      (3) Each putative membrane discontinuity was identified only after examining the entire plasma membrane of the cell on a given section. Importantly, aside from the discontinuity, the remainder of the plasma membrane remained intact. Moreover, in most cells, only a single discontinuity was present per section, arguing against random, widespread membrane tearing during preparation. Because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Again, the same membrane discontinuity was confirmed only after inspecting the entire plasma membrane on those neighboring sections as well. We will include this verification protocol in the revised Methods and additional imaging of consecutive sections would be provided if needed.

      To further determine whether a dividing cell indeed contains one membrane rupture, we performed two serial reconstruction experiments using consecutive sections, as the reviewer suggested. First, we used HPF-TEM to analyze 105 consecutive sections of a metaphase cell, reconstructing the entire plasma membrane and chromosome configuration. We found that one membrane rupture largely encircled the chromosomal disc (Figure 2 and Video S1), spatially aligning with the future segregation zone. Second, we used AutoCUTS-SEM to collect approximately 600 sections covering ~95% of a telophase cell containing three nuclei sharing a common cytoplasm. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site. These three ruptures converged to form a Yshaped exposed cytoplasmic region spanning >351 sections (Figure 5). Collectively, these reconstructions demonstrate that each cell contains only one discontinuity from a 3D point of view, further supporting that the phenomenon is not due to random sample preparation damage.

      (4) In addition to EM, we performed live imaging of plasma membrane dynamics. While live imaging provides important temporal context, we recognize its limitations in resolving membrane ultrastructure. The rapid kinetics of membrane extension (approximately 20–30 seconds for metaphase and less than 3 minutes for cytokinesis), combined with embryo motility, introduces spatiotemporal ambiguities. To capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, nevertheless, both putative membrane ruptures (Figure 6A) and free-ended sister membrane structures could be detected (Figures 6B and 6C), providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos. Notably, 3D membrane dynamics analysis using light-sheet microscopy (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088). revealed membrane ruptures in dividing early C. elegans embryonic cells, including during telophase and metaphase. Therefore, live imaging further validates the membrane rupture phenomena in dividing embryonic cells in C. elegans

      We are confident that the cumulative evidence presented here addresses the reviewer's concerns and demonstrates that the observed membrane discontinuities, as well as cytoplasm-immersed membranes, are not procedural artifacts but rather reflect a previously underappreciated aspect of plasma membrane dynamics during embryonic cell division.

      Another critical issue concerns the detection of the membrane discontinuities in electron micrographs, which, in my opinion, is ambiguous. How do the authors reliably discriminate in their TEM images whether there is a plasma membrane or not? The absence - or weak appearance - of the stain of the electron dense material at membranes, which seems to be their criterion for MOs, is also apparent at other, intracellular membranes, like at the NE or at the ER (for example, see Figure 1C). Also, the plasma membrane itself appears unevenly stained in regions that the authors delineate as intact (for example, Figure 1C, 2B/1).

      We thank the reviewer for raising this important concern.

      First, our laboratory has extensive experience with electron microscopy across diverse biological systems, including neurons, muscle cells, and hypodermis in C. elegans, as well as tissues from Drosophila, mouse, bacteria, and cultured cells (Chen et al., 2013; Ding et al., 2018; Guan et al., 2022; Y. Li et al., 2018; Miao et al., 2024; Qin et al., 2014; Wang et al., 2026; J. Xu et al., 2022; M. Xu et al., 2021; L. Yang et al., 2020; X. Yang et al., 2019; Zhu et al., 2022). Importantly, we did not introduce any novel or unconventional steps in our EM preparation; all protocols were standard and well established. Thus, the observed membrane discontinuities are unlikely to result from technical inexperience or idiosyncratic methods.

      Second, because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Specifically, a membrane discontinuity was confirmed only after inspecting the entirety of the plasma membrane in neighboring sections. We will include this verification protocol in the revised Methods section, and additional images of consecutive sections can be provided if needed.

      Third, in addition to membrane discontinuities, a large number of single plasma membranes separating adjacent cytoplasmic domains were detected by EM (Figure 1, 3 and 4). This observation is particularly significant because the invagination model cannot account for the formation of single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes offers a plausible explanation for the generation of cytoplasm-immersed membranes. Furthermore, the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, indicating successful EM processing and arguing against potential issues such as inadequate fixation or other technical limitations.

      EM-related publications by Jingjing Liang:

      Chen D, Jian Y, Liu X, Zhang Y, Liang J, Qi X, Du H, Zou W, Chen L, Chai Y, Ou G, Miao L, Wang Y, Yang C. 2013. Clathrin and AP2 Are Required for Phagocytic Receptor-Mediated Apoptotic Cell Clearance in Caenorhabditis elegans. PLoS Genetics 9:e1003517. DOI: https://doi.org/10.1371/journal.pgen.1003517

      Ding L, Yang X, Tian H, Liang J, Zhang F, Wang G, Wang Y, Ding M, Shui G, Huang X. 2018. Seipin regulates lipid homeostasis by ensuring calcium‐dependent mitochondrial metabolism. The EMBO Journal 37:e97572. DOI: https://doi.org/10.15252/embj.201797572

      Guan L, Yang Y, Liang J, Miao Y, Shang A, Wang B, Wang Y, Ding M. 2022. ERGIC2 and ERGIC3 regulate the ER‐to‐Golgi transport of gap junction proteins in metazoans. Traffic 23:140–157. DOI: https://doi.org/10.1111/tra.12830

      Li Y, Zhang Y, Gan Q, Xu M, Ding X, Tang G, Liang J, Liu K, Liu X, Wang X, Guo L, Gao Z, Hao X, Yang C. 2018. C . elegans -based screen identifies lysosome-damaging alkaloids that induce STAT3-dependent lysosomal cell death. Protein & Cell 9:1013–1026. DOI: https://doi.org/10.1007/s13238-018-0520-0

      Miao Y, Du Y, Wang B, Liang J, Liang Y, Dang S, Liu J, Li D, He K, Ding M. 2024. Spatiotemporal recruitment of the ubiquitin-specific protease USP8 directs endosome maturation. eLife 13:RP96353. DOI: https://doi.org/10.7554/eLife.96353

      Qin J, Liang J, Ding M. 2014. Perlecan Antagonizes Collagen IV and ADAMTS9/GON-1 in Restricting the Growth of Presynaptic Boutons. Journal of Neuroscience 34:10311–10324. DOI: https://doi.org/10.1523/JNEUROSCI.5128-13.2014

      Wang Z, Zhang L, Zhou B, Liang J, Tian Y, Jiang Z, Tao J, Yin C, Chen S, Zhang W, Zhang J, Wei W. 2026. A single MYB transcription factor GmMYB331 regulates seed oil accumulation and seed size/weight in soybean. Journal of Integrative Plant Biology 68:470– 485. DOI: https://doi.org/10.1111/jipb.70101

      Xu J, Chen S, Wang W, Man Lam S, Xu Y, Zhang S, Pan H, Liang J, Huang Xiahe, Wang Yu, Li T, Jiang Y, Wang Yingchun, Ding M, Shui G, Yang H, Huang Xun. 2022. Hepatic CDP-diacylglycerol synthase 2 deficiency causes mitochondrial dysfunction and promotes rapid progression of NASH and fibrosis. Science Bulletin 67:299–314. DOI: https://doi.org/10.1016/j.scib.2021.10.014

      Xu M, Ding L, Liang J, Yang X, Liu Y, Wang Y, Ding M, Huang X. 2021. NAD kinase sustains lipogenesis and mitochondrial metabolism through fatty acid synthesis. Cell Reports 37:110157. DOI: https://doi.org/10.1016/j.celrep.2021.110157

      Yang L, Liang J, Lam SM, Yavuz A, Shui G, Ding M, Huang X. 2020. Neuronal lipolysis participates in PUFA‐mediated neural function and neurodegeneration. EMBO reports 21:e50214. DOI: https://doi.org/10.15252/embr.202050214

      Yang X, Liang J, Ding L, Li X, Lam S-M, Shui G, Ding M, Huang X. 2019. Phosphatidylserine synthase regulates cellular homeostasis through distinct metabolic mechanisms. PLOS Genetics 15:e1008548. DOI: https://doi.org/10.1371/journal.pgen.1008548

      Zhu J, Lam SM, Yang L, Liang J, Ding M, Shui G, Huang X. 2022. Reduced phosphatidylcholine synthesis suppresses the embryonic lethality of seipin deficiency. Life Metabolism 1:175–189. DOI: https://doi.org/10.1093/lifemeta/loac02

    1. eLife Assessment

      This manuscript uncovers the importance of Vinculin in the maintenance of junctional integrity during neural tube closure in regions of increased mechanical stress, by using sophisticated methods such as laser ablation and live imaging. The manuscript also reports a novel application of an established embryonic stem cell protocol to efficiently generate mutant and transgenic embryos for analysis. The findings are fundamental in nature, significantly improve our understanding of a major research question, and are backed by compelling evidence. Whilst there is much to appreciate in this work, exactly how Vinculin mediates neural fold elevation remains unclear, and addressing this lacuna will significantly improve the strength of the manuscript; in addition, some rewriting for better clarity (including technical/methodological details) and inclusion of possible consequences of the increased number of tight junction gaps in the vinculin mutant would be pertinent.

    2. Reviewer #1 (Public review):

      Summary:

      In many vertebrates, the neural tube closes by folding, elevation, and fusion of bilateral neural folds. Loss of the actin-binding protein Vinculin causes failed cranial neural tube closure in mice and is associated with neural tube defects in human patients, but it was not known how Vinculin contributes to neural tube closure. Here, Prudhomme and colleagues find that neural fold elevation and the apical constriction that drives it initiate normally in Vinculin-deficient mouse embryos, but both arrest before the neural folds fuse. The time of failure coincides with increased mechanical tension within the cranial neural plate. They find that Vinculin localizes to areas of high mechanical stress in the WT neural plate, including multi-cellular junctions and dividing cells, and in the absence of Vinculin, recruitment of Myosin and Apical junction proteins is reduced at these sites. These data support a model in which Vinculin recruits junctional proteins to high-stress areas to maintain junctional integrity during neural tube closure.

      Strengths:

      The data presented are thorough, rigorous, and convincing. The combination of live imaging and transgenic fluorescent reporters enables direct observation of junctional behaviors within the mouse cranial neural plate and detailed analysis of how these behaviors are disrupted upon loss of Vinculin. The authors make good use of an ESC transplant approach to efficiently generate mutant and transgenic embryos for analysis.

      Weaknesses:

      Although the loss of junctional integrity, especially at multi-cellular junctions, is clearly and convincingly demonstrated in Vinculin-deficient embryos, it is not clear precisely how this disrupts the elevation of the neural folds to cause exencephaly.

    3. Reviewer #2 (Public review):

      Summary

      Using mouse embryos early in development, this excellent paper from Prudhomme et al. shows that Vinculin's recruitment to adherens junctions during mammalian cranial neural tube closure is essential for maintaining junctional integrity in response to increased tension during this process. Previous work had shown that during neural tube elevation, planar polarity of Myosin II and mechanical forces in the tissue are increased. Additionally, mouse embryos lacking Vinculin were known to display neural tube closure failure, and mutations in human Vinculin had been associated with increased risk of neural tube defects, but the mechanism remained unclear. Here, the authors utilize a high-throughput embryonic stem cell (ESC)-based pipeline to generate Vinculin-depleted embryos, complemented by a conditional mutant lacking Vinculin in the embryonic lineages, to investigate this question. The authors show that Vinculin is not required for force generation, but Vinculin is recruited to cell-cell junctions in a tension-dependent manner and is needed to transmit actomyosin-mediated tension to junctions - particularly tricellular and higher-order multicellular junctions - so that apical constriction can happen during neural fold elevation. Furthermore, they find that Vinculin is required to maintain adhesion during high force events (e.g., rosette resolution and cell division) during neural tube closure. The research builds on previous studies about Vinculin's role in mechanotransduction at cell-cell junctions carried out in cultured epithelial cells, zebrafish cardiomyocytes, or early Xenopus embryos, and investigates how physiological forces required for mouse neural tube closure challenge junction integrity and the important role that Vinculin plays in maintenance of junction integrity and translation of mechanical forces into changes in tissue structure during this process.

      Strengths:

      This study stands out for its sophisticated use of laser ablation and live imaging in neurulating mouse embryos, enabling quantification of junctional tension, Vinculin recruitment to multicellular junctions, and assessment of junction integrity during neural tube elevation. The authors' use of both ESC-derived Vinculin mutant embryos complemented by a second conditional mutant of Vinculin convincingly demonstrates that their findings are specific to the loss of Vinculin. Additionally, the authors demonstrated proof-of-principle for their ESC-based pipeline with a Shroom3 mutant known to be important for neural tube closure. The Zallen lab's application of the genetically engineered ESC-derived mouse embryo pipeline to efficiently generate larger numbers of mutant mouse embryos exhibiting neural tube closure defects (compared with traditional genetic crossing strategies) that can be utilized for live imaging and mechanical perturbations like laser ablation will be valuable for future work in the field. The authors show that Vinculin depletion disrupts tricellular and multicellular junctions. Notably, over 75% of higher-order (5+) vertices in Vinculin mutant embryos display gaps, but interestingly, about one third of 5+ cell junctions in Control embryos also display gaps, indicating that transient vertex remodeling events are needed for normal neural tube closure. Overall, this is a well-written paper that places the authors' findings within the context of prior literature; their beautiful data that is robustly analyzed and clear figure presentation will make the authors' exciting findings accessible to readers.

      Weaknesses:

      The criteria for selection of junctions targeted by laser ablation, including specifics of location, Myosin II intensity, and initial junction length, should be more clearly described in the Methods, especially given the use of different reporter strains (MyoIIB-GFP vs. GFP-Plekha7) across figures, which may influence junction selection for laser ablation. Analysis of Myosin II in Vinculin mutant embryos would benefit from staining for active Myosin II (pMRLC), and further examination of actomyosin organization at different stages of neural fold elevation in controls vs. Vinculin mutants would be informative. Although the authors note that ZO-1 gaps are limited to a subset of vertices where adherens junction gaps are detected, the increased frequency of tight junction gaps in Vinculin mutants could have functional significance that should be noted. Finally, inclusion of schematics to detail how the adherens and tight junction gaps were defined and measured at cell vertices, as well as how cell division completion was defined, would improve transparency and strengthen readers' understanding of how the data were quantified.

    4. Reviewer #3 (Public review):

      Summary:

      Prudhomme et al report a detailed analysis of the role of vinculin in maintaining neuroepithelial integrity during cranial neurulation.

      Strengths:

      The authors use complementary experiments involving super-resolution microscopy, laser ablation, and live imaging of conditional knockout and ESC-derived embryos to demonstrate that loss of vinculin produces wide gaps between the adherens junctions of neuroepithelial cells at later stages of cranial neural fold elevation. The data presented are of extremely high quality, logically presented in a compelling story, and represent a very substantial contribution.

      Weaknesses:

      The authors are invited to consider the largely minor questions recommended below.

      (1) The laser ablations reported are a correlate of cell border, or 'junctional' tension. Please avoid broad statements such as 'mechanical forces are upregulated' (abstract), which invoke gene-like regulation of tissue-level forces (in Newtons). Changes in junctional tension are likely to relate to changes in force generated, but their relationship is not simple: higher tensile stress withstood by the shorter length of junctions in cells with smaller apical surfaces does not necessarily translate into greater force being produced by that cell. The junctional tension readout measured is perfectly relevant to the paper, more so than tissue-level forces would have been.

      (2) What is the mechanical mechanism by which loss of vinculin prevents neural fold elevation? The authors present exciting findings about the cellular consequences of losing Vcl at the late elevation stages when the tissue is quantifiably dysmorphic. A clear argument of how Vcl loss could lead to this dysmorphology would strengthen the paper, particularly given that junctional tension defects are excluded and apical non-constriction at the late stage is only mild.

      (3) Can the authors comment on the likely impacts of Vcl deletion on the basal domain of the cell? For example, they could cite live-imaging of distinct behaviours in Williams et al Dev Cell 2014, and the NTD phenotypes of some integrin/focal adhesion mutant mice.

      (4) The apparent uncoupling of apical area (larger in Vcl KO) from junctional tension (equivalent) in this model is noteworthy. Can the authors speculate on its potential basis?

      (5) Live imaging in Figure 7C appears to show a marked reduction in apical area before cleavage furrow formation (T0-18min), suggesting a large apical constriction event (post-mitotic?), as previously reported (e.g., Ampartzidis et al Dev Biol 2023). Do junctional gaps appear during these constrictions?

      (6) The live imaging setup used is clearly sufficient to identify differences between genotypes, so this is only a minor point. The gassing conditions listed in the methods specify 5% CO2, but E8.5 embryos also need low O2 to complete cranial closure. Was the O2 level controlled? Was tissue-level shape change observed to be consistent with ongoing neurulation during live-imaging?

      (7) Neither the multi-cell laser ablations in the pre-print by De La O cited here, nor the narrower junctional ablations in Bocanegra-Moreno et al., Nat Phys, (2023), identified differences in recoil between developmental stages. Why might those results be different from the findings reported here (e.g., analysis region - not specified in the latter paper)? Limitations to interpreting junctional ablations between cells with different junction lengths include more of the recoil being dissipated by retraction of the longer ablated border.

      (8) Is a truncated Vcl expressed in the ESC model, which could bind catenin without an F-actin anchor? The very high-contrast western shown is cropped so it is not clear whether the catenin-binding N-terminus is present. Does the antibody used recognise the head domain (this reviewer could not readily find the information)?

    1. eLife Assessment

      This important study explores how the phase of neural oscillations in the alpha band affects visual perception, indicating that perceptual performance varies due to changes in sensory precision rather than decision bias. The evidence is convincing in its experimental design and analytical approach. This work should interest cognitive neuroscientists who study perception and decision-making.

    2. Reviewer #1 (Public review):

      [Editors' note: This version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      In their paper entitled "Alpha-Band Phase Modulates Perceptual Sensitivity by Changing Internal Noise and Sensory Tuning," Pilipenko et al. investigate how pre-stimulus alpha phase influences near-threshold visual perception. The authors aim to clarify whether alpha phase primarily shifts the criterion, multiplicatively amplifies signals, or changes the effective variance and tuning of sensory evidence. Six observers completed many thousands of trials in a double-pass Gabor-in-noise detection task while an EEG was recorded. The authors combine signal detection theory, phase-resolved analyses, and reverse correlation to test mechanistic predictions. The experimental design and analysis pipeline provide a clear conceptual scaffold, with SDT-based schematic models that make the empirical results accessible even for readers who are not specialists in classification-image methods.

      Strengths:

      The study presents a coherent and well-executed investigation with several notable strengths. First, the main behavioral and EEG results in Figure 2 demonstrate robust pre-stimulus coupling between alpha phase and d′ across a substantial portion of the pre-stimulus interval, with little evidence that the criterion is modulated to a comparable extent. The inverse phasic relationship between hit and false-alarm rates maps clearly onto the variance-reduction account, and the response-consistency analysis offers an intuitive behavioral complement: when two identical stimuli are both presented at the participant's optimal phase, responses are more consistent than when one or both occur at suboptimal phases. The frontal-occipital phase-difference result suggests a coordinated rather than purely local phase mechanism, supporting the central claim that alpha phase is linked to changes in sensitivity that behave like changes in internal variability rather than simple gain or criterion shifts. Supplementary analyses showing that alpha power has only a limited relationship with d′ and confidence reassure readers that the main effects are genuinely phase-linked rather than a recasting of amplitude differences.

      Second, the reverse-correlation results in Figure 3 extend this story in a satisfying way. The classification images and their Gaussian fits show that at the optimal phase, the weighting of stimulus energy is more sharply concentrated around target-relevant spatial frequencies and orientations, and the bootstrapped parameter distributions indicate that the suboptimal phase is best described by broader tuning and a modest change in gain rather than a pure criterion account. The authors' interpretation that optimal-phase perception reflects both reduced effective internal noise and sharpened sensory tuning is reasonable and well-supported. Overall, the data and figures largely achieve the stated aims, and the work is likely to have an impact both by clarifying the interpretation of alpha-phase effects and by illustrating a useful analytic framework that other groups can adopt.

    3. Reviewer #2 (Public review):

      Summary:

      The study of Pilipenko et al evaluated the role of alpha phase in a visual perception paradigm using the framework of signal detection theory and reverse correlation. Their findings suggest that phase-related modulations in perception are mediated by a reduction in internal noise and a moderate increase in tuning to relevant features of the stimuli in specific phases of the alpha cycle. Interestingly, the alpha phase did not affect the criterion. Criterion was related to modulations in alpha power, in agreement with previous research.

      Strengths:

      The experiment was carefully designed, and the analytical pipeline is original and suited to answer the research question. The authors frame the research question very well and propose several models that account for the possible mechanisms by which the alpha phase can modulate perception. This study can be very valuable for the ongoing discussion about the role of alpha activity in perception.

      Conclusion:

      This study addresses an important and timely question and proposes an original and well-thought-out analytical framework to investigate the role of alpha phase in visual perception. While the experimental design and theoretical motivation are strong, the very limited sample size substantially constrains the strength of the conclusions that can be drawn at the group level.

      Bibliography:

      Button, K., Ioannidis, J., Mokrysz, C. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14, 365-376 (2013). https://doi.org/10.1038/nrn3475

      Tamar R Makin, Jean-Jacques Orban de Xivry (2019) Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript eLife 8:e48175 https://doi.org/10.7554/eLife.48175

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their paper entitled "Alpha-Band Phase Modulates Perceptual Sensitivity by Changing Internal Noise and Sensory Tuning," Pilipenko et al. investigate how pre-stimulus alpha phase influences near-threshold visual perception. The authors aim to clarify whether alpha phase primarily shifts the criterion, multiplicatively amplifies signals, or changes the effective variance and tuning of sensory evidence. Six observers completed many thousands of trials in a double-pass Gabor-in-noise detection task while an EEG was recorded. The authors combine signal detection theory, phase-resolved analyses, and reverse correlation to test mechanistic predictions. The experimental design and analysis pipeline provide a clear conceptual scaffold, with SDT-based schematic models that make the empirical results accessible even for readers who are not specialists in classification-image methods.

      Strengths:

      The study presents a coherent and well-executed investigation with several notable strengths. First, the main behavioral and EEG results in Figure 2 demonstrate robust pre-stimulus coupling between alpha phase and d′ across a substantial portion of the pre-stimulus interval, with little evidence that the criterion is modulated to a comparable extent. The inverse phasic relationship between hit and false-alarm rates maps clearly onto the variance-reduction account, and the response-consistency analysis offers an intuitive behavioral complement: when two identical stimuli are both presented at the participant's optimal phase, responses are more consistent than when one or both occur at suboptimal phases. The frontal-occipital phase-difference result suggests a coordinated rather than purely local phase mechanism, supporting the central claim that alpha phase is linked to changes in sensitivity that behave like changes in internal variability rather than simple gain or criterion shifts. Supplementary analyses showing that alpha power has only a limited relationship with d′ and confidence reassure readers that the main effects are genuinely phase-linked rather than a recasting of amplitude differences.

      Second, the reverse-correlation results in Figure 3 extend this story in a satisfying way. The classification images and their Gaussian fits show that at the optimal phase, the weighting of stimulus energy is more sharply concentrated around target-relevant spatial frequencies and orientations, and the bootstrapped parameter distributions indicate that the suboptimal phase is best described by broader tuning and a modest change in gain rather than a pure criterion account. The authors' interpretation that optimal-phase perception reflects both reduced effective internal noise and sharpened sensory tuning is reasonable and well-supported. Overall, the data and figures largely achieve the stated aims, and the work is likely to have an impact both by clarifying the interpretation of alpha-phase effects and by illustrating a useful analytic framework that other groups can adopt.

      Weaknesses:

      The weaknesses are limited and relate primarily to framing and presentation rather than to the substance of the work. First, because contrast was titrated to maintain moderate performance (d′ between 1.2 and 1.8), the phase-linked changes in sensitivity appear modest in absolute terms, which could benefit from explicit contextualization. Second, a coding error resulted in unequal numbers of double-pass stimulus pairs across participants, which affects the interpretability of the response-consistency results. Third, several methodological details could be stated more explicitly to enhance transparency, including stimulus timing specifications, electrode selection criteria, and the purpose of phase alignment in group averaging. Finally, some mechanistic interpretations in the Discussion could be phrased more conservatively to clearly distinguish between measurement and inference, particularly regarding the relationship between reduced internal noise and sharpened tuning, and the physiological implementation of the frontal-occipital phase relationship.

      We appreciate the reviewer’s thoughtful and constructive feedback, particularly regarding clarity and framing. In response, we have made several revisions to improve transparency and contextualization throughout the manuscript.

      First, we now explicitly contextualize the relatively modest change in sensitivity by adding discussion of the contrast-titration procedure and its implications for effect size interpretation. Second, we address the coding error that led to unequal numbers of double-pass stimulus pairs across participants sooner in the manuscript by reporting the average number of pairs per participant in the Results (as well as the Methods), allowing for readers to interpret the results more appropriately. Third, we have provided additional detail, including precise stimulus timing parameters, electrode selection criteria, and a clearer explanation of the rationale for phase alignment in the Results (in addition to the Methods) section. Finally, we have revised portions of the Discussion to adopt more conservative language when interpreting our results, which more clearly distinguishes between empirical observations and mechanistic inferences, along with offering additional interpretations for the frontal-occipital phase relationship.

      We believe these revisions substantially improve the clarity, transparency, and interpretability of the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study of Pilipenko et al evaluated the role of alpha phase in a visual perception paradigm using the framework of signal detection theory and reverse correlation. Their findings suggest that phase-related modulations in perception are mediated by a reduction in internal noise and a moderate increase in tuning to relevant features of the stimuli in specific phases of the alpha cycle. Interestingly, the alpha phase did not affect the criterion. Criterion was related to modulations in alpha power, in agreement with previous research.

      Strengths:

      The experiment was carefully designed, and the analytical pipeline is original and suited to answer the research question. The authors frame the research question very well and propose several models that account for the possible mechanisms by which the alpha phase can modulate perception. This study can be very valuable for the ongoing discussion about the role of alpha activity in perception.

      Weaknesses:

      The sample size collected (N = 6) is, in my opinion, too small for the statistical approach adopted (group level). It is well known that small sample sizes result in an increased likelihood of false positives; even in the case of true positives, effect sizes are inflated (Button et al., 2013; Tamar and Orban de Xivry, 2019), negatively affecting the replicability of the effect.

      Although the experimental design allows for an accurate characterization of the effects at the single-subject level, conclusions are drawn from group-level aggregated measures. With only six subjects, the estimation of between-subject variability is not reliable. The authors need to acknowledge that the sample size is too small; therefore, results should be interpreted with caution.

      Conclusion:

      This study addresses an important and timely question and proposes an original and well-thought-out analytical framework to investigate the role of alpha phase in visual perception. While the experimental design and theoretical motivation are strong, the very limited sample size substantially constrains the strength of the conclusions that can be drawn at the group level.

      Bibliography:

      Button, K., Ioannidis, J., Mokrysz, C. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14, 365-376 (2013). https://doi.org/10.1038/nrn3475

      Tamar R Makin, Jean-Jacques Orban de Xivry (2019) Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript eLife 8:e48175 https://doi.org/10.7554/eLife.48175

      We thank the reviewer for their supportive remarks on our design and analysis, and for raising this important statistical concern about our sample size (n=6). Our choice of a small sample size was driven by methodological considerations. Specifically, our reverse correlation analysis requires a large number of trials per participant, as it estimates perceptual tuning by regressing behavioral responses against fluctuations in the energy of stimulus features (orientation and spatial frequency). This approach, as well as the computation of signal detection theory (SDT) metrics such as d′ and criterion, depends on high trial counts to obtain reliable estimates, particularly given that our analysis further subdivides trials across eight phase bins. For this reason, we prioritized collecting a large number of trials per participant (∼5,000), which is consistent with established practices in psychophysical research.

      Importantly, our approach means that our design is reliable on the individual level, which motivated us to include a new binomial probability testing in our revised paper. This binomial test helps address concerns about the generalizability of our results. Binomial testing considers each participant as an independent replication of the effect and then computes the p-value associated with the probability of having observed the given number of statistically significant participants by chance, with a false positive rate of 0.05. In our data, 3 out of 6 participants showed significant effects, which corresponds to a probability of 0.002 of having observed these effects by chance alone. We believe this converging evidence supports the replicability and generalizability of our results. To improve the transparency of the single-subject data, we have included single-participant results in the Supplemental Materials to allow readers to directly assess the consistency of effects across individuals and to better contextualize between-subject variability.

      Thank you again for your suggestions, we believe that these additions have greatly improved our manuscript by demonstrating the robustness of our findings and increasing the transparency of our single-subject results.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The issue of generalizability arose during the review process, as your results are based on a small sample of participants who undertook a very large number of trials. In the revised version, it would be useful to discuss why this approach is valid, especially in the context of linking EEG with modeling (i.e., why it is more powerful than having many participants with fewer trials), and the extent to which your results can generalize to the population.

      We sincerely appreciate all of the helpful comments provided by the reviewers and hope we can address the concerns of our experimental approach. In the introduction, we have emphasized the importance of our current small sample size design, which allows us to reliably compute our signal detection theory metrics across 8 phase bins in addition to including the reverse correlation analysis. In the methods section, we have added a description of the binomial probability statistical framework, which addresses the generalizability of our results. In this framework, each participant is viewed as an independent replication and the p-value reflects the probability of having observed the number of individually significant subjects from the total sample size by chance. In this regard, observing a significant effect in 3 out of 6 participants (as in our study) from chance alone has a 0.002 probability, which we believe is unlikely and instead reflects a true effect present in the general population.

      Below I have copied our changes in the introduction and methods sections.

      “... in a large number of trials (6,020 per observer, n = 6) across multiple EEG sessions. This approach ensures a sufficient number of trials in order to reliably compute signal detection theory (SDT) metrics across multiple alpha phase bins while also affording enough statistical power for reverse correlation analysis (Xue et al., 2024), making it preferred over having a larger sample size with fewer trials.”

      “Additionally, we used a binomial probability testing framework that is designed for small sample sizes and treats each participant as an independent replication. As such, it computes the probability of having observed the number of statistically significant outcomes by chance given our sample size (Schwarzkopf & Huang, 2024).”

      Reviewer #1 (Recommendations for the authors):

      My suggestions are intended to be light-touch and focused on strengthening the clarity and durability of the Reviewed Preprint rather than on additional experimentation or major new analyses.

      (1) Limitation statement for the double-pass coding error:

      Add a short statement in the Methods or Results acknowledging that the coding error led to markedly fewer repeated stimulus pairs for the first three participants than for the last three. For the response-consistency result in Figure 2E, a simple acknowledgement that the available evidence is stronger for some participants than others will help readers calibrate their confidence without detracting from the main story.

      Thank you for this suggestion, we have now added a statement to this effect in the Results section, in addition to the description already mentioned in the Methods section.

      “To examine this, we implemented a double-pass stimulus presentation (~600 stimulus pairs for participants 1-3 and ~2,500 pairs for participants 4-6) and analyzed participant’s response consistency (Xue et al., 2024) to two identical stimuli.”

      (2) Contextualizing the titrated performance level:

      In the Discussion, explicitly note that contrast was titrated to keep d′ between approximately 1.2 and 1.8, which intentionally maintains moderate performance. This contextualization will help readers understand that while the phase-linked changes appear modest in absolute terms, they are mechanistically informative within this design.

      Thank you, we have included a sentence to the Discussions speaking to this point.

      “We also note that the observed modulation of d’ between optimal and suboptimal phases was relatively modest in absolute terms (0.21) in our study and could therefore require many trials per subject to detect. Two reasons for this modest effect size could be related to specific features of our task design. First, we titrated stimulus contrast to maintain consistent task performance. This titration could have reduced the magnitude of the phase effect on d’ that would otherwise be apparent if the stimulus intensity were kept constant. Additionally, the use of (relatively) high-contrast random noise likely means that trial-to-trial variability in perception is largely driven by random fluctuations in the noise properties and, to a lesser extent, internal brain state. Although both of these choices were necessary to perform SDT and reverse correlation analysis, they differ from many previous studies investigating alpha phase using only near-threshold detection in the absence of external noise and may contribute to an underestimation of the true effect size.”

      (3) Methods clarifications:

      (a) Replace placeholder text such as "{plus minus}" and "{degree sign}" with the appropriate symbols, and ensure that any equations implied in the reverse-correlation section are fully present.

      Thank you for bringing this to our attention, these placeholder texts are an artifact of the conversion process and we will correct this.

      (b) State explicitly that the 8 ms stimulus duration corresponds to a single frame on your 120 Hz display, which will clarify the timing in Figure 1A and the pre-stimulus windows in the phase analyses.

      Thank you, we have added language to both the Method and Results sections explicitly indicating that the 8 ms stimulus choice corresponds to a single screen refresh. Additionally, we changed the text in Figure 1A to include inter-trial interval timing (as opposed to merely saying “Start Trial”):

      “(A) Task design. Each trial contained a brief, filtered-noise stimulus (8 ms; one screen refresh) presented to the right or left of fixation with equal probability.”

      “Each participant (n = 6) completed 5-6 EEG sessions of a Yes/No detection paradigm whereby participants reported the presence or absence of a brief (8 ms; one screen refresh) vertical Gabor target (2 cycles per degree) with concurrent confidence judgments (see Figure 1A), along with an additional imagination judgement (reported in the supplemental materials).”

      (c) In the description of the post-stimulus taper, consider phrasing the rationale in terms of minimizing contamination from evoked responses rather than asserting that the taper ends before the earliest evoked response, which keeps the argument correct without committing to a precise latency boundary.

      Thank you for this suggestion. We have changed our rationale for the taper to “minimizing”, rather than avoiding, the evoked response.

      “This resulted in the post-stimulus data being flat after 70 ms, which is intended to minimize the evoked response in our data.”

      (4) Analysis transparency:

      (a) In the description of posterior electrode selection, explicitly note that channels were chosen solely on the basis of alpha power, independent of behavioural performance, and that the same electrodes were used for each participant across sessions.

      We have gladly made this clarification to the methods.

      “This was individually determined by rank-ordering 17 of the posterior channels (Pz, P3, P7, O1, Oz, O2, P4, P8, P1, P5, PO7, PO3, POz, PO4, PO8, P6, and P2) and algorithmically choosing the three with the highest power. This ensured that electrode selection was made independent of performance and instead was based upon maximizing alpha signal strength.”

      (b) Describe the phase-alignment step used to center each participant's optimal bin before group averaging as a device for visualization and summary, and clarify that inferential statistics are based on the underlying, non-aligned data as appropriate. This will reassure readers who are cautious about circularity.

      We agree that this should be made more explicit throughout the manuscript and have added statements clarifying this aspect in the Figure 2B caption, the Results, and Method sections.

      “The data have been aligned across participants so that each individual's highest d’ was assigned to bin 8 (omitted from the plot), with the remaining data circularly shifted, and is averaged across -450 ms to stimulus onset. This graph is for visualization purposes only. Error bars represent ± 1 SEM. The pattern shows a clear phasic modulation of d’ across bins.”

      “... requiring us to phase-align the performance data across participants in order to visualize the underlying phasic effects. To this end, we aligned all metrics (d’, c, HR, and FAR) by circularly shifting the data so that the bin with the highest d’ was assigned to bin 8, which was then omitted from further visualizations.”

      “Bin 8 was then omitted from further visualizations. The shifted data were then averaged across all time points from -450 ms to 0 ms, based on significant effects at the group level, and averaged across participants. No statistics were conducted on these shifted variables and instead are for visualization purposes only.”

      (c) Add a short note on the number of permutations and the cluster-forming threshold in the phase-coupling analyses, if not already stated in the Results or captions, to complete the description of your non-parametric testing procedure.

      Thank you, we agree that reiterating this information in the Results section is helpful for the reader to clarify the analysis procedure.

      “After smoothing the resultant vector length over time with a 50 ms moving average, we compared the observed vector lengths to a permuted threshold (95th percentile of 1,000 permutations) at each time point from –700 to 0 ms and performed cluster correction (95th percentile of the permuted cluster size) to account for multiple comparisons.”

      (5) Discussion framing:

      Make one or two small adjustments to your mechanistic phrasing so that the distinctions between measurement and interpretation are fully explicit:

      (a) State that the combination of phase-d′ coupling, counterphased hit and false-alarm rates, response consistency, and phase-dependent classification images is "consistent with" a reduction in effective internal noise and sharper estimated tuning at optimal alpha phase, within the assumptions of your SDT and reverse-correlation framework.

      Thank you for this suggestion. We have changed the language in the discussions to reflect this framing and interpretation of the results.

      “Moreover, our data are consistent with a model in which the variability of internal responses changes systematically across the alpha cycle, as reflected in the inverse relationship between hit rate and false alarm rate.”

      (b) Emphasize that reduced effective internal noise and sharpened sensory tuning are two complementary descriptions of a better match between sensory evidence and decision template rather than fully separable mechanisms.

      Thank you, we have added this language for clarity of our interpretation.

      “Together with decreases in the variance of sensory tuning during the optimal phase, our results suggest that alpha phase impacts sensitivity by shaping trial-to-trial variation in internal noise during perceptual decision making, leading to better matches between sensory evidence and decision templates as opposed to a change in the gain of internal sensory responses.”

      (c) Note that the frontal-occipital phase relationship is consistent with a coordinated, possibly top-down component to the alpha-phase effect, while remaining agnostic about the precise physiological implementation.

      Thank you for raising this additional interpretation. We have added this as a plausible alternative to the single-source account in the Discussion section.

      “Moreover, our results suggest that prior literature reporting phasic effects in the alpha-band range from both frontal and occipital regions may plausibly be reporting the same effect from a single projected dipole source; however, these results are also consistent with two synchronized alpha sources which are anti-phase.”

      Reviewer #2 (Recommendations for the authors):

      Major issues:

      Given that collecting more data may not be doable, the authors should take some actions to test the reliability of their results. For instance, simulations could be run to test the robustness of the results with such a small sample size (Zoefel, 2019). It would also be of interest to include in the report statistics and plots at the individual level, not only the aggregates. It is also important to report which electrodes were used in the analysis for each of the subjects, in the Methods section, it is clearly stated that these electrodes differed between subjects.

      Thank you for these suggestions. To assess the reliability of our results at the single-subject level, we have included a new binomial probability test which is a framework suitable for small sample size experiments with large trail numbers (Schwarzkopf & Huang, 2024). Binomial testing views each individual as an independent replication and considers the probability of having observed the number of significant participants given the total number tested participants, and outputs the probability of having observed the results by chance. We believe this framework adequately addresses the reviewer’s concern of generalizability in addition to being well-suited to the design of our study.

      To assess individual significance, we averaged the resultant vector length and permutations over the analysis window from -450 to 0 ms. If the resultant vector length exceeded the permutation for that participant, then they were considered to be a significant participant. In total, 3 out of 6 participants (participants 1, 4, and 5) showed significant d’ coupling. The binomial probability (equivalent to a p-value) of having observed this outcome as a result of three false positives at the individual-subject level is very small (p = 0.002), which is sufficiently low for psychological studies.

      Below is the text which we have added to the Results and Methods sections.

      “To interrogate the robustness of our findings at the single-subject level, we adopted a test of binomial probability, which is a statistical framework that treats each individual as an independent replication and is ideal for small sample size studies that utilize a large number of trials per observer (Schwarzkopf & Huang, 2024). For our data, we assessed individual significance by averaging the actual and permuted resultant vector lengths across time (-450 to 0ms) and comparing the real vector length to the 95% percentile of the permuted datasets. With this approach, 3 out of 6 participants showed significant d’-phase coupling which corresponds to a binomial probability of p = 0.002, indicating a very low probability that we observed these results by chance alone.”

      “Additionally, we used a binomial probability testing framework that is designed for small sample sizes and treats each participant as an independent replication. As such, it computes the probability of having observed the number of statistically significant outcomes by chance given our sample size (Schwarzkopf & Huang, 2024). To assess significance at the participant level, we averaged the participant’s resultant vector length and permutations from -450 to 0 ms and obtained the 95th percentile of the time-averaged permutations. We then compared the averaged resultant vector lengths to the permutation thresholds for each subject, which revealed 3 out of 6 significant subjects. We then used the MATLAB function myBinomTest.m (Nelson, 2026) to compute the p-value associated with the probability of having observed 3 out of 6 significant subjects by chance (with a false-positive rate of 0.05).”

      To address the reviewer's second request, we now include a supplemental figure which has each individual’s results for the main analysis (see Supplementary figure 3). These graphs, in addition to the methods, now provides the reader with each participant’s given set of analysis electrodes.

      “Each participant had a different combination of electrodes which were used in the analyses; however, the same three channels were used across sessions within a participant (participant 1: POz, PO3, O1; participant 2: P7, PO7, PO4; participant 3: P2, P1, Pz; participant 4: O1, Oz, O2; participant 5: O2, PO8, PO4; participant 6: Oz, O2, O1).”

      As an alternative approach, linear mixed models (LMM) could be used for statistics, as they are more suitable for small sample sizes (Wiley et al., 2019). LMM improve generalization by modelling subject-specific random effects. Although raw circular data is not suitable for LMM, the sine and cosine of the phases could have been used as predictors, for instance. Given that data were collected for 6 different sessions, sessions could be included as a factor in the model to improve statistical power.

      We appreciate the suggestion but feel that LMMs would be a challenge in this case not only because the main predictor variables are circular, but because the main outcome variables are not defined on the single-trial level and require many trials to be computed (e.g., classification images, SDT measures, response consistency). As such, computing these measures within a session may also lead to noisier estimates than we had designed our experiment for. We therefore prefer the more straightforward approach we have taken in the paper, which has now been supplemented by a binomial test of individual-subject level significance.

      Given that the number of subjects is quite small, I believe that individual data should be presented (either in the main text or supplementary materials) also for figures: 2A, B, C and D.

      Thank you, we have included all of these results to the individual graphs in the Supplemental Materials (see Supplementary figure 3).

      In plot 2B (HR and FAR) a p-value = 0.015 appears. However, in the text you write:

      "Indeed, this showed that the difference between the HR and FAR vector angle was significantly clustered around a mean of 180{degree sign} (v = 3.78, p = 0.01), indicating that the phase angle associated with the greatest hits was counterphase to the phase angle associated with the greatest false alarms."

      Which one is correct? Or do they refer to different tests?

      We appreciate you catching this confusing discrepancy. The two values refer to the same test which has a p-value of 0.0145. In the figure, this value was rounded to the thousandths decimal place (i.e., 0.015), whereas in the text it was rounded to the hundredths value (0.01). We now consistently report p-values out to three decimal places throughout the manuscript.

      Did you perform any statistical test for phasic modulation of dprime and criterion? I say that because in Figure 2B, you state that the data shows a "clear phasic modulation of d' across bins", but no statistic is mentioned. On the other hand, in Figure 2D, you state, "We did not & observe any significant phase-dependent relationship between phase and criterion." Is this sentence referring to both 2C and 2D panels or only to 2C?

      Figure 2B and 2D show the phase-behavior relationship across bins after aligning the phase bins to each participant's “best” d’ bin. This bin is omitted from the plots because it is used for alignment, making the analysis circular. Accordingly, these panels were intended purely for visualization and were not used for statistical inference. Additional language has been added to the figure caption highlighting this aspect.

      “The data have been aligned across participants so that each individual's highest d’ was assigned to bin 8 (omitted from the plot), with the remaining data circularly shifted, and is averaged across -450 ms to stimulus onset. This graph is for visualization purposes only.”

      The primary statistical test for phase-behavior coupling was performed using permutation testing of the resultant vector length, which quantifies the magnitude of phase-dependent modulation. These results are shown in Figures 2A (for d′) and 2C (for criterion). In the original manuscript, we reported only the time points that survived cluster-based correction, but did not explicitly report the cluster p-values. We have now added these cluster p-values to the manuscript for completeness.

      “The data revealed significant cluster-corrected coupling between alpha phase and d’ in the prestimulus window from -220 ms until stimulus onset (cluster p = 0.046),...”

      Additionally, we have changed the caption of Figure 2 to be separate for C) and D).

      “(C) No evidence for the coupling of criterion to pre-stimulus alpha-band phase. Graph C reveals the time course of the resultant vector lengths for alpha phase-criterion coupling, which shows no significant phase-dependent relationship between phase and criterion.

      (D) The underlying shifted c across phase bins (shifted to participants’ optimal phase, as in graph B) did not visually demonstrate a phasic modulation pattern.”

      Minor issues:

      In general, the paper is very clear. I found a statement confusing in the Response consistency section:

      "To quantify response consistency, we computed the proportion of trials in which participants provided the same response across the two identical trials. This procedure was done for each channel at each time point (from -450 to 0 ms) and then averaged."

      Which makes no sense, as response consistency is independent of channel and time point. I believe here you refer to the phase, maybe by just changing the order (start with response consistency and then proceed to phase), the paragraph would be clearer.

      We appreciate you catching this mistake. We have clarified the Methods section in the following way:

      “To quantify response consistency, we computed the proportion of trials in which participants provided the same response across the two identical trials. Since the optimal phase changes over time, the set of trials were classified as either both having occurred during the optimal phase (or otherwise) for each time point (from -450 to 0 ms) and channel. The proportion of consistent responses was then averaged across channels and time.”

      Could you include a plot of the power spectrum used for IAF estimation of all the subjects?

      Thank you for the suggestion. In Supplemental Figure 3 we have included the power spectrum that was used to estimate IAF in addition to a topoplot of alpha power (IAF +/- 2 Hz) that has the analysis electrodes labelled.

      Bibliography:

      Wiley RW, Rapp B. Statistical analysis in Small-N Designs: using linear mixed-effects modeling for evaluating intervention effectiveness. Aphasiology. 2019;33(1):1-30. doi: 10.1080/02687038.2018.1454884.

      Zoefel B, Davis MH, Valente G, Riecke L, How to test for phasic modulation of neural and behavioural responses, NeuroImage, Volume 202, 2019,116175, https://doi.org/10.1016/j.neuroimage.2019.116175.

    1. eLife Assessment

      This is an important study that addresses the role of fever as a conserved response to viral infection. It demonstrates that the heat-shock factor, HSF1, is activated by increased temperature during fever to enhance the anti-viral immune response. The data provides compelling evidence for the conclusions and the work will be of interest to virologists, immunologists, and cell biologists.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods," Xiao and colleagues examine the role of the shrimp Litopenaeus vannamei HSF1 ortholog (LvHSF1) in the response to viral infection. The authors provide compelling support for their conclusions that the activation of LvHSF1 limits viral load at high temperatures. Specifically, the authors convincingly show that (i) LvHSF1 mRNA and protein are induced in response to viral infection at high temperatures, (ii) increased LvHSF1 levels can directly induce the expression of the nSWD (and directly or indirectly other antibacterial peptides, AMPs), (ii) nSWD's antimicrobial activities can limit viral load, and, (iv) LvHSF1 protects survival at high temperatures following virus infection. These data thus provide a model by which an increase in HSF1 levels limits viral load through the transcription of antimicrobial peptides, and provide a rationale for the febrile response as a conserved response to viral infection.

      Strengths:

      The large body of careful time series experiments, tissue profiling, and validation of RNA-seq data is convincing. Several experimental methodologies are used to support the author's conclusions that nSWD is an LvHSf1 target and increased LvHSF1 alone can explain increased levels of nSWD. Similar carefully conducted experiments also conclusively implicate nSWD protein in limiting WSSV viral loads.

      Weaknesses:

      As with any complex biological phenomenon, several aspects remain incompletely explained. Nevertheless, in their revision, the authors provide additional analyses supporting the authors model that losing LvHSF1 is not detrimental to survival, by more directly altering viral loads. In addition, their revised manuscript clarifies the complex interactions between infection, the role of HSF1, and hormesis. These revisions increase the impact of their findings.

      Comments on revisions:

      The authors have addressed all comments, and the manuscript is very much improved.

    3. Reviewer #3 (Public review):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However the logical flow of the paper can be improved.

      Comments on revisions:

      Some aspects of the initial study design, regarding the selection of representative candidate genes and the logical flow, raised concerns. However, these issues have been addressed in the revised manuscript through additional validations and clarifications. Most of my comments and concerns were sufficiently addressed in the revised manuscript. The results support the authors' conclusion that HSF1-dependent regulation of AMP expression contributes to antiviral defense under febrile conditions.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      Despite this compelling data regarding the protective role of HSF1 in the febrile response, what remains unexplained and complicates the authors' model is the observation that losing LvHSF1 at 'normal' temperatures of 25 ℃ is not detrimental to survival, even though viral loads increase and nSWD is likely still subject to LvHSF1 regulation. These observations suggest that WSSV infection may have other detrimental effects on the cell not reflected by viral load and that LvHSF1 may play additional roles in protecting the organism from these effects of WSSV infection, such as perhaps, perturbations to protein homeostasis. This is worth discussing, especially in light of the rather complicated roles of hormesis in protection from infection, the role of HSF1 in hormesis responses, and the findings from other groups that the authors discuss.

      We are grateful for your unbiased advice by reviewer. And we have added the description about the role of HSF1 in hormesis responses in discussion in Lines 422-425 in the revised manuscript. Thank you.

      Reviewer #2 (Public review):

      Temperature is a critical factor affecting the progression of viral diseases in vertebrates and invertebrates. In the current study, the authors investigate mechanisms by which high temperatures promote anti-viral resistance in shrimp. They show that high temperatures induce HSF1 expression, which in turn upregulates AMPs. The AMPs target viral envelope proteins and inhibit viral infection/replication. The authors confirm this process in drosophila and suggest that there may be a conserved mechanism of high-temperature mediated anti-viral response in arthropods. These findings will enhance our understanding of how high temperature improves resistance to viral infection in animals.

      The conclusions of this paper are mostly well supported by data, but some aspects of data analysis need to be clarified and extended. Further investigation on how WSSV infection is affected by AMP would have strengthened the study.

      We are grateful for your unbiased advice by reviewer. We have provided additional experimental evidence and supplementary instructions in the revised manuscript. Thank you.

      Reviewer #3 (Public review):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved.

      We are grateful for the positive comments and the unbiased advice by reviewer. We have improved the logical flow of the paper and added corresponding instructions in the revised manuscript. Thank you.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: The analysis compares Group TW to Group W (not the other way around).

      Thank you very much. To uncover the molecular mechanisms by which high temperature restricts WSSV infection, two shrimp groups, Group TW and Group W, were cultured at 25 °C. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing (Figure 1A). RNA-seq was used to identify genes responsive to high temperature, particularly those encoding potential transcriptional regulators. Thank you.

      (2) The RNA-seq data in Figure 1 focus only on the TFs. The manuscript would benefit from showing all the RNA-seq data and the differentially expressed genes. In particular, are the AMPs upregulated at the same time point? This should not be the case if LvHSF1 were responsible for the transcription of the AMPs, given the time lag between transcription and translation.

      Thank you for your suggestion. In Author response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024).

      Additionally, we also analyzed the AMPs expression between Group TW and Group W, and the results show that some antimicrobial peptides such as Lysozyme and C-type lectin are upregulated between Group TW and Group W. Notably, we did not detect upregulated expression of SWD between Group TW and Group W. We agree with the reviewer's point of view that there is a time lag between transcription and translation. Supplementary experimental evidences show that the expression level of LvHSF1 is strongly induced by WSSV stimulation, and then the expression level of SWD begins to increase. We have added a description in Lines 136-138 in the revised manuscript.

      Author response image 1.

      The Figure of the heat shock proteins in Group TW and Group W

      Author response image 2.

      Transcriptional expression levels of HSF1 and SWD after WSSV stimulation

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (3) The data showing the tissue distribution of LvHSF1 and nSWD is a rigorous approach and adds to the manuscript. A similar approach to understanding the time course of expression of AMPs in relationship to LvHSF1 expression levels would strengthen the authors' conclusions that LvHSF1 induction in response to high temperatures and viral infection, in turn, upregulates SWD and other antibacterial genes.

      Thank you for your suggestion. As you good suggestion, we detected the transcriptional expression levels of HSF1 and SWD after WSSV stimulation for 0, 2, 4, 6, 8, 12, 16, 20, and 24 hours. The transcriptional expression level of SWD was set to 1.00 at 0 h, in the early stage of WSSV infection (0-12 h, except 6 h), the expression level of LvHSF1 is strongly induced, and then the expression level of SWD begins to increase. Theses results show that LvHSF1 induction in response to viral infection, in turn, upregulates SWD and other antibacterial genes. Thank you.

      (4) The data (Figures 3 and 4) show that LvHSF1 is necessary to survive WSSV infection at high temperatures but does not affect survival at lower temperatures, even though LvHSF1 limits VP28 levels, and viral load at both temperatures is confusing. Does this suggest that LvHSF1 is not primarily important for protection against the virus but instead, for protection from the heat-induced damage caused by high temperatures, which would not be surprising? The manuscript would benefit if the authors could address this point. How do the authors envision the protection conferred by LvHSF1 only at high temperatures?

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection.

      Notably, the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. When infected with WSSV, shrimp use behavioral fever to elevate their body temperature (~32 °C), thereby inhibiting WSSV infection (Rakhshaninejad et al., 2023; Xiao et al., 2024). And this temperature (~32 °C) will not cause heat-induced damage to the shrimp. Our results demonstrate that febrile temperatures induce HSF1, which in turn upregulates antimicrobial peptides (AMPs) that target viral envelope proteins and inhibit viral replication.

      Only at high temperatures, we observed that knockdown of HSF1 did not affect shrimp survival rate (Figure 4A). Thank you again for your valuable feedback.

      Reference:

      Rakhshaninejad, M., Zheng, L., Nauwynck, H., 2023. Shrimp (Penaeus vannamei) survive white spot syndrome virus infection by behavioral fever. Sci Rep 13, 18034.

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (5) Related to the previous comment, the authors do not clearly distinguish between basal effects of LvHSF1 or nSWD induction and heat-induced effects and the differences related to the requirement of LvHSF1 for protection. Simply increasing LvHSF1 levels can result in increased nSWD. SWD levels increase upon WSSV infection even at 25 ℃, and the knockdown experiments suggest that this could also occur through LvHSF1. It would be useful to explicitly differentiate between basal functions of HSF1 and induced functions.

      Thank you for your suggestion. In previous responses, we have distinguished between basal effects of LvHSF1 or nSWD induction and heat-induced effects.

      As your good suggestion, we injected GST or rHSF1 protein into shrimp, the results showed that recombinant protein HSF1 could significantly induced the expression level of SWD (Supplementary Fig. 5C). Further, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 253-255 and Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      Reviewer #2 (Recommendations for the authors):

      (1) Two temperatures are used in the experiments of shrimp. It seems that HSF1 is also upregulated by WSSV infection at 25 ℃. However, this upregulation seems not to be able to protect the animals. The authors compare the infection at 25 and 32 ℃ but did not discuss the findings.

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection. We have added a discussion of this finding in Lines 461-464 in the revised manuscript. Thank you.

      (2) In the abstract the authors say that "These insights provide new avenues for managing viral infections in aquaculture and other settings by leveraging environmental temperature control." However, this point has not been discussed in the main text.

      We appreciated your comments. We have added a discussion about the environmental temperature control in Lines 512-514 in the revised manuscript. Thank you.

      (3) Line 142: "These results suggest that LvHSF1 may play a key role in enhancing shrimp resistance to WSSV at elevated temperatures." Although this type of conclusion has been made in many studies, I think it is impossible to see a "KEY role" based mainly on change in expression.

      Thank you for your suggestion. We have revised this conclusion in the revised manuscript. Thank you.

      (4) Section 2.1 Induction of Heat Shock Factor 1 in Response to WSSV at High Temperature

      Figure 1. Identification of HSF1 as a key factor induced by high temperature.

      The two titles are confusing. Whether the upregulation of HSF1 is a response to high temperature or WSSV infection? I think it is more likely a response to high temperature. Did the authors see the difference in HSF1 expression in shrimp with and without WSSV infection at high temperatures?

      Thank you for your comment. We have modified the title of Section 2.1 in the revised manuscript. As your good suggestion, we have measured the expression of LvHSF1 after WSSV challenge at high temperatures (32 ℃) in revised Figure 2F-2H in Line 122 in the revised manuscript. The results demonstrate that the expression of LvHSF1 is strongly induced by WSSV stimulation at high temperatures (32 ℃) in the revised manuscript. Thank you.

      (5) Figure 2. Upregulation of LvHSF1 in shrimp challenged by WSSV at both low and high temperatures. Results for WSSV challenge at high temperatures are not included in this figure.

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. The results demonstrate that the expression of LvHSF1 is strongly induced by Poly (I: C) and WSSV stimulation at high temperatures (32 ℃). And we have added a description in Lines 168-179 in revised manuscript. Thank you.

      (6) Section 2.2 Expression Profiles of LvHSF1 in Shrimp Under Varied Temperature Conditions and WSSV Challenge. Did the authors try poly IC and WSSV challenge at 32℃, and compare with the un-challenge group? Why were only low temperature was analyzed?

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. And we have added a description about the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in Lines 168-179 in revised manuscript. Thank you.

      (7) Figure 2: Please indicate the temperature used in C-E and F-H in the figure legend. Statistical significance: compared with which group? Please provide information in the legend or show it in the bar chart.

      Thank you for your suggestion. We have added the description of temperature used in revised Figures 2C-2E. The expression changes of HSF1 were compared with those of PBS control group at the corresponding time and we modified the comparison method of significance in revised Figures 2C-2E. Thank you.

      (8) Figure 3H: There are two groups (dsGFP+PBS; dsHSF1+PBS) showing with the same symbol (dot line).

      Thank you for your comment. The revised Figure 3H has used different symbols to distinguish the two groups. Thank you.

      (9) Line 205: qPCR

      Thank you for your careful checks. We have corrected this error in the revised manuscript. Thank you.

      (10) Figure 5d and f: Please indicate the sample in each row.

      Thank you for your suggestion. We have marked the samples in each row in the revised Figures 5d&5f.

      (11) Figure 3 and Figure 4: Why different tissues were analyzed in the two experiments? Low temperature: gill and hemocytes. High temperature: gill and muscle? It is better to use the same tissues so that they can be compared. Please indicate the tissue analyzed in D and d.

      Thank you for your suggestion. We have repeated the experiment to detect the copy number of WSSV in hemocyte at high temperature (32 °C) after LvHSF1 knockdown. The results showed that knockdown LvHSF1 showed increased viral loads in shrimp hemocyte (Figure 4C). We have supplemented the tissue information in Figure 4D&4d. Thank you.

      (12) Figure 2A The time for temperature treatment? hours or days?

      Thank you for your comment. Transcriptional expression of LvHSF1 in different tissues of healthy shrimp subjected to low (25 °C) and high (32 °C) temperatures for 12 hours. We have supplemented this information in the legend of Figure 2A in Lines 840-841 in revised manuscript. Thank you.

      (13) Line 249: purified by SDS-PAGE gel?

      Thank you for your comment. We have modified this description in Lines 272-274 in current manuscript. Thank you.

      (14) Line 258 "Next, to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperature". I think it is confusing to use "mediated" here. It seems that HSF1 is downstream of nSWD. Actually, HSF1 controls the expression of nSWD and thus regulates the anti-WSSV effect of shrimp at high temperatures.

      We appreciated your comments. We have modified this description in Lines 282-283 in current manuscript. Thank you.

      (15) Line 458 "The most probable anti-WSSV mechanism of nSWD is its direct interaction with WSSV envelope proteins VP24 and VP26, potentially inhibiting viral entry into target cells. I suggest the author analyze the entry of WSSV to see whether nSWD blocks this process.

      Thank you for your comment. In general, the antimicrobial mechanism of action of AMPs is thought to involve direct membrane disruption, especially for enveloped virus (such as WSSV) (Wilson et al., 2013).

      Thanks to the reviewers for their valuable comments. Our manuscript mainly focuses on the febrile temperature-inducible HSF in host antiviral immunity, and the role of HSF1 in regulating antimicrobial effectors (such as SWD). Due to the limitation of the manuscript's length, we will further investigate the functional mechanisms of SWD-specific anti-WSSV in future studies. Thank you.

      Reference:

      Wilson, S.S., Wiens, M.E., Smith, J.G., 2013. Antiviral Mechanisms of Human Defensins. Journal of Molecular Biology 425, 4965-4980.

      (16) Line 435-456 The author discusses the difference between two shrimp species. Did the two studies measure the same immune parameters? I wonder whether the different observation is due to true differences or different methods they used to evaluate the response. If no immune response was promoted in the previous study, what's the possible anti-viral mechanism?

      We appreciated your comments. Firstly, the shrimps in the two experimental groups have different adaptability to temperature. The optimal water temperature for M. japonicus growth ranges from 25 to 32 °C, and the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. Secondly, the experimental environmental factors are different in the two experimental groups. Ammonia is a key stress factor in aquatic environments that usually increases the risk of pathogenic diseases in aquatic animals, however, High temperatures (32°C) have been shown to inhibit the replication of WSSV and reduce mortality in WSSV-infected shrimp. Thirdly, the two studies tested different immune indicators. Ammonia-induced Hsf1 suppressed the production and function of MjVago-L, an arthropod interferon analog. In this study, our findings revealed the molecular mechanism through which the HSF-AMPs axis mediates host resistance to viruses induced by febrile temperature. Taken together, the benefits of HSF1 can be attributed to either the host or the pathogen, depending on the nature and context of the host-virus-environment interaction.

      (17) Line 472 "directly bind to WSSV envelope proteins and inhibit WSSV proliferation"

      I think it is confusing to use "proliferation" here. It seems that the binding of HSF affects the replication process. However, based on the authors' discussion, HSF may likely block viral entry.

      Thank you for your suggestion. We have modified this description in Lines 505-507 in the current manuscript. Thank you.

      Reviewer #3 (Recommendations for the authors):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved. Following are my specific concerns.

      Major comments

      (1) The study design is pretty good, but the logical flow is not. The following should be improved.

      (a) In Figure 1, the reason for selecting HSF1 as the focus of the study is not clearly explained.

      Thank you for your comment. In a previous study, we have revealed that heat shock proteins exerted a significant role in enhancing the resistance of shrimp to WSSV at elevated temperature (32 ℃) (Xiao et al., 2024). GO functional enrichment analysis of DEGs between group TW and group W, indicating that most DEGs were involved in biological processes such as protein refolding, chaperone-mediated protein folding, and heat response. Therefore, special attention has been paid to heat shock factor 1 (HSF1), the master regulator of the heat shock response. We have added the description in Lines 136-138 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (b) As the authors draw models in Figure 9, the established activation mechanism of HSF1 is via trimerization by the release of HSP90, which binds to misfolded proteins under stress conditions, such as heat shock. Therefore, the increase in the HSF1 mRNA level in Figure 1 is strange. The authors need to clarify this issue by explaining this established activation mechanism of HSF1 and also must provide the basis of upregulation of HSF1 by mRNA increase via citing papers in the Introduction.

      We appreciated your comments. Under non-stress conditions, HSF monomers are retained in the cytoplasm in a complex with HSP90. During the stress response, such as high temperature, HSF dissociates from the complex, trimerizes, and converts into a DNA-binding conformation through regulatory upstream promoter elements known as heat shock elements (HSEs) (Andrasi et al., 2021). Previous studies have demonstrated that the expression of HSF1 was remarkably induced by stress response, such as high temperature (Ren et al., 2025), virus infection (Merkling et al., 2015), and ammonia stress (Wang et al., 2024). Our results also showed that the expression of LvHSF1 was significant induced by WSSV infection and high temperature (Figure 2). Therefore, this is not surprising that the increase in the HSF1 mRNA level in Figure 1.

      In response, we have revised the proposed model to better reflect our experimental findings and the accompanying description. This revision ensures that the schematic is consistent with our data and accurately represents the proposed mechanism. We appreciate your careful review and constructive feedback.

      Reference:

      Andrasi, N., Pettko-Szandtner, A., Szabados, L., 2021. Diversity of plant heat shock factors: regulation, interactions, and functions. J Exp Bot 72, 1558-1575.

      Ren, Q., Li, L., Liu, L., Li, J., Shi, C., Sun, Y., Yao, X., Hou, Z., Xiang, S., 2025. The molecular mechanism of temperature-dependent phase separation of heat shock factor 1. Nature Chemical Biology.

      Merkling, S.H., Overheul, G.J., van Mierlo, J.T., Arends, D., Gilissen, C., van Rij, R.P., 2015. The heat shock response restricts virus infection in Drosophila. Sci Rep 5, 12758.

      Wang, X.X., Zhang, H., Gao, J., Wang, X.W., 2024. Ammonia stress-induced heat shock factor 1 enhances white spot syndrome virus infection by targeting the interferon-like system in shrimp. mBio 15, e0313623.

      (c) For RNA seq analysis in both in Figures 1 and 5, they need to provide changes in conventional HSF1 target chaperones (many HSPs) to validate their RNA seq data.

      Thank you for your suggestion. In Authopr response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024). We have added the description in Lines 136-138 in the revised manuscript.

      In Figure 5, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (d) In Figure 5, they did experiments by focusing on the changes by HSF1 knockdown at 32 ℃. However, the logical flow should be focusing on genes whose expression was increased by 32 ℃ compared with 25 ℃ (in figure 1), among them they need to characterize HSF1 target genes. Here as mentioned above, classical HSP genes must be included in addition to those AMP genes.

      Thank you for your suggestion. As your good suggestion, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      (e) What is the logical basis of just picking nSWD? It is another example of cherry-picking similar to picking HSF1 in Figure 1.

      We appreciated your comments. To determine how temperature-induced LvHSF1 restricts WSSV infection, RNA-seq was performed to identify target genes regulated by HSF1. By analyzing the differentially expressed genes (DEGs), we screened eight candidate proteins for immunity-effector molecules, including SWD, CrustinⅠ, C-type lectin, Anti-lipopolysaccharide factor (ALF), and Vago. CrustinⅠ has been shown to play an important role in antiviral immunity (Li et al., 2020); C-type lectin (CTL1) can bind to the VP28, VP26, VP24, VP19, and VP14, thereby inhibiting the infection of WSSV (Zhao et al., 2009); Anti-lipopolysaccharide factor (ALF3) performs its anti-WSSV activity by binding to the envelope protein WSSV189 (Methatham et al., 2017); Vago can inhibit WSSV infection by activating the Jak/Stat pathway in shrimp (Gao et al., 2021). However, the detailed regulatory mechanism of SWD against WSSV was unclear, and particular attention was paid to the SWD. We have added the description in Lines 215-220 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Li, S., Lv, X., Yu, Y., Zhang, X., Li, F., 2020. Molecular and Functional Diversity of Crustin-Like Genes in the Shrimp Litopenaeus vannamei, Marine Drugs 18, 361.

      Zhao, Z.Y., Yin, Z.X., Xu, X.P., Weng, S.P., Rao, X.Y., Dai, Z.X., Luo, Y.W., Yang, G., Li, Z.S., Guan, H.J., Li, S.D., Chan, S.M., Yu, X.Q., He, J.G., 2009. A novel C-type lectin from the shrimp Litopenaeus vannamei possesses anti-white spot syndrome virus activity. Journal of Virology 83, 347-356.

      Methatham, T., Boonchuen, P., Jaree, P., Tassanakajon, A., Somboonwiwat, K., 2017. Antiviral action of the antimicrobial peptide ALFPm3 from Penaeus monodon against white spot syndrome virus. Dev Comp Immunol 69, 23-32.

      Gao, J., Zhao, B.R., Zhang, H., You, Y.L., Li, F., Wang, X.W., 2021. Interferon functional analog activates antiviral Jak/Stat signaling through integrin in an arthropod. Cell Rep 36, 109761.

      (f) Likewise, choosing Atta in S2 cells needs logic.

      We appreciated your comments. Our manuscript revealed that febrile temperature inducible HSF1 confers virus resistance by regulating the expression of antimicrobial peptides (AMPs) in L. vannamei. Further, we want to know that whether HSF1 regulation of antimicrobial peptides is a conserved defense mechanism induced by elevated temperature in arthropods, and experiments were performed in an invertebrate model system (Drosophila S2 cells). Previous study showed that DmAMPs (such as Attacin A, Cecropins A, Defensin, Metchnikowin, and Drosomycin) exerted a significant role in the antiviral immunity in Drosophila (Zhu et al., 2013). Our results showed that the expression of Attacin A, Cecropins A and Defensin were remarkably induced by DmHSF, and the expression of Attacin A was the highest induced. Therefore, DmAtta was chosen as a representative to further demonstrate that DmHSF1 exerts its anti-DCV function by regulating DmAMPs. We have added the description in Lines 328-330 and Lines 361-364 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Zhu, F., Ding, H., Zhu, B., 2013. Transcriptional profiling of Drosophila S2 cells in early response to Drosophila C virus. Virol J 10, 210.

      (2) From Figure 6I to 6K, the authors aimed to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperatures. However, what they showed was just showing that nSWD plays anti-WSSV function downstream of HSF1. The authors should show additional data for dsControl+rnSWD.

      Thank you for your suggestion. As your suggestion, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      (3) For the physical interaction between nSWD and WSSV, it will be great if the authors perform Alphafold3 prediction analysis (Abramson et al PMID: 38718835).

      Thank you for your suggestion. As you suggestion, we performed Alphafold3 prediction analysis on SWD and WSSV (VP24 and VP26). The predicted template modeling (pTM) score measures the accuracy of the entire structure. A pTM score above 0.5 means the overall predicted fold for the complex might be similar to the true structure. The Alphafold3 prediction results show that there is a possible interaction between SWD and WSSV. Notably, our manuscript demonstrated that rSWD could interact with VP24 and VP26 by pulldown assays and confocal analysis.

      Author response image 3.

      Alphafold3 prediction analysis of SWD&VP24 as follow (pTM = 0.64)

      Author response image 4.

      Alphafold3 prediction analysis of SWD&VP26 as follow (pTM = 0.53)

      Minor comments

      (1) In the Abstract and many other places, the authors need to specifically write "Drosophila S2 cells" instead of "Drosophila" because conventionally Drosophila implies fruit fly as an organism. We don't say cultured human cells as "human" or "Homo sapiens" in papers.

      Thank you for your suggestion. We have modified the description of Drosophila in the revised manuscript. Thank you.

      (2) Figure numbers can be reduced for better readability. I would combine Figures 1 and 2, and Figures 3 and 4. If the combined figures are too crowded, some can go to into supplementary figures.

      Thank you for your suggestion. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. However, we have added some experimental data to Figures 1, 2, 3, and 4. Therefore, we did not combine Figure 1 and Figure 2, and Figures 3 and 4. Thank you.

      (3) One of the best-understood roles of HSF1 in physiology other than heat shock response is longevity, in particular with C. elegans. The authors need to mention this in the Discussion by citing the following recent review paper (Lee PMID: 36380728).

      Thank you for your suggestion. We have supplemented the description of HSF1 regulating longevity and aging of organisms and cited the above reference in the revised manuscript (Lee and Lee, 2022). Thank you.

      Reference:

      Lee, H., Lee, S.V., 2022. Recent Progress in Regulation of Aging by Insulin/IGF-1 Signaling in Caenorhabditis elegans. Mol Cells 45, 763-770.

      (4) Please make your own label for small letter panels or transfer small letter panels to supplementary figures.

      Thank you for your suggestion. We have adjusted the relevant letter labels. The uppercase letters represent the main image of the Figure, and the small letter panels are the corresponding supplementary instructions in the revised manuscript. Thank you.

      (5) In the introduction part, I recommend changing the references for HSFs and HSR with recent ones.

      Thank you for your suggestion. We have added the latest references for HSFs and HSR in the Introduction part of the revised manuscript. Thank you.

      (6) In Figure 1, it is not intuitive to understand the name groups W and TW.

      We appreciated your comments. We have added the description of Group W and Group TW in revised Figure 1. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing. Thank you.

      (7) Please add some kinds of sequence comparisons of SWD and nSWD for readers to understand the homology.

      We appreciated your comments. We have added the multiple sequence alignment of SWD proteins in shrimp species in revised Supplementary Figure 3. Highly conserved amino acid residues and cysteine and residues are highlighted in red, indicating that LvSWD is a conserved antimicrobial peptide of the Crustin family. Thank you.

      (8) Naming nSWD with "newly identified" is strange as it will not be new anymore as time goes by. Please change the name.

      Thank you for your suggestion. We have modified the name of nSWD to SWD in the revised manuscript. Thank you.

      (9) Please write the full name for Lv (Litopenaeus vannamei), Dm (Drosophila melanogaster), ds (double-stranded) before using LvHSF1, DmHSF1, and dsLvHSF1.

      Thank you for your comments. We have added the full name of LvHSF1, DmHSF1, and dsLvHSF1 in the revised manuscript. Thank you.

      (10) In Figure 2, it will be better to transfer poly I:C data to supplementary figures.

      Thank you for your comments. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. Thank you.

      (11) The label for pGL3-nSWD-M12 is confusing. M1 and M2 are OK. Please change M12 with M1/2 or another one.

      Thank you for your suggestion. We have changed pGL3-nSWD-M12 with pGL3-nSWD-M1/2 in the revised manuscript. Thank you.

    1. eLife Assessment

      This article presents valuable findings on how the timing of cooling affects autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework for the various ways in which warming can affect bud set timing. The statistical analysis is very well considered, while indicating some factors that may temper the authors' claims. The factorial experiments offer solid support.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      This study provided key experimental evidence for the "Solstice-as-Phenology-Switch Hypothesis" through two temperature manipulation experiments.

      Strengths:

      The research is data-rich, particularly in exploring the effects of pre- and post-solstice cooling, as well as daytime versus nighttime cooling, on bud set timing, showcasing significant innovation. The article is well-written, logically clear, and is likely to attract a wide readership.

    3. Reviewer #2 (Public review):

      In 'Developmental constraints mediate the reversal of temperature effects on the autumn phenology of European beech after the summer solstice', Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I think the experiments are interesting, but note that the treatments are extreme compared to natural conditions. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species.

    4. Author Response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This article presents useful findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework for the various ways in which warming can affect but set timing. The statistical analysis is compelling, but indicates some factors that may temper the authors' claims, while the designs of experiments offer incomplete support for the current claims as they rely on one population under extreme conditions for only one year each while a confounding effect (time in a chamber) sometimes lacks a control.

      We thank the editor and reviewers for their consideration of our revised manuscript and for their constructive suggestions. In response to the editor’s guidance, we have ensured that: 1) the experimental design is clearly presented as physiological forcing, 2) the Solstice-as-Phenology-Switch concept is explicitly defined, limited, and framed as inferred, 3) conclusions are strictly aligned with the scope of the evidence, and limitations are acknowledged transparently.

      We hope these revisions fully address the remaining concerns and clarify both the conceptual framework and the appropriate scope of inference.

      Public Review:

      Reviewer #1 (Public review):

      The authors identified the summer solstice (June 21) as a phenological "switch point", but the flexibility of this switch point remains poorly understood. A more precise explanation of what "flexibility" means in this context is needed, along with a description of the specific experimental results that would demonstrate this flexibility.

      We agree that the concept of “flexibility” required clearer definition and a more explicit link to the experimental results. In the Introduction, we now explicitly define flexibility as the capacity for the effective timing of the phenological switch to shift earlier or later depending on developmental progression, rather than occurring at a fixed calendar date. This switch occurs at the compensatory point between the antagonistic influences of early-season development [ESD effect] and late-season temperature [LST effect](L92-98). We have extended and clarified our explanation of the summer solstice’s role in this framework (L69-90). We propose that the solstice acts as an environmental switch that initiates the LST effect, as declining daylengths signal trees to become responsive to late-season cooling (L92-94). The compensatory point then occurs where the advancing ESD effect is balanced by the delaying LST effect. This point should therefore not be fixed to a calendar date but instead vary with developmental progression each year (L75-95).

      In the Discussion, we clarify that flexibility is demonstrated experimentally by the observation that the magnitude of July cooling effects (LST effect) on autumn phenology depend on prior developmental rate (ESD effect) [3.4 times greater delay in late-leafing trees], indicating that the position of the compensatory point is development-dependent rather than fixed to June 21 (L398-410). We have made consistent edits throughout the Discussion, in particular in the ‘Support for the Solstice-as-Phenology-Switch Hypothesis’ subsection (L514-530).

      The experiment did not directly measure the specific date of the phenological switch point. Instead, it was inferred by comparing temperature effects before and after the solstice. The manuscript should clearly state that this switch point remains an inferred conceptual node rather than a directly measured variable.

      We fully agree and have clarified this in the revised manuscript. In the Discussion, we now clearly state that the compensatory point is a conceptual node inferred from responses to cooling before the solstice (June), directly after it (July), or later in the growing season (August) rather than a directly observed phenological event (L352-358 & L405-406).

      In Experiment 1, the effect of bud type (terminal vs. lateral) was inconsistent across the overall model and the different leafing groups. The authors should provide a more thorough discussion of potential reasons for this inconsistency.

      This inconsistency reflects biological complexity. In the Discussion, we now expand our interpretation to note that terminal and lateral buds may differ in developmental status, resource allocation and hormonal context. We emphasize that bud-type effects are therefore expected to be context-dependent and to interact with wholeplant developmental state, which plausibly explains why effects differ across leafing groups and models (L390-396).

      In addition, the statistical model for Experiment 1 indicates that the measured variables (summer cooling and leaf emergence date) explain only 23.4% of the variation in bud formation timing. This leaves over 76% of the variation unexplained, suggesting that other important factors are involved. The discussion should address this limitation in greater depth, moving beyond a focus on the measured variables.

      We now discuss the explained and unexplained variance in more detail. We also make it clear that our experiment was designed to test specific mechanistic pathways rather than to fully explain all phenological variability or maximise predictive power L417-419).

      In the Discussion, we acknowledge that a substantial fraction of variation remains unexplained (L419-421). We discuss the possibility of other physiological mechanisms, such as photosynthetic assimilation, contributing to the unexplained variation (L421-427). However, large inter-individual variability is commonplace in autumn phenology. A low intra-class correlation coefficient (ICC = 0.26; see L276-280 for methods) suggests much of the remaining variation is attributable to individual-level differences rather than missing explanatory variables (L429-431). In line with the literature, we suggest that genetic and epigenetic differences likely contributed significantly to inter-individual variation, even within a single provenance population (L431-434). In this context of high individual variability, leaf-out timing (ESD effect) and summer cooling treatment (LST effect) together explaining 23.4% of variation in bud set timing is biologically meaningful and demonstrates the mechanistic importance of these processes (L438-441). For completeness, we also briefly discuss alternate sources of within-treatment variability (L434-437).

      Reviewer #2 (Public review):

      I think the experiments are interesting, but I found the exact methods of them somewhat extreme compared to how the authors present them.

      We appreciate this concern and have substantially revised the manuscript to clarify the experimental logic. In the Introduction, we now state explicitly that the study uses temperature regimes that were designed as strong physiological forcing treatments, intended to deeply constrain development and isolate mechanisms rather than to simulate natural or future climatic conditions (L113-115).

      In the Methods, we have enhanced our description of the non-linear effects of temperatures below 10°C on physiological processes (L154-158).

      At the start of the Discussion, we have added a dedicated paragraph clarifying the scope of inference: the experiment tests causality and constraint (i.e. whether specific physiological processes can drive phenological shifts), not quantitative responses under realistic climate scenarios (L346-363). Throughout the Discussion, we have revised language that could be read as scenario-based interpretation, replacing it with mechanistic phrasing.

      Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species.

      Given the large individual variation expected in phenological experiments, we used single experimental populations of single provenance beech saplings to minimise uncontrolled for variation arising from genetic differences (L358-360). This allowed us to elucidate mechanisms despite noisy biological heterogeneity associated with phenology.

      In the last round of revision, we toned down statements of generalisation. In the Discussion, we now go further to clarify what mechanistic understanding can be gleamed directly from our findings and then cautiously make suggestions how these mechanisms may play out in natural systems. We repeatedly state the intention of the study as mechanistic inference rather than predictive power, e.g. “However, extrapolations to more complex natural ecosystems should be made with caution as our experimental design prioritised mechanistic inference over generalisability and predictive power.” (L417-419). Alongside our previous calls for tests on other species, we now additionally call for tests on other provenances of beech (L511-512).

      I was also very concerned by the revisions.

      If this concern stems from the confusion regarding line-numbers and the two submitted versions of the manuscript (with tracked changes and without tracked changes; as required by eLife), then we hope that situation is now clarified. Otherwise, the authors do not understand why our previous revisions would be perceived as being concerning. Regardless, we have made every attempt to address the remaining comments comprehensively.

      Further, I am at a loss about their hypothesis, when they write in their letter: "Importantly, the Solstice-asPhenology-Switch hypothesis does not assume that the reversal is fixed to June 21." Why on earth reference the solstice if the authors do not mean to exactly reference the solstice?

      We appreciate this important conceptual point. The Solstice-as-Phenology-Switch hypothesis is central to our conceptual model and therefore requires clear explanation. In concert with our changes in response to Reviewer 1’s comment regarding flexibility, we have substantially revised and improved our description of this hypothesis (L69-108).

      Whilst the summer solstice is fixed to a calendar date (June 21), the timing of when trees change their autumn phenological responses to temperature is not (L88-90 & L515-517). This occurs when the compensatory point of two antagonistic effects is crossed. Higher early-season development rates (which are driven by temperature) have an advancing (negative) effect on autumn phenology, which we now refer to as the ESD effect (L71-78). Warmer late-season temperatures have a delaying (positive) effect because trees become phenologically susceptible to cooling, i.e. overwintering responses are induced in response to cooling, which we now refer to as the LST effect (L78-82). The point in time when these two effects balance each other out, i.e. the net effect = 0, is the compensatory point (L95-97 & L523-525). The reason this point occurs after the solstice, is because the LST effect only becomes active when days begin to shorten (L92-94 & L522-523). The solstice acts as an environmental switch, initiating trees’ susceptibility to cooling. Therefore, the solstice is referenced in the hypothesis because it forms a daylength barrier. In this framework, the compensatory point cannot occur earlier than the solstice because day lengths are still increasing (L517-519).

      In the Introduction and Discussion, we clarify that the solstice is referenced as a biologically meaningful photoperiodic cue, not as a fixed threshold date. We now emphasise that the hypothesis concerns a seasonal reversal in responses to temperature structured around photoperiod, whose effective timing depends on developmental state, rather than a reversal occurring precisely on June 21. To avoid confusion, we have reworded phrases such as “summer solstice effect reversal” to “reversal of phenological responses to temperature after the summer solstice” (L371). In accordance, we have also changed the title to “Developmental constraints mediate the reversal of temperature effects on the autumn phenology of European beech after the summer solstice”.

      The following comments stem from the first round of review. We have previously revised the manuscript in accordance with these comments. For most of these points we do not see further cause for changes except for any overlap with comments above. We therefore predominantly copy our previous responses in quotes for clarity, the exception being the comment regarding the framing of our results in relation to natural systems.

      The comments below relate to my original review with many of them still applying.

      Methods: As I read the Results I was surprised the authors did not give more info on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods I feared they were burying this as the methods feel quite extreme given the framing of the paper.

      “We understand the concern regarding the structure of the manuscript and note that the methods section was moved to the end of the paper in accordance with eLife’s recommended formatting. We have now moved the methods section before the results to ensure that readers are familiar with the treatments before encountering the outcomes.

      Regarding presentation, treatment details are now described in both the Methods and the relevant figure legends. Given this structure, we have chosen not to restate the full treatment conditions in the main Results text to avoid repetition.”

      The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe of which I have worked in. For example a low of 2 deg C at night and 7 deg C during the day through end of May and then 7/13 deg C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      We appreciate the reviewer’s concern regarding the use of relatively extreme temperature treatments and the need to ensure that our conclusions are consistent with the motivation for using them. The manuscript was also revised in this regard in the previous round, and we copy the relevant responses at the bottom of this response. Despite this, we agree that further explanation of how our experimental treatments suited the aims of our study was still required.

      The aim of these treatments was not to reproduce typical ambient conditions, but to act as a mechanistic probe. Such mechanisms are not readily identifiable from observations or mild manipulations, because the expected effects are small relative to natural variability; stronger perturbations are therefore required to generate a diagnostic contrast. By strongly constraining development in the early-season, and by providing a robust cooling signal in the late-season, we sought to reveal the causal structure underlying the observed solstice-related reversal in temperature effects on autumn phenology.

      Temperatures below 10°C intensively slow down cell division and mitotic rates, these rates then rapidly and non-linearly approach 0 as temperatures drop towards 0°C (Körner, 2021). As reflected in L152-158 of the revised manuscript, we selected a spring cooling regime of 2–7 °C to strongly slow developmental processes while maintaining a clear thermal safety margin that eliminates the risk of frost damage. Although a milder cooling regime (e.g. 5–10 °C) would be less extreme, it would also be expected to produce only a comparatively small reduction in developmental rates, thereby substantially reducing our ability to generate distinct early- and late-developing individuals and to detect carry-over effects on autumn phenology. Applying strong cooling therefore increases signal-to-noise and allows us to detect the underlying mechanism, which would not be possible with temperature treatments that represent average contemporary climatic variation.

      The use of conditions out with the norm is a standard practice to elucidate mechanisms in ecology, where organisms are often pushed to their physiological limits or transplanted into environments fundamentally different to those which they are adapted (Somero, 2010; Berend et al., 2019). Experiments targeting autumn phenology have utilised a broad range of environmental conditions from moderate to extreme manipulations (Tanino et al., 2010). For example, to test the controls of growth cessation and dormancy induction in Prunus species, one study applied a range of treatments including constant 9°C temperature and 24 hour photoperiod between April and July (Heide, 2008).

      Our experimental design aimed to reduce rates of development, cell division and maturation. In the Methods, we describe this aim and clearly state that the experimental design was not intended to mimic natural climatic variation (L154-156 & L181-186). Importantly, our conclusions are framed at the level of direction, timing, and interaction of effects, rather than the magnitude expected under contemporary or future field conditions (L360-363).

      This framing intends to reflect the primary inference of this study, which concerns when and why temperature effects reverse around the solstice, and how this timing depends on developmental state and diel temperature exposure, rather than making quantitative predictions for present-day or future climates. This aligns our conclusions with the experimental design. We have further revised the Discussion to explain these aims and conclusions more clearly, including the addition of a subsection at the beginning titled “Experimental forcing and scope of inference” (L346-363). We have also set up this expectation in the Introduction (L113-115).

      Additionally, we have improved the Discussion in a number of related aspects.

      We explicitly separate mechanistic conclusions and any relation to natural systems, remaining cautious to not overgeneralise or overstate our findings (L417-419).

      We now include a dedicated paragraph explaining that, although these specific conditions are not likely to be found in beech’s range, analogous developmental constraints can arise during cold springs, late cold spells following budburst, or at high-elevation and continental sites where temperatures remain low despite increasing photoperiod (L540-545, L583-588). We further explain that because developmental progression integrates temperature cumulatively over time, even short episodes of strong cooling can exert lasting carry-over effects on seasonal timing, thereby linking the forced experimental responses to processes relevant under natural, fluctuating conditions (L545-550).

      We explicitly state that the decoupling of day and night temperatures was not intended to represent realistic meteorological states (L458-460). We explain that this design was used diagnostically to isolate inherently diel physiological processes (e.g. nocturnal growth, cell division and expansion versus daytime carbon assimilation), and that the observed responses demonstrate the importance of diel timing of temperature exposure rather than the realism of the imposed cycles (L460-468).

      Previous response:

      We recognise that our temperature treatments were severe and do not mimic real world scenarios. They were deliberately designed to create large contrasts in developmental rates, thereby maximising our ability to detect the mechanisms underpinning the solstice switch. For example, the severe cooling between 4 April and 24 May was specifically designed to slow spring development as much as possible without damaging the plants. We have added text in the Methods to clarify this aim.

      I also think the control is confounded with growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2) so I think they need to be more upfront about this. The study is still very valuable, but -- again -- we may need to be more cautious in how much we infer from the results.

      We appreciate the reviewer’s concern about the potential confounding effect of chamber exposure in experiment 1. We have now discussed this limitation more explicitly, adding further explanation to the Methods and Discussion.

      Note that chamber-related problems (e.g. aphid infestations) primarily occurred under warm chamber conditions, whereas our experiment 1 cooling treatments maintained low temperatures that suppressed such issues. This means that an equivalent “warm chamber control” could have been associated with its own artefacts, as trees kept under warm chamber conditions would have been exposed to additional stressors that were not present under natural growing conditions. To address this point, we included a chamber control in experiment 2. While aphid abundance was indeed higher in the warm chamber controls, chamber exposure itself had no detectable effect on autumn phenology. This suggests that the main findings of experiment 1 are unlikely to be artefacts of chamber conditions.

      Nevertheless, we agree that chamber exposure remains a potential limitation of experiment 1, which requires clear acknowledgement. We now state this more explicitly in the manuscript while also emphasising that our results are supported by experiment 2 and by converging lines of external evidence.

      Also, I suggest the authors add a figure to explain their experiments as they are very hard to follow. Perhaps this could be added to Figure 1?

      We have now added figures to the methods section to depict the experimental timelines and settings more clearly (Figs. 2 and 3).

      Finally, given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      We agree that carbon assimilation is an important component of forest carbon dynamics. However, the primary aim of this study was to identify how developmental state and diel cycles mediate temperature effects on autumn phenology, rather than to quantify carbon assimilation per se. Assessing photosynthetic controls on autumn phenology would require a substantially different experimental design and is therefore beyond the scope of the present study.

      That said, we were able to include measurements of photosynthetic assimilation during pre-solstice cooling (now presented as Fig. S12 for all treatments). These data show that cooling strongly reduced assimilation across all treatments, despite their markedly different phenological outcomes. This supports our interpretation that variation in assimilation alone cannot explain the observed phenological responses, consistent with previous manipulative and observational studies reporting a weak role of late-season assimilation in controlling autumn phenology.

      Fagus sylvatica: Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late) so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      We agree that Fagus sylvatica has a stronger photoperiod dependence than many other European tree species. As we note in our response to Reviewer 1, our findings align with previous research across temperate northern forests. Within our framework, interspecific variation in leaf-out timing would not alter the overall response pattern, though it could shift the specific timing of effect reversals. For example, earlier-leafing species may approach completion of development sooner and thus show sensitivity to late-season cooling earlier than F. sylvatica. Nevertheless, we acknowledge the importance of not overstating generality. We have therefore revised the manuscript to phrase conclusions more cautiously and highlight the need for further research across species.

      And the referenced response to Reviewer one:

      We agree that extrapolation from our experiments on Fagus sylvatica to other species and natural forests requires caution. However, it is precisely the controlled nature of our design that allowed us to isolate the precise mechanisms that appear to underpin the solstice switch, highlighting the role of diel and seasonal temperature variation. In natural systems, additional variables such as competition, precipitation, and soil heterogeneity can strongly influence phenology, but they also make it difficult to disentangle causal mechanisms. By minimising these confounding factors, our experiment provided a clear test of how temperature before and after the solstice regulates growth cessation.

      To acknowledge the limitation, we have toned down statements about generalisation (e.g. “likely generalisable” to “other temperate tree species may display similarities”) and explicitly call for follow-up studies across species and forest contexts. At the same time, we highlight that our findings align with independent evidence from manipulative experiments, satellite observations, flux measurements, and groundbased phenology, which suggests the mechanisms we report may extend beyond the specific populations studied here.”

      As described in responses above, we have further clarified what can be directly concluded from our study, avoiding overgeneralisation.

      Measuring end of season (EOS): It's well known that different parts of plants shut down at different times and each metric of end of season -- budset, end of radial expansion, leaf coloring etc. -- relate to different things. Thus I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised the authors cite almost none of the literature on budset, which generally suggests is it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may different with a different population of plants. 

      We thank the reviewer for pointing out that our discussion of the responses of different EOS metrics needs more clarity. We agree with much of this perspective, and we have added an additional analysis of leaf chlorophyll content data to use leaf discolouration as an alternative EOS marker. On this we would like to make two important points:

      Firstly, we agree that bud set often occurs before leaf discolouration, although this can depend on which definition of leaf discolouration is used. In experiment 1, budset occurred on average on day-of-year (DOY) 262 and leaf senescence (50% loss of leaf chlorophyll) occurred on DOY 320. However, we do not necessarily agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and loss of leaf chlorophyll) are similar, even if only directionally. Figure S11 shows how, across both experiments, treatment effects were tightly conserved (R<sup>2</sup> = 0.49) amongst the two phenometrics. In accordance with these revisions, we have updated the manuscript title to “Developmental constraints mediate the summer solstice reversal of climate effects on the autumn phenology of European beech”.

      Secondly, shifts in bud set timing remain the primary focus of the manuscript as these shifts are of direct physiological relevance to plant development and dormancy induction, whereas leaf discolouration may simply follow bud set as a symptom of developmental completion. This is supported by our results, which show stronger responses of bud set than leaf senescence (Figs. 4 & 5 vs. Figs. S9 & S10).

      Following the reviewer’s suggestion, we have included more references on the topic of bud set and its environmental controls. The reviewer rightly stresses that photoperiod is considered the most important factor. Photoperiod is therefore key in our conceptual model. However, the responses we observed in F. sylvatica cannot be explained by photoperiod alone. For example, in experiment 1, July cooling delayed the autumn phenology of late-leafing trees but had negligible impact on early-leafing trees, even though both experienced the exact same photoperiod. Moreover, in experiment 2, day, night and full-day cooling showed substantial variations in their effects despite equal photoperiod across the climate regimes. This is why we suggest that the annual progression of photoperiod modulates the responses to temperature variations instead of eliciting complete control.

      Following the addition of an analysis of leaf senescence data, we also revised the terminology in places (including the title) from “primary growth cessation/bud set” to the broader term “autumn phenology.” This term is intended to encompass two distinct but related physiological processes—bud set and leaf senescence—both of which are commonly used as markers of autumn phenology and the end of the growing season.

      Somewhat minor comments:

      (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.

      We have revised the analysis to include bud type as a fixed effect. There are only very minor numerical adjustments (e.g. rounding to 4.8 days instead of 4.9) and inferences are not altered. We also report the bud type effects for experiment 1 and experiment 2.

      (2) I didn't fully see how the authors results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end of season timing?

      Our responses to the main comments in this new round of revision have comprehensively covered this topic.

      References

      Berend K, Haynes K, MacKenzie CM. 2019. Common garden experiments as a dynamic tool for ecological studies of alpine plants and communities in northeastern North America. Rhodora 121: 174.

      Heide OM. 2008. Interaction of photoperiod and temperature in the control of growth and dormancy of Prunus species. Scientia Horticulturae 115: 309–314.

      Körner C. 2021. Alpine Plant Life: Functional Plant Ecology of High Mountain Ecosystems. Cham: Springer International Publishing.

      Somero GN. 2010. The physiology of climate change: how potentials for acclimatization and genetic adaptation will determine ‘winners’ and ‘losers’. Journal of Experimental Biology 213: 912–920.

      Tanino KK, Kalcsits L, Silim S, Kendall E, Gray GR. 2010. Temperature-driven plasticity in growth cessation and dormancy development in deciduous woody plants: a working hypothesis suggesting how molecular and cellular function is affected by temperature during dormancy induction. Plant Molecular Biology 73: 49–65.

    1. eLife Assessment

      This important study combined careful computational modeling, a large patient sample, and replication in an independent general population sample to provide convincing evidence in support of a computational account of a difference in risk-taking between people who have attempted suicide and those who have not. It is proposed that this difference reflects a general change in the approach to risky (high-reward) options and a lower emotional response to certain rewards. While the findings advance our understanding of cognitive mechanisms at the group level, the observation that computational phenotype is predictive of suicidal behavior only in the clinical sample and not in the online sample limits its applicability for individual prediction, early detection and prevention of suicidality.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use a gambling task with momentary mood ratings from Rutledge et al. and compare computational models of choice and mood to identify markers of decisional and affective impairments underlying risk-prone behavior in adolescents with suicidal thoughts and behaviors (STB). The results show that adolescents with STB show enhanced gambling behavior (choosing the gamble rather than the sure amount), and this is driven by a bias towards the largest possible win rather than insensitivity to possible losses. Moreover, this group shows a diminished effect of receiving a certain reward (in the non-gambling trials) on mood. The results were replicated in a general online sample where participants were divided into groups with or without STB based on their self-report of suicidal ideation on one question in the Beck Depression Inventory self-report instrument. The authors suggest, therefore, that adolescents diagnosed with depression or anxiety with decreased sensitivity to certain rewards may need to be monitored more closely for STB due to their increased propensity to take risky decisions aimed at (expected) gains (such as relief from an unbearable situation through suicide) regardless of the potential losses. However, such a result was only found in the clinical sample and cannot be generalized more broadly based on the current findings.

      Strengths:

      (1) The study uses a previously validated task design and replicates previously found results through well-explained model-free and model-based analyses.

      (2) Sampling of adolescents at high risk can help target early preventative diagnoses and treatments for suicide.

      (3) Replication of the results in an online cohort increases confidence in the findings.

      (4) The models considered for comparison are thorough and well-motivated. The chosen models allow for teasing apart which decision and mood sensitivity parameters relate to risky decision-making across groups based on their hypotheses.

      (5) Novel finding of mood (in)sensitivity to non-risky rewards and its relationship with risk behavior in STB.

      Weaknesses:

      (1) Sample size of 25 for S- group is low-powered, which is explicitly mentioned as a study limitation.

      (2) Modeling in the mediation analysis focused on predicting risk behavior in this task from the model-derived bias for gains and suicidal symptom scores. Thus, the implications of this work are more relevant to a basic-science understanding of the etiology of suicidal behavior than they are useful as a predictor of suicidal behavior, and it is not clear that a psychiatrist or psychologist could use this task to potentially determine who is at higher risk of attempting suicide and must be more closely monitored. Indeed, relationships between task parameters and behavior and suicidal behavior was limited to the clinical sample with a diagnosis of depression or anxiety disorder, and did not extend to the online sample. Therefore, the claim that these findings provide "computational markers for general suicidal tendency among adolescents" is unwarranted.

    3. Reviewer #2 (Public review):

      Summary:

      This article addresses a very pertinent question - what are the computational mechanisms underlying risky behaviour in patients having attempted suicide. In particular, it is impressive how the authors find a broad behavioral effect whose mechanisms they can then explain and refine through computational modeling. This work is important because currently, beyond previous suicide attempts, there has been a lack of predictive measures. This study is the first step towards that: understanding the cognition on a group level. Before then being able to include it in future predictive studies (based on the cross-sectional data, this study by itself cannot assess the predictive validity of the measure).

      Strengths:

      - Large sample size<br /> - Replication of their own findings<br /> - Well-controlled task with measures of behaviour and mood + precise and well-validated computational modeling

      Questions, based on revised manuscript and replies to other reviewers:

      (1) Replies to reviewers in general: Bayes Factors have been added, it would be good to also use common verbal terms to describe them (e.g. 'anecdotal', 'moderate' etc). For example, my reading of table S8 would be that for gambling rate there is only anecdotal evidence that it does not relate to PSWQ, BDI, and moderate evidence it does not relate to TAI.

      (2) Reply to reviewer 1 Q2 (Predicting STB):<br /> For the regression predicting suicidal ideation, it seems to me that what you did was a regression STB ~ gambling behaviour + approach + mood? Could you clarify? I had expected as a test of whether the task can predict STB risk something slightly different - a cross-validation (LOO or maybe 5-fold in the large sample): STB ~ gambling behaviour + approach [parameter from model] + mood [parameter from model]; and then computing in the left out participants: predicted STB. Then checking correlation between STB and predicted STB. This would allow testing whether the diverse task measures together predict STB (with the caveat, that it's cross-validated, rather than hold-out sample, unless you could train on one sample (in lab) and test on the other (online).

      (3) Reply to reviewer 2 Q1 (parameter recovery): I'm looking at S3, it seems to still show only the scatter plots and not the correlation matrices, which are now added as text notes. Can you actually show these matrices? An off-diagonal correlation of 0.63 appears quite high. I think it needs to be discussed exactly which parameters those are, and whether that impacts the interpretation of the results.

      (4) Reply to reviewer 3 Q3 (mood model): I would have imagined that the response would involve changing the mood equations (equation 8 main text) to include a term for whether the participant gambled or not, independent of the gamble value.

    4. Reviewer #3 (Public review):

      This manuscript investigates computational mechanisms underlying increased risk-taking behavior in adolescent patients with suicidal thoughts and behaviors. Using a well-established gambling task that incorporates momentary mood ratings and previously established computational modeling approaches, the authors identify particular aspects of choice behavior (which they term approach bias) and mood responsivity (to certain rewards) that differ as a function of suicidality. The authors replicate their findings on both clinical and large-scale non-clinical samples.

      The main problem, however, is that the results do not seem to support a specific conclusion with regard to suicidality. The S+ and S- groups differ substantially in the severity of symptoms, as can be seen by all symptom questionnaires and the baseline and mean mood, where S- is closer to HC than it is to S+. The main analyses control for illness duration and medication but not for symptom severity. The supplementary analysis in Figure S11 is insufficient as it mistakes the absence of evidence (i.e., p > 0.05) for evidence of absence. Therefore, the results do not adequately deconfound suicidality from general symptom severity.

      The second main issue is that the relationship between an increased approach bias and decreased mood response to CR is conceptually unclear. In this respect, it would be natural to test whether mood responses influence subsequent gambling choices. This could be done either within the model by having mood moderate the approach bias or outside the model using model-agnostic analyses.

      Additionally, there is a conceptual inconsistency between the choice and mood findings that partly results from the analytic strategy. The approach bias is implemented in choice as a categorical value-independent effect, whereas the mood responses always scale linearly with the magnitude of outcomes. One way to make the models more conceptually related would be to include a categorical value-independent mood response to choosing to gamble/not to gamble.

      The manuscript requires editing to improve clarity and precision. The use of terms such as "mood" and "approach motivation" is often inaccurate or not sufficiently specific. There are also many grammatical errors throughout the text.

      Claims of clinical relevance should be toned down, given that the findings are based on noisy parameter estimates whose clinical utility for the treatment of an individual patient is doubtful at best.

      Comments on revisions:'

      The authors adequately addressed my comments and I find the manuscript substantially strengthened.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study combined careful computational modeling, a large patient sample, and replication in an independent general population sample to provide a computational account of a difference in risk-taking between people who have attempted suicide and those who have not. It is proposed that this difference reflects a general change in the approach to risky (high-reward) options and a lower emotional response to certain rewards. Evidence for the specificity of the effect to suicide, however, is incomplete, which would require additional analyses.

      We thank the editors and reviewers for this important assessment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      Moreover, as Reviewer 3 pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M<sub>1</sub>), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Beyond these specific findings, this work highlights the broader utility of computational modelling and mood to better understand behavioral effect, showing how to use both mood and choice data to better comprehend a psychiatric issue. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use a gambling task with momentary mood ratings from Rutledge et al. and compare computational models of choice and mood to identify markers of decisional and affective impairments underlying risk-prone behavior in adolescents with suicidal thoughts and behaviors (STB). The results show that adolescents with STB show enhanced gambling behavior (choosing the gamble rather than the sure amount), and this is driven by a bias towards the largest possible win rather than insensitivity to possible losses. Moreover, this group shows a diminished effect of receiving a certain reward (in the non-gambling trials) on mood. The results were replicated in an undifferentiated online sample where participants were divided into groups with or without STB based on their self-report of suicidal ideation on one question in the Beck Depression Inventory self-report instrument. The authors suggest, therefore, that adolescents with decreased sensitivity to certain rewards may need to be monitored more closely for STB due to their increased propensity to take risky decisions aimed at (expected) gains (such as relief from an unbearable situation through suicide), regardless of the potential losses.

      Strengths:

      (1) The study uses a previously validated task design and replicates previously found results through well-explained model-free and model-based analyses.

      (2) Sampling choice is optimal, with adolescents at high risk; an ideal cohort to target early preventative diagnoses and treatments for suicide.

      (3) Replication of the results in an online cohort increases confidence in the findings.

      (4) The models considered for comparison are thorough and well-motivated. The chosen models allow for teasing apart which decision and mood sensitivity parameters relate to risky decision-making across groups based on their hypotheses.

      (5) Novel finding of mood (in)sensitivity to non-risky rewards and its relationship with risk behavior in STB.

      Weaknesses:

      (1) The sample size of 25 for the S- group was justified based on previous studies (lines 181-183); however, all three papers cited mention that their sample was low powered as a study limitation.

      We thank the Reviewer for rising this concern. We agree that the sample size for S<sup>-</sup> group (n=25) is modest, and the prior studies we cited also acknowledged limited power. We wanted to point out that we obtained a comparable sample size to a prior study. In the revision, we therefore updated the section to justify this sample size in which we acknowledge the limited power of our study in the limitation section. Please see our clarification below:

      Page 32:

      “Third, despite replicating our main results in an independent dataset (n=747), the modest S<sup>-</sup> subgroup size (n=25) has a limited statistical power.”

      (2) Modeling in the mediation analysis focused on predicting risk behavior in this task from the model-derived bias for gains and suicidal symptom scores. However, the prediction of clinical interest is of suicidal behaviors from task parameters/behavior - as a psychiatrist or psychologist, I would want to use this task to potentially determine who is at higher risk of attempting suicide and therefore needs to be more closely watched rather than the other way around (predicting behavior in the task from their symptom profile). Unfortunately, the analyses presented do not show that this prediction can be made using the current task. I was left wondering: is there a correlation between beta_gain and STB? It is also important to test for the same relationships between task parameters and behavior in the healthy control group, or to clarify that the recommendations for potential clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. Indeed, in line 672, the authors claim their results provide "computational markers for general suicidal tendency among adolescents", but this was not shown here, as there were no models predicting STB within patient groups or across patients and healthy controls.

      Thank you for these thoughtful comments. Our study focuses on why adolescent patients with suicidality have increased risk behavior, aiming to provide a mechanism-based target for suicide prevention. Therefore, our dependent variable in the mediation model was gambling behavior. We also agree that the clinically relevant question is whether suicidality can be predicted from task-derived behavior/parameters. We thus used risky behavior and the potential mental parameters to predict STB. Linear regressions showed that gambling behavior, as well as the value-insensitive approach parameter, can predict suicidal symptom scores among patients (former: β = 9.189, t = 2.004, p = 0.048; latter: β = 5.587, t = 2.890, p = 0.005). In healthy controls, these predictions failed (gambling behavior: β = 1.471, t = 0.825, p = 0.411; approach: β = 0.874, t = 1.178, p = 0.241). These results suggest that clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. We found same patterns for the mood parameter (mood sensitivity to certain rewards: patients: β = -28.706, t = -2.801, p = 0.006; healthy controls: β = -2.204, t = -0.528, p = 0.599). In sum, we believe that our statement of “computational markers for general suicidal tendency among adolescents” is reasonable now. Please see our revisions below:

      Page 17:

      “Furthermore, linear regression showed that gambling rate can predict the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048) among patients, but not among HC (β = 1.471, t = 0.825, p = 0.411), suggesting that gambling behavior has patient-specific predictive utility for suicidal symptoms.”

      Page 19:

      “Furthermore, linear regression showed that approach parameter can predict the current suicidal ideation score (β = 5.587, t = 2.890, p = 0.005) among patients, but not among HC (β = 0.874, t = 1.178, p = 0.241), suggesting that value-insensitive approach parameter has patient-specific predictive utility for suicidal symptoms.”

      Page 21:

      “Furthermore, linear regression showed that mood sensitivity to CR can predict the current suicidal ideation score (β = -28.706, t = -2.801, p = 0.006) among patients, but not among HC (β = -2.204, t = 0.528, p = 0.599), suggesting that mood sensitivity to CR has patient-specific predictive utility for suicidal symptoms.”

      (3) The FDR correction for multiple comparisons mentioned briefly in lines 536-538 was not clear. Which analyses were included in the FDR correction? In particular, did the correlations between gambling rate and BSI-C/BSI-W survive such correction? Were there other correlations tested here (e.g., with the TAI score or ERQ-R and ERQ-S) that should be corrected for? Did the mediation model survive FDR correction? Was there a correction for other mediation models (e.g., with BSI-W as a predictor), or was this specific model hypothesized and pre-registered, and therefore no other models were considered? Did the differences in beta_gain across groups survive FDR when including comparisons of all other parameters across groups? Because the results were replicated in the online dataset, it is ok if they did not survive FDR in the patient dataset, but it is important to be clear about this in presenting the findings in the patient dataset.

      Thank you for raising the important issue of multiple testing and for asking us to clarify exactly which tests were covered by the FDR procedure. In the clinical dataset we conducted a large number of inferential tests (χ<sup>2</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values. Please see our clarification below:

      Supplementary Page 4:

      “Supplementary Note 8: Clarification for FDR correction.

      In the clinical dataset we conducted a large number of inferential tests (χ<sup2\</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values.”

      (4) There is a lack of explicit mention when replication analyses differ from the analyses in the patient sample. For instance, the mediation model is different in the two samples: in the patient sample, it is only tested in S+ and S- groups, but not in healthy controls, and the model relates a dimensional measure of suicidal symptoms to gambling in the task, whereas in the online sample, the model includes all participants (including those who are presumably equivalent to healthy controls) and the predictor is a binary measure of S+ versus S- rather than the response to item 9 in the BDI. Indeed, some results did not replicate at all and this needs to be emphasized more as the lack of replication can be interpreted not only as "the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients" (lines 582-585) - it may also be that this link is not truly there, and without a replication it needs to be interpreted with caution.

      Thank you for these important comments. This study focused on cognitive and affective computational mechanisms underlying increased risky behavior in STB. Accordingly, we compared patients with STB (S<sup>+</sup>) with patients without STB (S<sup>-</sup>) and healthy controls (HC) to examine the effects of STB on risky behavior. Therefore, group comparison, instead of dimensional measure of suicidal symptoms by Beck Scale for Suicidal Ideation, can answer our research questions directly.

      To enhance consistency between the clinical and replication datasets, we included all participants in each dataset when performing the mediation analysis. Given that S<sup>-</sup> and HC did not differ in gambling behavior or the approach parameter in the clinical dataset, we merged these two groups. In the replication dataset, to mirror the S<sup>+</sup> vs. S<sup>-</sup> contrast used clinically, we categorized the general sample into S+ and S<sup>-</sup> based on BDI item 9. The mediation results remained significant in both datasets (the clinical dataset: a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; the replication dataset: a×b = 0.143, 95% CI = [0.016, 0.288], p = 0.031), suggesting that STB is associated with increased risk behavior via stronger approach motivation.

      We also acknowledge the non-replication of the correlation between gambling behavior and mood sensitivity to certain rewards in the online sample. While this pattern might indicate that the link is specific to suicidal patients, it may also reflect sample-specific or unstable effects; thus, we now state this explicitly and interpret the finding with caution. Please see our revisions below:

      Page 15:

      “We next verified our results in an independent dataset, including the same task and BDI questionnaire in 747 general participants (500 females; age: 20.90±2.41) (46). One item in BDI involves the measurement of STB. In item 9 of BDI, participants chose one option that describes them best: Option 1, “I don't have any thoughts of killing myself.”; Option 2, “I have thoughts of killing myself, but I would not carry them out.”; Option 3, “I would like to kill myself.”; Option 4, “I would kill myself if I had the chance.”. In line with the current definition of S<sup>+</sup>/S<sup>-</sup> in the clinical dataset, we identified S<sup>+</sup> group as choosing Option 2, 3, or 4, while participants selecting Option 1 were categorized as S<sup>-</sup> group.”

      Page 19:

      “Given significant correlations between group, approach parameter, and gambling rate for gain trials (ps < 0.017), we further conducted a mediation analysis with the assumption of the mediating effect of approach motivation of suicidality on the risk behavior. Given that we aimed to test the effect of STB, with S<sup>-</sup> and HC as controls, and given that S<sup>-</sup> and HC did not differ in gambling behavior or in the approach parameter, we merged these two groups for the mediation analysis. Results supported our hypothesis (a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; Figure 2C), confirming that suicidal thoughts and behavior increase risk behavior through stronger approach motivation.”

      Page 26:

      “However, we did not observe any significant correlation between mood sensitivity to CR and gambling behavior (ps > 0.389), which suggests that the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients. Alternatively, this non-replicated result may also reflect sample-specific or unstable effects, which needs to be interpreted with caution.”

      (5) In interpreting their results, the authors use terms such as "motivation" (line 594) or "risk attitude" (line 606) that are not clear. In particular, how was risk attitude operationalized in this task? Is a bias for risky rewards not indicative of risk attitude? I ask because the claim is that "we did not observe a difference in risk attitude per se between STB and controls". However, it seems that participants with STB chose the risky option more often, so why is there no difference in risk attitude between the groups?

      Thank you for pointing out the ambiguity. In our manuscript, “motivation” and “risk attitude” are defined at the computational level. Following prior work with this task Rutledge et al., (2015, 2016), we decompose observed gambling into (i) value-dependent valuation parameters that capture risk attitude (e.g., risk aversion and loss aversion, which scale the subjective value of outcomes), and (ii) value-insensitive, valence-dependent biases that capture approach/avoidance motivation. Accordingly, a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups—which is what we observe for S<sup>+</sup> vs. controls. We have clarified this point in the computational modeling section.

      Pages 12-13:

      “Please note that a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups. Risk attitude is indeed conceptualized in economics as the curvature of the utility function (i.e., the subjective value) of the objective outcomes, with concave curves associated with risk aversion, and convex curves associated with risk seeking (54,56). By contrast, the approach or avoidance bias apply to all the value. A possible interpretation of the approach bias is that participant approach the option with the highest possible gain (the lottery) in the gain frame; the avoidance bias would then reflect a tendency to systematically avoid the highest potential losses (the lottery) in the loss frame.”

      Reviewer #2 (Public review):

      Summary:

      This article addresses a very pertinent question: what are the computational mechanisms underlying risky behaviour in patients who have attempted suicide? In particular, it is impressive how the authors find a broad behavioural effect whose mechanisms they can then explain and refine through computational modeling. This work is important because, currently, beyond previous suicide attempts, there has been a lack of predictive measures. This study is the first step towards that: understanding the cognition on a group level. This is before being able to include it in future predictive studies (based on the cross-sectional data, this study by itself cannot assess the predictive validity of the measure).

      Strengths:

      (1) Large sample size.

      (2) Replication of their own findings.

      (3) Well-controlled task with measures of behaviour and mood + precise and well-validated computational modeling.

      Weaknesses:

      I can't really see any major weakness, but I have a few questions:

      (1) I can see from the parameter recovery that the parameters are very well identified. Is it surprising that this is the case, given how many parameters there are for 90 trials? Could the authors show cross-correlations? I.e., make a correlation matrix with all real parameters and all fitted parameters to show that not only the diagonal (i.e., same data is the scatter plots in S3) are high, but that the off-diagonals are low.

      Thank you for raising these thoughtful concerns. The current task consisted of 90 choices and 36 mood ratings. There were 5 choice parameters and 4 mood parameters. The apparently strong identifiability is not unexpected, as 90 choice trials and 36 mood ratings are comparable to those in prior computational modeling literature (Blain & Rutledge, 2022).

      As suggested, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery. Please see our clarifications below:

      Supplementary Pages 2-3:

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Page 10:

      “The numbers of choice trials and mood ratings were comparable to those in prior computational modeling studies (34,35).”

      (2) Could the authors clarify the result in Figure 2B of a correlation between gambling rate and suicidal ideation score, is that a different result than they had before with the group main effect? I.e., is your analysis like this: gambling rate ~ suicide ideation + group assignment? (or a partial correlation)? I'm asking because BSI-C is also different between the groups. [same comment for later analyses, e.g. on approach parameter].

      Thank you for pointing out the lack of clarity. We performed group difference analysis and correlation of suicidal ideation analysis, separately. We first performed group difference analysis to test our hypothesis of STB effects. We then conducted correlational analysis to further specify our findings.

      (3) The authors correlate the impact of certain rewards on mood with the % gambling variable. Could there not be a more direct analysis by including mood directly in the choice model?

      Thank you for this insightful suggestion. As suggested, we tried to integrate mood into choice models by adding mood bias component(s) in line with previous literature (Vinckier et al., 2018). The first model (mcM1) assumes that mood biases choice, building on cM3 (the winning choice model). cmM2 further separated the mood bias parameter into two components according to participants’ choices.

      However, model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see our clarifications below:

      Supplementary Pages 3-4:

      “Supplementary Note 6: integration of mood into choice models

      Although we modeled choice and mood separately to examine cognitive and affective mechanisms underlying increased risk behavior in adolescent suicidal patients, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model).

      Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2).

      Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. The mood bias parameters in neither cM2 nor cM3 reached significance (ps > 0.091), which may be due to the absence of a blocked design in our experiment, unlike in Vinckier et al. (2018) and Eldar and Niv (2015).”

      (4) In the large online sample, you split all participants into S+ and S-. I would have imagined that instead, you would do analyses that control for other clinical traits. Or, for example, you have in the S- group only participants who also have high depression scores, but low suicide items.

      Thank you for this insightful suggestion. Following prior suicide-related literature (Tsypes et al., 2024), we controlled for depression by including them as covariates. Note that depression scores were derived from our established bifactor model (Wang et al., 2025), which decomposed depression from the anxiety. These results remained largely significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.

      Please see our clarifications below:

      Page 26:

      “After controlling for depression severity using our established bifactor model (see ref 60 for details), these results remained significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.”

      Reviewer #3 (Public review):

      This manuscript investigates computational mechanisms underlying increased risk-taking behavior in adolescent patients with suicidal thoughts and behaviors. Using a well-established gambling task that incorporates momentary mood ratings and previously established computational modeling approaches, the authors identify particular aspects of choice behavior (which they term approach bias) and mood responsivity (to certain rewards) that differ as a function of suicidality. The authors replicate their findings on both clinical and large-scale non-clinical samples.

      (1) The main problem, however, is that the results do not seem to support a specific conclusion with regard to suicidality. The S+ and S- groups differ substantially in the severity of symptoms, as can be seen by all symptom questionnaires and the baseline and mean mood, where S- is closer to HC than it is to S+. The main analyses control for illness duration and medication but not for symptom severity. The supplementary analysis in Figure S11 is insufficient as it mistakes the absence of evidence (i.e., p > 0.05) for evidence of absence. Therefore, the results do not adequately deconfound suicidality from general symptom severity.

      Thank you for this important comment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      As pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M₁), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Please see our revisions below:

      Page 17:

      “Within patients, this group effect on gambling rate remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.024; also see Figure S11, Table S7 and Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) to extract main components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. To further control for anxiety and depression, linear regression using these components as covariates revealed that the group effect on gambling rate remained significant (p = 0.024; Table S9).”

      Pages 18-19:

      “Within patients, this group effect on the approach parameter remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.027; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on approach parameter remained significant (p = 0.027; Table S9).”

      Page 21:

      “Within patients, this group effect on βCR remained significant after controlling for gambling rate, earnings, mood-related outcome effect, mood drift effect, sex, illness duration, family history, diagnosis, and various medications use (ps < 0.032), as well as general symptoms (e.g., depression and anxiety; p = 0.001; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on this mood parameter remained significant (p = 0.001; Table S9).”

      (2) The second main issue is that the relationship between an increased approach bias and decreased mood response to CR is conceptually unclear. In this respect, it would be natural to test whether mood responses influence subsequent gambling choices. This could be done either within the model by having mood moderate the approach bias or outside the model using model-agnostic analyses.

      Thank you for this important suggestion. As suggested, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model). Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2). Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see Supplementary Pages 3-4:

      (3) Additionally, there is a conceptual inconsistency between the choice and mood findings that partly results from the analytic strategy. The approach bias is implemented in choice as a categorical value-independent effect, whereas the mood responses always scale linearly with the magnitude of outcomes. One way to make the models more conceptually related would be to include a categorical value-independent mood response to choosing to gamble/not to gamble.

      We apologise for the unclear statement. The approach bias is implemented in choice as a continuous value-independent effect, ranging from -1 to 1.

      It was true that the mood responses always scale with the magnitude of outcomes, since mood ratings were request after the outcomes. Therefore, mood parameters and the approach bias were both continuous.

      We also attempted to integrate mood into choice modelling. See Response 2 for Reviewer 3 for details.

      (4) The manuscript requires editing to improve clarity and precision. The use of terms such as "mood" and "approach motivation" is often inaccurate or not sufficiently specific. There are also many grammatical errors throughout the text.

      Thank you for this important suggestion. We have now explained motivation and mood in the Introduction section and the computational modeling section. Please see our clarifications below:

      Pages 3-4:

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g., from surprise to fear)(31-33,39).”

      We have corrected grammatical errors throughout the manuscript.

      5) Claims of clinical relevance should be toned down, given that the findings are based on noisy parameter estimates whose clinical utility for the treatment of an individual patient is doubtful at best.

      Thank you for this comment. We agree that we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters, which is outside the scope of the study, and it is indeed possible that parameter estimate is somehow noisy. Therefore, we tone down the clinical relevance of our results. Please see our revision below:

      Page 32:

      “Next, we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters and it is indeed possible that parameter estimate is somehow noisy.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Title: I believe "aberrant mood dynamics" is both too general and overstating the results of this study, which did not measure mood dynamics longitudinally. "Aberrant" is also overly pathologizing. I would suggest sticking more directly to the results, for instance, "Insensitivity of momentary mood to non-risky rewards in adolescent suicidal patients".

      Thank you for this suggestion. We have now corrected it.

      (2) Abstract: in line 61, "Our study uncovers the cognitive and affective mechanisms" suggests that these are the only ones, and you uncovered them. Of course, there could be more mechanisms contributing to risk behavior in STB, so I would suggest removing the word "the" or adding "one of the".

      Thank you for this suggestion. We have now corrected it.

      (3) One major weakness of this study is that suicidal thoughts and behaviors were not assessed via a clinical instrument such as the Columbia Suicide Severity Rating Scale - this should be mentioned upfront.

      Thank you for this comment. According to medical records and information from family and friends by the researcher and psychiatrists, patients with suicidal thoughts and behaviors were categorized as suicidal group (S<sup>+</sup>), while patients without suicidal thoughts and behaviors were identified as control group (S<sup>-</sup>). Note that medical records and information were recorded from clinical interviews where the psychiatrists were vigilant for signs of suicidal ideation and inquired about suicidal-related thoughts and behaviors from both the patients and their families. Therefore, the current group operation was possibly comparable to Columbia Suicide Severity Rating Scale.

      (4) Table 1: female/male are sex, not gender (gender is man/woman/transgender/non-binary).

      Thank you for this suggestion. We have now corrected it.

      (5) Equation 1: It would be good to clarify what happens in gain-only or loss-only trials (the other value is then 0, but this can be clarified as it is not technically a loss or a gain).

      Thank you for this suggestion. We have now corrected it. Please see below for our revision:

      Page 12:

      “Please note that V<sub>gain</sub> is 0 in gain trials and V<sub>loss</sub> is 0 in loss trials.”

      (6) Figure 1E: The model prediction is not informative here. Given the linear regression model, there is no other option except that the mean prediction would overlap with the mean empirical measurement (unless the model was specified incorrectly). The same is true in Figure 2A.

      Thank you for this suggestion. We have now removed plots for model prediction.

      (7) Figure 1G: There was no analysis of the differences between groups in terms of earnings, given that the ANOVA was not significant. Still, if the claim is that risky behavior is sometimes suboptimal in this task, it would be good to show that there is a correlation between, say, symptoms of STB across groups and 1) risky behavior and 2) earnings.

      Thank you for this insightful comment. In the patient cohort, risky behavior (gambling rate)—but not earnings—predicted the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048; earnings, β = 0.001, t = 0.582, p = 0.562). The lack of association for earnings is consistent with the task design, in which there is no stable optimal policy and payouts are only a coarse proxy for decision quality. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB. We have clarified this point below:

      Page 32:

      “Second, although we assumed that increased risky behavior in STB was suboptimal, the current task was not suited to test this, given the task design of random feedback for gambling option. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB.”

      (8) Line 290: "beta_gain: -1-1" is unclear. I believe you meant beta_gain \in [-1,1].

      Thank you for this suggestion. We have now corrected it to make it clear.

      (9) The gain and loss biases are modeled as minimum and maximum probabilities for choosing the gamble. This is a legitimate choice for value-agnostic biases, but it is not the traditional choice (as far as I know). I wonder if the same results would hold with the more traditional formulation of the bias as an added constant to the utility of the gamble, i.e., p(gamble) = 1/(1+ exp(-mu(U_gamble + beta_gain - U_certain)). I believe in this case, you would also not have to specify different equations for positive or negative biases, or to limit the bias to the range of [-1,1] (indeed, the bias would be in reward-equivalent units).

      Thank you for this suggestion. The winning choice model we used here was consistent with previous literature (Rutledge et al., 2015 & 2016), which decomposed the decision process into risk-attitude-driven valuation (e.g., loss and risk aversion) and value-insensitive motivational components. These approach/avoidance parameters are a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference.

      As suggested, we also compared the traditional bias choice model. Model comparison did not support this. Please see our revision below:

      Supplementary Page 4:

      “We also considered the traditional bias parameter (cM4), rather than approach/avoidance parameters. We limited the bias to the range of [-100, 100], which was in reward-equivalent units.

      However, model comparison did not support cM4 (Table S6).”

      (10) Also, for equations 5-8, it seems that 5-6 are identical to 7-8 except for the use of beta_gain versus beta_loss. You might want to consider simplifying by putting beta in the equations and specifying in the text that, depending on the trial type (loss or gain), the relevant beta is used.

      Thank you for this suggestion. We have now simplified it. Please see response to Reviewer 2, point 3.

      (11) It is not clear what equations are applied to mixed trials in cM3.

      Sorry for the confusion. We have now clarified this point.

      Page 12:

      “Approach/avoidance parameters are not applied to in mixed trials.”

      (12) Model comparison: the mood models are nested within each other (e.g., mM3 can be derived from mM1 by setting beta_EV = beta_RPE). In this case, model comparison can use the likelihood ratio test instead of BIC, which can be too conservative (and therefore does not support the extra beta parameter for RPE, different from previous results in the literature). I wonder if a likelihood ratio test would lead to results more in line with previous findings with this task?

      Thanks for this suggestion. We agree that mM1 (CR+EV+RPE) and mM3 (CR+GR) are nested. However, our model space also included unnested models, such as mM5 (CR+GR<sub>better</sub>+GR<sub>worse</sub>). Therefore, it was not reasonable in our model space to use likelihood ratio tests.

      (13) Line 346: The replication sample is described as "healthy participants," however, their health (or mental health) status was not assessed, and they may as well have mental health concerns. I would suggest calling this a general sample or an undifferentiated sample - but not a healthy sample.

      Sorry for the confusion. We have now corrected this phrase.

      (14) Line 363: "in addition to the replication of previous findings in the validation dataset" is unclear. Are those tests not two-tailed?

      Sorry for the unclear statement. In the replication analyses, we used one-tailed t-tests because the direction of the effect was revealed on the clinical dataset. Please see our clarification below:

      Page 15:

      “For the replication of previous findings in the validation dataset, we used one-tailed tests in line with our clinically motivated directional hypothesis.”

      (15) Line 372: "validating our group manipulation" - the presented work does not have a manipulation. Maybe you meant "validating our grouping of participants"?

      Thank you for this suggestion. We have now corrected it to make it clear.

      (16) Figure 2B: It is not clear how the data were binned for illustration purposes only, and why this binning is necessary (I have not seen it in other papers) - presenting the data from each subject and the correlation line with error margins (as is done here) should be sufficient.

      Thank you for flagging this. For illustration only, we binned the data proportional to group sizes: in the patient sample (S<sup>-</sup> n = 25; S<sup>+</sup> n = 58; ≈1:2), we displayed 3 bins for S<sup>-</sup> and 6 bins for S<sup>+</sup>. We agree that binning is not necessary; all statistics were computed on raw, unbinned data. The binned panel was included solely for visualization, consistent with our prior work (Blain et al., 2023).

      (17) Table 2: delta BIC should be presented per subject (that is, divided by the number of subjects in each group), as the groups are of different sizes, so as presented now, the columns are not comparable across groups.

      Thank you for the helpful suggestion. Our goal in Table 2 is not to compare ΔBIC magnitudes across groups, but to identify the winning model within each group. The ΔBICs are aggregated at the group level solely to rank models for that group. Dividing by the number of participants would rescale each group’s column by a constant and would therefore not affect the within-group ranking or the conclusion that cM3 is the best model in all groups. For this reason, we retain the current presentation and interpret each column within group rather than across groups.

      (18) Line 640 - the effect of expectations and prediction errors on mood was not only shown in healthy people, but also in people with depression (Rutledge et al., 2007, https://pubmed.ncbi.nlm.nih.gov/28678984/)

      Thank you for this comment. Indeed, Rutledge et al., (2017) showed evidence for CR+EV+RPE mood model in adult people with depression. However, our study recruited adolescents with depression or anxiety, given that adolescent period might provide a developmental window for opportunities for early intervention of suicidality. Therefore, it is also possible that the current winning model was specific to adolescents. Please see our clarifications below:

      Page 28:

      “It is also possible that the current winning model was specific to adolescents. Given that Rutledge et al., (2017) supported the “CR-EV-RPE model” in adults with depression, our study with adolescent populations may suggest a developmental change for mood sensitivities.”

      (19) Supplemental material: Is the R2 section about R-squared? Perhaps you can use superscript on the 2 to make that clearer? For Figure S2, how was model recovery determined? Should I interpret the confusion matrix as suggesting that the winning model for each and every simulated subject was the generating model, or was the winning model determined for the whole simulated population in each of the 100 simulations? Traditionally, confusion matrices use the former measure, but the results of 100% recoverability make me suspect the latter was used here. In Figure S3, should we not be looking at simulated parameters and recovered parameters? What are "real parameters" here?

      Thank you for these important comments. We now consistently denote the coefficient of determination as R<sup>2</sup> (with a superscript 2) throughout the manuscript and Supplementary Materials.

      For the model recovery analysis in Figure S2, we have clarified that the confusion matrix is computed at the population level. Specifically, for each of the 100 simulations we generated a full dataset under each candidate model, fit all models to that dataset, and selected the winning model based on group-level model evidence (BIC). Each cell in the confusion matrix therefore reflects the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. This operation was reasonable because the decision of the winning model is made on the population-level dataset rather than on individual subjects.

      In Figure S3, the term “real parameters” referred to the parameters used to generate the simulated data. To avoid confusion, we now relabel these as “simulated (generating) parameters” and explicitly describe the figure as showing the relationship between simulated (generating) parameters and recovered parameters. Please see our revisions below:

      Supplementary Pages 2-3:

      “Model recovery: We generated 100 simulated datasets for each model (3 choice models and 8 mood models) using the fitted parameters of each model as the ground truth. Each dataset contained 201 trials and included 3 (or 8) sets of simulated data corresponding to the respective models. For each simulated dataset, we then fit all models and determined the winning model at the population level based on group-level BIC, yielding a confusion matrix in which each entry represents the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. As shown in Figure S2, all models are highly identifiable, indicating excellent recovery performance for both the choice and mood models.”

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“generating”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Typos:

      (1) Line 90: original → originate

      (2) Line 596-598 - the same phrase is repeated twice.

      (3) Line 616: on the other word → hand.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      For people unfamiliar with interpersonal theory or motivational-volitional model, or three-step theory (lines 105-106), could you briefly explain the key idea of mood and suicide before going to the decision-making tasks? And from this, maybe motivate the predictions in your task? In particular, in the abstract and introduction, the phrasing could be a bit more concise and simpler. In the abstract, sentences were sometimes quite long. In the introduction, some paragraphs are somewhat repetitive. In the discussion, there were some typos.

      Thank you for these suggestions. We have now explained the key idea of mood and suicide before going to the decision-making tasks in the introduction, which can be seen below:

      Pages 4-5:

      “Contemporary theories of suicide converge on the idea that STB is initially caused by low mood experience. The interpersonal theory of suicide proposes that suicidal desire arises when people simultaneously feel socially disconnected (“thwarted belongingness”) and like a burden on others (“perceived burdensomeness”), experiences that are tightly linked to chronically low mood(25). The motivational–volitional model(26) and the three-step theory(27,28) similarly emphasize that when negative mood and feelings of defeat or entrapment are experienced as inescapable, they can give rise to suicidal ideation, and that the progression from ideation to suicide attempts depends on additional factors such as reduced fear of death, increased pain tolerance, and a tendency to act impulsively under intense affect. Some official organizations, e.g., National Institute of Mental Health, have also listed mood problems as warning signals(8). Interestingly, within the framework of decision making under uncertainty, gambling on lotteries with a revealed outcome has been found to induce high mood variance(29), providing an opportunity to assess the relationship between deficient mood and increased gambling decisions in STB.”

      We have also refined the wording and corrected typos throughout the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Since many readers might only read the abstract, it is important that it is both informative and accurate. I have two suggestions in this respect. First, for the abstract to be more informative, it may be helpful to indicate already there that these are value-insensitive approach-avoidance parameters, in the sense that they favor/disfavor the gamble regardless of the potential outcomes' magnitude or probability. This issue is also present throughout the text, where the phrases "approach and avoidance motivation" are referred to as if they have established and precise computational definitions. In my view, these terms could just as easily be interpreted as parameters that multiply the value of potential gains or losses, which is not what the authors mean. It would be helpful to clarify this terminology.

      Thank you for these suggestions. In line with previous literature (Rutledge et al., 2015 & 2016), approach and avoidance motivation are indeed defined at the computational level, referring to a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. We have cited these papers in the manuscript. We also make it clear to further clarify approach and avoidance parameters in the abstract and introduction. Please see our revisions below:

      Page 2 (Abstract):

      “Using a prospect theory model enhanced with value-insensitive approach-avoidance parameters revealed that this rise in risky behavior resulted only from a heightened approach parameter in S<sup>+</sup>.Altogether, model-based choice data analysis indicated dysfunction in the approach system in S<sup>+</sup>, leading to greater propensity for gambling in the gain domain regardless of the lottery expected value.”

      Page 3 (Introduction):

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      (2) The statement "our study uncovers the cognitive and affective mechanisms contributing to increased risk behavior in STB" is overstating the findings, as the study may have uncovered some contributing mechanisms, but likely not all of them. Removing the word "the" would fix this issue.

      Thank you for this suggestion. We have now corrected it.

      (3) Since mood is typically defined as lasting hours, it's inappropriate to refer to ratings that only reflect the last few trials as self-reports of mood. To be sure, I view the distinction between emotions and moods as quantitative, not qualitative, so I do not think there is a problem studying the former to understand the latter, but to avoid confusion, the terminology should follow common usage.

      Thank you for this suggestion. We follow previous work and operational definitions regarding mood (Rutledge et al., 2014, Eldar & Niv, 2015, Vinckier et al., 2018). Emotion is usually a very brief response to a specific stimulus (Emanuel & Eldar, 2023), e.g., leading to rapid changes like surprise then fear. In contrast, mood is defined as a diffuse state that is not specific to one stimulus. Here, we operationally and computationally define mood as an affective state reflecting the recent history of safe and gamble outcomes. We now clarify that point in the main text. Please see our revision below:

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g. from surprise to fear)(31-33,39).”

      (4) Line 78: The phrases "increase in risk attitude", "decrease in loss attitude", and "decrease in value-independent choice biases" are unclear to me in terms of their directionality. An attitude might be avoidant or embracing. If it is the former then increasing it would decrease risk-taking.

      Thank you for pointing out the ambiguity. We have now corrected them throughout the manuscript. Please see our revision below:

      Page 4:

      “We therefore hypothesized that heightened approach motivation, or weakened avoidance motivation, would account for increased risk behavior in STB.”

      (5) Line 125: I was not sure why one would expect the mood response to gamble-related quantities (EV and RPE) to be lower in STB and not higher.

      Sorry for the typo. We hypothesized that mood would respond more strongly to gambling-related quantities—expected value (EV) and reward prediction error (RPE)—in adolescents with STB than in controls, given prior evidence that STB is associated with greater risk-taking.

      (6) The text could use proofreading, as there are many typos. These are from the first 100 lines alone:

      a) Abstract: regardless the lotteries -> regardless of the lotteries'.

      b) Line 78: it remains whether.

      c) Line 80: can each -> each can.

      d) Line 90: may original from.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      (7) The rationale for focusing on the S+ group for mood model comparison is incorrect. The purpose is to identify parameters that vary as a function of suicidality, and for that, the S- group is just as important.

      Thank you for this comment. We agree that the S<sup>-</sup> group is as important as the S<sup>+</sup> group. A direct comparison was complicated because the winning mood models differed (S<sup>+</sup>: mM3; S<sup>-</sup>: mM5; Table 3). To ensure comparability, we checked results from both model specifications (mM3 and mM5). The conclusions were convergent: mood sensitivity to certain rewards (CR) was lower in S<sup>+</sup> than in S<sup>-</sup> (see Fig. 3 for mM3 and Fig. S8 for mM5).

      (8) There appears to be a contradiction between the inclusion criteria, which include having experienced suicidal thoughts and behaviors, and the definition of the S- group as not having suicidality.

      Thank you for pointing out this mistake. The corrected version of inclusion criteria can be seen on Page 7:

      “Patients were included if they met the following criteria: 1) both the researcher and psychiatrists agreed on their group classification; 2) they had a current diagnosis of major depressive disorder (MDD; unipolar depression), generalized anxiety disorder (GAD), or bipolar disorder with depressive episodes (BD), confirmed by two experienced psychiatrists using the Structured Clinical Interview for DSM-IV-TR-Patient Edition (SCID-P, 2/2001 revision; see Supplementary Note 1 for details); 3) they were between 10 and 19 years of age; 4) they had no organic brain disorders, intellectual disability, or head trauma; 5) they had no history of substance abuse; 6) they had no experience of electroconvulsive therapy.”

      (9) It would be helpful to specify whether mood modeling was based on objective or subjective values, and why.

      Thank you for this helpful suggestion. We have now clarified whether mood modeling was based on objective or subjective values, and why. Specifically, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000). Based on this result and for parsimony, we report and interpret the mood modeling results from the objective-value family in the main text. We have clarified this point below:

      Supplement Pages 4-5:

      “Supplementary Note 9: Mood model comparison using subjective values.

      To identify whether mood modeling was based on objective or subjective values, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox (Daunizeau et al., 2014) to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000).”

    1. eLife Assessment

      Using single-cell transcriptomic data from mouse inner ear hair cells, the authors compare for the first time gene expression across the four recognized hair cell types in adults, generating information fundamental to understanding hair cell relationships between the ancient vestibular compartment and the more recent cochlea. Among observed differences, compelling evidence is provided for the expression in vestibular hair cells but not cochlear hair cells of certain ciliary motility-related genes, suggesting that the kinocilium of vestibular hair cells may function as an active force generator to increase sensitivity.

    2. Reviewer #1 (Public review):

      Summary

      From transcriptomic comparisons of adult mouse cochlear and vestibular hair cells, Xu et al. provide a broad and well-organized overview of differences across 4 established hair cell types (2 cochlear and 2 vestibular). They go on to demonstrate the power of such analyses to provide functional insights by focusing on the differentiated expression of ciliary genes, building to the hypothesis that kinociliary motility occurs in adult vestibular hair cells.

      Background

      Cilia are prominent in sensory receptors, including vertebrate photoreceptors, olfactory neurons and mechanosensitive hair cells of the inner ear and lateral line. Cilia can be motile or nonmotile depending on their axonemal structure: motile cilia require dynein and the inner 2 singlet microtubules of the 9+2 array. Primary cilia, present early in development, are considered to have sensory functions and to be nonmotile (Mill et al., Nature Rev Gen 2023).

      In hair cells, the kinocilium anchors and polarizes the mechanosensitive hair bundle of specialized microvilli. The kinocilium matures from the primary cilium of a newborn hair cell; behind it the bundle of mechanosensory microvilli rises in a descending staircase of rows. During maturation of the mammalian cochlea, all hair cells lose the kinocilium, though not the associated basal body. The consensus for many years has been that most vertebrate kinocilia, and especially mammalian kinocilia, are nonmotile, based largely on the lack of spontaneous motility in excised mammalian vestibular organs, but also on the impression that the rare examples of spontaneous beating motility even in non-mammalian hair cells are associated with deterioration of the preparation (Rüsch & Thurm 1990).

      Strengths

      In comparing RNA expression across the 4 major types of mouse hair cells - 2 cochlear and 2 vestibular - Xu et al. provide rich data sets for exploration of structure-function differences between these highly specialized cell types. The revised paper significantly improves the organization, interpretation and readability of the presentation of overall findings. smFISH and immuno-staining back up key RNA data, and comparisons are made with published data.

      The ciliary motility focus of the rest of the paper is creative and highly interesting. The authors curated the ciliary genes into types associated with different aspects of beating motility, and also investigated the expression of genes typical of primary cilia, which are considered to have sensory and cell signaling functions and to be nonmotile. Their data justify suggesting a role for kinociliary motility (or force generation) in adult mammalian vestibular hair cells, in opposition to a long-held assumption. The results should stimulate investigation of the implications for mechanosensitivity.

      Weaknesses

      Data

      Functional data on kinocilia motility: The technical difficulty in making such measurements in small mouse hair bundles led the authors to work with bullfrog crista bundles. Though not extensively studied here, the ciliary motility shown is convincing. Mouse hair bundle motions are also shown but the evidence connecting the data to kinociliary motion are more suggestive than convincing. But the authors are not dogmatic about these data, and it is reasonable to show them.

      Interpretation

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear what role kinociliary motility would play in mature hair bundles. The authors have added a discussion of this question in the revision.

      An underlying rationale for the hypothesis that ciliary motility manifests in mammalian vestibular hair cells seems to rest on the presence of the necessary mRNA and its contrasting absence in cochlear hair cells. Another way to look at this difference could be that evolution acted on cochlear hair cells to shed kinocilia as one of many changes to improve mechanosensitivity at much higher sound frequencies. In vestibular hair cells, kinociliary motion might be useful to enhance mechanostimulation in the developing vestibule (as suggested in this revision) and not so active in maturity. Nevertheless, with their scholarly analysis of the expression of ciliary genes, the authors make a significant argument for further investigation of when and why hair cell kinocilia show active motility.

    3. Reviewer #2 (Public review):

      Summary:

      In this study the authors compared the transcriptomes of the various different types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data lead to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia is fascinating. It is possible that perhaps the kinocilium known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells more like a motile cilium. Since the kinocilium is retained in vestibular hair cells it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study which cannot be overstated is that for the transcriptome analysis they are using mature mice. To date there is a lot of data from many labs for embryonic and neonatal hair cells but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cell develop in these systems. The more markers available the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cell to see what genes are only required during development and not in later functioning hair cells.

      Comments on revision:

      I am satisfied with the revision, the authors made an effort to incorporate the changes requested.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Data:

      (a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      Our single-cell RNA-seq findings show that genes related to motile cilia are specifically expressed in vestibular hair cells. This has not been demonstrated before. We have also provided supporting evidence using electrophysiology and imaging from bullfrogs and mice. Although no ultrastructural images of mouse vestibular kinocilia were provided in our study, transmission electron micrograph of mouse vestibular kinocilia has been published (O’Donnell and Zheng, 2022). The mouse vestibular kinocilia have a “9+2” microtubule configuration with nine doublet microtubules surrounding two central singlet microtubules. This finding contrasts with a previous study, which demonstrated that the vestibular kinocilia from guinea pigs lack central singlet microtubules and inner dynein arms, whereas outer dynein arms and radial spokes are present (Kikuchi et al., 1989). The central pair of microtubules is absent at the end of the bullfrog saccular kinocilium (Fig. 7A). We would like to point out that the dual identity of primary and motile cilia is not just based on the TEM images. The kinocilium has long been considered a specialized cilium, and its role as a primary cilium during development has been demonstrated before (Moon et al., 2020; Shi et al., 2022).

      In most motile cilia, the central pair complex (CPC) does not originate directly from the basal body; instead, it begins a short distance above the transition zone, a feature that already illustrates variation in CPC assembly across systems (Lechtreck et al., 2013). The CPC can also show variation in its spatial extent: for example, in mammalian sperm axonemes, it can terminate before reaching the distal end of the axoneme (Fawcett and Ito, 1965). In addition, CPC orientation differs across organisms: in metazoans and Trypanosoma, the CPC is fixed relative to the outer doublets, whereas in Chlamydomonas and ciliates it twists within the axoneme (Lechtreck et al., 2013). Such variation has been described in multiple motile cilia and flagella and is therefore not unique to vestibular kinocilia. What appears more unusual in our data is the organization at the distal tip, where a distinct distal head is present, similar to cilia tip morphologies recently described in human islet cells (Polino et al., 2023). Although this feature is intriguing, we interpret it primarily as a structural signature rather than as evidence for a specialized motile adaptation, and we have moderated our interpretation accordingly in the revision.

      (b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      While these genes (e.g., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) appear more highly expressed in P2 cochlear hair cells, they are not uniquely associated with the axoneme. For example, Dynll1/2 and Dynlrb1 are components of the cytoplasmic dynein-1 complex (Pfister et al., 2006), Cetn2 has multiple basic cellular functions beyond cilia (e.g., centrosome organization, DNA repair), and Mdh1 encodes a cytosolic malate dehydrogenase involved in central metabolic pathways such as the citric acid cycle and malate–aspartate shuttle. This contrasts with axonemal dyneins, which are uniquely required for cilia motility. To avoid ambiguity, we have marked such cytoplasmic or multifunctional genes with red asterisks in both Fig. 5G and Fig. 6D in the revised manuscript.

      Our comparison showed that key genes for motile machinery are not detected in cochlear hair cells. For example, Dnah6 and Dnah5 are not expressed in the P2 cochlear hair cells. Dnah6 and Dnah5 encode axonemal dynein and are part of inner and outer dynein arms. Importantly, we did not detect the expression of CCDC39 and CCDC40 in kinocilia of P2 cochlear hair cells. Furthermore, axonemal CCDC39 and CCDC40, the molecular rulers that organize the axonemal structure in the 96-nm repeating interactome were not detected in cochlear hair cells. We have revised the text to emphasize key differences.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Spontaneous flagella-like rhythmic beating of kinocilia in vestibular HCs in frogs and eels (Flock et al., 1977; Rüsch and Thurm, 1990) and in zebrafish early otic vesicle (Stooke-Vaughan et al., 2012; Wu et al., 2011) has been reported previously. Based on Rüsch and Thurm (1990), spontaneous kinocilia motility occurred under non-physiological conditions and was interpreted as a sign of cellular deterioration rather than a normal feature. We speculate that deterioration under non-physiological conditions may lead to the disruption of lateral links between the kinocilium and the stereociliary bundle, effectively unloading the kinocilium and allowing it to move more freely. Additionally, fluctuations in intracellular ATP levels may contribute, as ciliary motility is highly ATP-dependent; when ATP is depleted, beating ceases. Similar phenomena have been documented in respiratory epithelia, where ciliary activity can temporarily pause. Nevertheless, the fact that kinocilia can exhibit spontaneous motility under these conditions indicates that they possess the motile machinery necessary for such beating. Irrespective of the condition, cilia without the molecular machinery required for motility will not be able to move.

      We agree with the reviewer that, based on the present data, it is difficult to know the functional role of kinocilia and whether the presence of such autonomous rhythm would interfere with temporal fidelity. Spontaneous bundle motion, driven by the active process associated with mechanotransduction, was observed in bullfrog saccular hair cells (Benser et al., 1996; Martin et al., 2003). We have revised the discussion to clarify this important point of the reviewer. Specifically, we will emphasize that our observations of ciliary beating in the ex vivo conditions may not reflect its properties in the mature in vivo context, but rather a byproduct of motile machinery clearly present in the kinocilia. We speculate that this machinery in mature hair cells could operate in a more subtle mode—modulating the rigor state of dynein arms or related axonemal structures to influence kinociliary mechanics and, in turn, bundle stiffness in response to stimuli or signaling cues. Such a mechanism could either enhance sensitivity or introduce filtering properties, thereby contributing to the fine control of mechanosensory function without compromising temporal fidelity. Future studies using loss-of-function approach will be needed to reveal the unexplored role(s) of kinocilia for vestibular hair cells in vertebrates.

      We note that spontaneous activity exits throughout nervous system. It allows the nervous system to maintain baseline activity and interpret signals. Retinal cells are spontaneously active even in the dark and spiral ganglion neurons also fire spontaneously. Spontaneous hair bundle motion driven by mechanotransduction-related mechanism has been observed in bullfrog saccular hair cells. So, it is unlikely that spontaneous kinocilia beating would interfere with generating temporally faithful representations.

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

      We thank the reviewer for these excellent suggestions. We agree that kinociliary motility could plausibly serve roles during development, for example by guiding hair bundle formation or by contributing to early mechanosensitivity and spontaneous neural activity before mature stimulation mechanisms are established. It is also possible that the motility machinery represents a latent capacity in mature vestibular hair cells that could be reactivated under stress or pathological conditions. We have revised the Discussion to address these possibilities and to provide a more nuanced consideration of whether the observed motility is normal and what potential functions it might serve.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

      We would like to thank reviewer 2 for his/her comments and hope that the datasets provided in this manuscript will be a useful resource for researchers in the auditory and vestibular neuroscience community.

      Joint Recommendations for the authors:

      (1) Figure 1 - Explain how hair cell types are recognized after dissociation. Figure 1 will not be clear in this regard for non-aficionados. Some of the dissociated cells shown appear quite distorted and even unhealthy - e.g., the bottom right crista type II hair cell; the second from left crista type I hair cell; can you address why this doesn't matter for the purposes of this study?

      HC types in Fig. 1C were identified based on their morphological features: Type I HCs are flask-shaped with a narrow neck while type II HCs are cylindrical and short. We have replaced those cells with new images. In our study, HCs were identified based on their marker genes. Although some HCs such as those shown in Fig. 3C were impossible to avoid during preparation of single cells for library (most people did not examine their morphology), quality of mRNA and sequencing was high, better than those datasets published in previous studies.

      (2) Line 98 - Explain accessory cells (as opposed to supporting cells).

      We changed accessory cells to other cell types.

      (3) Line 246 - The primary cilium is...

      Changed.

      (4) Figure 6D - The scale bar is missing. Please use arrows to point to the genes you call out in the text. Also, the genes called out in the text as differently expressed (line 342) are quite faint bands in both cell types. It would be a service to the reader to point them out in the panel.

      A scale bar has been added. We also marked those genes as suggested and edited the text accordingly.

      (5) Figure 7 - mixes frog crista and mouse middle ear images with waveforms and FFTs from frog crista, mouse middle ear, and mouse crista. Related to these still images are 2 videos of frog kinocilium beating (2 hair cells). The mouse images must be underwhelming, or we would have been shown those, yet they were considered adequate to analyze.

      Yes, the spontaneous kinocilia motion of mouse crista HCs is very small. The peak motion is about 40 nm, which is very close to the resolution of our camera. That is why we used photodiode technique to detect its motion. Photodiode is more sensitive, and this technique allows us to observe dynamic response waveform.

      (6) I recommend labeling each figure panel with the tissue of origin to avoid confusion.

      Labeled as suggested.

      (7) I suggest dropping the mouse middle ear data, as they are not directly adequate as a positive control (or no more so than the more beautiful frog data).

      We keep the waveforms of middle ear cilia movement in Fig. 7. The main reason is that we would like to show the magnitude difference between airway cilia and kinocilia. The kinocilia movement was at least an order of magnitude less than the movement of airway cilia. This has led to our effort to generate a model to predict the 96-nm modular repeat and explain why kinocilia movement in mice is much smaller than airway cilia and bullfrog kinocilia.

      (8) Focus on the hair bundle motions:

      (a) Show the waveforms for the frog crista hair cells and their FFTs.

      These images were captured many years ago using camera. The kinocilia motion is between 5 and 10 Hz. We did not present any waveforms of kinocilia motion since we no longer have access to bullfrogs. However, although we did not present response waveforms, the videos are very powerful for visualization of kinocilia beat of bullfrog saccular HCs.

      (b) Find some way to show us how you measured the mouse hair bundle beating.

      Photodiode technique was used to measure spontaneous kinocilia motion in mice. More details are now included in the text.

      (c) Does EGTA break links between kinocilium and stereocilia? (Could that contribute to the higher beat frequency?) Just applying the same treatment and viewing from above could clarify whether kinocilia dissociate from stereocilia rows. This would likely be more straightforward with an otolith organ.

      All these links (tip links, side links) are vulnerable to Ca concentration and Ca-free medium is often used to break these links as shown in many previous studies. Breaking the kinocilia links leads to reduced load to the kinocilia, which may result in larger motion of the kinocilia. The frequency is inherent to motile machinery and subject to temperature and intracellular ATP concentration. When facing upward, the hair bundles in otolith organ do not have a good contrast against HCs in the background. This makes measurement of their motion difficult, especially when the motion is small and random and can’t be averaged to improve signal to noise ratio. Besides, unlike cochlear HCs whose hair bundles are short and can easily be oriented in parallel with light path, the long hair bundle of vestibular HCs is more difficult to orient and image. For these reasons, we chose to use crista hair bundles for our measurements since they can be oriented in perpendicular to the light path without interference from background HCs. The lateral motion of the entire bundle is also relatively easy to measure in this preparation.

      (6) Is there no reason to cite McInturff et al. (2018), given that they compared type I and II VHC transcriptomes at P12 and P100? This database is also available on gEAR.

      Their studies are now cited. We also compared their datasets with ours.

      (7) Line 374 - Eatock et al., 1998 citation does not work for this purpose. Eatock & Songer (2011) would be better, or Li, Xue, Peterson (2008): mouse utricle anatomy; significant discussion of relative heights of kinocilia and tallest stereocilia.

      Changed and cited.

      (8) In Figure 3, 2 of the 18 panels in B are missing labels.

      The bar, applied to all panels, was there at the bottom of Fig. 3B. The bar is bigger and more visible in the revision.

      (9) Line 187 should "Sppl1" be Spp1?

      Corrected.

      (10) Define BBSome on line 244.

      Added.

      (11) Looking at Figure 5, it seems that all the motile genes are expressed in the vestibular hair cells and not the cochlear hair cells. It is surprising that there are any cilia-related genes expressed in these adult cochlear hair cells, given that they do not retain their cilia into adulthood. Could the authors make a comment on this finding in the discussion? Also, are there any ciliopathies that show a vestibular defect but normal hearing in mice or humans? Have you compared the cilia-related gene expression in neonatal/embryonic vestibular hair cells to your dataset?

      There are many kinocilia related genes still expressing adult cochlear HCs. It is not surprising to see many kinocilia related genes in cochlear HCs. Most of these genes are related to primary cilia structure including the basal body and transporters in cilia. The basal body is still present in cochlear HCs. Many other primary cilia-related proteins are also expressed in soma, especially those related to signal transduction, microtubule cytoskeleton, actin cytoskeleton, vesicle transport, metabolic enzyme, protein folding, translation, nuclear transport, ubiquitination, RNA binding, mitochondrial proteins and transcription factors. Of course, some of them are vestigial. We added discussion of this in the text. Comparison between neonatal cochlear and vestibular was presented in Fig. 6D. We compared those genes related to the axonemal repeat (96 nm repeat complex). Due to quality of mRNA, the total genes and genes related to kinocilia detected in previous developmental studies were much less than our datasets. While we detected 112 out of 128 genes related to axonemal repeat, only 90 genes were detected in previous studies (Burns et al., 2015; McInturff et al., 2018). Therefore, we only compared neonatal cochlear and vestibular HCs using their datasets. As far as we know, no ciliopathies with vestibular defects but normal hearing have been reported in mice or humans. But we plan to use a Ccdc39 mutant mouse model to examine how loss of function of a key motile cilia signature gene would affect kinocilia motility and vestibular function.

      (12) How is "expression level" in the violin plots being calculated? Is this a measure of read count? The normalization is cursorily explained in the methods. Is this value comparable across genes? Did the authors switch to z-score by Figure 6?

      We dissected the auditory and vestibular sensory epithelia from the same groups of mice and prepared libraries and sequenced them at the same time. All parameters are the same. The violin Plots are based on values presented in Supplementary Table 1. Each dot in the plot reflects an aggregated number of reads across all cells for each gene. They are all normalized across different HC types and biological repeats. The details for normalization are now provided.

      (13) The authors comment on the 16/128 motile cilia axonemal repeat genes that are not expressed in the vestibular hair cells. Listing these somewhere may be helpful to the readers.

      We thank the reviewer for this helpful suggestion. Most of the 128 motile cilia axonemal repeat genes were listed in Figs 8C and S5, along with known loss-of-function mutations and ciliopathy associations identified in human diseases or observed in animal models. To improve clarity, we have now included Table S2, which provides the complete list of all 128 motile cilia axonemal repeat genes, including those not expressed in vestibular HCs.

      (14) Figure 5D needs some refinement. While the authors used databases, including CiliaCarta, SYSCILIA gold standard, and CilioGenics, to identify the primary cilia-related genes, they have included many genes that are not highly specific to primary cilia function (e.g., HSP90, HSPA8, DNAJA4, GNAS...). Perhaps the authors would be able to do a better job of specifically querying primary cilia function by using genes that are common to these three databases.

      We presented comparison and analysis based on three major cilia databases, which are generated from proteomics of cilia from different tissues/organisms. In addition, we have provided more comprehensive list of primary cilia-related genes in Fig. S2. While majority of cilia-related genes/proteins are highly conserved, some genes/proteins are tissue-/organism-specific. Majority of the genes presented in Fig. 5D of our manuscript are shared among all three databases. The cilium is a complex structure, composed of proteins for microtubule cytoskeleton, actin cytoskeleton, vesicle transport, metabolic enzyme, signaling, and protein folding. It also contains proteins for translation, nuclear transport, ubiquitination, RNA binding as well as mitochondrial proteins and transcription factors (https://ciliogenics.com/?page=Home). Proteins such as HSP90 and HSPA8 are important for protein folding. HSPA8 also functions as an ATPase in the disassembly of clathrin-coated vesicles during transport of membrane components through the cell. GNAS is part of a G protein complex that transmits signals. DNAJA4 is one of the high-confidence cilia proteins (mean score of 1.26, expression rank is 938). These proteins are detected in cilia according to CilioGenics (https://ciliogenics.com/?page=Home). These proteins are not highly specific to cilia and are expressed in soma as well. Most of these proteins for signaling such as WNT (Supplementary Fig. 2) are detected in both cilia and soma.

      (15) The authors state, "Furthermore, we observed robust spontaneous kinocilia motility in bullfrog crista HCs and small spontaneous bundle motion in mouse crista HCs." This statement should be moderated by acknowledging that this motility was observed in only some cells. The authors favor the hypothesis that the lack of motility in some crista HCs is due to depolarization or damage to the sample. The authors should also acknowledge the possibility that there may be cell-to-cell variability in the motility of the kinocilia.

      We address these issues in public review section. We modified the statement as suggested.

      (16) The first few pages of the Results section include many lists of genes. Readability may be improved if this is curtailed modestly.

      Changed as suggested. We removed comparison among different types of HCs and replotted Fig. 2B. This has reduced the number of genes mentioned in the text.

    1. eLife Assessment

      This important work delineates layered glucose-responsive neuropeptidergic mechanisms that regulate sugar intake. Using a combination of genetic, physiological, and behavioral experiments, the authors convincingly show that Hugin- and Allatostatin A-releasing neurons suppress sugar feeding by reducing the sensitivity of Gr5a-expressing gustatory neurons. They further demonstrate that Neuromedin U neurons share key physiological properties with fly Hugin neurons, highlighting conserved peptide functions across animal phyla.

    2. Reviewer #1 (Public review):

      This revised manuscript by Qin and colleagues delineates an important neural mechanism that suppresses the intake of sugar solution in response to internal glucose level (the "brake" mechanism for sugar consumption). They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons, primarily in response to elevated level of circulating glucose. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are activated by a high concentration of hemolymph glucose, which is directly sensed by Hugin-releasing neurons in a cell-autonomous mechanism. Next, Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sweet-sensing Gr5a-expressing gustatory sensory neurons through the AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces fly's sugar intake motivation. They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostal nucleus of the solitary tract (rNST)) are also activated by high concentration of circulating glucose, independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptides in the fly is analogous to the function of NMU in the mouse.

      The authors have provided multiple lines of compelling evidence generated through rigorous and comprehensive experiments, which spans genetic abrogation, neuronal manipulation, pharmacology, and functional imaging. The authors are also receptive to the critiques and reframed the central message, such that their conclusions are soundly supported by the presented data. Importantly, the parallel study in mice adds a unique comparative perspective that makes the paper of interest to a wide range of readers.

    3. Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations, and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest in both fed and starved flies, suggesting that glucose intake through Glut1 may only be part of the mechanism. The authors address this in their discussion.

    4. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this revised manuscript, Qin and colleagues aim to delineate a neural mechanism that is engaged specifically in the sated flies to suppress the intake of sugar solution (the "brake" mechanism for sugar consumption). They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in active state when the concentration of glucose is high. This activation depends on the cell-autonomous function of Hugin-releasing neurons that sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces fly's sugar intake motivation. They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostal nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      The shift of the narrative, which focuses specifically on the hugin-AstA axis as the "brake" on the satiety signal and feeding behavior, clarified the central message of the presented work. The authors have provided multiple lines of compelling evidence generated through rigorous experiments. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers.

      While I deeply appreciate the authors' efforts to substantially restructure the manuscript, I have a few suggestions for further improvements. First, there remains room for discussion whether the "brake" function of the hugin-AstA axis is truly satiety state-dependent. The fact that neural activation (Fig. Supp. 8), peptide injection (Fig. 3A, 4A), receptor knockdown (Fig. 3C,G, 4E), and receptor mutants (Fig. Supp. 10, 12) all robustly modulate PER irrespective of the feeding status suggests that the hugin-AstA axis influences feeding behaviors both in sated and hungry flies. Additionally, their new data (Fig. Supp. 13B, C) now shows that synaptic transmission from hugin-releasing neurons is necessary for completely suppressing feeding even in sated flies. If the hugin-AstA axis engages specifically in sated (high glucose) state, disruption of this neuromodulatory system is expected to have relatively little effect in starved flies (in which the "brake" is already disengaged).

      We thank the reviewer for pointing out this inconsistency. We have corrected this interpretation. Specifically:

      (1) We removed statements suggesting that the circuit is fully disengaged during starvation.

      (2) We now state that endogenous hugin activity is reduced during starvation, but the circuit retains modulatory capacity when experimentally perturbed.

      (3) The Discussion now emphasizes that the system operates as a state-modulated inhibitory tone rather than a strictly fed-state switch.

      We believe this revised framing resolves the discrepancy.

      In this context, it is intriguing that the knockdown of PK2-R2 hugin receptor modestly but consistently decreases proboscis extension reflex specifically in starved flies (Fig. 3D, H). The manuscript does not discuss this interesting phenotype at all. Given the heterogeneity of hugin-releasing neurons (Fig. Supp. 7), there remains a possibility that a subset of hugin-releasing neurons and/or downstream neurons can provide a complementary (or even opposing) effect on the feeding behavior.

      We agree that this is an important observation. Although the effect size is modest, it is reproducible and suggests that hugin signaling may not operate as a strictly linear pathway.

      To address this:

      (1) We added a paragraph in the Results acknowledging the PK2-R2-dependent phenotype.

      (2) We included a discussion noting the potential functional heterogeneity of hugin neurons.

      (3) The schematic model (now Figure Supplementary 17, previously Figure Supplementary 16) includes a dashed line indicating a possible parallel PK2-R2-dependent branch.

      Given these intriguing yet unresolved issues, it is important to acknowledge that whether this system is "selectively engaged in fed states to dampen sweet sensation (in Discussion)" requires further functional investigations. Consistent effects of manipulation of the hugin-AstA system across multiple experimental approaches underscores the importance of this molecular circuitry axis for controlling feeding behaviors. Moderation of conclusions to accommodate alternative interpretation of data will be beneficial for field to determine the precise mechanism that controls feeding behaviors in future studies.

      We fully agree with the reviewer. Our original description of the circuit as a “satiety brake” implied exclusive engagement in fed states, which is not strictly supported by the behavioral data. Although endogenous hugin activity is elevated under fed conditions (as shown by CaMPARI), experimental manipulations demonstrate that the circuit retains functional capacity to modulate feeding behavior across feeding states.

      To address this concern, we have:

      (1) Removed the term “satiety-specific brake” throughout the manuscript.

      (2) Reframed the circuit as a glucose-responsive, state-modulated inhibitory module.

      (3) Revised the Discussion to explicitly state that the hugin–AstA pathway biases sweet sensitivity according to circulating glucose levels rather than functioning as an on/off switch.

      (4) Substantially revised Supplementary Figure 17 to reflect graded modulation across metabolic states rather than binary state engagement.

      These changes better align our conclusions with the experimental observations.

      Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations, and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest in both fed and starved flies, suggesting that glucose intake through Glut1 may only be part of the mechanism.

      We agree that the modest PER phenotype suggests that Glut1-mediated glucose uptake represents one component of glucose sensing in hugin neurons. We have clarified this in the Discussion and now explicitly state that additional glucose-sensing mechanisms may contribute to hugin activation.

      Additionally, many of the manipulations testing the "brake" circuitry throughout the study show similar effects in both fed and starved flies. This suggests that the focus of the discussion and Supplemental Figure 16 on a satiety-specific "brake" mechanism may not be fully supported by the data.

      We fully agree that the previous framing overstated state specificity.

      As described above, we have:

      (1) Removed “satiety-specific brake” terminology.

      (2) Reframed the circuit as a glucose-responsive inhibitory module.

      (3) Revised the Discussion to explicitly acknowledge modulation across feeding states.

      (4) Updated the schematic model (Figure Supplementary 17, formerly Figure Supplementary 16) accordingly.

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the authors):

      Both the reviewers and I agree that the conclusion about a "satiety-dependent" brake needs to be modified to discuss the phenotypes that are also observed under starved conditions. Reviewer 1 would further like to emphasize that the authors are not required to follow through with the specific recommendations suggested by them. Modifying the conclusion and Supplementary Figure 16 should suffice.

      We sincerely thank the Reviewing Editor for the clear guidance. We fully agree that our previous framing of the hugin–AstA circuit as a strictly “satiety-dependent” brake may have overstated the state specificity of the system.

      In response to this recommendation, we have:

      (1) Revised the Abstract, Results, and Discussion to moderate the conclusion and explicitly acknowledge the phenotypes observed under starved conditions.

      (2) Reframed the circuit as a glucose-responsive, state-modulated inhibitory module, rather than a satiety-exclusive brake.

      (3) Supplementary Figure 17 (formerly Figure Supplementary 16) has been substantially revised to illustrate graded modulation across metabolic states rather than binary engagement.

      We appreciate the clarification that no additional experiments were required and are grateful for the opportunity to improve the conceptual framing of our work.

      Please include full statistical reporting in the main manuscript (e.g., figure legends or results).

      We have revised all figure legends to include full statistical reporting.

      Reviewer #1 (Recommendations for the authors):

      By re-framing their finding as the "brake" mechanism on satiety-induced suppression of feeding behavior and sensitivity to sweet taste, the authors substantially improved the clarity of their findings and their significance. The additional data (Fig. Supp. 13B, C) allows "apple-to-apple" comparisons of behavioral data. I support the publication of this manuscript with no further experiments, although I have several suggestions for the text.

      As I write in the public review, I have a reservation on the authors' argument that hugin-AstA system is the "'satiety brake' - that is selectively engaged in fed states to dampen sweet sensation (lines 392-394)". Manipulation of both hugin system (Fig. 2C, Fig. 3A, C, D, G, Fig. Supp. 8A, C, Fig. Supp. 10A-C, Fig. Supp. 13B, C) and AstA system (Fig. 4A, E, Fig. Supp., 8C, D, Fig. Supp. 12A-C, Fig. Supp. 13D) all indicate that hugin-AstA system suppresses feeding regardless of the satiety state. Specifically, Fig. Supp. 13B shows that synaptic blockade does further increases PER, causing contradictions to authors' statements ("silencing hugin+ neurons led to enhanced sweet-driven feeding behavior (line 299-300)" and "...further silencing has little additional effect (line 402)"). The CaMPARI data (Fig. 1J) provides the link between the activity levels of hugin-releasing neurons and satiety state. However, the fact that eliminating hugin-AstA signal can promote further PER in starved flies suggests that this brake is not completely satiety-dependent. I ask authors to at least discuss this perceived discrepancy between their data and conclusions.

      Also, the authors' finding that PK2-R2 reduction actually suppresses PER specifically among starved flies (Fig. 3D, H), albeit with relatively small effect size, suggests that hugin-AstA axis is not a singular, linear pathway as authors suggest in Fig. Supp. 16. While delineating the PK2-R2-dependent pathway is beyond the scope of this study, at least a line of discussion would be helpful.

      Minor comments:

      (1) Fig. Supp. 8 (dTRPA1 activation of hugin and AstA neurons), and Fig. Supp. 13B-D (inhibition of hugin and AstA neurons) should be in the main figure given its relevance to the narrative of this manuscript.

      We agree with the reviewer regarding their importance. The key behavioral panels from these figures have now been moved to the main figures to strengthen the narrative flow.

      (2) Fig. Supp. 11 (PER and imaging using decapitated heads only), despite its creativity, leaves me wonder how PER of fly heads looks like. It is a highly artificial and invasive experiment. Supplementary movies would be helpful.

      We apologize for the lack of clarity in our description. In this experiment, flies were not decapitated. Instead, we surgically severed the connection between the brain and the ventral nerve cord (VNC), while keeping the body and proboscis musculature intact. Thus, the flies remained physically intact, and PER was measured using the same behavioral protocol as in intact animals.

      We have revised the figure legend to clarify this point and avoid confusion. Because the behavioral procedure was identical to standard PER assays and the flies retained normal proboscis motor function, we did not include supplementary videos.

      (3) Expression patterns of PK2-R1 and AstA-R2 in proboscis are mentioned in text but with no data (lines 229 and 279). I strongly encourage authors to show images.

      We have now included the relevant expression images in the revised manuscript.

      (4) A citation for the "previous study (line 486)" describing PER method is required.

      The appropriate citation has been added.

    1. eLife Assessment

      This important study developed a new sensor for TDP-43 activity that is sensitive and robust that should strongly impact the field's ability to monitor whether TDP-43 is functional or not. The evidence, though limited to cell culture, is compelling and is the first demonstration that a GFP on/off system can be used to assess genetic TDP-43 mutants as well as loss of soluble TDP-43.

    2. Reviewer #2 (Public review):

      Summary:

      The authors goals is to be develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR based assays to determine whether targets of TDP-43 were up or down regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, its cost-effective, its rapid and reliable.

      Strengths:

      In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFP-fluorescence) adding additional rigor. The final major strength i'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:

      Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed. The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogenous in the image panels, for example Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs. Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP-43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and its unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified. Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many others disorders, having these type of sensors is a major boost to field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

      Comments on revisions:

      In the revised version, most of the reviewer's comments have been appropriately addressed with the exception of 1) the use of flow sorting to improve the data analysis and 2) testing this sensor in primary neurons. The latter is the focus of an ongoing separate study. Though flow sorting would significantly strengthen this study and help others in the field to use this sensor, it is still an impactful and innovative study without it.

    3. Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well-described unique resource that would be of high interest and utility to a number of researchers validated in multiple cell types as a sensitive readout of TDP-43 loss of function. Future studies validating the utility of this biosensor in models of TDP-43 loss of function (e.g. disease iPSNs) that do not rely on TDP-43 knockdown will be of further interest.

    4. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      We thank the editor and reviewers for their thoughtful and constructive feedback, which has enabled us to greatly strengthen the manuscript. We apologize for the delay in resubmitting this as we were dealing with a large turnover in the lab due to trainee graduations which has We have carefully revised the text, figures, and supplementary materials in response to these comments. Below, we summarize the key revisions made followed by a point-by-point response to the reviewers’ critiques.

      (1) Performed CUTS analyses in human neuronal system: In the revised manuscript, we included new data demonstrating that the CUTS system can be applied to additional cellular models, specifically neuronal cells (Figure 5, Figure S4). To address whether CUTS functions effectively in neuronal contexts, we generated stable CUTS-expressing lines in differentiated BE(2)-C and ReN VM–derived differentiated neurons (Figure 5A-D, Figure S4 A-C). To ensure this was neuronal expression, we developed a new Tet-On3G system construct where the Tet-On3G transactivating protein is driven by the SYN1 promoter to ensure neuron-specific inducible expression for these experiments.

      (2) Define the relationship between CUTS and endogenous/physiological cryptic exons inclusion: To evaluate how well the CUTS system reflects physiological cryptic exon regulation, we performed RT-PCR analysis of several cryptic exons previously reported by us and evaluated CUTS activation at the RNA level in parallel (Figure S2E) . CUTS is sensitive to low-mild reductions in TDP-43 levels, whereas the tested endogenous cryptic exons exhibit variable responses to TDP-43 knockdown.

      (3) Defining stress-induced TDP-43 loss of function: We included new data demonstrating that the CUTS system can detect TDP-43 loss of function induced by acute sodium arsenite (NaAsO₂) treatment in HEK cells (Figure 3D–I). We have also tested additional stressor as part of a separate ongoing study where this work will be expanded upon (Xie et al., 2025). We selected this paradigm since TDP-43 loss of function in response to acute NaAsO₂ treatment is also supported by work from other labs(Huang et al., 2024).

      (4) Implications of using a TDP-43 Loss-of-Function sensor for therapeutic applications: In the revised manuscript, we clarify that CUTS-TDP43 is auto-regulated and we highlight two potential therapeutic applications: i) TDP-43 Knockdown-and-replacement: CUTS-TDP43 provides a strategy for simultaneous depletion of pathological TDP-43 species while enabling autoregulated re-expression of wild-type TDP-43. This design mitigates the risk of supraphysiologic overexpression, a known liability in conventional replacement approaches, by restoring TDP-43 within a self-limiting regulatory network that maintains homeostatic control. ii) Aggregation-independent correction: Because CUTS is autoregulatory, it can be repurposed to regulate alternative downstream effectors, including splicing modifiers or TDP-43 functional interactors, without expressing TDP-43 itself. This approach provides a potential aggregation-independent strategy to compensate for TDP-43 loss-of-function (LOF) by restoring downstream splicing. We are evaluating this work in a follow up study (Xie et al., 2025). In these ongoing studies, we show that CUTS-regulated expression of splicing proteins in response to TDP-43 loss restored subsets of cryptic exon events (24/28 events evaluated). These findings suggest CUTS as a versatile tool for both autoregulated TDP-43 replacement and trans-regulatory therapeutic correction. We expanded on this concept in the discussion section of this revised manuscript. We also note that autoregulatory TDP-43 biosensor strategies have been proposed in related systems, including TDP-Reg, underscoring broader interest in self-regulated TDP-43 systems (Wilkins et al., 2024).

      (5) Clarified mechanism of TDP-43 5FL causing strong loss of function: The TDP-43 5FL exhibits reduced RNA binding capacity, and we previously showed that the lack of RNA binding promotes aberrant homotypic phase separation of TDP-43 (Mann et al., 2019). Expression of RNA-deficient TDP-43 variant forms nuclear “anisomes” (Yu et al., 2021), which evidence suggests sequesters endogenous TDP-43 protein into insoluble structures. We expanded on this in our results section in this revised manuscript.

      (6) Improved figure clarity and data presentation: To enhance clarity and organization, we maintained the main structure of the manuscript while reorganizing figures and improved data visualization. Some examples include:

      Figure 1: We revised the schematic layout for greater clarity and simplicity. The figure now focuses more specifically on the CUTS data, with additional data on the UNC13A-TS and CFTR-TS moved to Figure S1. To improve readability, titles were added to all schematic panels. Visual consistency was also improved by refining the color labelling for each sensor in Figures 1C and 1D and adjusting the corresponding bar graphs accordingly.

      Figure 2: We reorganized the figure to clearly distinguish between protein and mRNA analyses for greater clarity. In the revised layout, western blot quantifications of TDP-43 and CUTS (GFP) signals are shown in Figures 2D and 2E, respectively, while the corresponding qPCR analyses are presented in Figures 2H and 2I. Minor edits include removing the percentage knockdown and fold-change annotations from the graphs and incorporating these values into a mini-table in Figure S2E.

      The original Figure 2D and 2G were reincorportated as reference panels in Figure S2A–B, while new graphs showing CUTS protein-level changes as a function of TDP-43 knockdown were added (Figure S2C–D). We also incorporated new data showing the behavior of endogenous cryptic exons under low siTDP-43 treatment (Figure S2E).

      Figure 3: We added new data demonstrating that the application of the CUTS system in detecting TDP-43 loss of function induced by stress conditions. Specifically, we show that sodium arsenite (NaAsO₂) treatment leads to TDP-43 functional impairment detectable by CUTS and supported with endogenous cryptic exon via RT-PCR (Figure 3D-I).

      Figure 5 and Figure S4: We introduced a new figure that demonstrates the effective application of the CUTS system in differentiated neuronal systems, thereby extending its usability to disease-relevant cell types.

      Figures 2SA and 4B were edited to include the corresponding labels on the sides of each image for clarity. Sup Figure 2A was moved to Sup Figure 3A, while Figure 4B remains in its original configuration.

      We thank the reviewers again for their insightful critiques and helpful suggestions, which have enabled us to substantially improve the manuscript. Please find our detailed response to each review below:

      Reviewer #1 (Public review):

      Summary:

      The authors create an elegant sensor for TDP -43 loss of function based on cryptic splicing of CFTR and UNC13A. The usefulness of this sensor primarily lies in its use in eventual high throughput screening and eventual in vivo models. The TDP-43 loss of function sensor was also used to express TDP-43 upon reduction of its levels.

      Strengths:

      The validation is convincing, the sensor was tested in models of TDP-43 loss of function, knockdown and models of TDP-43 mislocalization and aggregation. The sensor is susceptible to a minimal decrease of TDP-43 and can be used at the protein level unlike most of the tests currently employed,

      Weaknesses:

      Although the LOF sensor described in this study may be a primary readout for high-throughput screens, ALS/TDP-43 models typically employ primary readouts such as protein aggregation or mislocalization. The information in the two following points would assist users in making informed choices.

      (1) Testing the sensor in other cell lines

      We thank the reviewer for raising this important point. In agreement with this suggestion, we generated ReN VM cell lines and used a neuroblastoma cell line model (BE(2)-C) expressing the TetOn3G CUTS system under a human synapsin I (hSYN1) promoter. In this construct the transactivator protein is under the control of a neuronal specific hSYN1 promoter whereas the classical TetOn3G system uses a CMV-like promoter. Several studies have reported reduced activity or silencing of CMV and PGK-driven transgenes in neurons. Therefore, we for our neuronal experiments, we removed this promoter to generate a new version of a doxycycline-inducible CUTS system in which Tet-On 3G transactivator is now driven by the hSYN1 promoter which will express CUTS in response to doxycycline treatment. In this improved construct, we also replaced mCherry with mScarlet to enhance the fluorescent signal.

      To test this neuronal-adapted system, we established stable CUTS expression in undifferentiated BE(2)-C cells, a subclone of the SK-N-BE(2) neuroblastoma line that has been used to study TDP-43–dependent splicing function(Brown et al., 2022). This model can be differentiated into neuron-like cells within 10 days, as shown in Supplementary Figure 4A. Using this model, we confirmed that TDP-43 knockdown leads to robust activation of the CUTS system (Figure 5B-E). We additionally tested this in in a stable polyclonal ReN VM cells following differentiation into cortical-like neurons (Figure 5D, Figure S4B-C).

      (2) Establishing a correlation between the sensor's readout and the loss of function (LOF) in the physiological genes would be useful given that the LOF sensor is a hybrid structure and doesn't represent any physiological gene. It would be beneficial to determine if a minor decrease (e.g., 2%) in TDP-43 levels is physiologically significant for a subset of exons whose splicing is controlled by TDP43.

      We agree with the reviewer that correlating the sensor’s readout with physiological TDP-43 splicing targets is essential to validate its biological relevance. To this end, we complemented our sensor expression profile with endogenous cryptic exons (CEs) sensitive to TDP-43 depletion. We tested a panel of five physiological cryptic exons regulated by TDP-43 (LRP8, EPB41L4A, ARHGAP32, HDGFL2, and ACBD3). To address the reviewer’s concerned, we performed RT-PCR on samples from the low-dose siTDP-43 experiment shown in Figure S2E.

      The endogenous CEs used in the panel were selected based on our own and others’ preliminary observations. Among these, HDGFL2 showed a particularly robust increase in cryptic exon inclusion at very low siTDP-43 concentrations (38 pM), while untreated samples showed almost no CE inclusion. This finding strongly supports a direct mechanism linking mild TDP-43 reduction to loss of physiological splicing control.

      (3) Considering that most TDP-LOF pathologically occurs due to aggregation and or mislocalization, and in most cases the endogenous TDP-43 gene is functional but the protein becomes non-functional, the use of the loss of function sensor as a switch to produce TDP-43 and its eventual use as gene therapy would have to contend with the fact that the protein produced may also become nonfunctional. This would eventually be easy to test in one of the aggregation modes that were used to test the sensor.. However, as the authors suggest, this is a very interesting system to deliver other genetic modifiers of TDP-43 proteinopathy in a regulated fashion and timely fashion.

      We thank the reviewer for this thoughtful point and agree that in the disease-relevant context where endogenous TDP-43 is intact but TDP-43 function is lost due to mislocalization and/or aggregation, a re-supply of TDP-43 risks sequestration and loss of activity. In our manuscript, the CUTS-TDP43 module was presented as a control circuit proof-of-concept rather than a stand-alone approach: it demonstrates that CUTS can (i) sense LOF with high dynamic range and proportionality, and (ii) drive a payload under negative feedback such that total TDP-43 remains near baseline while partially rescuing a splicing readout (CFTR minigene) under knockdown conditions.

      Importantly, we evaluated CUTS in aggregation/mislocalization-prone contexts: ΔNLS, 5FL, and ΔNLS+5FL variants trigger CUTS activation (ref), allowing us to quantify LOF arising from these aggregation modes. This confirms that CUTS can operate precisely in the very settings where sequestration is likely to occur.

      To directly address the reviewer’s suggestion, in the revision we (i) clarify in the Discussion that CUTS-TDP43 is a circuit demonstration and not our proposed monotherapy in aggregation-dominant disease; and (ii) expand our therapeutic framing into two approaches:

      Knockdown-and-replacement: concurrently deplete aggregation-prone/endogenous pathologic TDP-43 species (i.e., mutant TDP-43) while using CUTS to re-deliver wild-type TDP-43 under autoregulation. Aggregation-independent correction: use of CUTS to deliver modifiers that bypass TDP-43 sequestration (e.g., downstream effectors or splicing correctors that restore LOF consequences without expressing TDP-43 itself).

      (4) I don't think the quantity of siRNA is directly proportional to the degree of TDP-43 knockdown/extent of TDP-43 loss. Therefore, to enhance the utility of the dose-response curves, I'd suggest using TDP-43 levels as the variable on the x-axis, rather than the amount of siRNA administered or even just adding a plot alongside the current plots would enable readers to quickly evaluate LOF response levels concerning the protein. While I understand that the sensitivity of Western blots for quantification might be why the authors have not created the graphs in this manner, having this information would be useful.

      We appreciate the reviewer’s insightful comment. As noted, in the original version of the graph, we incorporated the percentage of TDP-43 knockdown corresponding to each siTDP-43 concentration (indicated in red text). However, we agree that this format was not easy to interpret, given the amount of information presented. To address this, we generated two new plots in which the x-axis represents TDP-43 levels (percentage of remaining protein or mRNA), and the y-axis shows the fold change in CUTS signal measured by (i) TDP-43 protein pixel intensity and (ii) TDP-43 mRNA levels, respectively. These new plots are now included as Supplementary Figures 2C–D, which allow a clearer visualization of CUTS readout in relation to actual TDP-43 levels rather than siRNA dose. As the reviewer anticipated, the reason we did not originally present the data in this format was that at low siTDP-43 concentrations, the fold change is minimal and more difficult to quantify by Western blot. Nevertheless, we have now incorporated the revised plots to strengthen the interpretation of the dose–response relationship. Additionally, we experience batch effects across siRNA lots. We believe this revised format should enhance the clarity of the result.

      (5) p3 line 74: one of the reasons cited as a pitfall of using the endogenous cryptic exons exhibit variable responses to TDP-43 loss and may be cell type-specific. has the sensor been used in different cell lines?

      We tested the CUTS system in differentiated neuronal models using two differentiated neuronal cell types, BE(2)C and ReN VM cells. The results are presented in Figure 5 and Figure S4 of the revised manuscript.

      (6) The order of the text describing 1A and 1B is confusing. The text starts describing the TS cassettes referring to 1A using the CUTS cassettes which haven't been introduced yet as an example. I'd suggest reorganising this section. The graph, always in 1A showing readout proportional to GFP should be taken out or highlighted in the figure legend that it is theoretical.

      We agree with the reviewer’s point. In the original schematic (Figure 1A), we included the CUTS system as an example to introduce the TS cassette design, since it contains the three possible sensor configurations. However, we recognize that this could be confusing. Therefore, we have removed the CUTS cassette from Figure 1A, along with the theoretical graph showing GFP readout proportional to the degree of TDP-43 LOF. In agreement with this change, we also restructured Figure 1. As the focus is the CUTS system, we have moved the Western blot and quantification of UNC13A-TS and CFTR-TS to Supplementary Figure 1.

      Reviewer #2 (Public review):

      Summary:

      The authors goal is to develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR-based assays to determine whether targets of TDP-43 were up or down-regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, it's cost-effective, rapid and reliable.

      Strengths:

      In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFPfluorescence) adding additional rigor. The final major strength I'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:

      (1) Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed.

      We thank the reviewer for highlighting the importance of validating the sensor in neuronal models, given the central role of TDP-43 dysfunction in ALS/FTD and related neurodegenerative disorders. While initial characterization in established cell lines provides experimental control and scalability, we agree that demonstrating functionality in neuronal systems is essential. To address this, we adapted the CUTS platform for neuronal application by incorporating the human synapsin-1 (hSYN1) promoter into the Tet-On 3G system to enable inducible, neuronal specific expression. We validated this configuration in differentiated BE(2)-C cells (Figures 5A-C, S4A-C), where CUTS retained robust responsiveness to TDP-43 perturbation. In parallel, we generated stable CUTS-expressing ReN VM neural progenitor cells and differentiated them for three weeks prior to functional assessment (Figures 5A-C, S4A-C). In both neuronal models, CUTS was functional and responsive to TDP-43 siRNA. We are currently optimizing promoter selection and expression paradigms for fully differentiated iPSC-derived neuronal models and will be the subject of future studies.

      (2) The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogeneous in the image panels, for example, Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      We thank the reviewer for this thoughtful suggestion. We agree that flow cytometry and sorting of GFP-positive populations would provide a higher-resolution, single-cell–level relationship between TDP-43 abundance and sensor output. Such an approach would reduce heterogeneity arising from incomplete siRNA penetrance and allow more precise quantification of how incremental changes in TDP-43 protein levels track with GFP fluorescence. In the present study, our goal was to establish proof-of-principle functionality of the CUTS circuit and to demonstrate that graded TDP-43 depletion produces a proportional sensor response at the population level. While GFP signal heterogeneity is visible in imaging panels, we hypothesize that this variability likely reflects known differences in siRNA uptake and transfection efficiency rather than instability of the circuit itself. Importantly, bulk measurements consistently demonstrated dose-dependent sensor regulation across independent experiments, supporting the robustness of the system despite cellular heterogeneity. Furthermore, we were able to quantify CUTS activation in HeLa TARDBP<sup>-/-</sup> cells. We also note that CUTS was developed as a practical tool for rapid assessment of TDP-43 LOF in standard laboratory settings. Although flow cytometry increases resolution, the ability to detect functional perturbation using bulk fluorescence measurements supports the utility of the system for routine and high-throughput applications.

      We agree that flow cytometry would provide a more refined analysis of the dynamic range and sensitivity of CUTS, particularly for defining thresholds such as minimal TDP-43 knockdown required for measurable activation. We plan to include this work in future studies. Specifically, we have implemented FACs sorting of CUTS-expressing cells in a parallel study in which we are conducting a CRISPR knockout screen to identify modifiers of TDP-43 splicing function. For this, we incorporate TDP-43 knockdown followed by FACs to stratify cells based on CUTS activation. This strategy enables direct evaluation of the relationship between the extent of TDP-43 LOF and CUTS sensor activation. These analyses are ongoing and provide a more quantitative analyses linking TDP-43 depletion to CUTS activation and address the reviewer’s concern regarding heterogeneity in bulk measurements. We plan to include this in a future study.

      (3) Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs.

      We thank the reviewer for this suggestion. In response, we have split the graphs previously shown in Figures 2D and 2G to improve clarity, as we agree that these panels contained an extensive amount of data. We Specifically split Figure 2D into two separate graphs showing TDP-43 and GFP pixel intensity from Western blots on the Y-axis, plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 D and Figure 2E in the new manuscript.

      Furthermore, for Figure 2G we also split into graphs showing the fold change of mRNA for TDP-43 and the CUTS cryptic exon plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 H and Figure 2I in the new manuscript. We have maintained the previous graphs in Supplementary Figure 2 to preserve the full dataset for reference.

      (4) Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      We appreciate the reviewer’s careful observation. In both figures, we are showing mCherry and GFP signals. In the revised version, we have added the corresponding labels to the side of each image for clarity. Therefore, Sup Figure 2A has been moved and is now Sup Figure 3A, while Figure 4B remains in its original configuration.

      (5) Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and it's unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified.

      The TDP-43 5FL variant exhibits reduced RNA-binding capacity, and we previously demonstrated that impaired RNA binding promotes aberrant homotypic phase separation of TDP-43. Consistent with this mechanism, expression of RNA-binding–deficient TDP-43 variants induces the formation of nuclear “anisomes” which have been shown to sequester endogenous TDP-43 into insoluble fractions via dominant-negative mechanisms (Cohen et al., 2015; Keating et al., 2023; Mann et al., 2019; Yu et al., 2021). These findings support a model in which disruption of RNA engagement alters TDP-43 biophysical behavior and promotes functional depletion through self-association. We have expanded this mechanistic explanation in the Results section of the revised manuscript to better contextualize the behavior of the 5FL construct and its impact on endogenous TDP-43.

      (6) Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      We appreciate this suggestion and agree with this important point. Due to the lack of methods to directly induce endogenous TDP-43 aggregation and loss of function, the use of stressors has become a partial solution to address this issue. In line with this, our group has tested several stressors in follow-up research, including sodium arsenite (NaAsO₂), puromycin, KCl, MG132, sorbitol, and tunicamycin, using HEK cells expressing the CUTS system(Xie et al., 2025). We were able to show a dose-response relationship in relative GFP intensity under these conditions, with sodium arsenite showing the strongest effect, consistent with previous reports(Huang et al., 2024). To provide additional relevant findings in the current manuscript, we expanded this analysis by testing sodium arsenite in the CUTS system while also including endogenous cryptic exons. We therefore added a new figure showing the effect of sodium arsenite on the CUTS system, including GFP intensity measurements, qPCR using CUTS cryptic exon primers, and three endogenous cryptic exon reporters (ATG4B, GPSM2, and KCNQ2).

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many other disorders, having these types of sensors is a major boost to the field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

      (7) Regarding the methods, they seem a bit sparse and would benefit from additional detail. For example, I do not see a section in the methods where microscopy images were quantified (%GFP positive cells for example). This information is important and is lacking in the current form.

      We thank the reviewers, and we add the following information in the method section: For live imaging quantification, we measured the mean GFP signal intensity for each group. The values were averaged, and the fold change was calculated and plotted. For immunofluorescent imaging, we first created maximum intensity projection images. We then applied masks to the GFP, mCherry, and Hoechst signals. By overlapping the GFP and mCherry signals, we identified the number of GFP-positive cells. Similarly, by overlapping the mCherry signal with the Hoechst mask, we identified the CUTS-expressing cells. We then calculated the ratio of GFPpositive cells to CUTS-expressing cells and plotted it as a percentage of GFP-positive cells. All analyses were performed using the Nikon NIS software. This information is included in the methods of the revised manuscript.

      Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR-based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well described unique resource that would be of high interest and utility to a number of researchers. Nonetheless, a couple of points should be addressed by the authors to enhance the overall utility and applicability of this biosensor.

      (1) While the rationale for selecting UNC13A CE as the reporting CE species is understood given the relevance to disease, could the authors please comment on whether other CE sequences would behave similarly or as robustly? This is particularly critical given the multitude of different splicing changes that can occur as a result of TDP-43 loss of function (ie cryptic exons of differing sensitivity, skiptic exons, premature polyadenylation).

      We thank the reviewer for this question regarding generalizability beyond the UNC13A CE. While UNC13A was selected due to its strong disease relevance and well-characterized sensitivity to TDP-43 loss-of-function (LOF), our platform is not intrinsically restricted to this sequence. In the manuscript, we directly compared three architectures: UNC13A-TS, CFTR-TS, and the combined CUTS sensor incorporating additional UG motif optimization. Under matched conditions in stable HEK293 lines, CUTS demonstrated superior specificity and sensitivity, exhibiting near-zero baseline activity and a proportional, log-linear response across low-dose siTDP43 (38–1200 pM) (Figures 1–2). Importantly, this head-to-head comparison demonstrates that sensor performance can be engineered and optimized beyond a single CE species.

      TDP-43 LOF is known to induce a spectrum of RNA processing defects, including cryptic exons with differing sensitivities and cell-type dependence, premature polyadenylation events (e.g., STMN2), and, under conditions of excess nuclear TDP-43, exon skipping (“skiptic exons”). This diversity supports the concept in which alternative CE elements, or other TDP-43 regulated RNAs, can be incorporated into the same sensor backbone and tuned for specific biological scenarios (cell type, specific stress responses, etc...). Consistent with this, the recently described TDP-REG system (Wilkins et al., 2024) designed and AI-generated de novo CE sequences to express reporters or gene payloads, and screened multiple candidates to identify the appropriate RNA elements required for this response. These findings demonstrate that CE sequences beyond UNC13A can serve as robust TDP-43 sensing elements when optimized. Our results complement this work by demonstrating that CUTS achieves tight baseline control and a steep dynamic range (>110,000-fold induction over baseline in HEK293 cells), while maintaining compatibility across both non-neuronal and neuronal model systems, as shown in the revised manuscript.

      In the revised manuscript, we show direct comparisons indicating that CUTS outperforms single-CE sensors such as UNC13A-TS and CFTR-TS under identical conditions. This supports independent work from other groups that alternative CE sequences can be engineered into effective sensors, depending on their paradigm and model systems. We have clarified this in the revised Discussion and now note that CUTS is adaptable to alternative CE inserts.

      (3) Could the authors provide evidence of the utility of their biosensor in disease relevant systems that do not rely on TDP-43 KD? For example, does this biosensor report on TDP-43 loss of function in C9orf72 iPSNs in a time-dependent manner? Alternatively, groups have modeled TDP-43 proteinopathy in wildtype iPSNs via MG132 treatment.

      We thank the reviewer for this important suggestion. We agree that demonstrating CUTS responsiveness in disease-relevant models independent of artificial TDP-43 knockdown would further strengthen its translational relevance. In the current study, our primary objective was to establish the sensitivity, dynamic range, and autoregulatory properties of the CUTS circuit under controlled perturbation of TDP-43 levels. siRNA-mediated depletion provides a reliable approach to establish the relationship between graded TDP-43 LOF and the CUTS sensor sensitivity/specificity. That said, CUTS is designed to detect functional TDP-43 loss irrespective of the upstream cause. As the reviewer notes, disease-relevant systems, such as C9orf72 iPSC-derived neurons and proteotoxic stress paradigms (e.g., MG132-induced impairment of TDP-43 nuclear function), are important for future studies. We are currently evaluating CUTS in iPSC-derived neuronal models of TDP-43 proteinopathy, but are optimizing the induction system, promoters, and timing. It should be noted that C9orf72 iPSC neurons do not exhibit TDP-43 LOF using standard differentiation protocols. Regarding pharmacological stress, we have shown that acute sodium arsenite treatment can activate CUTS (Figure 3). In a concurrent study under revision, we show that MG132 similarly causes TDP-43 LOF and CUTS activation (Xie et al., 2025). Notably, none of these induce complete nuclear loss of TDP-43; instead, they show nuclear TDP-43 retention or modest mislocalization. This suggests that TDP-43 LOF may also result from nuclear redistribution and dysfunction under these stress conditions, rather than from complete nuclear loss. We look forward to presenting these ongoing studies in the future.

      References

      Brown A-L, Wilkins OG, Keuss MJ, Kargbo-Hill SE, Zanovello M, Lee WC, Bampton A, Lee FCY, Masino L, Qi YA, Bryce-Smith S, Gatt A, Hallegger M, Fagegaltier D, Phatnani H, NYGC ALS Consortium, Newcombe J, Gustavsson EK, Seddighi S, Reyes JF, Coon SL, Ramos D, Schiavo G, Fisher EMC, Raj T, Secrier M, Lashley T, Ule J, Buratti E, Humphrey J, Ward ME, Fratta P. 2022. TDP-43 loss and ALS-risk SNPs drive mis-splicing and depletion of UNC13A. Nature 603:131–137. doi:10.1038/s41586-022-04436-3

      Cohen TJ, Hwang AW, Restrepo CR, Yuan C-X, Trojanowski JQ, Lee VMY. 2015. An acetylation switch controls TDP-43 function and aggregation propensity. Nat Commun 6:5845. doi:10.1038/ncomms6845

      Huang W-P, Ellis BCS, Hodgson RE, Sanchez Avila A, Kumar V, Rayment J, Moll T, Shelkovnikova TA. 2024. Stress-induced TDP-43 nuclear condensation causes splicing loss of function and STMN2 depletion. Cell Rep 43:114421. doi:10.1016/j.celrep.2024.114421

      Keating SS, Bademosi AT, San Gil R, Walker AK. 2023. Aggregation-prone TDP-43 sequesters and drives pathological transitions of free nuclear TDP-43. Cell Mol Life Sci 80:95. doi:10.1007/s00018-023-04739-2

      Mann JR, Gleixner AM, Mauna JC, Gomes E, DeChellis-Marks MR, Needham PG, Copley KE, Hurtle B, Portz B, Pyles NJ, Guo L, Calder CB, Wills ZP, Pandey UB, Kofler JK, Brodsky JL, Thathiah A, Shorter J, Donnelly CJ. 2019. RNA Binding Antagonizes Neurotoxic Phase Transitions of TDP-43. Neuron 102:321-338.e8. doi:10.1016/j.neuron.2019.01.048

      Wilkins OG, Chien MZYJ, Wlaschin JJ, Barattucci S, Harley P, Mattedi F, Mehta PR, Pisliakova M, Ryadnov E, Keuss MJ, Thompson D, Digby H, Knez L, Simkin RL, Diaz JA, Zanovello M, Brown A-L, Darbey A, Karda R, Fisher EMC, Cunningham TJ, Le Pichon CE, Ule J, Fratta P. 2024. Creation of de novo cryptic splicing for ALS and FTD precision medicine. Science 386:61–69. doi:10.1126/science.adk2539

      Xie L, Zhu Y, Hurtle BT, Wright M, Robinson JL, Mauna JC, Brown EE, Ngo M, Bergmann CA, Xu J, Merjane J, Gleixner AM, Grigorean G, Liu F, Rossoll W, Lee EB, Kiskinis E, Chikina M, Donnelly CJ. 2025. Contextdependent Interactors Regulate TDP-43 Dysfunction in ALS/FTLD. BioRxiv. doi:10.1101/2025.04.07.646890

      Yu H, Lu S, Gasior K, Singh D, Vazquez-Sanchez S, Tapia O, Toprani D, Beccari MS, Yates JR, Da Cruz S, Newby JM, Lafarga M, Gladfelter AS, Villa E, Cleveland DW. 2021. HSP70 chaperones RNA-free TDP-43 into anisotropic intranuclear liquid spherical shells. Science 371. doi:10.1126/science.abb4309.

    1. eLife Assessment

      This valuable study addresses mechanisms of feedback inhibition between planar cell polarity protein complexes during convergent extension movements in Xenopus embryos. The authors propose a conceptually new model, in which non-canonical Wnt ligand stimulates transition of Dishevelled from its complex with Vangl to Frizzled, with essential roles of Prickle and Ror in this process. The main observations supporting molecular interactions rely on modest but significant changes in protein association in response to Wnt11. While the study is limited due to insufficient phenotypic analysis at the cellular level and the use of exogenously supplied proteins, this work is convincing and will be of broad interest to cell and developmental biologists.

    2. Reviewer #1 (Public review):

      Summary:

      Planar cell polarity core proteins Frizzled (Fz)/Dishevelled (Dvl) and Van Gogh-like (Vangl)/Prickle (Pk) are localized on opposite sides of the cell and engage in reciprocal repression to modulate cellular polarity within the plane of static epithelium. In this interesting manuscript, the authors explore how the anterior core proteins (Vangl/Pk) inhibit the posterior core protein (Dvl). The authors propose that Pk assists Vangl2 in sequestering both Dvl2 and Ror2, while Ror2 is essential for Dvl to transition from Vangl to Fz in response to non-canonical Wnt signaling.

      Strengths:

      The strengths of the manuscript are found in the very interesting and new concept along with supportive data for a model of how non-canonical Wnt induces Dvl to transition from Vangl to Fz with an opposing role for PK and Vangl2 to suppress Dvl during convergent extension movements. Ror is key player required for the transition and antagonizes Vangl.

      Weaknesses:

      In addition to general whole embryo morphology that is used as evidence for CE defects, two forms of data are presented: co-expression and IP, as well as IF of exogenously expressed proteins. The microscopy would benefit from super-resolution microscopy since in many cases the differences in protein localization are not very pronounced, and Western analysis data often show relatively subtle differences. Thus, future work will determine the strength of the interactions of the model.

      Major points.

      Overexpression conditions

      A possible concern is that most analyses were performed with overexpression conditions. PCP core proteins (Vangl2, Pk, Dvl, and Fz receptors) are known to display polarized subcellular localization in both the neural epithelium and DMZ explants (Ref: PCP and Septins govern the polarized organization of the actin cytoskeleton during convergent extension, Current Biology, 2024). However, in this study, overexpressed PCP core proteins failed to show polarized localization. Thus, one must be careful in interpreting data.

      Subtle effects

      Several of the reported results show quite modest changes in imaging and immunoprecipitation analyses, which are supportive of the proposed molecular model, but future experiments will be needed to robustly test the model.

    3. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      Reviewer #1 (Public review):

      The weaknesses are in the clarity and resolution of the data that forms the basis of the model. In addition to general whole embryo morphology that is used as evidence for CE defects, two forms of data are presented, co-expression and IP, as well as a strong reliance on IF of exogenously expressed proteins. Thus, it is critical that both forms of evidence be very strong and clear, and this is where there are deficiencies; 1) For vast majority of experiments general morphology and LWR was used as evidence of effects on convergent extension movements rather than keller explants or actual cell movements in the embryo. 2) the microscopy would benefit from super resolution microscopy since in many cases the differences in protein localization are not very pronounced. 3) the IP and Western analysis data often shows very subtle differences, and some cases not apparent.

      Major points.

      (1) Assessment of CE movement

      The authors conducted an analysis of the subcellular localization of PCP core proteins, including Vangl2, Pk, Fz, and Dvl, within animal cap explants (ectodermal explants). The authors primarily used the length-to-width ratio (LWR) to evaluate CE movement as a basis for their model. However, LWR can be influenced by multiple factors and is not sufficient to directly and clearly represent CE defects. While the author showed that Prickle knockdown suppresses animal cap elongation mediated by Activin treatment, they did not test their model using standard assays such as animal cap elongation or dorsal marginal zone (DMZ) Keller explants. Furthermore, although various imaging analyses were performed in Wnt11-overexpressing animal caps and DMZ explants, the Wnt11-overexpressing animal caps did not undergo CE movement. Given that this study focuses on the molecular mechanisms of Vangl2 and Ror2 regulation of Dvl2 during CE, the model should be validated in more appropriate tissues, such as DMZ explants.

      (2) Overexpression conditions

      Another concern is that most analyses were performed with overexpression conditions. PCP core proteins (Vangl2, Pk, Dvl, and Fz receptors) are known to display polarized subcellular localization in both the neural epithelium and DMZ explants (Ref: PCP and Septins govern the polarized organization of the actin cytoskeleton during convergent extension, Current Biology, 2024). However, in this study, overexpressed PCP core proteins failed to show polarized localization. Previous studies, such as those from the Wallingford lab, typically used 10-30 pg of RNA for PCP core proteins, whereas this study injected 100-500 pg, which is likely excessive and may have created artificial conditions that confound the imaging results.

      (3) Subtle and insufficient effects

      Several of the reported results show quite modest changes in imaging and immunoprecipitation analyses, which are not sufficient to strongly support the proposed molecular model. For example, most Dvl2 remained localized with Fz7 even under Vangl2 and Pk overexpression (Fig. 4). Similarly, Wnt11 overexpression only slightly reduced the association between Vangl2 and Dvl2 (Sup. Fig. 8), and the Ror2-related experiments also produced only subtle effects (Fig. 8, Sup. Fig. 15).

      We thank reviewer 1 for careful reading of our revised manuscript, and additional constructive criticisms. Since the two reviewers had divergent opinions towards our revised manuscript, we think that it might be more productive to request a Version of Record at this point, and have our proposed model debated/ tested by others in the field. We will keep the reviewer’s suggestions in mind while design ongoing studies. We would like to address the criticisms collectively below:

      (1) The primary goal of our current manuscript is to build a mechanistic model for non-canonical Wnt signaling through elucidating the functional relationships between Dvl, Vangl, PK and Ror during CE. They each have been studied extensively in prior literature using DMZ injected embryos, and DMZ, Keller and animal cap explants, so there is little doubt that the reduced LWR following their over-expression or knockdown in DMZ is due to disruption of CE. In the context of our study in the current manuscript, we primarily performed their co-injections in different combinations to differentiate synergistic vs. antagonistic relationship, and in the majority cases we relied on epistatsis to draw conclusions (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). Nevertheless, we did follow the reviewer’s suggestion and used animal cap elongation as an additional assay to confirm that Pk and Vangl2 did synergize to disrupt CE, and their synergy could be blocked by Dvl2 co-overexpression; the new data is added to Fig. 1 (Fig. 1h, h’). Therefore, given the prior literature, our new animal cap explant data, and the specific scope of our current study, we feel that the LWR measurement is a reasonable assay to determine CE phenotype in this manuscript. We fully agree with the reviewer that our model will need to be tested at the cellular level through live imaging of DMZ explants; it is indeed the direction of our future study, but is beyond the scope of the current manuscript.

      (2) A salient feature of non-canonical Wnt signaling is that loss or over-expression of any components can often cause identical CE defects at the tissue/ embryo level. We used many co-injection experiments to demonstrate that this is due, at least in part, to a counterbalance between Dvl/Ror and Vangl/PK (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). It is in this context that we planned the imaging and biochemical experiments to determine the possible molecular mechanisms underlying their functional interaction, and we feel that the moderate over-expression used is reasonable in this case for us to build the first integrated model. We do plan to test our model using lower expression in the future. To acknowledge the limitation of our study, we also added the following sentences in the Discussion:

      “We acknowledge, however, that our model explains primarily the potential molecular actions underlying the regulation of CE at the tissue level. Whether and how our model may explain the cellular behavior during CE, such as polarized remodeling of cell junction or extension of cell protrusions, will require further study.”

      (3) The Wnt11 induced reduction of Dvl2-Vangl2 co-IP (Suppl. Fig. 8, 15) may be moderate, but is statistically significant and reproducible, and we have reported similar findings in two other publications (DOI: 10.1093/hmg/ddx095; DOI: 10.1038/s41467-025-57658-0). Given the limitation of co-IP, we had to rely on high level over-expression to make the experiments feasible. We are building proximity based assays such as NanoBRET, and plan to verify the result with lower level expression in the future.

      Reviewer #2 (Public review):

      We thank the reviewer for the encouraging comments, and the suggestion to clarify the description related to Suppl. Fig. 15. We made revision according to the reviewer’s suggestion, and added Suppl. Fig. 16 to further examine the effect of Ror2 knockdown on the steady state interaction between Dvl2 and Vangl2 using imaging approach.

    1. eLife Assessment

      In this important study, a new multi-scale imaging workflow promises to accelerate and democratize comparative connectomics, with projectome-level data informing synapse-level connectivity. While the pipeline and time savings are convincing, the evidence for the segmentation methodology as a reusable community resource is incomplete, with key metrics like error rates, annotation times, and proof-reading times not reported. Furthermore, the evidence on the utility of projectome-level information for analysing brains appears misleading. By clarifying the findings and ensuring that the complete software pipeline is available in online open source repositories alongside precise documentation, the authors would deliver on their vision to enable any laboratory to map and analyse brain connectomes.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents an end-to-end pipeline, intended to accelerate EM-based connectomics by combining low-resolution imaging for large volumes with synapse-level imaging only in selected regions of interest. In principle, this strategy can substantially reduce imaging time, computational demands, analysis time, and overall cost.

      General note:

      Overall, I found the manuscript interesting and valuable, particularly as a description of how one laboratory has assembled and applied a practical workflow to reconstruct and analyze the central complex across multiple insect species. In that sense, the work is compelling as an account of a real, functioning strategy for comparative connectomics, and I appreciated reading it. My main reservation is not about the relevance of the biological problem or the utility of the pipeline in the authors' own hands, but about whether the manuscript, in its current form, fully meets the expectations of a paper that is focused on tools and resources. The expectation would be that this paper would be a venue for sharing new techniques, software tools, datasets, and other resources intended to be usable by the community. Here, because much of the pipeline appears to build on existing methods and software, the key value added should be a particularly clear demonstration of how these components were adapted, integrated, validated, and documented for this specific use case in a way that others could realistically reproduce and adopt. At present, that translational and reproducibility-oriented component does not yet seem sufficiently developed, despite the clear promise of the overall approach.

      Major comments:

      (1) The work is valuable as a practical integration and application of multiple existing tools into a coherent pipeline, together with a new multi-resolution imaging strategy. However, the manuscript at times reads as though it introduces an entirely novel workflow. I would encourage the authors to clarify the contribution more explicitly: which components are genuinely new (for example, the acquisition strategy and the end-to-end integration/validation), and which are adaptations of already established methods or software. This would make the scope and novelty of the paper easier to assess.

      (2) The most distinctive element is the multi-resolution acquisition strategy. However, as described, the selection of high-resolution regions seems to be decided a priori based on anatomy (guided by xCT localization of the CX), rather than being determined automatically from the data (i.e., ROI placement is anatomy-driven rather than data-driven). A more data-driven or machine learning-guided ROI strategy would strengthen the methodological contribution and the adaptability to new scenarios, along the lines of approaches such as SmartEM [1].

      (3) The manuscript emphasizes open-source availability and reduced barriers to entry, but the current software release, as referenced, does not yet appear to support straightforward external reuse. Since much of the pipeline builds on existing methods, the main added value lies in how these technologies were adapted, combined, and validated for the present problem. A clear and complete explanation of this adaptation is therefore essential, but is currently missing. I would suggest the following concrete improvements:<br /> a) Provide a single landing page or umbrella repository that links each pipeline step in the paper to the corresponding codebase, including version tags/commits and expected inputs/outputs for each step.<br /> b) Include step-by-step tutorials for each component.<br /> c) Provide an example dataset together with a full reproduction walkthrough in a controlled environment.<br /> d) Clearly explain the required parameters and configuration for each step, including how they should be adjusted for other datasets or scenarios.<br /> e) Follow packaging and distribution best practices (for example, PyPI/conda releases, Docker containers, and version pinning).

      (4) In my own attempt to set up and run parts of the released code, I encountered issues that currently limit reproducibility. For example, when creating an environment for EMalign (https://github.com/Heinze-lab/EMalign), the required Python version is not specified, and installation did not succeed under Python 3.12 due to dependency constraints. Additionally, synful_312 (https://github.com/Heinze-lab/synful_312) and SegToPCG (https://github.com/Heinze-lab/SegToPCG) appear to be empty despite being referenced in the manuscript. These are fixable issues, but addressing them is important if the paper is to deliver on its "low entry cost" claim.

      (5) Table 1 reports acquisition times, which is helpful. However, the multi-resolution approach adds essential processing steps that appear due to the strategy followed (e.g., "XY alignment high-res" and "high-res to low-res alignment"). Please include registration/alignment (and other major post-processing) runtimes and resource requirements, such as storage, in a comparable table so readers can assess true end-to-end cost.

      References:

      [1] Meirovitch, Y., et al. "SmartEM: machine learning-guided electron microscopy." Nature Methods (2025).

    3. Reviewer #2 (Public review):

      Summary:

      The paper proposes a workflow to accelerate EM connectomics by combining multi-scale imaging with image processing and analysis (image alignment, registration, neuron tracing, automated segmentation and synapse prediction, proof-reading) to derive a brain region connectome. The paper argues and (partially) demonstrates that this approach facilitates comparative connectomics.

      The data acquisition pipeline uses a well-established sample preparation protocol, uCT guided acquisition, and SBEM imaging at cellular and synaptic resolution.

      Data processing and analysis combine existing state-of-the-art components and focus on the alignment and complementary analysis of the two SBEM resolution levels. The paper applies the workflow to the central complex of six different insects and performs some preliminary analysis based on this (which is acceptable for a resource/tool).

      Disclaimer for the rest of the review: I am an expert in image analysis and segmentation, so I have mainly focused on these aspects as I am not qualified to analyze the details of image acquisition.

      Strengths:

      The paper addresses an important problem and promises an acceleration and democratization of comparable connectomics. The time savings of the imaging approach are well-motivated and derived. The methods used for image alignment, segmentation, synapse detection, and proofreading are state-of-the-art.

      Weaknesses:

      I see two major weaknesses in the paper:

      (1) The paper introduces the (approximate) equivalence of the projectome and connectome in the insect brain very prominently in the introduction and uses this as a central motivation for the multi-resolution image acquisition protocol. But - to me - it is unclear how this principle is really used in the analysis presented in the last results and if this assumption is evaluated at all. Specifically, Figure 4 a shows the anatomical neuron reconstructions (from cellular resolution SBEM), d-g show connectome-level analysis from the synaptic resolution data. The only link I can see between the two is that the neural processes in the synapse-resolution data can be mapped to the neurons from the cellular resolution data, thanks to the image alignment. This is certainly important, BUT it is only tangentially related to the projectome vs. connectome claim from the introduction. This claim implies that a tentative connectome is derived from projectome-level data (e.g. by assuming a uniform probability of synapse-formation given surface or distance between projections) that is then validated by the "true" connectome data from synaptic resolution. Instead, what is actually solved - to my understanding - is mapping the local connectome to the projectome. While related, these are different things and the current framing of the paper and the quite brief description of the section on comparative connectomics (also no corresponding Methods section) make this claim inadequately supported.

      (2) Reporting on segmentation and proofreading is purely qualitative. Given that this is claimed as a core contribution of the paper (e.g. statement in line 497 and following), I would expect substantially more reporting and evaluation of this claim:<br /> a) Report the actual time needed for proofreading the segmentations in CAVE. I could not find any numbers on this.<br /> b) Report the initial segmentation quality of the model: How many errors does it make? Note: There is a brief mention of VoI-based quantification in Methods (around line 1060), but the results are not reported.

      What should be done: Report the error rates (with an accurate measure such as skeleton VoI) independently for all 6 volumes. Given that the authors have the proofread versions, this is feasible. Only then can the claims be made here be evaluated. Note that the F1-score of synapse prediction is quantified. This is a good starting point, but could also be extended to further species in order to assess the actual transferability. Furthermore, none of the data from the study seems to be available. The training data of the network has to be made available. If possible, high-resolution data should be proofread too.

      Further points:

      (1) Why isn't reconstruction at the cellular level addressed with ML? This is surely possible and should be easier than the full connectome analysis. Similar to before, the actual times needed for tracing with CATMAID are not reported; the manuscript only states that this can be done in minutes for a neuron, but it's unclear if this is the best or average case. It would help to have quantitative numbers to assess whether automation would bring any benefits.

      (2) Finally, regarding the underlying software. I did not try this myself due to time constraints, but did check the repositories. They seem to be in an ok state with some documentation in a README. However, given the central role of the software contribution, I would expect a centralized doc page that explains how to use the different parts of the software, including a full example with sample data. Without this, application by other labs - a central claim - will be difficult.

    4. Author Response:

      Public Review:

      On behalf of all authors I would like to thank the reviewers for highly constructive and helpful comments, which, once addressed fully, will make the paper stronger and more useful as a tools and resources contribution.

      Besides addressing all minor issues that were pointed out by the reviewers, we see three main lines of changes we will need to pursue in order to address all major concerns. We plan to do all of these as fast as possible. Given that new alignments, segmentation and tracing is needed, this will take between one and three months.

      (1) Availability of code, software documentation and accessibility of pipeline. 

      Both reviewers and the editorial summary agreed that we need to improve the availability of our code, provide more instructions and examples of how to use the code, and make our methods more reusable to outsiders. To achieve this we will follow the suggestions made by the reviewers, in particular the list presented by reviewer 1 (point three of weaknesses in the public review).

      We firstly would like to apologize for the faulty link to the SegToPCG (https://github.com/Heinzelab/SegToPCG) repository (the correct name and link is: LSDtoPCG and https://github.com/Heinze-lab/LSDtoPCG) as well as the missing code in the https://github.com/Heinze-lab/synful_312 repository; these issues have already been fixed and will be included in an updated bioRxiv version.

      Second, we will generate an overarching umbrella page that will serve as a go-to site for any user who would like to implement our pipeline. To enable implementation, we will expand the documentation, provide detailed instructions, and include an example dataset with these instructions.

      (2) Quantification of analysis steps, including segmentation, alignment and manual tracing, to validate our claims of increased efficiency and transferability across species.

      As for point 1, both reviewers as well as the editorial summary highlighted the need for more comprehensive quantification of the workflow, especially with respect to segmentation quality as well as time investment into manual tracing and high resolution alignments. In particular, these data should validate the transferability of the segmentation models across species, and support the claims made about the time savings resulting from using our multiresolution workflow compared to a whole sample synaptic resolution approach.

      To this aim, we will generate all analyses according to the reviewer suggestions and incorporate the resulting data in new figures and tables. To make the data fully comparable across species, we will apply the latest version of our alignment and segmentation scripts to at least one high resolution data stack of each species, quantify manual tracing of a comparable, defined set of neurons in each species, and perform VOI analyses of each species segmentation against manually traced neurons in identically sized testing volumes in each dataset. Additionally, we will proof-read identical branches of homologous neurons in each species and quantify the required number of edits from raw segmentation output to completion.

      As the segmentation pipeline has evolved over the last years, a fair comparison between all datasets requires fresh analysis based on the latest version of our machine learning models (cannot be done with existing data) and will therefore take a few weeks of time.

      (3) Clarification of aims for multi-resolution pipeline and how projectomes and connectomes inform each other

      Reviewer 2 highlighted that there is not sufficient clarity about the aims of combining projectome and connectome. Judging from the reviewer comment, we might have inadvertently left the impression that we aimed at predicting a connectome from projectome data, by using spatial proximity of neurons as a proxy for connectivity. In fact, our data show that this is not possible, and that projection level data cannot predict connectivity. For instance, in the head direction system, the projectivity data suggests identical circuits for bees and flies (except at the edges of the ring), but connectivity data shows that the components of the ring attractor circuit are forming circuits that are distinctly different between the species (despite the same neurons with the same projection patterns being involved).

      What we aim to do is slightly different. We define global patterns of information flow using the projectome, and then define circuits in a part of this global circuit at synaptic level. Then, we extrapolate the global connectivity by assuming that the circuits identified in one or two computational units (columns) are repeated in each column. This rests on the assumption that the same neurons form the same connections in each repeated module, as long as the cellular repertoire is identical (verified by the projectome), but does not use proximity data to predict connectivity. This method thus only applies to brain regions that consist of repeated computational modules, i.e. where we can assume that knowing the connectivity in one of them allows extrapolation to the entire brain region. While this is a simplification, the Drosophila CX has in principle confirmed this assumption.

      We will generate a new figure in which we illustrate the process of combining local connectomes and global projectomes using examples from our data, but illustrating this schematically also for other brain regions, e.g. the insect optic lobe or the cerebral cortex of mammals. We will also carefully rewrite the relevant text passages to avoid misunderstandings.

      Overall, we would like to thank the reviewers again for their thorough and detailed comments, which will help to make our connectomics workflow more accessible and reproducible.

    1. eLife Assessment

      This manuscript demonstrates the feasibility and potential value of using functional MRI in awake, behaving mice, enabling assessment of distributed brain activity during ongoing behavior in a manner analogous to human fMRI. The valuable findings suggest that the periaqueductal gray (PAG), a midbrain structure classically linked to threat processing and aversive learning, also contributes to reversal learning. If supported, this result would carry theoretical and practical implications for our subfield by expanding the computational roles attributed to the PAG and motivating cross-species circuit-level investigations. However, the strength of evidence is, at present, incomplete, and several key claims are only partially supported by the current analyses.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to determine the neural networks involved in updating behaviour by training mice on a 'go / no go' odour discrimination task, and measuring their brain activity using functional MRI.

      Strengths:

      The use of the translationally relevant 'go / no go' task is a major strength, as this is a task that can be used as readily in humans as in animals such as mice. The use of fMRI in awake, behaving mice is also a major strength, as this allows the activation of multiple brain regions to be measured while behaviour is ongoing, and also facilitates comparison to human studies. The computational modelling approaches further support these translational aims, again being as readily applied to human data as to animal data.

      Weaknesses:

      The major weakness of the paper - and one that is potentially addressable - is that the key analysis of the paper, showing that the periaqueductal gray (PAG) is recruited for reversal learning, is only partially supported by the data presented in the paper as it stands. The authors have used a sophisticated way of analysing the behavioural data using 'signal detection theory', in which they collected behavioural data showing correct 'go' responses ('hits'), correct 'no go' responses ('correct rejections'), missed 'go' responses ('misses') and go responses when mice should have withheld a response ('false alarms'). The data presented showing a double dissociation in the activation of the nucleus accumbens for 'hits' but not 'correct rejections' and the PAG for 'correct rejections' but not 'hits' is very interesting; however, it is confounded by the fact that the nucleus accumbens may activate when the animal makes a response, and the PAG when the animal withholds a response. If the authors also included the analysis of nucleus accumbens and PAG activation for 'misses' and 'false alarms', this would allow them to determine whether the activation of these regions reflects the behavioural response or the expectation of reinforcement from the response.

      Thus, the paper includes very interesting data and is impressive in its approach to analysing behaviour in a manner that is highly translatable between species. The additional analyses would markedly strengthen the paper and would add depth to the finding that the PAG appears to be involved in behavioural flexibility.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors test the hypothesis that whole-brain functional magnetic resonance imaging in behaving mice, coupled with reinforcement-learning modeling, can dissociate neural substrates of initial cue-reward acquisition versus contingency reversal, and potentially reveal underappreciated contributors to cognitive flexibility. Using a head-fixed go/no-go odor discrimination task with subsequent rule reversal in a subset of mice, they model trial-by-trial state-action values with a model-free Q-learning algorithm (hierarchical Bayesian fit) and use the model-derived decision variable as a parametric regressor in whole-brain analyses. They report that acquisition-related signals prominently involve ventral and dorsal striatal regions, whereas reversal learning additionally recruits the periaqueductal gray (negative correlation with the decision variable) and shows an apparent double dissociation between nucleus accumbens and periaqueductal gray responses for hit versus correct-rejection outcomes during reversal.

      Strengths:

      (1) The reversal manipulation is implemented without explicit punishment, targeting suppression of previously rewarded actions under reward omission - an underexplored regime for midbrain contributions beyond canonical threat/pain framing.

      (2) The manuscript provides a credible MR-compatible olfactory/licking platform with synchronized sniff/lick/valve/reward timing and high-field imaging, supporting feasibility and broader utility for mesoscale systems neuroscience in rodents.

      (3) Trial-by-trial value estimates from a Q-learning variant are fit via hierarchical Bayesian inference and explicitly integrated into subject-level general linear models with a mouse hemodynamic response function, which is appropriate for leveraging within-subject dynamics in small-N rodent fMRI.

      (4) The decision-variable maps during acquisition recover expected basal ganglia involvement (including nucleus accumbens and dorsal striatum), providing face validity; the reversal-stage map yields an interpretable set of cortical/striatal/pallidal regions plus periaqueductal gray/hippocampus.

      (5) The finite impulse response analysis stratified by behavioral outcomes (hit, false alarm, correct rejection, miss) adds interpretability beyond the model regressor alone, and the reported crossover interaction between nucleus accumbens and periaqueductal gray is potentially impactful if robust.

      Weaknesses:

      (1) The core claim regarding selective periaqueductal gray engagement rests on a subset of n = 6 mice for reversal. With permutation-based whole-brain inference and very small cluster sizes, the robustness of the periaqueductal gray effect to reasonable analytic perturbations is not yet convincing. I would suggest providing leave-one-animal-out analyses for the periaqueductal gray cluster/ROI effects and reporting how often the key findings survive.

      (2) The authors note that due to temporal resolution and hemodynamics, they cannot separate stimulus, choice, and feedback and therefore model "whole trials." This limitation creates ambiguity about whether periaqueductal gray signals reflect value updating, action inhibition (no-lick), reward omission, autonomic arousal, or motor preparation/withholding, especially given the strong hit versus correct-rejection opponency. I would suggest adding targeted analyses that disambiguate "withholding" from "reversal-related updating".

      (3) ROIs are defined from the whole-brain decision-variable maps and then interrogated by outcome types; the manuscript acknowledges non-independence. This can inflate apparent dissociations. It would be better if the authors define ROIs independently (anatomical periaqueductal gray/nucleus accumbens masks, or split-half ROI definition with held-out data) and repeat the key ROI conclusions.

      (4) The reversal group is a subset of the acquisition cohort and also experiences a different task phase structure and additional sessions; the paper attempts to address exposure differences descriptively. I would suggest that the authors formally test whether periaqueductal gray effects are explained by session count, time-in-scanner, or learning rate differences (e.g., include these as covariates, or match sessions more strictly).

      (5) The platform records sniffing and licking, but the imaging models described include motion, global, and ventricle regressors and do not clearly include trialwise lick/sniff covariates. Given the periaqueductal gray's known autonomic and defensive coordination roles, physiological state confounding is a major concern. Could the authors incorporate sniff and lick metrics (and their derivatives) as nuisance regressors and show whether the periaqueductal gray effects persist?

    1. eLife Assessment

      This multi-omics study provides a comprehensive characterization of the context-dependent roles of the JAK-STAT pathway (JSP) across different cellular compartments within the breast cancer microenvironment. The authors present convincing evidence that high JSP activity paradoxically drives anti-tumor cytotoxicity in T cells but promotes malignancy and immunosuppression in tumor epithelial cells, leading to the fundamental discovery that broad JAK-STAT inhibition could be therapeutically counterproductive. Ultimately, the identification of the immune-related JSP score and the STAT4 axis as predictive biomarkers for anti-PD-1 immunotherapy response, particularly in triple-negative breast cancer, offers critical insights for precise patient stratification and targeted therapeutic interventions.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript, Zhou and colleagues present a detailed look at how the JSP functions differently in the various cells of a breast tumor. The authors have effectively shown that the JSP acts as a double-edged sword, as it helps T cells fight cancer but also allows tumor cells to grow and avoid ferroptosis. These findings are important because they identify a useful biomarker to predict how TNBC patients might respond to PD-1 inhibitors.

      Strengths:

      This work is important because it provides a clear explanation for the conflicting roles of the JSP in the tumor environment. The evidence is solid, as it combines data from thousands of patients with single-cell analysis and lab experiments to confirm the role of STAT4 in cancer progression and immunity.

      Weaknesses:

      However, there are areas for improvement in the scope of the review, the depth of analysis, and the potential for broader clinical implications. The authors are encouraged to address these issues to enhance the scientific and clinical impact of the study.

      Major Issues:

      (1) The authors demonstrate that STAT4 upregulates SLC47A1, but this is currently supported only by expression correlation and western blot data. To confirm a direct link, the authors are encouraged to perform ChIP-qPCR or luciferase reporter assays to show that STAT4 binds directly to the SLC47A1 promoter.

      (2) The conclusion that the MIF-CD74 axis drives immunosuppression is based on computational inference. To support this, the authors could consider mining publicly available breast cancer spatial transcriptomics data to show the co-localization of MIF and CD74. Alternatively, performing simple dual-color immunofluorescence staining on a few clinical sections would effectively demonstrate the physical proximity of these cells.

      (3) TNBC is highly heterogeneous and includes subtypes like mesenchymal and immunomodulatory groups. The authors should analyze whether the JSP score or STAT4 levels vary significantly between these subtypes, as this could further refine the selection of patients for JAK1 inhibitors.

      (4) While the JSP score works well in the current datasets, the authors should consider validating its predictive accuracy in additional independent immunotherapy cohorts, such as the TONIC trial, to ensure the biomarker is robust across different treatment settings.

      Minor Issue:

      The manuscript mentions a U-shaped trajectory of JSP activity during tumor transition. A more detailed biological explanation of why the pathway activity initially drops and then rises would add depth to the discussion.

    3. Reviewer #2 (Public review):

      Summary:

      The JAK-STAT pathway (JSP) exhibits cell-type-specific functional heterogeneity in breast cancer. This study investigates the JSP in breast cancer and its response to anti-PD‑1 immunotherapy. JSP displays distinct cell‑type heterogeneity: it promotes malignant phenotypes and immunosuppression in tumor cells, while enhancing cytotoxicity and reducing exhaustion in T cells. Elevated JSP expression correlates with improved immunotherapy responses, especially in triple‑negative breast cancer. These findings highlight the paradoxical roles of JSP, indicating that broad inhibition may compromise anti‑tumor immunity.

      Strengths:

      The major strengths of this study include the comprehensive characterization of JSP heterogeneity across epithelial, tumor, and T cells in breast cancer. The identification of JSP and STAT4 as predictive biomarkers for immunotherapy response, particularly in triple‑negative breast cancer, provides clinically relevant insights for patient stratification.

      Weaknesses:

      The findings rely heavily on public dataset analyses.

    4. Reviewer #3 (Public review):

      Summary:

      This multi-omics study by Zhou et al elucidates the context-dependent roles of the Janus kinase-signal transducer and activator of transcription (JAK-STAT) pathway (JSP) across different cellular compartments in the breast cancer tumor microenvironment. While bulk JSP activity is associated with a favorable prognosis, single-cell analysis reveals a paradoxical landscape: high JSP in T cells drives anti-tumor cytotoxicity and reduces exhaustion, whereas high activity in tumor epithelial cells promotes malignancy and immunosuppression via the MIF-CD74 signaling axis. The JSP score (immune-related) serves as a robust predictive biomarker for response to anti-PD-1 immunotherapy, particularly in triple-negative breast cancer (TNBC). Furthermore, the study identifies the STAT4/SLC47A1 axis as a critical mechanism through which tumor cells resist ferroptosis, facilitating disease progression. These findings suggest that broad JAK-STAT inhibition may be counterproductive in cancer therapeutics; instead, therapeutic success depends on precise modulation and carefully timed interventions to preserve its T-cell-associated functions. This study may inspire future studies to explore specific factors that selectively modulate JAK-STAT activity in immune cells to achieve favorable therapeutic outcomes.

      Strengths:

      Significant therapeutic implications.

      Weaknesses:

      Limited molecular mechanisms.

    1. eLife Assessment

      It remains unclear how human antibody-secreting cells (ASCs) differentiate. In this study, the authors discovered a CD30⁺ intermediate subset that appears during the transition from B cells to ASCs, providing a potential ontogeny for extra-germinal center B cell differentiation. This study is useful because it identifies novel intermediate markers that enable tracking of human ASC ontogeny, offering new insights into ASC development. However, the evidence is incomplete, and we see three major limitations: (1) the data are largely representative, requiring additional reproducibility; (2) the bioinformatics analysis is limited; and (3) step-wise phenotypic validation would require lineage-tracing experiments on sorted populations.

    2. Reviewer #1 (Public review):

      Summary:

      Fields et al. investigated the heterogeneity and kinetics of human antibody secreting cell (ASC) differentiation by analyzing ex vivo tonsil samples and using in vitro differentiation modeling. They discovered that a CD30+ intermediate subset emerges in transition from B cell to ASC in both contexts, but not from germinal centers, and they identified cytokines that promote this state. They also identified an isoform of CD44, CD44v9, that is expressed on some ASCs.

      Strengths:

      The strengths are the novelty of the findings and the identification of two new markers that may be useful for tracking ASC heterogeneity.

      Weaknesses:

      However, some of this work seems preliminary and would need to be further validated. Some of the data presented was only representative, with limited controls and biological repeats, limiting the interpretation. For example, the role of Mef2c for CD30 expression was not robustly demonstrated. It was not clear if Figure 1 scRNAseq/ATACseq was from multiple donors or just one. Future studies may extend these novel findings and determine the functional relevance of these factors, CD30, and CD44v9 for ASC differentiation and physiology.

    3. Reviewer #2 (Public review):

      Summary:

      Bhattacharya and colleagues here use cell culture, single-cell RNA and ATACseq sequencing of such in vitro cultures and of ex vivo isolated B-lineage cells to infer an ontogeny for extra-germinal centre B cell differentiation. The manuscript presents a useful potential ontogeny for plasma cells, wherein in vitro cultured naïve human B cells enter a CD30+ intermediate state before moving in subsequent days through a CD44v9+ state before ultimately obtaining a 'mature' antibody-secreting plasma cell phenotype. Ex vivo isolated germinal centre B cells obtain the plasma cell state without expressing CD30 in their development. Phenotype analysis of tonsillar B-lineage cells supports the same phenotype conversion in vivo, although the intermediate cell population was smaller in vivo. The link to CD44v9 expression on developing plasma cells is inferred to be for extra-GC (T-independent) responses, but the data presented leave this equivocal, and the functional importance of developing via a CD30+CD44v9+ intermediate is not investigated.

      Strengths:

      The article presents a solid potential ontogeny for PC development, wherein some differentiating B cells acquire a CD30+ state, transition through a CD44v9+CD30+ state, then downmodulate CD30 before obtaining canonical CD38+ 'PC' status. A strength is the integration of in vitro cultured B cell results with tonsillar B-lineage cell data sets, and careful flow cytometry of the in vitro cultures over several days to infer lineage. The data provide reasonable support for the concept. CD30+ cells are shown to develop readily from naïve B cells in culture, but uncommonly from GC B cell cultures. A nice piece of data is Figure 6B, which shows reasonably strong correlative changes in phenotype through the assumed ontogeny, and this fits with the expected trajectory of maturation.

      Weaknesses:

      The most important weakness throughout is the non-absolute nature of the relationship. An example is seen in that the sorted ex vivo GC B cells also give rise to the 'extra-GC' phenotype of plasma cell, suggesting that while the profile is enriched, it is not absolute. There is a further weakness, as while cultures are run for several days, division-associated shifts in PC phenotype are not mapped; such would greatly strengthen the weight of the argument, and show conditional shifts in phenotype associated with division, an uncontrolled parameter in the mix. For example, for the MEF2C A388 inhibition experiments, it would be strong evidence of the pathway/process contributing if a by-division peak increase in CD30+ population was demonstrated in the early days of culture.

      There are some basic sort experiments performed (e.g. 3C-3F), which show that the CD30+ cells do give rise to PC preferentially, but what is missing is the step-wise phenotype shifts in these sorted populations, which should support the trajectory shown in Figure 3B and (the in vitro equivalent of) 6B. It would emphatically support the trajectory to show the cellular phenotypes on the PC with sorting based on CD30, CD44v9, CD27, and CD20 expression, and following outcome phenotypes 24-48 hours later, if the inferred maturation trajectory is true.

      There are also specific weaknesses with the bioinformatics, in that, while the analyses are likely appropriate, unpresented data is necessarily used to shape the argument. For example, Figure 1C shows bubble plots for two plasma cell sets, yet, of archetypal PC-expressed genes, only IRF4 is demonstrated to confirm they are true PC, and the gene is not universally expressed in cells in the clusters. For this figure, it would help to expand the bubble plot to show J-CHAIN, XBP-1, CIITA and PRDM1 or other appropriate PC demarcating molecules. Similarly, in Fig 2B, more evidence of a bifurcation in state is needed than that CD44v9 distinguishes PC1 from PC2 clusters-this is the stated conclusion, but 2A depicts that 50% of PC1 relatively weakly express CD44, while <25% of PC2 express it. Demonstrating additional molecules or genes distinguishing the clusters would improve veracity. Figure 2F shows clonal lineages, but it would be helpful to see somatic hypermutation burdens and learn if they differ between the demarcated subsets. I also find the pseudotime analyses of limited value, as some of the branches follow trajectories that are unrealistic biologically, so less weight should be placed on the pathways to which they do or do not point (i.e., the notion that GC B cells do or do not give rise to particular PC subsets).

      Statistically, some of the experiments are single wells from single donors, so there is a low level of confidence and no reproducibility demonstrated for some aspects of the study, which is a weakness.

      Paradoxical to the argument that it is the TI response process being modelled, it is presented that CpG stimulation, plus proxy T cell help (CD40L), drives the CD30+ phenotype best with the addition of the GC-associated cytokine IL-21. This should be carefully considered and discussed.

      Overall, in addition to presenting more contextual information from the bioinformatics, the best way to solidify the data set, in my vie,w would be to revisit the hypothesis with two additional experimental approaches: (1) to incorporate division tracing into the ontogeny studies and (2) to perform lineage tracing on sort-purified populations at different stages of the maturation process.

    1. eLife Assessment

      This important study offers insights into the anatomical and physiological features of cold-selective lamina I spinal projection neurons. The evidence supporting the authors' claims is convincing, although including a larger sample size and more quantification would have strengthened the study, and the claims of monosynaptic connectivity would benefit from further experimental evidence. The work will interest those in the field of somatosensory biology, especially researchers studying spinal cord dorsal horn circuits and projection neuron cell types.

    2. Reviewer #1 (Public review):

      Summary:

      Spinal projection neurons in the anterolateral tract transmit diverse somatosensory signals to the brain, including touch, temperature, itch, and pain. This group of spinal projection neurons is heterogeneous in their molecular identities, projection targets in the brain, and response properties. While most anterolateral tract projection neurons are multimodal (responding to more than one somatosensory modality), it has been shown that cold-selective projection neurons exist in lamina I of the spinal cord dorsal horn. Using a combination of anatomical and physiological approaches, the authors discovered that the cold-selective lamina I projection neurons are heavily innervated by Trpm8+ sensory neuron axons, with calb1+ spinal projection neurons primarily capturing these cold-selective lamina I projection neurons. These neurons project to specific brain targets, including the PBNrel and cPAG. This study adds to the ongoing effort in the field to identify and characterize spinal projection neuron subtypes, their physiology, and functions.

      Strengths:

      (1) The combination of anatomical and physiological analyses is powerful and offers a comprehensive understanding of the cold-selective lamina I projection neurons in the spinal cord dorsal horn. For example, the authors used detailed anatomical methods, including EM imaging of Trpm8+ axon terminals contacting the Phox2a+ lamina I projection neurons. Additionally, they recorded stimulus-evoked activity in Trpm8-recipient neurons, carefully selected by visual confirmation of tdTomato and GFP juxtaposition, which is technically challenging.

      (2) This study identifies, for the first time, a molecular marker (calb1) that labels cold-selective lamina I projection neurons. Although calb1+ projection neurons are not entirely specific to cold-selective neurons, using an intersectional strategy combined with other genes enriched in this ALS group or cold-induced FosTRAP may further enhance specificity in the future.

      (3) This study shows that cold-selective lamina I projection neurons specifically innervate certain brain targets of the anterolateral tract, including the NTS, PBNrel, and cPAG. This connectivity provides insights into the role of these neurons in cold sensation, which will be an exciting area for future research.

      Weaknesses:

      (1) The sample size for the ex vivo electrophysiology conducted on the calb1+ lamina I projection neurons (Figure 5) is limited to a total of six recorded neurons. Given the difficulty and complexity of the preparation, this is understandable. Notably, since approximately 87% of lamina I projection neurons heavily innervated by Trpm8+ terminals are calb1+, these six recordings of such neurons in Figure 4E could also be calb1+.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors took advantage of a semi-intact ex vivo somatosensory preparation that includes hindlimb skin to characterize the response of projection neurons in the dorsal horn of the spinal cord to peripheral stimulation, including cold thermal stimuli. The main aim was to characterize the connectivity between peripheral afferents expressing the cold sensing receptor TRPM8 and a set of genetically tagged neurons of the anterolateral system (ALS). These ALS neurons expressed high levels of the calcium binding protein calbindin 1.

      In addition, combining different viral tracing methods, the authors could identify the anatomical targets of this specific subset of projection neurons within the brainstem and diencephalon.

      Strengths:

      The use of a relatively new (seldom used previously) transgenic line to label TRPM8-expressing afferents, combined with the genetic characterization of a previously identified subset of projections neurons add specificity to the characterization. The transgenic line appears to capture well the subpopulation of Trpm8-expressing neurons.

      In addition, the use of electron microscopy techniques makes the interpretation of the structural contacts more compelling

      The writing is clear and the presentation of findings follows a logical flow.

      Overall, this study provides solid, novel information about the brain circuits involved in cold thermosensation.

      Weaknesses:

      In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recordedd neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      The authors acknowledge that, technically, this is a very difficult preparation with very low yield as far as obtaining successful recordings. Moreover, the tissue needs to be maintained at room temperature which is obviously not ideal when characterizing cold thermoreceptors due to the unavoidable effects of low temperature on cold-activated receptors.

    4. Reviewer #3 (Public review):

      Summary:

      Razlan and colleagues provide a detailed anatomical characterization of lamina I projection neurons in the mouse spinal cord that are densely innervated by primary afferents activated by cooling of the skin. The authors validate a Trpm8-Flp mouse line, show synaptic contacts between Trpm8⁺ boutons and projection neurons at the ultrastructural level, and demonstrate at the physiological level that these neurons specifically respond to cooling stimuli. Next, by taking advantage of previous transcriptomic analysis of ALS neurons, the authors identify calbindin as a marker for cold activatetd lamina I projection neurons and map their ascending projections to the rostral lateral parabrachial area, caudal periaqueductal gray, and ventral posterolateral thalamus, well-known thermosensory and thermoregulatory centers. Altogether, these findings provide strong anatomical and functional evidence for a direct line of transmission from Trpm8⁺ sensory afferents through Calb1⁺ lamina I neurons to key supraspinal centers controlling perception of cold and thermoregulatory responses.

      Strengths:

      The combination of mouse genetics, electron microscopy, ex-vivo physiology, optogenetics and viral tracing provides convincing evidence for a direct cold pathway. The work validates the Trpm8-Flp line by extensive anatomical and molecular characterization. Integration with previous transcriptomic and anatomical data, neatly links the cold-selective lamina I neurons to a molecularly defined cluster of ALS neurons, strengthening the bridge between molecular identity, anatomy, and physiological function.

      Weaknesses:

      The main limitation remains the relatively small number of neurons that could be recorded electrophysiologically. While understandable given the complexity of the preparation, this necessarily limits generalization.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The sample size for the ex vivo electrophysiology is small. Given the difficulty and complexity of the preparation, this is understandable. However, a larger sample size would have strengthened the authors' conclusions.

      We appreciate that the sample size is small, but this was limited by the technical difficulty and relatively low yield with this preparation. From a total of 16 experiments, we were able to obtain successful recordings in 6 cases, and these provided the characterisation of the 11 cells reported in Figure 4. We believe that this is sufficient to “strongly suggest” that the cells with dense Trpm8 input correspond to cold-selective cells. We have toned down the statements in the abstract (line 23) and the Results section (line 246).

      (2) The authors used tdTomato expression to identify brain targets innervated by these coldselective lamina I projection neurons. Since tdTomato is a soluble fluorescent protein that fills the entire cell, using synaptophysin reporters (e.g., synaptophysin-GFP) would have been more convincing in revealing the synaptic targets of these projection neurons.

      As the Reviewer says, tdTomato labelling fills the entire cell. However, examination at high magnification reveals numerous varicosities along the labelled axons, presumably corresponding to synaptic boutons. We now illustrate this in Figure 6–figure supplement 2F.

      In addition, we have provided further evidence that these varicosities correspond to (glutamatergic) synaptic boutons by immunostaining sections through the LPB for the postsynaptic density protein Homer1, and showing Homer1 puncta apposed to varicosities (Figure 6–figure supplement 2 G,H). This new information now appears in the Results section (lines 374-380).

      (3) The summary cartoon shown in Figure 7 can be misleading because this study did not determine whether these cold - selective lamina I projection neurons have collateral branches to multiple brain targets or if there are anatomical subtypes that may project exclusively to specific targets. For example, a recent study (Ding et al., Neuron, 2025) demonstrated that there are PBN-projecting spinal neurons that do not project to other rostral brain areas. Furthermore, based on the authors' bulk labeling experiments, the three main brain targets are NTS, PBNrel, and cPAG. The VPL projection is very sparse and almost negligible.

      We agree that branches to different brain nuclei may originate from specific subsets of ALS3 neurons and this is now stated in the figure legend. It is true that there are projections to other brain regions (including NTS). These are not included in the diagram, because their circuitry in relation to cold-sensing is less well understood. Although the projection to VPL from lumbar cord is sparse, this is likely to be explained by the very low proportion of lamina I projection neurons with axons that reach the thalamus. Our retrograde tracing data (e.g. Figure 6-figure supplement 4) had already revealed many cells in the C7 segment that were densely coated with Trpm8 afferents and retrogradely labelled from the lateral thalamus. We have carried out additional experiments in which AAV1.Cre<sup>ON</sup>.td Tomato was injected into the cervical enlargement of Calb1<sup>Cre</sup> mice.This resulted in much denser labelling in the VPL and PoT thalamic nuclei, supporting the suggestion that cold-selective lamina I neurons in the cervical enlargement project to these nuclei. This is now described in lines 381-387 and illustrated in Figure 6–figure supplement 3.

      Reviewer #2 (Public review):

      (1) In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      We fully accept that the sample size is small (please see response to Reviewer 1 above). We also accept that the thermal stimulation was not that well controlled. Unfortunately, commercially available probes for controlling skin temperature are too large to apply to the skin in this preparation. For this reason, we have used application of hot and cold saline, as in our previous studies with this preparation.

      (2) The authors could provide some sense of the effort needed to record from the 6 coldactivated neurons described. How many preparations were needed, etc?

      We now state that 6 out of 16 experiments resulted in successful recordings for this part of the study (lines 858-861).

      Reviewer #3 (Public review):

      (1) While anatomical evidence for direct synaptic connectivity between Trpm8+ afferents and lamina I projection neurons is compelling, a physiological demonstration of strict monosynaptic transmission is not shown. The conclusion that these inputs are exclusively monosynaptic should be toned down. Similarly, the statement that "Lamina I ALS neurons that are surrounded by Trpm8 afferents are cold-selective" should also be toned down as only a few neurons have been tested and it cannot be excluded that other neurons with similar characteristics may be polymodal.

      We have now carried out optogenetic experiments by expressing channelrhodopsin in Trpm8 afferents and retrogradely labelling ALS neurons with tdTomato. This has allowed us to directly demonstrate monosynaptic input. This is described in the Results section (lines 180-202) and the Methods section has been updated. As noted above, we have toned down the statement about lamina I neurons surrounded by Trpm8 afferents being coldselective (line 246).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The patch innervation of Trpm8+ sensory neurons in lamina I of the spinal cord dorsal horn is interesting. Do they occupy specific areas within lamina I along the mediolateral axis, or are their placements random? Quantifying the distribution of these terminals in lamina I might be worthwhile.

      Although we have not studied the mediolateral distribution systematically, it appears that the locations of the patches in the mediolateral axis is random, and they could be seen in medial, central and lateral parts of lamina I (as shown in Figure 2). We have added a comment to this effect in the Results section (lines 114-116). Quantifying Trpm8 terminals would be very labour-intensive, and we do not feel that this would be of great benefit.

      (2) Quantification for the percentage of Trpm8+ boutons contacting Phox2a+ neurons that are vGlut3+

      The main purpose of this part of the study was to provide a possible explanation for the finding by Li et al (2015) that some lamina I cells were associated with Vglut3-

      immunoreactive boutons. We found that the percentages of Trpm8+ boutons that contained Vglut3 varied considerably from cell to cell, and this is now stated in the text (lines 133134). However, knowing exact proportions was not an important aspect of the study, we have therefore not carried out a detailed analysis.

      (3) Quantification for the percentage of PBN projections neurons densely innervated by Trpm8+ axons that are calb1+.

      As requested, we have carried out immunohistochemistry to determine the proportion of lamina I ALS neurons with dense Trpm8 input that are calbindin-immunoreactive. We examined 31 neurons from 3 different mice and found that all but 4 (i.e. 87%) were immunoreactive. This is now described (lines 287-293) and illustrated (Figure 5–figure supplement 1). We have now put the electrophysiological characterisation that was in this figure into a separate supplement (Figure 5–figure supplement 2).

      (4) It might be helpful to confirm the brain projection targets of Cal1b+ lamina 1 projection neurons using AAV1-CreON-Synaptophysin-GFP (or other fluorescent proteins) injections

      Please see our response to Public review Reviewer 1 comment 2 above. We have provided further evidence that the brain regions that received input from the Calb1+ cells contain axonal boutons (lines 374-380 and Figure 6–figure supplement 2F-H).

      (5) Figure 6 - Figure Supplements 3 and 4 are duplicated

      We apologise for this duplication, which was made in error in the version originally submitted to eLife. This has now been corrected.

      Reviewer #2 (Recommendations for the authors):

      (1) As mentioned, in the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low, some recorded in current clamp, a few in voltage clamp. This prevents any solid statistical evaluation of the findings

      Please see response to response to the first point made by Reviewer 1 in the Public reviews. As stated above, we have toned down the statement about the relationship between cells with dense Trpm8 input and cold-selective cells (line 246).

      (2) In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the synaptic connection between afferents and ALS projection neurons.

      Please see our response to the Public review comment made by this Reviewer.

      (3) Line 35. In the description of the anterolateral system and the effects of lesions, the species(s) should be specified since rodents and humans have a different anatomical distribution of spinal tracts.

      We now state that while ALS axons ascend in the anterolateral quadrant in humans, they are located in the dorsolateral white matter in rodents (lines 40-42)

      (4) To describe the semi-intact preparation used for recording and stimulation from the periphery, the authors cite a study by Julien Allard (reference 25). However, that study describes an in vivo preparation. I believe there is an error in the citation.

      We thank the Reviewer for pointing this out – it has now been corrected.

      (5) Line 726. Dorsal horn recordings were performed at 25 ºC. What is the temperature of the skin? How would this low temperature affect the excitability of cold afferents and their axons? Perhaps a comment about this issue would be appropriate.

      The skin temperature in this preparation is the same as that of the spinal cord (25 °C). At this temperature, Trpm8 afferents would be active, but are likely to have adapted during the course of the experiment. Since this temperature is below 37 °C, it is likely that the conduction velocity of these afferents will be slower than in the in vivo situation. We have added a comment to this effect (lines 818-821).

      (6) Line 401. The authors could not detect Trpv1-immunoreactivity in the central terminals of Trpm8Flp;RCE:FRT mice. Could they detect Trpv1 immunoreactivity in any central terminal? Do they have positive evidence that their immunostaining worked?

      Trpv1 was readily detected in central terminals with the Trpv1 antibody. An example showing lack of detectable Trpv1-immunoreactivity in GFP-labelled (Trpm8-expressing) afferents is now shown in Figure 2–figure supplement 1K-M.

      (7) Line 437. What is the expected anterograde transport time for YFP from the lumbar cord to the brainstem? Are 2-3 weeks not sufficient based on the literature? I noticed the authors are using longer survival times after intraspinal injections

      In preliminary experiments for a previous study Substance P-expressing excitatory interneurons in the mouse superficial dorsal horn provide a propriospinal input to the lateral spinal nucleus | Brain Structure and Function we had found that a 2 week survival time after injection of AAV1.Cre<sup>ON</sup>.GFP into the lumbar spinal cord of Tac1<sup>Cre</sup> mice was not sufficient to label axons in the brain, although at 4 weeks we saw brain labelling. We have also found that extending survival times from 4 to 6 weeks gives greatly improved labelling, especially in the thalamus.

      (8) Figure 5A. Many of the labelled cells appear to have the somas in the white matter, which makes little sense. It seems the reference section to plot the cells is not optimal

      The placement of cells is accurate. Many spinal projection neurons are present outside the main region of grey matter (i.e. laminae I-X). These cells are found in 2 main regions – the lateral spinal nucleus (LSN) and the lateral reticulated part of lamina V. These two regions are intermediate between grey and white matter – i.e. they contain scattered cell bodies amongst a dense collection of axons. For this reason they appear outside the grey/white border as it is conventionally shown on diagrams of this type. This has been reported in numerous studies, e.g. see Figure 2 in The cells of origin of the spinothalamic tract of the rat: a quantitative reexamination - PubMed.

      (9) Recent transcriptomic studies suggest the presence of more than one subpopulation of Trpm8-expressing DRG or trigeminal neurons. It is unclear to what extent the Trpm8-Flp line is capturing this diversity.

      We are aware that there are at least 3 transcriptomic subsets of Trpm8-expressing primary sensory neurons. However, we are not aware of any suitable molecular markers that would allow us to discriminate between them, and therefore address this point.

      (10) Could the patchy distribution of Trpm8 afferents in lamina I reflect incomplete recombination; the empty spaces could be occupied by unmarked afferents?

      In theory it could, but this seems unlikely. The Trpm8<sup>Flp</sup> line (crossed with RCE:FRT) captures ~83% of Trpm8-positive cell bodies, and it seems very unlikely that the remaining 17% of Trpm8-expressing afferents would fill the spaces between GFP bundles that we see in lamina I. This is now stated in the Results section (lines 116-120).

      Reviewer #3 (Recommendations for the authors):

      (1) It would be a nice addition to the validation of the Trpm8-Flp line to specify what ages (if multiple) have been analysed and whether there are any differences. In addition, is labelling different at different levels of the spinal cord, and is there any labeling in supraspinal regions?

      The tissue used for this part of the study was obtained from mice aged 5-9 weeks and this is now stated (lines 78-79). We did not observe any differences with age, but we did not look at this in detail. Labelling was similar at different levels of the spinal cord, and this is stated (lines 108-109). We have added a brief account of the distribution of GFP labelling in the brain (lines 140-144).

      (2) Line 169. It is not clear how ALS neurons are labeled. It is explained in the material and methods (I believe it is AAV9.mCherry into the LPB or CVLM). Although I could not find a mention of a tdTomato AAV, maybe I missed it. In any case, it would be great to have the experimental strategy briefly explained in the text. For the same reason, I would recommend moving Figure 4 Supplement 1A and 1B schematics to the main figure, very helpful for understanding the experiment.

      We thank the Reviewer for this suggestion. We now explain in the Results section how the ALS neurons were labelled (lines 209-212), and as the Reviewer recommends we have put the schematic diagrams from Figure 4–figure supplement 1 into the main Figure. As noted in the text, the tdTomato labelling resulted from injection of an AAV coding for Cre into mice that contained the Ai9 allele. We have also updated the descriptions of brain injections in the Methods section to cover the new experiments (optogenetics, and calbindin immunohistochemistry).

      (3) Line 184. "Figure 4" would be good to specify the panels; I believe it should be 4A-C. Same for line 194, 4D-F?

      We apologise that this was omitted from the original version – we have now specified the panels.

      (4) Line 179. It would be great to specifiy in the text and figures the temperature used for hot and warm water. In addition, would the responses be different using different temperatures? Can you test ramps? These would go a great way to compare with responses shown in vivo by Ran and colleagues.

      We now specify the hot and cold saline temperatures used to stimulate the skin in the semiintact preparation in the legend for Figure 4 and in the Results section (lines 222-223). As noted above, it is difficult to use more accurate thermal stimuli in this preparation. Please see response to Reviewer 2 public comment 1.

      (5) Figure 4-Figure supplement 1F. It looks like these are very slow responses (1 sec?) for monosynaptic connectivity.

      In this figure (now part 1D) the action potential frequency was determined from counts of APs in 1 sec bins, and this is now stated in the legend. This might have given the impression of slow responses.

      (6) Line 203. I would tone down the statement, as only 6 cells "that were clearly associated with numerous GFP-labelled afferents" have been tested. Thus, it cannot be excluded that other cells with similar anatomical characteristics may also respond to other stimuli

      As requested, we have toned down this statement (line 246).

      (7) Line 230. Here AAV11.CreON.td Tomato is used, in previous retrograde experiments, AAV9 has been used (Figure 4), why the switch to 11? Is the tropism the same? Is it possible that because you are using a different serotype, you are targeting different neurons?

      We have found that although AAV9 coding for fluorescent proteins is very good for retrograde labelling, AAV9 coding for Cre-dependent constructs (e.g. AAV.Cre<sup>ON</sup>.tdTomato) gives very poor recombination in spinal projection neurons, for reasons that we do not understand. We recently became aware of the AAV11 serotype, which was recommended as being suitable for retrograde transport AAV11 enables efficient retrograde targeting of projection neurons and enhances astrocyte-directed transduction | Nature Communications. We have found that this works very well for labelling ALS cells throughout the spinal cord when using Cre-dependent constructs. We have added a reference to this paper at this point in the text. We are not able to say whether tropism is the same or different, but in each case many ALS neurons (including many of those in lamina I) are captured.

      (8) Line 234. Is there any positional organization for the "tdTomato-labelled cells densely innervated byTrpm8 afferents", do they preferentially cluster in some position of lamina I?

      These cells are found throughout the mediolateral extent of the dorsal horn, and this is now stated (lines 279-280).

      (9) Line 237. The actual number of cells/mm would be informative.

      This would be difficult to estimate, as the sections were cut in the horizontal plane, which means that lamina I can appear on a variable number of sections.

      (10) Line 249. From the figures, the action potentials of the Calb+ neurons seem to have a delayed onset (at the end of cold saline treatment, Figure 5, Supplement 1l) compared to lamina I ALS neurons recorded in Figure 4, Supplement 1f. If real, it is an interesting difference in the time-course of response that could indicate different coding properties e.g., response to cooling (general neurons) vs. response to absolute temperature (calb + neurons).

      As for Fig 4-figure supplement 4 (see response to point #5 above), action potential frequency was determined from APs counted in 1 sec bins, and this is now stated in the legend.

      (11) Figure 7. In the model, the disynaptic pathway should also be shown

      We have added a comment to the legend stating that there may also be indirect (“polysynaptic”) input from Trpm8 afferents to ALS3 neurons.

    1. eLife Assessment

      This study offers valuable insights into the anatomical and physiological features of cold-selective lamina I spinal projection neurons. The evidence supporting the authors' claims is convincing, although including a larger sample size and more quantification would have strengthened the study further, and the claims of monosynaptic connectivity would benefit from being stated more cautiously. The work will interest those in the field of somatosensory biology, especially researchers studying spinal cord dorsal horn circuits and projection neuron cell types.

    2. Reviewer #1 (Public review):

      Summary:

      Spinal projection neurons in the anterolateral tract transmit diverse somatosensory signals to the brain, including touch, temperature, itch, and pain. This group of spinal projection neurons is heterogeneous in their molecular identities, projection targets in the brain, and response properties. While most anterolateral tract projection neurons are multimodal (responding to more than one somatosensory modality), it has been shown that cold-selective projection neurons exist in lamina I of the spinal cord dorsal horn. Using a combination of anatomical and physiological approaches, the authors discovered that the cold-selective lamina I projection neurons are heavily innervated by Trpm8+ sensory neuron axons, with calb1+ spinal projection neurons primarily capturing these cold-selective lamina I projection neurons. These neurons project to specific brain targets, including the PBNrel and cPAG. This study adds to the ongoing effort in the field to identify and characterize spinal projection neuron subtypes, their physiology, and functions.

      Strengths:

      (1) The combination of anatomical and physiological analyses is powerful and offers a comprehensive understanding of the cold-selective lamina I projection neurons in the spinal cord dorsal horn. For example, the authors used detailed anatomical methods, including EM imaging of Trpm8+ axon terminals contacting the Phox2a+ lamina I projection neurons. Additionally, they recorded stimulus-evoked activity in Trpm8-recipient neurons, carefully selected by visual confirmation of tdTomato and GFP juxtaposition, which is technically challenging.

      (2) This study identifies, for the first time, a molecular marker (calb1) that labels cold-selective lamina I projection neurons. Although calb1+ projection neurons are not entirely specific to cold-selective neurons, using an intersectional strategy combined with other genes enriched in this ALS group or cold-induced FosTRAP may further enhance specificity in the future.

      (3) This study shows that cold-selective lamina I projection neurons specifically innervate certain brain targets of the anterolateral tract, including the NTS, PBNrel, and cPAG. This connectivity provides insights into the role of these neurons in cold sensation, which will be an exciting area for future research.

      Weaknesses:

      (1) The sample size for the ex vivo electrophysiology is small. Given the difficulty and complexity of the preparation, this is understandable. However, a larger sample size would have strengthened the authors' conclusions.

      (2) The authors used tdTomato expression to identify brain targets innervated by these cold-selective lamina I projection neurons. Since tdTomato is a soluble fluorescent protein that fills the entire cell, using synaptophysin reporters (e.g., synaptophysin-GFP) would have been more convincing in revealing the synaptic targets of these projection neurons.

      (3) The summary cartoon shown in Figure 7 can be misleading because this study did not determine whether these cold-selective lamina I projection neurons have collateral branches to multiple brain targets or if there are anatomical subtypes that may project exclusively to specific targets. For example, a recent study (Ding et al., Neuron, 2025) demonstrated that there are PBN-projecting spinal neurons that do not project to other rostral brain areas. Furthermore, based on the authors' bulk labeling experiments, the three main brain targets are NTS, PBNrel, and cPAG. The VPL projection is very sparse and almost negligible.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors took advantage of a semi-intact ex vivo somatosensory preparation that includes hindlimb skin to characterize the response of projection neurons in the dorsal horn of the spinal cord to peripheral stimulation, including cold thermal stimuli. The main aim was to characterize the connectivity between peripheral afferents expressing the cold-sensing receptor TRPM8 and a set of genetically tagged neurons of the anterolateral system (ALS). These ALS neurons expressed high levels of the calcium-binding protein calbindin 1.

      In addition, combining different viral tracing methods, the authors could identify the anatomical targets of this specific subset of projection neurons within the brainstem and diencephalon.

      Strengths:

      The use of a relatively new (seldom used previously) transgenic line to label TRPM8-expressing afferents, combined with the genetic characterization of a previously identified subset of projection neurons, adds a specificity to the characterization. The transgenic line appears to capture well the subpopulation of Trpm8-expressing neurons

      In addition, the use of electron microscopy techniques makes the interpretation of the structural contacts more compelling.

      The writing is clear, and the presentation of findings follows a logical flow.

      Overall, this study provides solid, novel information about the brain circuits involved in cold thermosensation.

      Weaknesses:

      In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      The authors could provide some sense of the effort needed to record from the 6 cold-activated neurons described. How many preparations were needed, etc?

    4. Reviewer #3 (Public review):

      Summary:

      Razlan and colleagues provide a detailed anatomical characterization of lamina I projection neurons in the mouse spinal cord that are densely innervated by primary afferents activated by cooling of the skin. The authors, building on their previous anatomical work, validate a Trpm8-Flp mouse line, show synaptic contacts between Trpm8⁺ boutons and projection neurons at the ultrastructural level, and demonstrate at the physiological level that these neurons specifically respond to cooling stimuli. Next, by taking advantage of their previous transcriptomic analysis of ALS neurons, they identify calbindin as a marker for cold-activated lamina I projection neurons and map their ascending projections to the rostral lateral parabrachial area, caudal periaqueductal gray, and ventral posterolateral thalamus, well-known thermosensory and thermoregulatory centers. Altogether, these findings provide strong anatomical and functional evidence for a direct line of transmission from Trpm8⁺ sensory afferents through Calb1⁺ lamina I neurons to key supraspinal centers controlling perception of cold and thermoregulatory responses.

      Strengths:

      The combination of mouse genetics, electron microscopy, ex vivo physiology, and viral tracing provides convincing evidence for a direct cold pathway. The work validates the Trpm8-Flp line by extensive anatomical and molecular characterization. Integration with previous transcriptomic and anatomical data neatly links the cold-selective lamina I neurons to a molecularly defined cluster of ALS neurons, strengthening the bridge between molecular identity, anatomy, and physiological function.

      Weaknesses:

      While anatomical evidence for direct synaptic connectivity between Trpm8+ afferents and lamina I projection neurons is compelling, a physiological demonstration of strict monosynaptic transmission is not shown. The conclusion that these inputs are exclusively monosynaptic should be toned down. Similarly, the statement that "Lamina I ALS neurons that are surrounded by Trpm8 afferents are cold-selective" should also be toned down as only a few neurons have been tested and it cannot be excluded that other neurons with similar characteristics may be polymodal.

    1. eLife Assessment

      This study presents data suggesting that excitatory cholecystokinin (CCK)-expressing neurons in hippocampal area CA3 influence hippocampal-dependent memory using multiple methods to manipulate excitatory CCK-expressing CA3 neurons. The study is valuable, particularly considering that most past studies of CCK-expressing neurons have focused on those neurons that co-express CCK and GABA. Currently, the strength of evidence is incomplete, but it would improve if evidence of specificity was provided and other concerns were addressed. If this is not possible, the conclusions, particularly those requiring evidence of specific targeting of excitatory neurons, should be modified accordingly.

    2. Reviewer #1 (Public review):

      Summary:

      CCK is the most abundant neuropeptide in the brain, and many studies have investigated the role of CCK and inhibitory CCK interneurons in modulating neural circuits, especially in the hippocampus. The manuscript presents interesting questions regarding the role of excitatory CCK+ neurons in the hippocampus, which has been much less studied compared to the well-known roles of inhibitory CCK neurons in regulating network function. The authors adopt several methods including transgenic mice and viruses, optogenetics, chemogenetics, RNAi, and behavioral tasks to explore these less-studied roles of excitatory CCK neurons in CA3. They find that the excitatory CCK neurons are involved in hippocampal-dependent tasks such as spatial learning and memory formation, and that CCK-knockdown impairs these tasks.

      However, these questions are very dependent on ensuring that the study is properly targeting excitatory CCK neurons (and thus their specific contributions to behavior).

      There needs to be much more characterization of the CCK transgenic mice and viruses to confirm the targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Strengths:

      This field has focused mainly on inhibitory CCK+ interneurons and their role in network function and activity, and thus this manuscript raises interesting questions regarding the role of excitatory CCK+ neurons, which have been much less studied.

      Weaknesses:

      (1a) This manuscript is dependent on ensuring that the study is indeed investigating the role of excitatory CCK-expressing neurons themselves and their specific contribution to behavior. There needs to be much more characterization of the CCK-expressing mice (crossed with Ai14 or transduced with various viruses) to confirm the excitatory-cell targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      (2) The methods and figure legends are still extremely sparse, still leading to many questions regarding methodology and accuracy. More details would be useful in evaluating the tools and data, and the lack of proper quantification is still prevalent throughout the paper. In many places, only % values are noted, or only images are presented, and the number of cells counted is almost never reported.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors have demonstrated, through a comprehensive approach combining electrophysiology, chemogenetics, fiber photometry, RNA interference, and multiple behavioral tasks, the necessity of projections from CCK+ CAMKIIergic neurons in the hippocampal CA3 region to the CA1 region for regulating spatial memory in mice. Specifically, authors have shown that CA3-CCK CAMKIIergic neurons are selectively activated by novel locations during a spatial memory task. Furthermore, authors have identified the CA3-CA1 pathway as crucial for this spatial working memory function, thereby suggesting a pivotal role for CA3 excitatory CCK neurons in influencing CA1 LTP. The data presented appear to be well-organized and comprehensive.

      Strengths:

      (1) This work combined various methods to validate the excitatory CCK neurons in the CA3 area; these data are convincing and solid.

      (2) This study demonstrated that the CA3-CCK CAMKIIergic neurons are involved in the spatial memory tasks; these are interesting findings, which suggest that these neurons are important targets for manipulating the memory-related diseases.

      (3) This manuscript also measured the endogenous CCK from the CA3-CCK CAMKIIergic neurons; this means that CCK can be released under certain conditions.

      Weaknesses:

      In summary, this work can be formally accepted after the revision. For the limitations of the revision, the distinct neural effects of cholecystokinin (CCK) receptors (CCK-1R, CCK-2R, and CCK-3R) on hippocampal function have not been fully elucidated. Recent studies indicate that CCK-2R can modulate hippocampal activity at CA3-Schaffer collateral synapses; however, the roles of CCK-1R and CCK-3R in hippocampal function remain poorly characterized, with limited experimental evidence supporting their involvement. Overall, this study provides an interesting and novel perspective on the role of excitatory CCK signaling in hippocampus-dependent navigation learning.

    4. Reviewer #3 (Public review):

      Summary:

      Fengwen Huang et al. used multiple neuroscience techniques (transgenetic mouse, immunochemistry, bulk calcium recording, neural sensor, hippocampal-dependent task, optogenetics, chemogenetics, and interfer RNA technique) to elucidate the role of the excitatory cholecystokinin-positive pyramidal neurons in the hippocampus in regulating the hippocampal functions, including navigation and neuroplasticity.

      Strengths:

      (i) The authors provided the distribution profiles of excitatory cholecystokinin in the dorsal hippocampus via the transgenetic mice (Ai14::CCK Cre mice), immunochemistry, and retrograde AAV.

      (ii) The authors used the neural sensor and light stimulation to monitor the CCK release from the CA3 area, indicating that CCK can be secreted by activation of the excitatory CCK neurons.

      (iii) The authors showed that the activity of the excitatory CCK neurons in CA3 is necessary for navigation learning

      (iv) The authors demonstrated that inhibition of the excitatory CCK neurons and knockdown of the CCK gene expression in CA3 impaired the navigation learning and the neuroplasticity of CA3-CA1 projections.

      Weaknesses:

      (i) The causal relationship between navigation learning and CCK secretion remains nebulous; answering this question will require a more sensitive CCK-BR sensor in future work.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      CCK is the most abundant neuropeptide in the brain, and many studies have investigated the role of CCK and inhibitory CCK interneurons in modulating neural circuits, especially in the hippocampus. The manuscript presents interesting questions regarding the role of excitatory CCK+ neurons in the hippocampus, which has been much less studied compared to the well-known roles of inhibitory CCK neurons in regulating network function. The authors adopt several methods, including transgenic mice and viruses, optogenetics, chemogenetics, RNAi, and behavioral tasks to explore these less-studied roles of excitatory CCK neurons in CA3. They find that the excitatory CCK neurons are involved in hippocampal-dependent tasks such as spatial learning and memory formation, and that CCK-knockdown impairs these tasks.

      However, these questions are very dependent on ensuring that the study is properly targeting excitatory CCK neurons (and thus their specific contributions to behavior). There needs to be much more characterization of the CCK transgenic mice and viruses to confirm the targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Strengths:

      This field has focused mainly on inhibitory CCK+ interneurons and their role in network function and activity, and thus, this manuscript raises interesting questions regarding the role of excitatory CCK+ neurons, which have been much less studied.

      Weaknesses:

      (1a) This manuscript is dependent on ensuring that the study is indeed investigating the role of excitatory CCK-expressing neurons themselves and their specific contribution to behavior. There needs to be much more characterization of the CCK-expressing mice (crossed with Ai14 or transduced with various viruses) to confirm the excitatory-cell targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Thank you for this constructive comment. Indeed, the current study lacks comprehensive strategies to unequivocally distinguish excitatory CCK neurons from heterogeneous CCK neuronal populations. Nevertheless, we provide multiple lines of evidence supporting the distribution of CaMKIIα/Vglut1-expressing CCK<sup>+</sup> neurons in the hippocampus (Figure 1F), using complementary approaches including transgenic mouse models as well as viral and antibody-based labeling (Figure 1A, Figure 1H-I). In addition, we demonstrate that 635 nm light reliably evokes field excitatory postsynaptic potentials (fEPSPs) at CA3-Schaffer collateral synapses expressing DIO-CaMKIIα-ChrimsonR in vitro (Figure 2A-F). Importantly, these light-evoked excitatory synaptic responses are abolished by AMPA and NMDA receptor antagonists (CNQX and APV), confirming the excitatory nature of the DIO-CaMKIIα-ChrimsonR-expressing synapses. To demonstrate the future works that can further support our findings and conclusions, we have added the strategies that can be conducted in the Discussion section in the revision:

      “Due to technical limitations at the current stage, we were unable to perform whole-cell recordings or pharmacological manipulations using CCK receptor antagonists. In future studies, the application of these approaches to directly record and selectively block EPSPs from excitatory CCK neurons in the hippocampus will further strengthen and validate our conclusions.” (Line 265 - line 269 in the revision).

      (1b) For the experiments that use a virus with the CCK-IRES-Cre mouse, there is no information or characterization on how well the virus targets excitatory CCK-expressing neurons. (Additionally, it has been reported that with CaMKIIa-driven protein expression, using viruses, can be seen in both pyramidal and inhibitory cells.

      We thank the reviewer for this insightful comment regarding the specificity of viral targeting in CCK-IRES-Cre mice.

      To address this concern, we performed additional characterization of viral expression in CA3. We found that DIO-CaMKIIα-mCherry expression showed a high degree of colocalization with CaMKIIα immunoreactivity, indicating preferential targeting of excitatory neurons (sFigure 1A-B; sFigure 2A-B; sFigure 3A-B). We showed an example to confirmed the high specificity of the AAV for infecting the excitatory CCK neurons in CA3 area.

      Besides, we acknowledge prior reports showing that CaMKIIα-driven viral expression can, in some cases, be detected in a small subset of inhibitory neurons. However, because CA3-Schaffer collateral projections to CA1 arise exclusively from excitatory CA3 pyramidal neurons, any potential expression in inhibitory CCK<sup>+</sup> interneurons are unlikely to directly contribute to the recorded CA1 synaptic responses in our electrophysiological experiments. That said, we cannot fully exclude the possibility that a minor population of inhibitory CCK⁺ neurons could indirectly modulate CA3 pyramidal neuron activity via local circuit mechanisms, particularly in experiments involving optogenetic manipulation or shRNA expression. We now explicitly acknowledge this limitation in the revised manuscript:

      “Importantly, to further improve cell-type specificity, we propose an intersectional genetic strategy using CCK-IRES-Cre × VGlut1-Flp mice combined with a Cre-On/Flp-On (Con/Fon) AAV, which would restrict expression exclusively to excitatory CCK-expressing neurons and eliminate potential contributions from inhibitory CCK<sup>+</sup> cells. This approach will be implemented in future studies to refine circuit specificity.” (Line 269 - line 273 in the revision).

      (2) The methods and figure legends are extremely sparse, leading to many questions regarding methodology and accuracy. More details would be useful in evaluating the tools and data. More details would be useful in evaluating the tools and data. Additionally, further quantification would be useful-e.g. in some places, only % values are noted, or only images are presented.

      Thank you for these constructive comments. We have expanded the methodological descriptions in both the Methods section and the figure legends to provide sufficient detail for evaluating the experimental tools and data accuracy. In addition, we have added quantitative analyses where previously only representative images or percentage values were shown. Specifically, quantification has now been included for each AAV condition in the corresponding figures in the revised manuscript.

      (3) It is unclear whether the reduced CCK expression is correlated, or directly causing the impairments in hippocampal function. Does the CCK-shRNA have any additional detrimental effects besides affecting CCK-expression (e.g., is the CCK-shRNA also affecting some other essential (but not CCK-related) aspect of the neuron itself?)? Is there any histology comparison between the shRNA and the scrambled shRNA?

      Recent studies from our lab demonstrated that knockout the CCK gene expression significantly attenuates the hippocampal-dependent spatial learning and CA3-CA1 LTP, indicating CCK plays a critical role in modulating the hippocampal functions[1,2]. Additionally, CCK-shRNA or CCK-scramble did not significantly affect the excitatory synaptic transmission in the CA3-CA1 projections, hinting that CCK-shRNA may exhibits no obvious adverse effect on other neural components.

      Finally, we have provided the histology comparison between the shRNA and the scrambled shRNA regrading the expression level of the CCK protein (Pro-CCK) in the revision. Our result shows that CCK-shRNA (left panel) significantly reduced CCK expression in CA3<sup>CCK</sup>-positive neurons compared with the CCK-Scramble group (right panel).

      Citation:

      (1) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (2) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      https://doi.org/10.7554/eLife.109001.1.sa2

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors have demonstrated, through a comprehensive approach combining electrophysiology, chemogenetics, fiber photometry, RNA interference, and multiple behavioral tasks, the necessity of projections from CCK+ CAMKIIergic neurons in the hippocampal CA3 region to the CA1 region for regulating spatial memory in mice. Specifically, authors have shown that CA3-CCK CAMKIIergic neurons are selectively activated by novel locations during a spatial memory task. Furthermore, authors have identified the CA3-CA1 pathway as crucial for this spatial working memory function, thereby suggesting a pivotal role for CA3 excitatory CCK neurons in influencing CA1 LTP. The data presented appear to be well-organized and comprehensive.

      Strengths:

      (1) This work combined various methods to validate the excitatory CCK neurons in the CA3 area; these data are convincing and solid.

      (2) This study demonstrated that the CA3-CCK CAMKIIergic neurons are involved in the spatial memory tasks; these are interesting findings, which suggest that these neurons are important targets for manipulating the memory-related diseases.

      (3) This manuscript also measured the endogenous CCK from the CA3-CCK CAMKIIergic neurons; this means that CCK can be released under certain conditions.

      Weaknesses:

      (1) The authors do not mention which receptors of the CCK modulate these processes.

      We appreciate the reviewer for raising this important question. Based on our recent work, CCK-B receptors are the primary neural components mediating CCK functions in the hippocampus at both the synaptic plasticity and behavioral levels (Su et al., 2023; Zhang et al., 2024; Wang et al., 2025). To clarify this mechanism, we have added the following content to the revised manuscript:

      “Based on our recent work, CCK signaling in the hippocampus is predominantly mediated by CCK-B receptors, which play a critical role in regulating synaptic plasticity and spatial memory-related behaviors.” (Line 105 - line 106 in the revision).

      (2) This author does not test the CCK gene knockout mice or the CCK receptor knockout mice in these neural processes.

      Thank you for this insightful comment. We previously tested these experiments in an earlier study. Our results showed that high-frequency electrical stimulation failed to induce significant LTP in the CA3-CA1 pathway in both CCK gene knockout (CCK-KO) mice and CCK-B receptor knockout (CCK-BR-KO) mice in vitro (Su et al., 2023; Zhang et al., 2024; Wang et al., 2025). These findings indicate that CCK mediates its synaptic effects predominantly through CCK-B receptors in the CA3-CA1 pathway. Accordingly, we have added this description to the revised manuscript.

      “Additionally, high-frequency electrical stimulation fails to induce LTP in the CA3-CA1 pathway in both CCK-KO and CCK-BR-KO mice, indicating that CCK-dependent synaptic plasticity in this circuit is primarily mediated by CCK-B receptors.” (Line 170 - line 173 in the revision).

      (3) The author does not test the source of CCK release during the behavioral tasks.

      We thank the reviewer for raising this important point. In our previous work, we directly monitored CCK release in the hippocampus during an object-exploration task using a GPCR-based CCK-BR sensor combined with fiber photometry (Su et al., 2023). During object exploration, we observed a rapid and robust increase in CCK-BR sensor fluorescence, indicating activity-dependent CCK release in the hippocampus. Based on these findings, we deduced that hippocampal CCK release plays a critical role in hippocampus-dependent behavioral tasks.

      We acknowledge that hippocampal neurons receive CCK-positive projections from multiple brain regions, making it technically challenging to isolate and monitor the precise source of CCK release in the CA1 area during behavioral tasks in vivo. One potential strategy to address this limitation is selective overexpression of CCK in CA3 neurons (e.g., AAV-CCK delivery), followed by assessment of CCK-BR sensor responses during hippocampal-dependent behaviors. We have added this discussion to the revised manuscript to clarify the source and functional relevance of CCK release during behavioral tasks.

      “Besides, using a GPCR-based CCK-BR sensor combined with fiber photometry, our previous work demonstrated rapid, activity-dependent CCK release in the hippocampus during object-exploratory behavior, supporting a functional role for hippocampal CCK signaling in cognitive tasks (Su et al., 2023). Given that hippocampal neurons receive CCK-positive projections from multiple brain regions, it remains technically challenging to precisely identify the cellular source of CCK release in CA1 during behavior. Future studies employing selective CCK overexpression in CA3 neurons, together with CCK-BR sensor recordings, may help further delineate the contribution of CA3-derived CCK to hippocampal-dependent behaviors.” (Line 313 - line 321 in the revision).

      Citation:

      (1) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (2) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      (3) Su, J., Huang, F., Tian, Y., Tian, R., Qianqian, G., Bello, S. T., ... & He, J. (2023). Entorhinohippocampal cholecystokinin modulates spatial learning by facilitating neuroplasticity of hippocampal CA3-CA1 synapses. Cell Reports, 42(12).

      https://doi.org/10.7554/eLife.109001.1.sa1

      Reviewer #3 (Public review):

      Summary:

      Fengwen Huang et al. used multiple neuroscience techniques (transgenetic mouse, immunochemistry, bulk calcium recording, neural sensor, hippocampal-dependent task, optogenetics, chemogenetics, and interfer RNA technique) to elucidate the role of the excitatory cholecystokinin-positive pyramidal neurons in the hippocampus in regulating the hippocampal functions, including navigation and neuroplasticity.

      Strengths:

      (1) The authors provided the distribution profiles of excitatory cholecystokinin in the dorsal hippocampus via the transgenetic mice (Ai14::CCK Cre mice), immunochemistry, and retrograde AAV.

      (2) The authors used the neural sensor and light stimulation to monitor the CCK release from the CA3 area, indicating that CCK can be secreted by activation of the excitatory CCK neurons.

      (3) The authors showed that the activity of the excitatory CCK neurons in CA3 is necessary for navigation learning.

      (4) The authors demonstrated that inhibition of the excitatory CCK neurons and knockdown of the CCK gene expression in CA3 impaired the navigation learning and the neuroplasticity of CA3-CA1 projections.

      Weaknesses:

      (1) The causal relationship between navigation learning and CCK secretion?

      Thank you for pointing out this important issue. Previous studies have shown that CCK can be rapidly secreted during exploratory behaviors, as detected by the CCK-BR sensor. In parallel, CCK-positive neurons have been demonstrated to play a critical role in the precise execution of hippocampus-dependent spatial learning. Together, these findings suggest that exploratory behavior induces CCK secretion, which in turn contributes to the accuracy of hippocampal-dependent learning and memory processes. Based on this evidence, we propose that CCK secretion serves as a functional link between behavioral exploration and spatial learning. We have added these explanations in the revised manuscript to better clarify the causal relationship between behavioral exploration and CCK secretion:

      “Besides, using a GPCR-based CCK-BR sensor combined with fiber photometry, our previous work demonstrated rapid, activity-dependent CCK release in the hippocampus during object-exploratory behavior, supporting a functional role for hippocampal CCK signaling in cognitive tasks (Su et al., 2023). Given that hippocampal neurons receive CCK-positive projections from multiple brain regions, it remains technically challenging to precisely identify the cellular source of CCK release in CA1 during behavior. Future studies employing selective CCK overexpression in CA3 neurons, together with CCK-BR sensor recordings, may help further delineate the contribution of CA3-derived CCK to hippocampal-dependent behaviors.” (Line 313 - line 321 in the revision)

      (2) The effect of overexpression of the CCK gene on hippocampal functions?

      We thank the reviewer for this comment. In fact, an earlier study from our laboratory demonstrated that intraperitoneal injection of exogenous CCK-4 significantly improved performance in hippocampus-dependent spatial learning tasks in both CCK gene knockout (CCK-KO) mice and Alzheimer’s disease (AD) mouse models. These findings suggest that enhancing CCK signaling can ameliorate hippocampal dysfunction at both the behavioral and synaptic plasticity levels (Zhang et al., 2024; Wang et al., 2025). Accordingly, although direct genetic overexpression of CCK in the hippocampus has not yet been extensively characterized, the observed benefits of exogenous CCK delivery support the notion that increased CCK availability positively modulates hippocampal function and spatial learning. We have cited this study in the revised manuscript to support this interpretation.

      “Interestingly, an earlier study demonstrated that intraperitoneal injection of exogenous CCK-4 significantly improved performance in hippocampus-dependent spatial learning tasks in both CCK gene knockout (CCK-KO) mice and Alzheimer’s disease (AD) mouse models (Zhang et al., 2024). These findings suggest that enhancing CCK signaling can ameliorate hippocampal dysfunction at both the behavioral and synaptic plasticity levels.” (Line 291 - line 297 in the revision)

      (3) What are the functional differences between the excitatory and inhibitory CCK neurons in the hippocampus?

      In the hippocampus, CCK-expressing neurons consist of two major populations with distinct functions: excitatory (glutamatergic) and inhibitory (GABAergic) neurons. Excitatory CCK neurons are relatively sparse and intermingled with pyramidal cells. By releasing glutamate, they directly contribute to excitatory transmission and are thought to participate in synaptic plasticity and information processing related to learning and memory. In contrast, inhibitory CCK neurons are more abundant and include well-characterized interneuron subtypes such as CCK-positive basket cells. These neurons release GABA and primarily target the perisomatic region of pyramidal neurons, providing strong control over neuronal firing. Notably, inhibitory CCK interneurons are highly sensitive to neuromodulatory signals, particularly endocannabinoids via CB1 receptors, enabling dynamic regulation of inhibitory tone and network activity. Together, excitatory CCK neurons mainly support hippocampal excitation and plasticity, whereas inhibitory CCK neurons regulate network dynamics and spike timing. As the focus of the present study is on excitatory CCK neurons, a detailed comparison between these two populations was not included in the original manuscript.

      (4) Do CCK sources come from the local CA3 or entorhinal cortex (EC) during the high-frequency electrical stimulation?

      Thank you for this insightful comment. Our data indicate that the CCK detected during high-frequency stimulation originates from CA3 neurons rather than the entorhinal cortex (EC). As shown in Figure 2, we used an optogenetic approach combined with a GPCR-based CCK sensor to selectively examine CCK release from the CA3-CA1 pathway. ChrimsonR was specifically expressed in CA3 neurons projecting to CA1, restricting light stimulation to CA3 axon terminals. In parallel, the CCK sensor was locally expressed in CA1, allowing real-time detection of CCK release at CA3 presynaptic sites. High-frequency light stimulation robustly evoked CCK signals in CA1, demonstrating activity-dependent CCK release from CA3 terminals. Importantly, EC inputs were neither genetically targeted nor optically stimulated in this experiment, excluding the EC as a source of the detected CCK. Together, these results support the conclusion that CCK released during high-frequency stimulation is derived from local CA3 projections to CA1. Similarly, as the focus of the present study is on excitatory CCK neurons in the CA3 area, a detailed comparison between these two CCK sources was not included in the original manuscript.

      Citation:

      (4) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (5) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      (6) Su, J., Huang, F., Tian, Y., Tian, R., Qianqian, G., Bello, S. T., ... & He, J. (2023). Entorhinohippocampal cholecystokinin modulates spatial learning by facilitating neuroplasticity of hippocampal CA3-CA1 synapses. Cell Reports, 42(12).

    1. eLife Assessment

      Using isolated frog brainstem preparations, pharmacological manipulation of excitability, systematic extracellular unit mapping, and focal microinjections, this study provides important findings on whether the buccal rhythm generator is a discrete anatomical nucleus or a distributed, state-dependent network. The question is conceptually significant and of interest to researchers working within respiratory neurobiology and rhythmogenicity in general, and the preparation and experimental strategy are generally appropriate. However, the evidence for the strongest architectural claims is incomplete, with pseudoreplication in pooled unit-mapping analyses, inconsistent statistical reporting, and limited controls in necessity/sufficiency experiments. Overall, although data are largely convincing, substantial revision and more nuanced interpretation of the results are required before claims of state-dependent architectural reorganization can be considered well-supported.

    2. Reviewer #1 (Public review):

      Summary:

      The authors test whether the frog buccal ventilatory rhythm generator behaves as a discrete, anatomically localized oscillator or as a distributed, state-dependent network. They combine reduced preparations (segment/subsegment work), systematic extracellular unit surveys over a defined grid, and local AMPA/GABA microinjections in a hemisected brainstem preparation. Based on these approaches, the authors conclude that mild global excitation (bath AMPA) broadens the distribution of rhythmically active units and renders a previously defined "buccal area" functionally non-identifiable as a unique necessary/sufficient locus.

      The central idea is plausible, and the overall experimental strategy is appropriate for the question being asked. However, in its current form, the manuscript overstates the strength of inference supporting the "expansion" and "loss of necessity/sufficiency" conclusions. This is primarily due to (a) statistical treatment of unit-mapping data that does not respect clustering by preparation/animal, (b) inconsistent statistical reporting across sections, and (c) limited interpretability of focal inhibitory perturbations under a globally excited state.

      Strengths:

      (1) The manuscript addresses a clear mechanistic question with broader relevance: whether rhythm generation is best conceptualized as a localized kernel or as an emergent distributed property that changes with excitatory state.

      (2) The authors use convergent approaches (reduced preparations, mapping, and necessity/sufficiency-style pharmacological perturbations), which is appropriate for circuit-level inference.

      (3) A strong element is the within-unit analysis supporting state-dependent changes in phase coupling for a subset of units ("lung" units adopting a buccal-like pattern). The authors' offline PCA-based spike sorting (with cluster-quality selection via silhouette score) provides some reassurance that the reported pre/post injection changes are not simply driven by unit misidentification.

      Weaknesses:

      (1) Pseudoreplication in unit-survey statistics undermines the main mapping inference. The Methods state that "Units were pooled from multiple preparations" and that chi-squared tests were used to compare proportions across conditions (baseline vs 60 nM AMPA). The Results similarly report proportion changes (e.g., 110 units pooled from three preparations vs 137 units pooled from three additional animals) analyzed with chi-squared tests. Because many units come from the same preparation/animal, independence is unlikely to hold; therefore, inference about state-dependent reorganization at the systems level should be made at the preparation/animal level or via hierarchical models that explicitly account for clustering.

      (2) Statistical methods are inconsistently described and need harmonization. In the segment dose-response "Analysis," values are described as compared to zero using a "One-sample t-test." Yet Table 1 is titled as using a "Wilcoxon One-sample Test." These discrepancies must be resolved throughout (Methods, Results, figure legends, and tables), including clear reporting of the unit of n and exact test statistics.

      (3) Unit classification and operational definitions raise interpretational concerns. The unit classification scheme defines "buccal units" as those firing during buccal bursts as well as lung bursts, and explicitly notes that "no units were found which fired only during buccal bursts." This is a consequential result, and it currently reads more like a limitation of detection/classification (or state-space sampled) than a robust biological conclusion. Without additional evidence, it weakens claims about a distinct buccal rhythmogenic module and complicates the interpretation of "buccal identity" changes under excitation.

      (4) Microinjection mapping: high exclusion rate and alternative explanations for 'loss of necessity' under excitation. The manuscript reports that 15 experiments were conducted, but 9 were excluded because the buccal area was not found or the preparation was "overdriven." This exclusion rate is too high to leave implicit; it raises concerns about selection bias and demands transparent accounting. Moreover, under baseline conditions, GABA (or AMPA-GABA) microinjections reliably reduce/abolish buccal bursts, but under bath 60 nM AMPA, the same injections produce no significant change in instantaneous frequency. This pattern can be interpreted as network redistribution, but it can also reflect state-dependent changes in gain, dynamic range, or local pharmacological impact (e.g., inhibition being comparatively underpowered in the globally excited state). Additional controls/analyses are required to distinguish these explanations.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate the response of the amphibian respiratory rhythm generator under varying excitability conditions. They use pharmacological agents to increase and/ or decrease synaptic excitability and demonstrate the resilience of buccal rhythms under different conditions. They employ these results to formulate their primary thesis, that there is no obligatory locus of the buccal respiratory rhythm in the frog, and that their respiratory rhythmogenic mechanisms should be considered diffuse and anatomically distributed across a larger brainstem region.

      Strengths:

      This manuscript is well written, with a sufficiently large number of experiments, for which the authors should be congratulated.

      Weaknesses:

      The presented results don't support the authors' main conclusions, and the interpretation of the data is heavily biased toward their hypothesis. This impregnates an unsubstantiated narrative in the Abstract, Introduction, and Discussion of this manuscript, which must be reexamined with the following points in consideration:

      (1) The authors seem to confuse degeneracy with redundancy. For instance, at line 54, they state, "These findings support the broader hypothesis that respiratory rhythm-generating circuits can switch to being diffuse and redundant, with discrete oscillators quickly drowning in a sea of excitations."

      Redundancy means having the same component repeated multiple times to buffer the failure of any single component, whereas degeneracy means different functional components that compensate for one another under perturbations (Goaillard and Marder, ARN 2021)

      Since the premotor-lung units get converted to buccal units under high excitability, this suggests a degenerate mechanism for respiratory rhythm generation- rather than a redundant mechanism, where there should be multiple buccal units that get recruited under different excitability conditions.

      (2) Line 83, "but the essential requirement for a discrete, rudimentary buccal oscillator is also lost".

      This statement is not supported by the data presented in this study. How does the expansion of the buccal unit imply that the essential requirement for discreteness is lost? Under increased excitability, does the burst/rhythm initiation zone also expand? Or does it still remain centered around the location of buccal units under physiological conditions? Increased excitability can lead to recruitment of a larger area, without a change in the location of the rhythmogenic kernel.

      (3) Line 86, "... oscillators should be viewed as promiscuous flexible functional entities that expand or contract...".

      Oscillators can be regarded as promiscuous only if, under physiological conditions, they switch positions. Under high excitability, only the flexibility argument holds, which has been established in mammals before (e.g., CA Del Negro, K Kam, JA Hayes, JL Feldman, The Journal of physiology 587 (6), 1217-1231; CA Del Negro, C Morgado-Valle, JL Feldman,Neuron 34 (5), 821-830; NA Baertsch, LJ Severs, TM Anderson, JM Ramirez, Proceedings of the National Academy of Sciences 116 (15), 7493-7502; NA Baertsch, HC Baertsch, JM Ramirez Nature communications 9 (1), 843).

      Results:

      (4) Interpretation of data in Figure 6.

      How does the Buccal activity and L2 Power stroke change with 60nm AMPA (in CN5)? Does the increase in the Buccal neurons and decrease in power stroke neurons also reflect in the CN5 activity? Also see comments on Figure 9 data below.

      (5) Interpretation of data in Figure 7.

      Here, classifying buccal neurons solely by spiking may obscure the fact that the 'silent' neurons under baseline conditions were part of the rhythmic network but could not spike due to subthreshold inputs. 60 nM AMPA increased their firing in response to previously subthreshold synchronous inputs during the buccal burst. Intracellular recordings are required to negate this possibility and establish that the neuronal classification is robust.

      (6) Interpretation of data in Figure 8.

      "Lung units can transform into buccal units under excitation".<br /> CN5 buccal and lung bursts need to be compared before and after AMPA injection. From Figure 8 A-D, it is apparent that the example Unit2's activity increases during the buccal bursts, after AMPA injection. However, they are also present in buccal burst pre-AMPA, albeit with less frequency.

      It is striking that the pre-AMPA epoch (panel A) is less than half of the post-AMPA epoch. This would, in itself, lead to a biased estimate of lung units that are active under the baseline condition during the buccal bursts.

      Figure 8G, meta-analysis of lung units spiking during the baseline buccal bursts is warranted to interpret the main claim of this figure. Similarly, analysis of spiking per lung burst for the post-AMPA condition is essential for comparing the lung unit's contribution under high excitability.

      (7) Interpretation of data in Figure 9

      "Buccal area loses importance under increased excitation."

      This interpretation is not fully supported by the data presented in this manuscript. Under 60 nm AMPA, does the ratio of lung burst to buccal burst change in CN5? This analysis is crucial for determining whether the lung units are indeed converted into buccal bursts at the expense of lung activity or whether their appearance during buccal bursts is incidental due to increased excitability. In the baseline, there are 4-5 buccal bursts per lung burst, whereas under high excitability, there are 2-3 buccal bursts per lung burst (Figure 9 A-B). This seems inconsistent with the conclusion that increased excitability converts lung units into buccal units (Figures 6 &7).

      Could the authors comment on the connectivity between the lung and the buccal units? Results in Figure 9A-B indicate that lung units may receive an efference copy of buccal units, and under high excitability, their spikes may generate negative feedback onto the buccal units, terminating their bursts. This could explain the decrease in the buccal-to-lung burst in high-AMPA conditions. This type of circuit interaction resembles the mammalian breathing CPG, in which the parafacial/RTN (which controls the abdominal muscles) and preBötC (which controls the diaphragm) interact and cross-inhibit each other.

      (8) Line 382.

      "Buccal-like bursting produced from two independent slices".

      The two "independent" slices have portions of the same anatomical kernel, the buccal rhythm generator. This experiment is like the sandwich slice preparation of preBötC (Del Negro Lab), in which two thinner slices exhibit rhythmic activity. Thus, the two slices are not independent; they are anatomically adjacent and functionally overlapping.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses isolated frog brainstem preparations to test whether inspiratory rhythm generation is confined to a narrowly defined neural center or instead reflects the activity of a distributed and adaptable network. Building on prior rodent work, the authors examine structural and functional parallels between the frog Buccal Area and the mammalian preBötzinger complex. By increasing excitatory drive, they assess whether a localized rhythmogenic region can expand into a broader network that participates in buccal rhythm generation, providing insight into how respiratory circuits are dynamically reconfigured across physiological states.

      Strengths:

      The work presents compelling evidence that ventilatory rhythm generation is supported by a flexible, state-dependent network rather than a fixed anatomical locus. The experimental preparation is well-suited to address these questions, and the data are generally of high quality. The demonstration that increased excitation recruits a more distributed network parallels observations in mammalian systems and strengthens the translational relevance of the findings. Overall, the analyses are thoughtful, and the interpretations are largely well supported by the results.

      Weaknesses:

      Some issues limit the strength of the conclusions. First, the study does not address the transition from eupnea to gasping in mammals, which could provide important physiological context for the observed AMPA-induced network reorganization. Second, the reported transformation of lung-active neurons into buccal-active neurons would benefit from additional analyses to clarify whether neurons switch identities or acquire dual activity. Finally, the necessity and sufficiency experiments in Figure 9 require further support, particularly through AMPA dose-response analyses and more comprehensive GABA manipulations, to confirm that network expansion does not obscure the continued functional importance of the core buccal region.

    5. Author response:

      Reviewer #1 (Public review):

      Hierarchical Inference (Unit Survey)

      We agree that pooling units across preparations can overstate the strength of inference if preparation-level clustering is ignored. We will therefore reanalyze the unit-survey dataset using a hierarchical approach in which the preparation/animal is treated as the unit of inference. Our pooled dataset was derived from three chunk preparations exposed to AMPA and three baseline preparations, allowing us to report per-preparation proportions and variability as requested.

      A preliminary reanalysis of the buccal segment preparations is summarized below. In this analysis, the unit of inference is shifted from individual recorded units to the preparation level (n = 3 baseline; n = 3 at 60 nM AMPA), thereby accounting for potential within-preparation dependence.

      Author response table 1.

      The distribution of units for each of the three preparations per condition is as follows:

      Using the proportion of buccal units per preparation as the dependent variable:

      Baseline (n = 3): mean proportion of buccal units = 6.5% (SD 5.7%).

      60 nM AMPA (n = 3): mean proportion of buccal units = 53.2% (SD 6.0%).

      Absolute difference in proportions = 46.7% (95% CI 33.4% to 59.8%).

      Independent-samples t-test on per-preparation proportions: t(4) = 9.77, p = 0.0006.

      Thus, this preliminary hierarchical reanalysis indicates that the observed recruitment is consistent across preparations and is not driven by outlier data from a single animal. These results support substantial expansion of the buccal oscillator with excitation.

      Statistical Standardization: In the revision, we will better justify our use of parametric and non-parametric versions of the one-sample tests and review usage in the Methods, Table 1, and figure legends for consistency.

      Exclusion criteria for microinjection experiments: We will extend the description of these experiments by including a flow diagram summarizing the 15 attempted microinjection experiments and documenting the technical reasons for the 9 exclusions. These exclusions reflected the technical requirements of the preparation: (a) the buccal area had to be localized before AMPA excitation so that the effects of buccal-area manipulation during excitation could be interpreted reliably, which was not always possible; and (b) preparations had to exhibit sufficiently sustained periods of consecutive buccal bursting to permit quantification of buccal burst frequency, whereas some preparations expressed motor patterns dominated by lung bursts.

      Pharmacological Potency and Necessity: We will revise the wording of this section to make the causal interpretation more precise. Our data already show that local GABA microinjections can reverse the excitatory effects of local AMPA microinjections, providing an internal control for local pharmacological efficacy of GABA when the local network is excited. Notably, the local AMPA concentration used in these experiments (5 µM) is nearly two orders of magnitude greater than the 60 nM concentration used in bath application. We therefore interpret the failure of focal GABA inhibition to abolish rhythm during global excitation as being consistent with expansion of rhythmogenic capacity beyond the spatial reach of the local injection, rather than with failure of the GABA manipulation itself.

      Finding an inhibitory site that remains sensitive in bath applied AMPA is an interesting experiment but this would require identifying the anatomical substrate of a brainstem circuit for a non-ventilatory circuit in Rana that is guaranteed not to undergo reconfiguration with AMPA. This is beyond the scope of the current manuscript; based on our work to identify the neuronal substrate for ventilation in Rana, this would take at least five years to complete. In addition, having identified such a circuit there would be no guarantee that AMPA would not cause reconfiguration in this case too. With regards to transection boundaries and location of injections, we agree these would be useful refinements. We used the location of nerves as reliable landmarks to guide transections and located the buccal area using stereotactic coordinates to guide micropipette insertion and functional criteria (AMPA and GABA sufficiency and necessity tests) to locate the exact position based on our previous work.

      Unit Classification: We will review the nomenclature we use to define units to ensure it does not cause confusion and provide more explicit criteria for unit classes. This will include clarification of the absence of “buccal-only” units as currently defined. Specifically, when both buccal and lung rhythms are present, units active during buccal bursts are also active during lung bursts in our preparation. This does not conflict with the multiple interacting oscillator model we have proposed previously. Rather, recruitment of buccal-area neurons during lung bursts is consistent with a model in which the lung oscillator excites the buccal oscillator. It is also consistent with prior evidence that lung bursts persist after buccal-area ablation. In addition, burst frequency during lung episodes exceeds buccal burst frequency during intervening buccal periods. We will revise the text to make this logic clearer.

      Reviewer #2 (Public review):

      (1) Degeneracy vs. Redundancy

      We agree that degeneracy is the more precise term for the phenomenon our data demonstrate, in which structurally and functionally distinct neurons (lung units) acquire the capacity to participate in buccal rhythm generation under excitation. The Discussion already uses this language (e.g., "necessity and sufficiency may not work in a large degenerate network where rhythm generation is distributed across many elements"), but we used the word "redundant" in the Key Points Summary and Abstract in the broader sense of distributed robustness that a wider readership could grasp. Nonetheless, we recognize the distinction drawn by Goaillard and Marder (2021) and, considering the reviewers concerns, we will revise the Abstract and Key Points to adopt the degeneracy framework consistently.

      (2) Loss of Essential Requirement for a Discrete Oscillator

      The reviewer asks whether expansion of the rhythmically active region necessarily implies loss of the rhythmogenic kernel. We believe our necessity and sufficiency experiments (Figure 9) directly address this. Under baseline conditions, GABA microinjection into the buccal area reliably abolishes buccal bursting; under 60 nM bath AMPA, the same injection at the same location and volume has no significant effect on buccal frequency. If the kernel remained essential and the surrounding recruitment were merely supplementary, local inhibition of the kernel should still slow or abolish the rhythm. It does not. We interpret this as evidence that the essential requirement for the discrete buccal area is lost under excitation, not merely that a larger area has been recruited around a still-critical core. We acknowledge, however, that the word "lost" could be read as implying permanent elimination rather than state-dependent suspension, and we will temper this language in the revision.

      (3) Novelty Relative to Mammalian Studies

      We appreciate the reviewer drawing attention to the cited mammalian literature (Del Negro et al., 2002, 2009; Baertsch et al., 2018, 2019), which we discuss in detail in the manuscript. However, we respectfully note that our findings extend this literature in several ways that the public review does not acknowledge. First, Baertsch et al. demonstrated recruitment of tonic or silent neurons to become phasically active during inspiration; we show that neurons already assigned to one oscillator phase (lung) can be dynamically reassigned to another (buccal), which represents a qualitatively different form of reconfiguration. Second, we developed a novel approach to functionally ablate motor neuron pools using high-frequency nerve stimulation, enabling the unit survey to be interpreted at the premotor level which was not achieved in the mammalian studies cited. Third, our data provide the first demonstration of state-dependent oscillator expansion in a non-mammalian tetrapod, offering evolutionary context that strengthens the generality of the principle. We will revise the term "promiscuous" if it overstates the claim, but we maintain that our data support the conclusion that oscillator boundaries are flexible, which goes beyond what has been shown in mammals.

      (4) Figure 6, CN5 Output Under AMPA

      The reviewer asks whether the shift in premotor unit composition is reflected in CN5 motor output. This is a reasonable question. As noted in the manuscript, 60 nM AMPA produces only minor changes in the overt motor pattern as recorded from CN5, which is precisely why we interpret the premotor changes as a reorganization of the network's internal architecture that is not readily apparent from motor output alone. This is in sharp contrast to observations of substantive network reconfiguration in mammals in which eupnea is replaced by the pathological condition of gasping. We will add quantification of CN5 burst parameters (amplitude, duration, frequency) under baseline and 60 nM AMPA to make this point explicit.

      (5) Subthreshold Recruitment vs. Network Expansion

      The reviewer suggests that neurons classified as newly rhythmic under AMPA may have been part of the rhythmic network all along, receiving subthreshold inputs at baseline. We are grateful to the reviewer for highlighting this and hope they would agree that the literature clearly demonstrates that all respiratory neurons receive subthreshold phasic inputs of one kind or another, perhaps providing a clue that reconfiguration is a common feature of respiratory networks generally. Regardless of the implications for other animals, we agree this is likely the mechanism at work in the frog, and indeed our manuscript states that "this increase in the number and proportion of premotor buccal units is due in part to recruitment of sub-threshold buccal neurons that, under low excitability, only fire during lung bursts," citing intracellular evidence from Kogo and Remmers (1994) that lung neurons in this region receive subthreshold buccal-timed input. We note that this observation does not diminish our conclusion and likely explains the mechanism by which network expansion occurs. Whether one calls these neurons "newly recruited" or "pushed above threshold," the functional consequence is the same: a larger population of neurons is now rhythmically active during buccal bursts, and the necessity of the original buccal area is lost. We will clarify this reasoning in the revision and acknowledge the limitation that additional intracellular recordings from our preparation would be needed to fully characterize the subthreshold dynamics.

      (6) Figure 8, Epoch Length and Meta-analysis

      The reviewer notes that the pre-AMPA epoch appears shorter than the post-AMPA epoch in Figure 8A, which could bias unit classification. We will address this in the revision by reporting epoch durations explicitly and addressing its implication on spike counts where appropriate. Regarding the request for meta-analysis of lung unit spiking during baseline buccal bursts: this analysis is part of the rationale for the phase-recruitment panels, and we will expand Figure 8 to include the requested cross-condition comparisons (lung unit activity during baseline buccal bursts, and during post-AMPA lung bursts) as also suggested by Reviewer 3.

      (7) Figure 9, Buccal-to-Lung Burst Ratio

      The reviewer observes that the ratio of buccal to lung bursts decreases from approximately 4-5:1 under baseline to 2-3:1 under 60 nM AMPA, and suggests this is inconsistent with conversion of lung units into buccal units. We do not believe this is inconsistent. The buccal-to-lung burst ratio reflects the overt motor pattern, which is determined by the interaction of multiple oscillators and is influenced by AMPA at both buccal and lung levels. A change in this ratio does not speak to whether individual premotor units have acquired buccal-timed activity; the unit survey and the single-unit transformation data (Figure 8) address that question directly. Regarding the alternative model involving efference copy and cross-inhibition: this is an interesting hypothesis, but it is speculative and not tested by the current dataset. We are happy to discuss lung-buccal interactions more fully in the revision, including the parallels to parafacial/preBötC interactions in mammals, but we note that our data on unit transformation are better explained by network reconfiguration than by a feedback model that remains to be tested.

      (8) "Independent" Slices

      The reviewer compares our Level 2 transection to the preBötC sandwich slice preparation and argues the two resulting slices are not independent. We take the reviewer's point that "independent" may be read as implying no shared developmental or functional origin, which is not our intent. By "independent" we mean that the two physically separated slices can each generate rhythmic output without being synaptically connected to each other. This is, in fact, our central point: rhythmogenic capacity is distributed across a region broad enough to endow two separated slices with independent rhythm-generating capability when excited. We note that the analogy to the sandwich slice is imperfect because in our Level 1 cuts, only the rostral slice containing the buccal area generates rhythm -- the caudal slice does not -- whereas Level 2 cuts that bisect the buccal area produce rhythmicity in both halves, consistent with distributed capacity specifically within the buccal region. We will revise the wording to clarify what we mean by "independent" in this context.

      Reviewer #3 (Public review):

      Physiological Parallels: We will expand the Discussion to place these findings in a broader comparative context, including the eupnea-to-gasping transition in mammals as an example of state-dependent reconfiguration of respiratory networks. This will also allow us to clarify two advances that may otherwise be missed when comparing our work to that in mammals: (a) we developed a novel approach to functionally eliminate motor neurons, allowing mapped units to be interpreted as premotor; and (b) the state-dependent reconfiguration of the buccal oscillator occurred without qualitative changes in the overt lung-buccal motor pattern.

      Unit Transformation Analysis: We will revise Figure 8 to improve clarity around the observed lung-to-buccal transformation by expanding the phase-recruitment panels as suggested and will revisit the operational definitions of lung and buccal unit identity to reduce ambiguity. The central observation is that some units active only during lung bursts under one condition become active during buccal bursts when network excitation is increased.

      Saturation vs. Network Expansion: We will directly address the possibility that 60 nM bath-applied AMPA simply pushes the network toward a frequency ceiling. Two observations strongly argue against this interpretation: (a) 60 nM global AMPA produced only mild changes in buccal frequency, whereas local AMPA injection at much higher concentrations produced larger effects; and (b) local GABA was sufficient to reverse the effects of high-concentration local AMPA microinjections but insufficient to abolish rhythm during low-concentration global AMPA application. Together, these findings are more consistent with global AMPA endowing the network with distributed rhythm-generating capacity than with simple saturation of a discrete local oscillator. Notwithstanding these arguments, we will attempt to extend AMPA/GABA dose response experiment as suggested or add the lack of such experiments as a caveat to our interpretation.

      Figure 9C Correction: We will correct the statistical markings in Figure 9C to align with the text in the Results regarding the significance of frequency changes under 60 nM AMPA.

      In total, we believe these revisions will improve the rigor and clarity of the manuscript while preserving the central conclusion supported by the data: that the organization of the frog respiratory rhythmogenic network is state dependent and becomes more distributed under excitation.

    1. eLife Assessment

      This valuable study addresses a timely question regarding the contribution of transposable elements to splice isoform diversity in the Drosophila brain, directly engaging with recent conflicting findings in the field. The work provides convincing evidence that TE-gene chimeric transcripts are detectable and that prior discrepancies largely arise from methodological differences in computational pipelines and experimental design. The combination of reanalysis, methodological clarification, and targeted validation represents a technical contribution that will be of interest to researchers studying transcriptome complexity and transposable elements. However, the strength of evidence would be further enhanced by increased methodological transparency, more rigorous experimental controls, and a more cautious interpretation of functional implications.

    2. Reviewer #1 (Public review):

      Summary:

      Choucri and Treiber have reassessed their previous study on TE-gene chimeric transcripts in neural genes in response to Azad et al (2024). Azad and colleagues argued that, contrary to Choucri and Treiber's findings, chimeric TE-mRNAs are relatively infrequent, and they cautioned that further optimization of bioinformatics pipelines is needed to detect TE insertions from RNAseq accurately. In this short response, Choucri and Treiber clearly demonstrate that differences in the tools used between their study and that of Azad et al. likely account for the contrasting results, along with RT-PCR failure in designing primers that would match the chimeric transcript, and the use of different Drosophila lines. The authors emphasize the need for uniform, standardized criteria in such analysis, which would ultimately strengthen and advance the field.

      Strengths:

      The addition of a ratio to compute the number of splice reads specific to the chimeric transcript and compare to the exon-exon splice reads is really interesting because it opens the door to finally quantify the contribution of chimeric TEs to the overall gene expression, although this is not the scope of the present article. The clear dissection of chimeric transcripts, along with the results from Azad et al, allows us to understand the differences between the two studies confidently. Finally, the discussion on Drosophila lines is indeed essential, given that the lines and even individuals have high TE polymorphism.

      Weaknesses:

      I think it is necessary to add more detail to this article, for instance, the differences between TEchim and Tidal could be laid out more precisely. Regarding the roo example, one of the caveats of this family, along with others, is the presence of simple repeats. It would be important to show that the simple repeats are not interfering with the read mapping. Regarding the experiments, if we are looking for a standardized protocol, then we should have a detailed material and methods section, with every experiment, replicate, and PCR temperature clearly defined. Finally, and in my opinion, more importantly, the use of RT negative controls on the RT PCRs, along with DNA PCRs to show insertion presence, is mandatory for testing the presence of chimeric genes. Of course, water negative PCR controls are also needed, and unfortunately, absent from Figure 3.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Choucri and Treiber aims to directly address a recent critique regarding the role of transposable elements (TEs) in diversifying the neural transcriptome of Drosophila. The authors seek to demonstrate that TEs are not merely genomic "noise" but are frequently and reliably "exonized" into brain-specific mRNA. By introducing an upgraded computational pipeline, TEChim, and conducting precise experimental validations, the authors set out to show that TE-mediated splicing represents a genuine biological phenomenon that expands the molecular repertoire of the nervous system.

      Strengths:

      The study's primary strength lies in its rigorous technical "forensic" analysis of previous failed replication attempts. The authors convincingly demonstrate that the lack of signal in the opposing study stemmed from a fundamental methodological mismatch: the software used by the critics (TIDAL) is logically incapable of detecting splice sites located within TE sequences. Importantly, the authors complement this computational clarification with definitive experimental evidence through an effective "experimental rescue." By employing correctly designed primers and matching the genetic backgrounds of the fly strains, thereby accounting for genomic polymorphisms, they successfully validated all seven loci that were previously reported as undetectable. This dual-pronged strategy, addressing both algorithmic bias and experimental design, establishes a more robust technical benchmark for the detection and validation of TE-derived exons in neural tissues.

      Weaknesses:

      While the technical rebuttal is highly convincing, the scope of the study remains primarily defensive. As a response to a prior critique, the work focuses on establishing the existence and detectability of chimeric TE-derived transcripts rather than exploring their broader functional consequences. As a result, there is limited new insight into how these TE-modified isoforms influence neural circuit function or organismal behavior. In addition, the detection and validation of these events remain technically demanding, requiring deep sequencing and specialized bioinformatic expertise, which may limit broader adoption by laboratories without dedicated computational resources.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Choucri and Treiber responds to a recent paper by Azad et al., which responds to a paper by Treiber and Wadell (Genome Research, 2020). The controversy relates to the detection of transcripts with transposable elements (TEs) spliced into them in the Drosophila brain.

      Strengths:

      The authors now argue convincingly that these transcripts exist using an improved, updated version of their pipeline. They also validate some of their findings using RT-PCR and explain why Azad et al. failed to detect these transcripts due to methodological errors. Overall, I am convinced that these transcripts exist and that the TE-derived transcripts described by Choucri and Treiber are real.

      Weaknesses:

      The authors should mention that combining PCR-amplified cDNA generation with short-read sequencing is suboptimal for detecting TE-fusion transcripts. Recently, direct long-read ONT RNA sequencing, which does not require amplification and spans the entire transcript, has been used to detect similar transcripts in human stem cells and the human brain (PMID: 40848716 & Garza et al, BioRxiv). Had the authors used this technology to validate their findings, there would be no question about these transcripts. If not doing such experiments, then they should at least discuss the possibility and the advantage of the approach.

    1. eLife Assessment

      This study presents an important methodological advance-Liver-CUBIC combined with multicolor metallic nanoparticle perfusion-that enables high-resolution 3D visualization of the liver's complex multi-ductal architecture. The identification of the Periportal Lamellar Complex (PLC) as a novel perivascular structure with distinct cellular composition and low-permeability characteristics is convincing, supported by rigorous imaging data. The observed scaffolding role during fibrosis offers intriguing biological insights, though the functional claims would benefit from direct experimental validation.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the minor comments raised in the previous round of review.]

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns and comments.

    3. Reviewer #3 (Public review):

      Xu, Cao and colleagues aimed to overcome the obstacles of high-resolution imaging of intact liver tissue. They report successful modification of the existing CUBIC protocol into Liver-CUBIC, a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized liver tissue clearing, significantly reducing clearing time and enabling simultaneous 3D visualization of the portal vein, hepatic artery, bile ducts, and central vein spatial networks in the mouse liver. Using this novel platform, the researchers describe a previously unrecognized perivascular structure they termed Periportal Lamellar Complex (PLC), regularly distributed along the adult liver portal veins.<br /> Using available scRNAseq data, the authors assessed the CD34⁺Sca-1⁺ cells' expression profile, highlighting mRNA presence of genes linked to neurodevelopment, bile acid transport, and hematopoietic niche potential. Different aspects of this analysis were then addressed by protein staining of selected marker proteins in the mouse liver tissue. Next, the authors addressed how the PLC and biliary system react to CCL4-induced liver fibrosis, implying PLC dynamically extends, acting as a scaffold that guides the migration and expansion of terminal bile ducts and sympathetic nerve fibers into the hepatic parenchyma upon injury.

      The work clearly demonstrates the usefulness of the Liver-CUBIC technique and the improvement of both resolution and complexity of the information, gained by simultaneous visualization of multiple vascular and biliary systems of the liver. The identification of PLC and the interpretation of its function represent an intriguing set of observations that will surely attract the attention of liver biologists as well as hepatologists. The importance of the CD34+/Sca1+ endothelial cell population and claims based on transcriptomic re-analysis require future assessment by functional experimental approaches to decipher the functional molecules involved in PLC formation, maintenance, and the involvement in injury response before establishing their role in biliary, arterial, and neural liver systems.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.<br /> This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - PLCs.

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell sub-population for PLC formation and function was not tested and warrants further validation.

    4. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns and comments.

      We thank the reviewer for the positive evaluation and helpful feedback.

      Reviewer #3 (Public review):

      Xu, Cao and colleagues aimed to overcome the obstacles of high-resolution imaging of intact liver tissue. They report successful modification of the existing CUBIC protocol into Liver-CUBIC, a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized liver tissue clearing, significantly reducing clearing time and enabling simultaneous 3D visualization of the portal vein, hepatic artery, bile ducts, and central vein spatial networks in the mouse liver. Using this novel platform, the researchers describe a previously unrecognized perivascular structure they termed Periportal Lamellar Complex (PLC), regularly distributed along the adult liver portal veins.

      Using available scRNAseq data, the authors assessed the CD34<sup>+</sup>/Sca-1<sup>+</sup> cells' expression profile, highlighting mRNA presence of genes linked to neurodevelopment, bile acid transport, and hematopoietic niche potential. Different aspects of this analysis were then addressed by protein staining of selected marker proteins in the mouse liver tissue. Next, the authors addressed how the PLC and biliary system react to CCL4-induced liver fibrosis, implying PLC dynamically extends, acting as a scaffold that guides the migration and expansion of terminal bile ducts and sympathetic nerve fibers into the hepatic parenchyma upon injury.

      The work clearly demonstrates the usefulness of the Liver-CUBIC technique and the improvement of both resolution and complexity of the information, gained by simultaneous visualization of multiple vascular and biliary systems of the liver. The identification of PLC and the interpretation of its function represent an intriguing set of observations that will surely attract the attention of liver biologists as well as hepatologists. The importance of the CD34+/Sca1+ endothelial cell population and claims based on transcriptomic re-analysis require future assessment by functional experimental approaches to decipher the functional molecules involved in PLC formation, maintenance, and the involvement in injury response before establishing their role in biliary, arterial, and neural liver systems.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.

      This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - PLCs.

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell sub-population for PLC formation and function was not tested and warrants further validation.

      We thank the reviewer for the valuable comment regarding the potential role of the CD34<sup>+</sup>/Sca-1<sup>+</sup> endothelial cell sub-population in PLC function.

      We agree that direct functional validation would be a crucial next step to confirm the contribution of this specific sub-population to PLC formation and function. The focus of the present study remains on the spatial localization and reproducible characterization of PLC structures based on 3D imaging, as well as the relevant transcriptional features revealed by single-cell analysis.

      To avoid overinterpretation, we have revised the Discussion section accordingly, providing a more focused and cautious description of the related findings.

      Comments on revisions:

      I appreciate the author's effort to revise the text so it more rigorously adheres to the presented evidence. Following a thorough read of the revised text, a few remaining minor issues were identified in the Discussion.

      (1) From where comes the hard evidence for PLC being the stem cell niche in the following sentence?

      for the two following statements:

      This suggests that the PLC may not only provide structural support but also serve as a perivascular stem cell niche specific to the portal region, potentially involved in hematopoiesis and tissue regeneration.

      The PLC serves as a directional scaffold for ductal growth, a specialized stem cell niche, and a potential site of neurovascular coupling.

      We thank the reviewer for this important comment. We agree that the term “stem cell niche” may imply functional evidence for direct stem cell regulation, which was not demonstrated in this study. Our conclusions were based on the spatial enrichment and transcriptional features of CD34<sup>+</sup>/Sca-1<sup>+</sup> endothelial populations expressing hematopoiesis-related genes in the portal region.

      To avoid overinterpretation, we have revised the sentence to remove the term “stem cell niche” and instead describe the PLC as being enriched in perivascular endothelial cell populations with hematopoiesis-related gene expression features. The revised text now reads:

      “These results suggest that, beyond structural support, the PLC in the portal region is enriched with perivascular endothelial cell populations exhibiting hematopoiesis-related gene expression features.”

      We have also modified the corresponding statement later in the Discussion. It now reads:

      “The PLC serves as a directional scaffold for ductal growth, displays distinct perivascular endothelial transcriptional features in the portal region, and may represent a potential site of neurovascular coupling.”

      We believe this wording more accurately reflects the descriptive and transcriptomic nature of our data without implying functional niche activity.

      (2) In the following paragraph, I lack references to the previously published evidence of liver innervation guidance mechanisms, such as the mesenchyme-mediated guidance (CD31- population) Gannoun et al., 2023 https://doi.org/10.1242/dev.201642, an important context for your finding.

      Further analysis showed significant upregulation of genes involved in neurodevelopment and axonal guidance in the CD34<sup>+</sup>/Sca-1<sup>+</sup> cluster, along with activation of neuronal signaling pathways. Immunostaining confirmed the presence of TH<sup>+</sup> sympathetic nerve fibers wrapping around the PLC in a "beads-on-a-string" pattern (Fig. 6), consistent with a classic neurovascular unit(Adori et al., 2021). Previous studies have shown that sympathetic nerves enter the liver along collagen fibers of Glisson's capsule and interact with hepatic arteries, portal veins, and bile duct epithelium, supporting the PLC as a scaffold for intrahepatic neurovascular integration.

      We thank the reviewer for highlighting the importance of previously published evidence regarding liver innervation guidance mechanisms. We agree that these studies provide important context for interpreting the neurodevelopmental and axon guidance–related transcriptional signatures observed in our dataset. Accordingly, we have revised the Discussion section to incorporate reference to mesenchyme-mediated axon guidance mechanisms in the portal region during liver development (Gannoun et al., 2023). This addition better situates our findings within the existing literature.

      (3) Several sentences have issues with a lack of space between words.

      We have carefully re-examined the entire manuscript for spacing and formatting inconsistencies and corrected minor typographical issues to ensure uniform formatting throughout the text.

    1. eLife Assessment

      This manuscript presents a valuable study of the activity and functional relevance of different circuits in the dentate gyrus of mice performing a pattern separation task. Solid evidence is presented to support the paper's central conclusions. The study is likely to be of interest to those studying the subregional organization and cell type-specific functions of the dentate gyrus.

    2. Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as itis preferentially involved in cognitive function? What happens in ventral DG?

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

      Comments on revisions:

      I appreciate the authors' careful and thorough revisions. They have addressed all of my previous concerns satisfactorily, and the manuscript is now significantly strengthened. I have no further concerns.

    3. Reviewer #2 (Public review):

      In this study, the authors investigate how increasing cognitive demand shapes activity patterns in the dorsal dentate gyrus (DG). Using a touchscreen-based TUNL task combined with TRAP/c-Fos tagging, birth-dating of adult-born granule cells (abDGCs), and chemogenetic inhibition, they show that higher task demand increases mature granule cell (mGC) recruitment and enhances suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Functionally, mGC inhibition reduces overall activity and impairs performance without disrupting blade bias, whereas inhibition of {less than or equal to}7-week-old abDGCs increases mGC activity, abolishes blade bias, and impairs discrimination under high-demand conditions. These findings suggest that effective pattern separation depends not only on overall DG activity levels but also on the spatial organization of recruited ensembles.

      The integration of touchscreen TUNL with temporally controlled activity tagging and birth-dated cohorts is technically strong. Quantification of SB-IB bias and radial/apical distributions adds anatomical precision beyond bulk activity measures. The comparison between mGC and abDGC inhibition is conceptually compelling and supports dissociable functional roles. Overall, the data convincingly demonstrate that increasing cognitive demand amplifies blade-biased DG recruitment and that mGCs and abDGCs differentially contribute to both behavioral performance and network organization.

      However, how abDGCs are integrated into the mGC network under high cognitive demand remains unresolved. Additional experiments are needed to clarify how abDGCs shape spatial recruitment patterns and whether they directly inhibit or indirectly regulate mGC activity to maintain high performance.

      Furthermore, the authors frame "high cognitive demand" as a multidimensional construct encompassing broad behavioral challenge. It would strengthen the work to delineate how local abDGC-mGC circuit interactions regulate specific task components in real time. This will require higher temporal resolution approaches, as TRAP and c-Fos labeling integrate activity over prolonged windows and primarily reflect sustained engagement rather than moment-to-moment computations.<br /> The central conclusion that dentate function depends on coordinated spatial recruitment rather than total activity magnitude is supported by the data, although mechanistic interpretations should be tempered given methodological limitations.<br /> Overall, this work advances models of adult neurogenesis by emphasizing a critical-period modulatory role of abDGCs in organizing DG network activity during high-demand discrimination. The combined behavioral and circuit-level framework is likely to be influential in the field.

    4. Reviewer #3 (Public review):

      This study examines the role of dentate gyrus neuronal populations, reflecting neurogenesis and anatomical location (suprapyramidal vs infrapyramidal blade), in a mnemonic discrimination task that taxes the pattern separation functions of the dentate. The authors measure dentate gyrus activity resulting from cognitive training and test whether adult neurogenesis is required for both the anatomical patterns of activity and performance in the cognitive task. The authors find that more cognitively challenging variants of the task evoked more dentate activity, but also distinct patterns of activity (more activity in the suprapyramidal blade, less in the infdrapyramidal blade). Using chemogenetic approaches they silence mature vs immature dentate gyrus neurons and find that only mature neurons (either the general population or specifically mature adult-born neurons), and not immature adult-born neurons, are required for the difficult version of the task. Inhibition of mature adult-born neurons furthermore increased overall activity in the dentate and reduced the biased pattern of activity across the blades, consistent with evidence that adult-born neurons broadly regulate dentate gyrus activity.

      Comments on revisions:

      I appreciate the efforts the authors have taken to revise this manuscript. I have only minor concerns with this revised version of the manuscript:

      Methods state that significance is defined as P<0.05 but some results are interpreted as significant when P=0.05. Either the alpha value needs to change or the interpretation needs to change.

      I believe the statistical results for group and blade effects for the ANOVAs, in Figs 2,3 & 4, appear to be switched (blade should be significant, not group).

      I appreciate that sometimes there is not a perfect overlap between immunohistochemical signals, but I continue to believe that the spatially-non-overlapping TRAP and EDU signals in Fig 3 is caused by these 2 markers being in different cells. A Z-stack or orthogonal projection could verify/disprove this concern.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      In Figure 1G and 1H we report TRAP+ abDGCs as a percentage rather than density because we are analyzing colocalization of the two markers, which are very sparse in this population. Given the very low number of double-labeled abDGCs, calculating density would not be practical. In the revised manuscript we have clarified the rationale for using these measures. As noted in the current text, we did not observe abDGCs co-expressing TRAP and c-Fos; we have made this point more explicit to guide interpretation of these data.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as it is preferentially involved in cognitive function? What happens in ventral DG?

      The sections shown in Figure 2 were obtained from the dorsal dentate gyrus (see Methods, “Histology and imaging”: stereotaxic coordinates −1.20 to −2.30 mm relative to bregma, Paxinos atlas). From a feasibility standpoint, it is not possible to analyze the entire longitudinal extent of the hippocampus with these low-throughput histological approaches. We therefore focused on the dorsal DG, for which there is a strong functional rationale. A large body of work indicates that the dorsal hippocampus, and specifically the dorsal DG, is preferentially involved in spatial memory and in the fine contextual discrimination that underlies pattern separation. The dorsal hippocampus is critical for encoding and distinguishing similar spatial representations, a core component of the high-cognitive demand task used here. In contrast, the ventral DG is more strongly associated with emotional regulation and affective memory processing and is less implicated in high-resolution spatial encoding. For these reasons, the present study was designed to assess TRAP+ cell distributions specifically in the dorsal DG.

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      We agree that prolonged tamoxifen administration results in labeling a heterogeneous population of abDGCs spanning approximately 0 to 5–7 weeks of age, rather than a precisely birth-dated cohort. This is a limitation of this approach and we have included discussion of this in more detail in the revised manuscript.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      We agree that mCitrine is not a marker that allows localization of hM4Di as it is well known that the mCitrine can be independently expressed in a Cre independent manner in this mouse. As suggested, we have removed the figure that showed the mCitrine and have performed immunohistochemical localization of the DREADD with an antibody against the HA tag. This is now shown in Figure 5.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      The goal of this study was to examine activity patterns of adult-born versus mature granule cells, rather than to assess maturation state. The adult-born neurons analyzed were 25–39 days old, an age at which point most cells have progressed beyond the DCX⁺ stage and are expected to express NeuN based on prior work. We therefore do not think that including DCX or NeuN quantification would provide additional information relevant to the aims or interpretation of this study.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      We have updated Figure 2B, the Methods, and the main text to more explicitly localize this which it the boundary between the subgranular zone (SGZ) and the hilus.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      We have now added the cell number information to the figure legends. In Figures 2B and 2C, each point corresponds to a single cell, with an equal number of mice per group. The total number of TRAP⁺ cells per mouse is shown in Figure 1F, which reports TRAP⁺ cell densities by group.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      We made the DG-hilus boundaries clearer in the sample images to improve visualization and interpretation.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

      We apologize for the confusion here. The protocol used in Figure 6 is the same tamoxifen chow–based approach as in Figure 5, differing only in the duration of tamoxifen exposure. Mice in Figure 5 received tamoxifen chow for 7 weeks, whereas mice in Figure 6 received it for 4 weeks, restricting labeling to a younger and narrower cohort of adult-born DGCs. Thus, the population targeted in Figure 6 is younger than that in Figure 5 and does not correspond to mature 6–7-week-old neurons. By contrast, the experiment in Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells, which are Dock10-positive and express Cre endogenously, allowing selective manipulation of this later-stage population.

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary

      In this manuscript, the authors combine an automated touchscreen-based trial-unique nonmatching-to-location (TUNL) task with activity-dependent labeling (TRAP/c-Fos) and birth-dating of adult-born dentate granule cells (abDGCs) to examine how cognitive demand modulates dentate gyrus (DG) activity patterns. By varying spatial separation between sample and choice locations, the authors operationally increase task difficulty and show that higher demand is associated with increased mature granule cell (mGC) activity and an amplified suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Using chemogenetic inhibition, they further demonstrate dissociable contributions of abDGCs and mGCs to task performance and DG activation patterns.

      The combination of behavioral manipulation, spatially resolved activity tagging, and temporally defined abDGC perturbations is a strength of the study and provides a novel circuit-level perspective on how adult neurogenesis modulates DG function. In particular, the comparison across different abDGC maturation windows is well designed and narrows the functionally relevant population to neurons within the critical period (~4-7 weeks). The finding that overall mGC activity levels, in addition to spatially biased activation patterns, are required for successful performance under high cognitive demand is intriguing.

      Major Comments

      (1) Individual variability and the relationship between performance and DG activation.

      The manuscript reports substantial inter-animal variability in the number of days required to reach the criterion, particularly during large-separation training. Given this variability, it would be informative to examine whether individual differences in performance correlate with TRAP+ or c-Fos+ density and/or spatial bias metrics. While the authors report no correlation between success and TRAP+ density in some analyses, a more systematic correlation across learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB) could strengthen the interpretation that DG activity reflects task engagement rather than performance only.

      As mentioned, we previously reported no correlation between task success and TRAP+ density. We have now performed additional analyses examining correlations with learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB), and found no significant relationships. Therefore, as we did not find any positive correlations the original interpretation that DG activity primarily reflects task engagement rather than performance level seems the most parsimonious.

      (2) Operational definition of "cognitive demand".

      The distinction between low (large separation) and high (small separation) cognitive demand is central to the manuscript, yet the definition remains somewhat broad. Reduced spatial separation likely alters multiple behavioral variables beyond cognitive load, including reward expectation, attentional demands, confidence, engagement, and potentially motivation. The authors should more explicitly acknowledge these alternative interpretations and clarify whether "cognitive demand" is intended as a composite construct rather than a strictly defined cognitive operation.

      We agree that reducing spatial separation between stimuli likely engages multiple behavioral and cognitive processes beyond a single, strictly defined operation. We have now clarified this point in the manuscript and explicitly state that our use of the term “cognitive demand” reflects a multidimensional behavioral challenge rather than a singular cognitive process (see Discussion).

      (3) Potential effects of task engagement on neurogenesis.

      Given the extensive behavioral training and known effects of experience on adult neurogenesis, it remains unclear whether the task itself alters the size or maturation state of the abDGC population. Although the focus is on activity and function rather than cell number, it would be useful to clarify whether neurogenesis rates were assessed or controlled for, or to explicitly state this as a limitation.

      While the primary goal of this study was to examine activity and functional recruitment of adult-born granule cells, we also quantified the survival of birth-dated neurons at the end of behavioral training. Density measurements of BrdU⁺ and EdU⁺ cells revealed no differences across experimental groups, indicating that engagement in the pattern separation task, across low to high cognitive demand conditions, did not significantly alter survival of adult-born neurons. In addition, we examined the spatial distribution of BrdU⁺ and EdU⁺ neurons between the suprapyramidal and infrapyramidal blades of the dentate gyrus. The proportion of newborn neurons was consistent across all groups, with approximately 60% located in the suprapyramidal blade and 40% in the infrapyramidal blade. These findings indicate that behavioral training did not alter the baseline distribution of adult-born neurons. We have now clarified these points in the manuscript (See Results).

      (4) Temporal resolution of activity tagging.

      TRAP and c-Fos labeling provide a snapshot of neural activity integrated over a temporal window, making it difficult to determine which task epochs or trial types drive the observed activation patterns. This limitation is partially acknowledged, but the conclusions occasionally imply trial-specific or demand-specific encoding. The authors should more clearly distinguish between sustained task engagement and moment-to-moment trial processing, and temper interpretations accordingly. While beyond the scope of the current study, this also motivates future experiments using in vivo recording approaches.

      We agree and have made changes to the manuscript to discuss these points (see Discussion and Limitations).

      (5) Interpretation of altered spatial patterns following abDGC inhibition.

      In the abDGC inhibition experiments, Cre+ DCZ animals show delayed learning relative to controls. As a result, when animals are sacrificed, they may be at an intermediate learning stage rather than at an equivalent behavioral endpoint. This raises the possibility that altered DG activation patterns reflect the learning stage rather than a direct circuit effect of abDGC inhibition. Additional clarification or analysis controlling for the learning stage would strengthen the causal interpretation.

      We agree that differences in learning stage could in principle confound the interpretation of DG activation patterns. However, although Cre+ DCZ-treated mice exhibited delayed learning, they ultimately reached the same performance criterion as control animals. Thus, adult-born DGC inhibition did not prevent learning but increased the time required to reach criterion, indicating that these neurons are beneficial for learning efficiency rather than strictly necessary for task acquisition. Importantly, all animals were sacrificed only after reaching the predefined success criterion. Therefore, the immunohistochemical analyses were performed at the same behavioral endpoint for Cre+ DCZ and control groups, even though the number of training days differed. Consequently, the observed differences in DG activation reflect circuit recruitment at equivalent task mastery rather than differences in learning stage.

      (6) Relationship between c-Fos density and behavioral performance.

      The study reports that abDGC inhibition increases c-Fos density while impairing performance, whereas mGC inhibition decreases c-Fos density and also impairs performance. This raises an important conceptual question regarding the relationship between overall activity levels and task success. The authors suggest that both sufficient activity and appropriate spatial patterning are required, but the manuscript would benefit from a more explicit discussion of how different perturbations may shift the identity, composition, or coordination of the active neuronal ensemble rather than simply altering total activity levels.

      We agree that our findings highlight that successful performance is not determined solely by the overall level of dentate gyrus activity, but rather by the composition and spatial organization of the active neuronal ensemble. In our study, inhibition of abDGCs increased overall mGC activity while disrupting the spatially organized, blade-biased activation pattern and impaired performance. In contrast, direct inhibition of mGCs reduced global excitability but preserved the relative spatial organization of active neurons in animals that continued to perform the task. These findings suggest that different perturbations alter task performance by shifting the identity and coordination of the active neuronal ensemble, rather than simply increasing or decreasing total activity levels. We have now expanded the Discussion to more explicitly address how dentate gyrus computations may depend on the structured recruitment of granule cell ensembles and how distinct manipulations differentially disrupt this organization.

      Reviewer #3 (Public review):

      Summary:

      The authors used genetic models and immunohistochemistry to identify how training in a spatial discrimination working memory task influences activity in the dentate gyrus subregion of the hippocampus. Finding that more cognitively challenging variants of the task evoked more and distinct patterns of activity, they then investigated whether newborn neurons in particular were important for learning this task and regulating the spatial activity patterns.

      Strengths:

      The focus on precise anatomical locations of activity is relatively novel and potentially important, given that little is known about how DG subregions contribute to behavior. The authors also use a task that is known to depend on this memory-related part of the brain.

      Weaknesses:

      Statistical rigor is insufficient. Many statistical results are not stated, inappropriate tests are used, and sample sizes differ across experiments (which appear to potentially underlie null results). The chemogenetic approach to inhibit adult-born neurons also does not appear to be targeting these neurons, as judged by their location in the DG.

      Please refer to the updated statistical analyses in response to the recommendations below.

      Recommendations for the authors:

      Reviewing Editor Comments

      Please note that reviewers agreed that appropriate revisions are needed to increase the strength of evidence for the paper's claims. Concerns were raised about a lack of statistical rigor in the statistical analyses used. Results of statistical tests were not consistently provided (i.e., statistic applied, value of statistic, degrees of freedom, p-value), and seemingly inappropriate statistical tests were used in some instances. Also, some comparisons had lower statistical power than others. When clarifying the statistical approaches used in the manuscript, we also encourage you to consider reading this article that outlines common statistical mistakes (Makin TR, Orban de Xivry JJ. Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife. 2019 Oct 9;8:e48175. doi: 10.7554/eLife.48175.), such as the importance of not basing conclusions on a significant p-value for one pair-wise comparison vs a non-significant p-value for another pairwise comparison (i.e., groups that are being compared should be included in the same statistical analysis, and interaction effects should be reported when appropriate). We hope that you find this information to be helpful should you decide to submit a revised manuscript to eLife.

      Reviewer #1 (Recommendations for the authors):

      (1) Standardize TRAP+ quantification across Figure 1.

      Please report TRAP+ cell numbers using consistent metrics (e.g., density or percentage) to enable comparison across cell types. In addition, extend the TRAP+ reactivation analysis in Figure 1H to include abDGCs so that reactivation dynamics can be compared directly between mGCs and abDGCs.

      Reply in Public Review

      (2) Clarify whether dorsal or ventral DG was analyzed in Figure 2.

      The differing anatomical distributions of TRAP+ cells under low- and high-demand conditions raise important questions about DG axis specificity. Please indicate whether analyses were performed in dorsal DG, ventral DG, or both, and provide data or justification accordingly.

      Reply in Public Review

      (3) Acknowledge limitations of the tamoxifen-chow labeling strategy in AsclCreER; hM4 experiments.

      Since tamoxifen chow administered over 4-7 weeks labels a heterogeneous abDGC population spanning a broad age range, this approach does not generate birth-dated cohorts. This limitation should be clearly addressed in the text and interpretations, particularly related to cell age-dependent effects, should be tempered.

      Reply in Public Review

      (4) Revise DREADD quantification using HA rather than mCitrine.

      The hM4 mouse line requires HA immunostaining to accurately identify Ascl-lineage cells expressing the DREADD receptor. Because mCitrine is not specific to adult-born neurons and does not reliably reflect hM4 expression, quantification based on mCitrine should be revised.

      Reply in Public Review

      (5) Include markers to assess abDGC maturation state.

      Adding quantification of DCX and NeuN would help define the developmental stage of abDGCs in key experiments and improve the interpretation of cell-age-dependent effects.

      Reply in Public Review

      (6) Clarify DG layer boundaries and terminology in Figure 2.

      If the metric labeled "Distance from the hilus" corresponds to the subgranular zone (SGZ), using SGZ terminology would prevent confusion. Additionally, please provide clearer delineation of DG and hilus borders in sample images.

      Reply in Public Review

      (7) Provide missing cell number data for Figures 2B and 2C.

      Reply in Public Review

      (8) Clarify the tamoxifen administration protocol in Figure 6.

      Please describe how the protocol selectively targets 6-7-week-old abDGCs and how it differs from the chow-based approach. This will help readers understand the intended specificity of the manipulation.

      Reply in Public Review

      Reviewer #2 (Recommendations for the authors):

      (1) EdU birth-dating timeline

      The manuscript would benefit from a clearer description of the EdU birth-dating timeline, ideally with a schematic similar to that provided for BrdU in Supplementary Figure 1.

      We appreciate the suggestion. However, we did not include a separate schematic for EdU because its use and birth-dating logic are identical to BrdU (both are thymidine analogs administered systemically and incorporated during S-phase). Therefore, the timeline shown in Supplementary Figure 1 applies equally to both markers. We have clarified this point in the Methods section to avoid confusion.

      (2) Clarity of TUNL task description.

      The description of the TUNL task, particularly for readers unfamiliar with touchscreen-based paradigms, is difficult to follow without consulting prior literature. A simplified schematic or a clearer step-by-step explanation in the main text or supplementary material would improve accessibility.

      We note that the main steps of the TUNL protocol are illustrated in Figure 1A, Supplementary Figure 2A and 2B. Nevertheless, we agree that the description in the text can be made clearer for readers less familiar with touchscreen-based tasks. Thus , we have now revised the Methods section to provide a clearer step-by-step description of the TUNL.

      (3) Influence of outliers in Figure 1G.

      In Figure 1G, the reported trend that ~1% of 25-39-day-old abDGCs are TRAP+ during LS trials appears to be driven by a small number of outliers. This should be acknowledged, and the wording of the conclusion moderated to reflect the variability in the data.

      We agree with the reviewer that the apparent outliers reflect the inherent sparsity of TRAP labeling in this population. In absolute terms, this corresponds to between 0 and 2 TRAP⁺ 25–39-day-old abDGCs per mouse, such that the presence or absence of a small number of labeled cells can appear as outliers when expressed as a percentage. We have revised the text to acknowledge this (see Results).

      (4) Presentation of learning curves.

      Rather than focusing primarily on "days before criterion" (DBC), it would be helpful to show full learning curves across the entire training period. This would provide a clearer picture of acquisition dynamics and inter-animal variability.

      We agree that learning curves can be informative in many behavioral paradigms. However, in our protocol, mice do not undergo the same number of training days because training stops individually once each animal reaches criterion. As a result, plotting full learning curves would produce trajectories of different lengths, making group comparisons difficult and visually cluttered. For this reason, we aligned animals based on days before criterion (DBC), which allows direct comparison of learning dynamics relative to task acquisition. We also consider the cumulative probability representation to be the most appropriate way to summarize learning progression across animals in this context which are also included in the figures.

      (5) Clarification of Figure 3B labeling

      In Figure 3B, the identity of the orange-labeled group above the LS condition is unclear. Clarification in the figure legend would improve interoperability.

      Figure 3B includes two experimental groups. One group performed both the large- and small-separation conditions; this group is shown in orange and labeled LS. Within this group, the upper orange trace corresponds to performance in the large-separation condition, while the lower orange trace corresponds to performance in the small-separation condition. The second group is a control group that performed only the large-separation configuration, and therefore only a single green trace is shown. We agree that this distinction was not sufficiently clear and have revised the figure legend and text to clarify the identity of each trace.

      Reviewer #3 (Recommendations for the authors):

      (1) Please label figures and, even better, put the legends on the same page.

      (2) Just to confirm, in establishing the task, mice performed above 70% for the small separation trials in one of the sessions on 2 consecutive days, for each criterion? Performance seems to be below 70%.

      Yes. To meet the criterion, each mouse had to reach ≥70% correct performance in at least one of the two daily sessions on two consecutive days. We then averaged the performance across both sessions for each of those days. As a result, if one session was ≥70% but the other was lower, the daily average could fall below 70%. The values shown in the figure correspond to these daily averages, further averaged across mice.

      (3) mGC needs to be explicitly defined. Am I assuming any non-birthdated GC is an mGC according to the authors? (which means it is unknown whether they are in fact mature, though likely most of them are).

      In this study, “mature granule cells” (mGCs) refer operationally to granule cells that are not birth-dated with BrdU or EdU and therefore are not classified as adult-born neurons within the defined labeling window. We agree that this population is not directly age-defined, and that while the majority are expected to be mature based on their birth timing relative to the labeling period, we cannot exclude the possibility that a small fraction may include younger, unlabeled neurons. We have now explicitly defined this usage of mGCs in the Methods and clarified this point in the text to avoid ambiguity.

      (4) Methods state that Kruskal-Wallis tests were used when more than 3 groups were compared, but I don't see these stats presented (e.g., for trap data in Figure 1, blade x task TRAP expt in Figure 3 (should be 2-way RM anova here and elsewhere), etc) or any corrections for multiple comparisons. I appreciate that the mean rates of TRAPed abGCs are higher in the S and LS groups than in the shaping group, but most mice do not have any BrdU+ cells that are also TRAPed, and there are no statistics here to support the claim. I don't think there is enough sampling to accurately quantify activation of abGCs. Also, no stats to support the claim that TRAPing increases at the "tip of the SB after the more demanding LS task".

      We agree with this comment. We have now systematically tested all datasets for normality (by group) and applied parametric tests when the data met normality assumptions, and non-parametric tests otherwise. The statistical analyses have been revised accordingly. We added the appropriate tests (including two-way ANOVA where relevant, such as for blade × group comparisons) and now report full statistics in the figure legends and results sections. For the TRAP analyses in adult-born DGCs, we explicitly acknowledge the very low number of BrdU⁺/TRAP⁺ cells, which limits statistical power and, in some cases, precludes robust statistical testing. These limitations are now clearly stated in the Results and Discussion, and the corresponding interpretations have been tempered. For all Kruskal–Wallis tests, post hoc pairwise comparisons were performed using Dunn’s test, with Bonferroni correction for multiple comparisons, as now specified in the Methods section. We also expanded the Methods to describe the statistical workflow in detail. In addition, we have added the previously missing statistical analysis for Figure 2C. Comparisons were performed between the 0–50% and 50–100% portions of the blade, where 0% corresponds to the apex and 100% corresponds to the distal tip of the blade.

      (5) Figure 3I: I can't figure out which effect is statistically significant here (what does the asterisk signify?). Why no individual data points in this graph?

      We agree that the absence of individual data points reduced interpretability, and we have now updated the figure to include individual data points to better illustrate data distribution and variability.

      (6) The gradient of activity (shap < S < LS) could be due to how long they've been trained on a given stage (e.g. less activity during shaping because they have habituated, and neurons encoding that task phase have already been selected)

      We agree that task duration and habituation could, in principle, influence activity levels. Under this interpretation, higher activity would primarily reflect task novelty rather than cognitive demand. However, our data do not support this explanation. Specifically, we found no correlation between the number of training days required to reach criterion and c-Fos–positive or TRAP-positive cell density within a given stage. Thus, animals that reached criterion rapidly did not show higher activity levels than animals that required more days of training and were presumably more habituated to the task demands. This suggests that the observed activity gradient (shaping < S < LS) is not driven by exposure duration or habituation, but rather reflects differences in cognitive demand across task stages.

      (7) The TRAP+ EDU+ cell in Figure 3 looks odd because the BrdU signal is (a lot) larger than the TRAP signal, but BrdU is in the nucleus and should be smaller.

      We agree that the example in Figure 3 is not optimal. In dividing cells, BrdU/EdU signals can sometimes appear broader or closely apposed, which may affect their apparent size.

      (8) For the Ascl-HM4Di experiment, HM4Di appears to be expressed in all of the areas of the granule cell layer where abGCs are NOT located (i.e. no expression in the deep cell layer, near the sgz). This is problematic because it suggests perhaps abGCs are not inhibited as expected.

      As noted in our response to Reviewer #1, we did not use the mCitrine to localize the DREADD receptor as it has been demonstrated that mCitrine expression is expressed in a Cre-independent manner and not correlated with hM4Di expression. In the revised manuscript we include a representative image were we performed immunostaining using an HA antibody to directly visualize hM4Di and confirm its expression in adult-born granule cells (Figure 5).

      (9) Line 267: "6-7 week old neurons by themselves do not influence either the performance of mice in the task". I don't think this is fair because this experiment wasn't designed with as much power to detect an effect. The group trends are in the same direction, but there are many fewer mice in this experiment (n=6/group) than in the =<7w experiment (n=11/group), where the effect just reached statistical significance.

      We are sorry for this confusion which came from an incorrect version. The experiment shown in Figure 6 does not target 6–7-week-old neurons specifically. It uses the same tamoxifen chow–based protocol as Figure 5, but with a shorter exposure (4 weeks vs. 7 weeks), thereby labeling a younger and more restricted cohort of adult-born DGCs. By contrast, Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells (Dock10+).

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

    1. eLife Assessment

      This paper describes Unbend - a new method for measuring and correcting motions in cryo-EM images, with a particular emphasis on more challenging in situ samples such as lamellae and whole cells. The method, which fits a B-spline model using cross-correlation-based local patch alignment of micrograph frames, represents an important tool for the cryo-EM community. The authors elegantly use 2D template matching to provide convincing evidence that Unbend outperforms the previously reported method of Unblur by the same authors. Comparison to alternative programs for motion correction shows smaller gains, but with interesting differences between data sets.

    2. Reviewer #1 (Public review):

      Kong et al.'s work describes a new approach that does exactly what the title states, "Correction of local beam-induced sample motion in cryo-EM images using a 3D spline model." It is, therefore, a more elaborate approach than current methods in the field for the "movie alignment" stage. Additionally, the work uses 2DTM (2D Template Matching)-related measurements to quantify the improvement of the new method compared to other methods in the field. I find both parts very compelling (the new method and the testing approach)

      On a "focused" view, the strengths of the work rest on presenting a better approach for motion correction and on measuring their performance very well at the 2D level in a compelling manner

      On a more "general" view, the authors introduce the important notion that even one of the most worked-out steps in the processing workflow can still be done better in a measurable way, and that this could lead to better results beyond the 2DTM metrics used for testing, reflecting in better results along the processing pipeline (although the manuscript does not explore further this notion)

      On the "usability" side, the method is still CPU-based and is slower than standards in the field. This may pose significant limitations in practical work, although the authors are aware of this issue and are working on it.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a new method, Unbend, for measuring motion in cryo-EM images, with a particular emphasis on more challenging in situ samples such as lamella and whole cells (that can be more prone to overall motion and/or variability in motion across a field of view). Building on their previous approach of full-frame alignment (Unblur), they now perform full-frame alignment followed by patch alignment, and then use these outputs to generate a 3D model of the motion. This model allows them to estimate a continuous, per-pixel shift field for each movie frame that aims to better describe complex motions and so ultimately generate improved motion-corrected micrographs. Performance of Unbend is evaluated using the 2D template matching (2DTM) method developed previously by the lab, and results are compared to using full-frame correction alone and to the leading local motion correction methods. Several different in situ samples are used for evaluation covering a broad range that will be of interest to the rapidly growing in situ cryo-EM community.

      Strengths:

      The method appears an elegant way of describing complex motions in cryo-EM samples and the authors present sound data that Unbend generally improves SNR of aligned micrographs as well as increases detection of particles matching the 60S ribosome template when compared to using full-frame correction alone and since review to the leading local motion correction methods. The authors also give interesting insights into how different areas of a lamella behave with respect to motion by using Unbend on a montage dataset collected previously by the group. There is growing interest in imaging larger areas of in situ samples at high resolution and these insights contribute valuable knowledge. Additionally, the availability of data collected in this study through the EMPIAR repository will be much appreciated by the field.

      Weaknesses:

      A major weakness was comparing this method to full-frame approaches only but this has since been addressed by the authors during review and Unbend is compared to MotionCor2, 3, CryoSPARC and Warp. The improvements here are smaller, generally it seems to perform on par with the above methods, but there are significant gains for certain samples (e.g. the M. pneumoniae sample). A comment from this reviewer about using an adaptive approach to decide if/when to proceed to the full Unbend pipeline, over full-frame alone, has been addressed by the authors.

    4. Reviewer #3 (Public review):

      Summary

      Kong and coauthors describe and implement a method to correct local deformations due to beam induced motion in cryo-EM movie frames. This is done by fitting a 3D spline model to a stack of micrograph frames using cross-correlation-based local patch alignment to describe the deformations across the micrograph in each frame, and then computing the value of the deformed micrograph at each pixel by interpolating the undeformed micrograph at the displacement positions given by the spline model. A graphical interface in cisTEM allows the user to visualise the deformations in the sample, and the method is proved to be successful by showing improvements in 2D template matching (2DTM) results on the corrected micrographs using five in situ samples.

      Impact

      This method has great potential to further streamline the cryo-EM single particle analysis pipeline by shortening the required processing time as a result of obtaining higher quality particles early in the pipeline, and is applicable to both old and new datasets, therefore being relevant to all cryo-EM users.

      Strengths

      (1) The key idea of the paper is that local beam induced motion affects frames continuously in space (in the image plane) as well as in time (along the frame stack), so one can obtain improvements in the image quality by correcting such deformations in a continuous way (deformations vary continuously from pixel to pixel and from frame to frame) rather than based on local discrete patches only. 3D splines are used to model the deformations: they are initialised using local patch alignments and further refined using cross-correlation between individual patch frames and the average of the other frames in the same patch stack.

      (2) Another strength of the paper is using 2DTM to show that correcting such deformations continuously using the proposed method does indeed lead to improvements, as evidenced by the number of particles found and the quality of the detections (measured using 2DTM SNR). This is shown using five in situ datasets, where local motion is quantified using statistics based on the estimated motions of ribosomes. The same analysis is performed using other deformation correction tools, with Unbend showing superior performance in terms of particle detected or 2DTM SNR of the detections.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments. A central concern raised is the comparison of performance with existing motion-correction methods. In response, we performed motion correction using several widely used approaches and compared results using the number of particles detected by 2DTM and their associated SNR. To minimize potential bias, we selected parameters to give each method a comparable level of model flexibility so that the results are as directly comparable as possible. Overall, Unbend performs the best. We note that extensive, method-specific parameter optimization could further affect absolute performance, and a comprehensive benchmarking study is therefore beyond the scope of this work

      Public Reviews:

      Reviewer #1 (Public review):

      Kong et al.'s work describes a new approach that does exactly what the title states: "Correction of local beam-induced sample motion in cryo-EM images using a 3D spline model." I find the method appropriate, logical, and well-explained. Additionally, the work suggests using 2DTM-related measurements to quantify the improvement of the new method compared to the old one in cisTEM, Unblur. I find this part engaging; it is straightforward, accurate, and, of course, the group has a strong command of 2DTM, presenting a thorough study.

      However, everything in the paper (except some correct general references) refers to comparisons with the full-frame approach, Unblur. Still, we have known for more than a decade that local correction approaches perform better than global ones, so I do not find anything truly novel in their proposal of using local methods (the method itself- Unbend- is new, but many others have been described previously). In fact, the use of 2DTM is perhaps a more interesting novelty of the work, and here, a more systematic study comparing different methods with these proposed well-defined metrics would be very valuable. As currently presented, there is no doubt that it is better than an older, well-established approach, and the way to measure "better" is very interesting, but there is no indication of how the situation stands regarding newer methods.

      Regarding practical aspects, it seems that the current implementation of the method is significantly slower than other patch-based approaches. If its results are shown to exceed those of existing local methods, then exploring the use of Unbend, possibly optimizing its code first, could be a valuable task. However, without more recent comparisons, the impact of Unbend remains unclear.

      We thank the reviewer for this important point. We agree that comparing against modern local motion-correction approaches is a valuable task. To address this, we added a new benchmarking section (pp. 17–18, lines 444–492, Fig. 8, Fig. 8—figure supplement 1) that compares Unbend against widely used patch-based local correction methods, including MotionCor2, MotionCor3, Warp, and CryoSPARC. Using the same 2DTM-based metrics described in the manuscript (detections per micrograph and SNR distributions for commonly detected particles), we find that Unbend provides the most stable performance across the tested datasets and, in most cases, yields higher detection counts and higher SNR than the alternative methods.

      Regarding runtime, the current implementation is CPU-based and is therefore slower than some optimized GPU-accelerated packages. We now clarify this limitation in the manuscript (line 498–499). Our primary goal in this study is to improve motion-correction accuracy and quantify its impact using 2DTM-based measures. Importantly, higher-quality motion-corrected micrographs can reduce downstream processing cost (e.g., by increasing particle detection efficiency and reducing ambiguous candidates), so modest additional compute times at the motion-correction stage can be offset later in the workflow. We also note that GPU acceleration and additional code-level optimizations are planned for future releases (line 501-503); however, they are not required to evaluate the methodological contribution and the benchmarking results presented here.

      Reviewer #2 (Public review):

      Summary:

      The authors present a new method, Unbend, for measuring motion in cryo-EM images, with a particular emphasis on more challenging in situ samples such as lamella and whole cells (that can be more prone to overall motion and/or variability in motion across a field of view). Building on their previous approach of full-frame alignment (Unblur), they now perform full-frame alignment followed by patch alignment, and then use these outputs to generate a 3D cubic spline model of the motion. This model allows them to estimate a continuous, per-pixel shift field for each movie frame that aims to better describe complex motions and so ultimately generate improved motion-corrected micrographs. Performance of Unbend is evaluated using the 2D template matching (2DTM) method developed previously by the lab, and results are compared to using full-frame correction alone. Several different in situ samples are used for evaluation, covering a broad range that will be of interest to the rapidly growing in situ cryo-EM community.

      Strengths:

      The method appears to be an elegant way of describing complex motions in cryo-EM samples, and the authors present convincing data that Unbend generally improves SNR of aligned micrographs as well as increases detection of particles matching the 60S ribosome template when compared to using full-frame correction alone. The authors also give interesting insights into how different areas of a lamella behave with respect to motion by using Unbend on a montage dataset collected previously by the group. There is growing interest in imaging larger areas of in situ samples at high resolution, and these insights contribute valuable knowledge. Additionally, the availability of data collected in this study through the EMPIAR repository will be much appreciated by the field.

      Thank you for this positive assessment.

      Weaknesses:

      While the improvements with Unbend vs. Unblur appear clear, it is less obvious whether Unbend provides substantial gains over patch motion correction alone (the current norm in the field). It might be helpful for readers if this comparison were investigated for the in situ datasets. Additionally, the authors are open that in cases where full motion correction already does a good job, the extra degrees of freedom in Unbend can perhaps overfit the motions, making the corrections ultimately worse. I wonder if an adaptive approach could be explored, for example, using the readout from full-frame or patch correction to decide whether a movie should proceed to the full Unbend pipeline, or whether correction should stop at the patch estimation stage.

      We thank the reviewer for suggesting an adaptive criterion to decide whether to proceed patch alignment or not. We agree that such an approach could be valuable for efficiency and for avoiding unnecessary model flexibility. However, our results indicate that a simple criterion based on the magnitude of estimated local patch motion is unlikely to be sufficient. For example, in the BS-C-1 cell lysate dataset, (see line 412-417 on page 16), we observe minimal local motion (Figure 4b) with mean patch shifts of only 0.7Å and full-frame alignment already yields comparable detection counts, yet local correction still produces a measurable SNR gain (13.84 ± 0.04 to 14.25 ± 0.04, 3%) and improves SNR for ~70% of the commonly detected targets (Figure 6c). This suggests that residual local distortion can remain even when overall local motion appears small. Establishing a robust, dataset-agnostic stopping rule would therefore require a dedicated, systematic benchmarking study across many samples and acquisition conditions.

      Reviewer #3 (Public review):

      Summary

      Kong and coauthors describe and implement a method to correct local deformations due to beam-induced motion in cryo-EM movie frames. This is done by fitting a 3D spline model to a stack of micrograph frames using cross-correlation-based local patch alignment to describe the deformations across the micrograph in each frame, and then computing the value of the deformed micrograph at each pixel by interpolating the undeformed micrograph at the displacement positions given by the spline model. A graphical interface in cisTEM allows the user to visualise the deformations in the sample, and the method has been proven to be successful by showing improvements in 2D template matching (2DTM) results on the corrected micrographs using five in situ samples.

      Impact

      This method has great potential to further streamline the cryo-EM single particle analysis pipeline by shortening the required processing time as a result of obtaining higher quality particles early in the pipeline, and is applicable to both old and new datasets, therefore being relevant to all cryo-EM users.

      Strengths

      (1) One key idea of the paper is that local beam induced motion affects frames continuously in space (in the image plane) as well as in time (along the frame stack), so one can obtain improvements in the image quality by correcting such deformations in a continuous way (deformations vary continuously from pixel to pixel and from frame to frame) rather than based on local discrete patches only. 3D splines are used to model the deformations: they are initialised using local patch alignments and further refined using cross-correlation between individual patch frames and the average of the other frames in the same patch stack.

      (2) Another strength of the paper is using 2DTM to show that correcting such deformations continuously using the proposed method does indeed lead to improvements. This is shown using five in situ datasets, where local motion is quantified using statistics based on the estimated motions of ribosomes.

      Thank you for this positive assessment.

      Weaknesses

      (1) While very interesting, it is not clear how the proposed method using 3D splines for estimating local deformations compares with other existing methods that also aim to correct local beam-induced motion by approximating the deformations throughout the frames using other types of approximation, such as polynomials, as done, for example MotionCor2.

      We thank the reviewer for this suggestion. We agree that positioning Unbend relative to existing local motion-correction methods is important. In the revised manuscript, we added a dedicated benchmarking section comparing Unbend with widely used local correction approaches, including MotionCor2, MotionCor3, Warp, and CryoSPARC, using the same 2DTM-based metrics (Fig. 8, Fig. 8—figure supplement 1). This section is included on pp. 17–18, lines 444–492. To make the comparison as fair as possible, we matched nominal model flexibility across methods and otherwise used default parameters to reduce method-specific tuning. This expanded comparison provides a direct baseline against current patch-/spline-based approaches and shows that Unbend performs consistently across the in situ datasets evaluated here, with improvements in detection counts and/or SNR in multiple cases.

      (2) The use of 2DTM is appropriate, and the results of the analysis are enlightening, but one shortcoming is that some relevant technical details are missing. For example, the 2DTM SNR is not defined in the article, and it is not clear how the authors ensured that no false positives were included in the particles counted before and after deformation correction. The Jupyter notebooks where this analysis was performed have not been made publicly available.

      We agree that these technical details improve clarity and reproducibility. We have therefore made three changes.

      (1) Definition of 2DTM SNR. We added an explicit definition of the 2DTM SNR in Section “2DTM provides a one-step verification for motion correction”, pp. 11, lines 277–287). Briefly, at each image location we compute cross-correlation values over the searched orientation space and define the 2DTM SNR as the maximum per location z-score across orientations.

      (2) False-positive control / detection threshold. We clarified how detection thresholds were set to control false positives (pp. 11, lines 285–287). Specifically, we used the standard 2DTM statistical framework in which the threshold  is chosen using the one-false-positive (1-FP) criterion (or equivalently, a specified expected false-positive rate). We applied the same thresholding procedure consistently across all motion-corrected micrographs. This ensures that particle counts before/after correction reflect changes in signal recovery.

      (3) Reproducibility of the analysis. We have made the script used for the benchmarking and figure generation publicly available (pp. 24 line 622-623), and we provide a link in the Data Availability statement (pp. 25 line 650). The repository includes sample .star files and a python package that computes detections per micrograph, commonly detected particles, and SNR comparisons.

      (3) It is also not clear how the proposed deformation correction method is affected by CTF defocus in the different samples (are the defocus values used in the different datasets similar or significantly different?) or if there is any effect at all.

      We thank the reviewer for raising this point. In the revised manuscript, we now report the defocus ranges used for each dataset (Table 1) and clarify that all motion-correction comparisons were performed within each dataset using the same CTF estimation and 2DTM settings (pp. 23 line 615-618). Across the five datasets, four were collected at similar defocus ranges (1.0 µm to 1.5µm), whereas one dataset includes near-focus (0.4 µm) micrographs (Table 1). Because Unbend operates on frame alignment/warping rather than CTF modeling, we do not expect a defocus specific effect beyond indirect influences through image SNR and reliability of cross-correlation-based alignment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The obvious recommendation would be to use their 2DTM approach for a comparison of their new method with other currently used ones

      We agree and added a new comparison section (pp. 17–18, lines 444–492). Addressed above in Response to Reviewer #1 Public Review.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 29, typo. 3 ~ 8% > 3 - 8%.

      Corrected.

      (2) Lines 220 and 226. Should this be e-/Angstrom squared for the exposure?

      Corrected to e<sup>-</sup>/Å<sup>2</sup> (Now pp. 9 lines 230, 236).

      (3) Figure 2 c-d. These are good for instinctively seeing the movement, but I found the legend confusing, as a 10 x 10 pixel array is mentioned, yet the schematics show a higher sampling (30 x 30 pixels? in c-e).

      Thank you for pointing this out. The “10×10” annotation refers to the physical scale, whereas the grid represents pixel sampling. We removed the “10×10” label and now show only the pixel grid to avoid confusion. The caption has been updated to state that the grid corresponds to a 30×30 pixel sampling. (Fig. 2c, d; pp. 31, line 766)

      (4) Figure 4. It would be good if the n of movies analyzed was given in the figure legend.

      Thank you for noticing this. We report the number of movies per dataset in the corresponding summary table (Table 1).

      (5) Figure 5. X/Y axes labels missing (assume pixels). Also, suggest changing the strain scale to % to match the main text description of this figure.

      We added X/Y axis labels, changed the strain scale to % (Figure 5), and specified that the strains are per pixel on pp. 14 line 367. Correspondingly, the X/Y labels and strain scale in strain plots in Figure 4—figure supplementary 1 to 5 are also changed.

      (6) Unify labelling of Figure 4 and 6 (i.e., Bacteria vs. M. pneumoniae, etc.).

      Corrected. Sample labels are now consistent across figures. (Figures 4 and 6)

      Reviewer #3 (Recommendations for the authors):

      Some recommendations related to the points mentioned in the 'Weaknesses' section in the public review:

      (1) If feasible, it would be useful to see a comparison with other existing methods that estimate local deformations (e.g., MotionCor2), at least on some of the datasets. For example, does the proposed method lead to better 2DTM SNR in the detected particles compared to other methods, or higher detection numbers? Alternatively, if such a comparison would require too much additional work and the authors have good reasons to believe that the results are evident, it would be helpful to include a discussion about why the proposed method is expected to perform better, both in terms of the general approach and specific implementation details.

      We agree that this comparison is important. (pp. 17–18, lines 444–492). Addressed above in Response to Reviewer #3 Public Review (1).

      (2) It would be useful to define the 2DTM SNR in the main text of the paper, as well as to address the point about false positives in the picked particles.

      We added an explicit definition of 2DTM SNR and clarified the detection thresholding/false-positive control used in our analysis (pp. 11, lines 277–287). Addressed above in Response to Reviewer #3 Public Review (2.1 and 2.2).

      (3) Regarding the results shown in Figures 4 and 6: do the authors have any insight about how the CTF defocus affects the deformation estimation and correction across the different sample types?

      We now report the defocus ranges used for each dataset (Table 1). We have addressed this problem in Response to Reviewer #3 Public Review (3).

      (4) Will the Jupyter notebooks used for the 2DTM analysis be made publicly available?

      Yes. We have deposited a python script used for the 2DTM benchmarking and figure generation in a public repository and added the link in Data Availability statement. (pp. 23 line 622, pp. 25 line 650). Addressed above in Response to Reviewer #3 Public Review (2.3).

      (5) I would also appreciate a few words about the implementation details of the 3D spline model (e.g., what libraries have been used, if any, or if the authors have implemented their own code for this).

      The 3D spline model and warping code were implemented by us (no external spline library was used) and the relevant implementation details are described in the “Sample distortion modeling and correction” section (pp. 7–10, lines 174–246). For optimization, we used the L-BFGS implementation provided by the dlib library, which is now explicitly cited (pp. 10, line 264).

      Some comments regarding the presentation of the work:

      (1) I found the mathematical background on splines on pages 7-9 a little distracting from the main ideas of the paper, and I believe it could be moved to the methods section. A short description of this in the main text of the paper would suffice, and it would be useful to state clearly when this is background material and when it is the authors' contribution.

      We appreciate the suggestion. Because Unbend includes an in-house spline implementation (no external spline library) and it is the central part of this work, we retained the spline description to support reproducibility. (pp. 7–10, lines 174–246).

      (2) More generally, I found the whole method very interesting, but understanding exactly what all the steps involved were was a bit cumbersome, as they are spread across different sections of the main text. I think it would be useful to have a dedicated section giving the exact steps taken in the algorithm, possibly pointing to the relevant section in the text for more details about each step. This could be, for example, in the form of an 'Algorithm' box or a flowchart.

      We added an Algorithm box as Figure 2 supplement summarizing the end-to-end workflow and pointing to the relevant sections for details (Figure 2—figure supplement 1 Algorithm, pp. 4, line 96–103, pp. 32 line 799). This is intended to make the sequence of steps easier to follow.

      (3) In Figure 3, panels (b) and (c), the difference between the two micrographs, before and after correction, is not very noticeable, particularly the Thon rings in the spectra. I don't know if this is due to the image quality in the paper or if a better example could be shown. For example, the differences are clear in some of the supplementary figures.

      Thank you for the suggestion. We revised the figure by adding annotations to show the recovered Thon rings. This figure shows a vertex motion and is intended not only to show improvement but also to illustrate complex, spatially varying deformation patterns that motivate the 3D spline model (pp. 12, lines 304–308). The supplementary figures display those with highest motions in each sample type, thus the Thon rings for the motion corrected micrograph in higher frequency space look more obvious. We also refer readers to the supplementary examples where the differences are more pronounced (pp. 12, lines 310–312).

    1. eLife Assessment

      This is a valuable study that integrates behavioral and molecular approaches to identify neuromodulators influencing blood-feeding behavior in the disease vector Anopheles stephensi. Through gene expression analyses across blood-seeking life stages and RNA interference experiments, the authors present solid evidence that co-knockdown of the neuromodulators short Neuropeptide F and RYamide affects blood-seeking states in A. stephensi. However, evidence demonstrating that these neuropeptides are sufficient to promote host-seeking is lacking.

    2. Reviewer #2 (Public review):

      Summary:

      In this study, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour after imbibing an initial bloodmeal. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides (particularly in the brain, but not in the abdomen since knockdown outside the brain did not affect feeding behaviour) appear to promote blood-feeding while having no impact on sugar feeding. Interestingly, when either of these two neuropeptide gene transcripts were reduced independently by RNAi, the proportion of females acquiring a blood meal was not affected, whereas simultaneous knockdown of both sNPF and RYa led to a reduction in blood feeding behaviour but did not impact sugar feeding.

      Given that the expression of both neuropeptide genes was found in mostly in non-overlapping brain neurons, this suggests that these two neuropeptides may elicit at least partially complementary actions promoting blood feeding in A. stephensi. Indeed, their putative receptors appear to be colocalized within several neurons within the brain, which could explain why knockdown of both sNPF and RYa transcripts was required to affect blood feeding behaviour (although authors could not confirm if either of these neuropeptides act independently as only partial knockdown was achieved in the brain). Finally, while sNPF was mapped to brain neurons and midgut enteroendocrine cells, the authors mapped RYa only in the brain while reporting expression in the abdomen by qPCR, but that was not localized to the midgut EECs (like sNPF). Therefore, the source of RYamide in the abdomen remains unknown in this mosquito species, but could involve the abdominal ganglia where this neuropeptide has been localized in Ae. aegypti.

      Strengths and/or weaknesses:

      Overall, the manuscript was effectively communicated. Previous concerns and requested clarifications have been addressed in the revised manuscript. While advanced cell-specific tools are lacking in this mosquito species, one weakness here is that peptides could have been applied ectopically in attempts to rescue the deficit in blood feeding behaviour following knockdown by RNAi. Further insight in this regard may be provided in future studies by this and other research groups.

      Reviewing editor comment:

      Inclusion of a schematic in Supplementary Figure S9B addresses the point raised by reviewer 1 in the previous round.

    3. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here Bansal et al., present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then use a transcriptomic approach to identify candidate neuromodulation path ways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi changes over the course of its life history and in response to its age, mating and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies which show that mating is pre-requisite for blood feeding behaviors in Ae. aegypt. Here they find A. stephensi like another Anopheline mosquitoes has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y- maze olfactometer that to some degree, changes in blood feeding status depend on behavioral modulation to host-cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host-cues for the blood-fed and mated individuals which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host-cues while navigating in flight, but something much more exciting happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood feeding stages of the mosquito's life cycle to identify a list of 9 candidates which have a role in regulating the host-seeking status of A. stephensi. Then through investigations of gene knockdown of candidates they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overrall, I found the experiments to be welldesigned. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich lines of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article I continued to think how many crucial details I may have missed if I were the scientist conducting these experiments. That attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors top down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      I believe the authors have adequately addressed all of my concerns; however, I think an accompanying figure to match the explained methods of the tissue-specific knockdown would help readers. The methods are now explicitly written for the timing and concentrations required to achieve tissue-specific knockdown, but seeing the data as a supplement would be especially reassuring given the critical nature of tissue-specific knockdown to the final interpretations of this paper.

      We thank the reviewer for the suggestion and have now incorporated a schematic in the supplementary figure S9B, explaining our methodology for achieving tissue-specific knockdowns.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding) although the impact was observed only after both neuropeptide genes underwent knockdown.

      While the authors have addressed most of the concerns of the original manuscript, a few issues remain. Particularly, the following two points:

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer's point or there has been a misunderstanding. In Figure 4D, we show that while there is more robust gene knockdown in unfed females, bloodfed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      NEW-

      In both the dsRNA treatments where animals were fed, neither was significantly different from control. Therefore, there is no change, and indeed this is confirmed by the author's labelling of the figure stats in panel 4D.

      We agree with the reviewer and thank them for pointing it out. We have now revised the figure legend and the text to reflect these results (see lines 351-354).

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,...

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      NEW-

      The authors are claiming that there is no variation between individual qPCR experiments (particularly in their controls)? Normally, one uses a known standard value (or calibrator) across multiple experiments/plates so that variation across biological replicates can be assessed. This has an impact on statistical analyses since there is no variation in the control data. Indeed, this impacts all figures/datasets in the manuscript where qPCR data is presented. All the controls have zero variation!

      We are truly thankful to this reviewer for insisting on this point. It has made us revisit what we thought we understood and now realise were doing wrong (though many in literature do it this way!). We were – incorrectly – setting each control to 1 and calculating relative fold changes for each replicate independently. While this is often seen in literature, we now realise that it is incorrect. We have revisited all our analyses and normalized all samples to the mean ΔCt of the control group, which captures biological variation in both control and experimental groups. All data are now re-plotted to show individual data points for both control and experimental groups, and the error bars on controls represent the biological variation across replicates (Figure 4D, 4F, 4G, S8, S9). Statistical analyses were also revised accordingly, and, importantly, they do not change any conclusions. Please note that the abdominal expression of sNPF and RYa are so low that the controls show very variable baseline expression values.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (2) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal hostseeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated and some conclusions appear premature based on the current data. The support for this conclusion would be strengthened with functional validation using peptide injection or genetic manipulation.

      (2) The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      (3) Some important caveats, such as variation in knockdown efficiency and the possibility of offtarget effects, are not adequately discussed.

      These comments were addressed in the previous round.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Awesome paper everyone. A delight to read and review.

      Thank you very much! We appreciated your comments too!

    1. eLife Assessment

      This study presents valuable findings and employs modern analytical approaches on how transient absence of visual input (darkness) affects tactile encoding in the rat somatosensory cortex (S1). The evidence supporting the authors' claims is solid, as population-level neural activity recorded in S1 and decoded by a CNN carries more discriminable texture information in darkness. The underlying basis of this effect remains only partly resolved, however, because it is still unclear which neural features from the CNN drive the decoding and if visual interference is appropriately accounted for, which might confound true neural representational change.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate how short-term visual deprivation influences tactile processing in the primary somatosensory cortex (S1) of sighted rats. They justify the study based on previous studies that have shown that long-term blindness can enhance tactile perception, and aim to investigate the change in neural representations underlying rapid, short-term cross-modal effects. The authors recorded local field potentials from S1 as rats encountered different tactile textures (smooth and rough sandpaper) under light and dark conditions. They used deep learning techniques to decode the neural signals and assess how tactile representations changed across the four different conditions. Their goal was to uncover whether the absence of visual cues leads to a rapid reorganization of tactile encoding in the brain.

      Strengths:

      The study effectively integrates high-density local field potential (LFP) recordings with convolutional neural network (CNN) analysis. This combination allows for decoding high-dimensional population-level signals, revealing changes in neural representations that traditional analyses (e.g., amplitude measures) failed to detect. The custom treadmill paradigm permits independent manipulation of visual and tactile inputs under stable locomotion conditions. Gait analysis confirms that motor behavior was consistent across conditions, strengthening the conclusion that neural changes are due to sensory input rather than movement artifacts.

      Weaknesses:

      (1) While the study interprets the emergence of more distinct texture representations in the dark as evidence of rapid cross-modal plasticity, the claim rests on correlational data from a short-term manipulation and decoding analysis. The authors show that CNN-derived feature embeddings cluster more clearly by texture in the dark, but this does not directly demonstrate plasticity in the classical sense (e.g., synaptic or circuit-level reorganization). The authors have noted this as a limitation and have clarified that the observed changes reflect functional reorganization rather than structural plasticity.

      (2) Although gait was controlled, changes in arousal or exploratory behavior in light versus dark conditions might play a role in the observed neural differences. The authors have controlled for various factors in relation to locomotion, but future studies would benefit from more direct behavioural readouts of arousal states (e.g., via pupillometry or cortical state indicators).

      (3) It should be noted that the time course of the observed changes (within 10 minutes) is quite rapid, and while intriguing, the study does not include direct evidence that the underlying circuits were reorganized-only that population-level signals become more discriminable. The authors have adequately discussed this as an avenue for more mechanistic future research.

      (4) The authors have adequately discussed that, while these findings are consistent with somatotopy and context-dependent dynamics, they do not provide strong independent evidence for novel spatial or temporal organization.

      (5) The authors have also discussed that, while the neural data suggest enhanced tactile representations, the study does not assess whether rats' actual tactile perception improved. Future studies including an assessment of a behavioral readout (e.g., discrimination accuracy), would be insightful.

      (6) The authors' discussion about the implications for sensory rehabilitation, including Braille training and haptic feedback enhancement was a bit premature, but they have amended this, and it remains an interesting translational potential to be explored in future studies.

      (7) While the CNN showed good performance, more transparent models (e.g., linear classifiers or dimensionality reduction) appear to not exceed chance level. The implications of this are that there is an underlying complex structure in the LFPs that has yet to be fully uncovered, on the mechanistic level. This would be important to push the findings forward in future studies.

      Therefore, while the authors raise interesting hypotheses around rapid plasticity, somatotopic dynamics, and rehabilitation, the evidence for each is indirect. Stronger claims will require future causal experiments, behavioral readouts, and mechanistic specificity beyond what the current data provides. However, the work represents an interesting starting point to a more mechanistic understanding in the future.

    3. Reviewer #2 (Public review):

      Summary:

      Yamashiro et al. investigated how transient absence of visual input (i.e. darkness) impacts tactile neural encoding in the rat primary somatosensory cortex (S1). They recorded local field potentials (LFPs) using a 32-channel array implanted in forelimb and hindlimb primary somatosensory cortex while rats walked on smooth or rough textures under illuminated and dark conditions. Employing a convolutional neural network (CNN), they successfully decoded both texture and lighting conditions from the LFPs. The authors conclude that the subtle differences in LFP patterns underlie tactile representation surface roughness and become more distinct in darkness, suggesting a rapid cross-modal reorganization of the neural code for this sensory feature.

      Strengths:

      • The manuscript addresses a valuable question regarding how sensory cortices dynamically adapt to changes in sensory context.<br /> • The use of machine learning (CNNs) enables the analysis to go beyond conventional amplitude-based metrics, potentially uncovering subtle but meaningful effects.<br /> • The authors have substantially improved the manuscript with clearer figures, additional statistical analyses (including permutation tests and cross-validation), and greater methodological transparency.

      Weaknesses:

      • The new analyses (grand-average LFPs, correlation maps, wavelet decompositions, attribution-score correlations) improve transparency but do not yet clarify which specific neural features the CNN exploits, leaving the central interpretability question unresolved.<br /> • A plausible alternative explanation for the increased discriminability in darkness remains insufficiently ruled out: visually driven activity in the light condition (e.g., ambient illumination changes or self-motion-induced visual input) could contaminate S1 LFPs and account for the effect without reflecting a true neural representational change.<br /> • Behavioural and order controls have been improved but remain somewhat limited in sample size.

      Overall assessment:

      The revised manuscript is clearer, more transparent, and technically strengthened. However, the true nature of the signal changes underlying the observed differences in discriminability remains unclear, limiting the scientific strength of the conclusions. The possibility that visual interference contributes to the observed effects remains a plausible and untested alternative interpretation. Additional experiments or analyses quantifying visually evoked activity in S1 would be required to confirm the claim of genuine reorganization of neural representation depending on the illumination condition.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) While the study interprets the emergence of more distinct texture representations in the dark as evidence of rapid cross-modal plasticity, the claim rests on correlational data from a short-term manipulation and decoding analysis. The authors show that CNN-derived feature embeddings cluster more clearly by texture in the dark, but this does not directly demonstrate plasticity in the classical sense (e.g., synaptic or circuit-level reorganization).

      Thank you for this insightful comment. We acknowledge that our claim of “rapid cross-modal plasticity” is based on correlational evidence and does not directly address synaptic or circuit-level reorganization, which would require more invasive methods. Our study instead focuses on changes in the representational structure of tactile stimuli when visual input is temporarily removed, highlighting the adaptability of sensory coding to environmental context. We agree that this distinction is important and have revised the manuscript to clarify that the observed changes reflect functional reorganization rather than structural plasticity, as indicated by the enhanced separability of texture representations in S1 during darkness.

      (2) Although gait was controlled, changes in arousal or exploratory behavior in light versus dark conditions might contribute to the observed neural differences. These factors are acknowledged but not directly measured (e.g., via pupillometry or cortical state indicators).

      Thank you for your insightful comment. We agree that arousal and exploratory behavior could influence neural differences and have considered these factors in our study. While gait was controlled, we did not directly measure arousal (e.g., via pupillometry or cortical indicators).

      To partially address this, we reviewed locomotor-speed traces (Supplementary Figure 1), which showed no significant differences between light and dark conditions, suggesting movement speed did not drive the neural differences. We also reversed the order of light and dark conditions, and although the separability of textures was not significantly different, it further supports that motivation did not confound our results.

      However, we acknowledge that arousal may still affect cortical dynamics, especially in the dark condition, where the lack of visual input might alter exploratory behavior. Due to technical limitations, we could not directly measure arousal states, and this is now discussed in the revised manuscript. While we cannot rule out the influence of arousal, the enhanced separability of texture representations suggests that sensory reorganization due to visual deprivation likely played a substantial role.

      (3) Moreover, the time course of the observed changes (within 10 minutes) is quite rapid, and while intriguing, the study does not include direct evidence that the underlying circuits were reorganized - only that population-level signals become more discriminable. As such, the term "plasticity" may overstate the conclusions and should be interpreted with caution unless validated by additional causal or longitudinal data.

      Thank you for your important comment. We agree that the term "plasticity" may overstate our conclusions, as our study focuses on population-level signal changes rather than direct evidence of circuit-level reorganization.

      To address this, we have revised the manuscript to clarify that while the observed changes in neural separability suggest functional reorganization of sensory representations, they do not confirm structural plasticity. We have updated the wording throughout the manuscript to emphasize that these findings reflect functional reorganization in response to short-term visual input loss, rather than structural or long-term plasticity.

      We also updated the discussion to highlight the need for future research with more invasive approaches to validate the causal mechanisms behind these rapid changes in neural dynamics.

      (4) The study highlights the forelimb region of S1 and a post-contact temporal window as particularly important for decoding texture, based on occlusion and integrated gradient analyses. However, this finding may be somewhat circular: The LFPs were aligned to forelimb contact, and the floor textures were sensed primarily via the forelimbs, making it unsurprising that forelimb electrodes were most informative. The observed temporal window corresponds directly to the event-aligned epoch, and while it may shift slightly in duration in the dark, this could reflect general differences in sensory gain or arousal, rather than changes in stimulus-specific encoding. Thus, while these findings are consistent with somatotopy and context-dependent dynamics, they do not provide strong independent evidence for novel spatial or temporal organization.

      Thank you for your insightful comment. We understand your concern that the finding of forelimb electrodes being most informative might seem circular, given that the LFPs were aligned to forelimb contact, and the floor textures were primarily sensed by the forelimbs. This design choice was intentional, as the task focused on texture perception through the forelimb, and the forelimb subregion of S1 is naturally expected to play a dominant role in this process. While this somatotopic specificity may make the results predictable, our aim was to emphasize the changes in temporal dynamics of neural processing under visual deprivation.

      We observed a shift in the temporal window's duration in the dark condition, which we interpret as a change in how texture information is processed without visual input. While this could reflect sensory gain or arousal differences, the lack of significant differences in locomotor speed or other behavioral measures (Supplementary Figure 1) suggests that these changes are more likely due to functional reorganization of sensory processing.

      We have clarified in the discussion that the shift in the temporal window is consistent with previous research on sensory reorganization involving both spatial and temporal cortical adjustments. While we do not claim novel spatial or temporal organization, we emphasize that the shift in temporal dynamics suggests adaptation in encoding strategy for texture perception in the absence of visual input. Future studies measuring arousal states (e.g., pupil diameter or cortical state markers) would help distinguish the contributions of arousal versus sensory reorganization to these dynamics.

      (5) While the neural data suggest enhanced tactile representations, the study does not assess whether rats' actual tactile perception improved. Without a behavioral readout (e.g., discrimination accuracy), claims about perceptual enhancement remain speculative.

      Thank you for raising this important point. We agree that while the neural data suggest enhanced separability of tactile representations in the dark condition, we do not directly assess whether these changes translate into improved tactile perception behaviorally.

      However, the primary aim of our study is not to claim perceptual enhancement, but to demonstrate that neural representations in the somatosensory cortex can rapidly reorganize in response to visual deprivation. To clarify this distinction, we have revised the manuscript to emphasize that the observed neural changes in S1 are consistent with functional reorganization of tactile representations, rather than a direct indication of perceptual improvement.

      Future studies will be crucial to directly test whether the enhanced separability of tactile representations in S1 correlates with improved tactile perception in a behavioral task. We have highlighted this as an avenue for future research to better understand the link between neural changes and perceptual outcomes.

      (6) In addition to point 4, the authors discuss implications for sensory rehabilitation, including Braille training and haptic feedback enhancement. However, the lack of actual chronic or even more acute pathological sensory deprivation, behavioral data, or subsequent intervention in this study limits the ability to draw translational conclusions. It remains unknown whether the more distinct neural representations observed actually translate into better tactile performance, discriminability, or perception. Additionally, extrapolating from rats walking on sandpaper in the dark to human rehabilitative contexts is speculative without a clearer behavioral or mechanistic bridge. The potential is certainly there, but the claim is currently aspirational rather than empirically grounded.

      Thank you for raising this important point. Upon careful consideration, we have decided to remove the discussion of sensory rehabilitation implications from the revised manuscript. We have refocused the manuscript to concentrate solely on the neural findings related to tactile encoding reorganization in response to short-term sensory deprivation, avoiding speculative extrapolation to human rehabilitative contexts. This revised approach ensures that the manuscript emphasizes the empirical findings without overstating the translational potential.

      (7) While the CNN showed good performance, details on generalization robustness and validation (e.g., cross-validation folds, variance across animals) are not deeply discussed. Also, while explainability tools were used, interpretability of CNNs remains limited, and more transparent models (e.g., linear classifiers or dimensionality reduction) could offer complementary insights.

      We appreciate the reviewer’s valuable feedback. In response to the concern about generalization robustness and validation, we have now conducted 5-fold cross-validation to assess the model's performance within animals (Figure 6C). We also have added supplementary information on the average silhouette scores across the different folds and animals (Supplementary Table 1, 2). These details are provided in the methods section and discussed in the results to offer a clearer picture of the model's robustness and consistency across rats.

      Regarding the interpretability of CNNs, we acknowledge that deep learning models can lack transparency. We also attempted classification using more transparent models such as PCA and SVM, but their performance did not exceed chance level (Supplementary Figure 2). This indicates that while these simpler models are more interpretable, they cannot capture the complex representations in the LFPs, making deep learning models like CNNs necessary for extracting these insights.

      Reviewer #2 (Public review):

      (1) Despite applying explainability techniques to the CNN-based decoder, the study does not clearly demonstrate the precise "subtle, high-dimensional patterns" exploited by the CNN for surface roughness decoding, limiting the physiological interpretability of the results. Additional analyses (e.g., detailed waveform morphology analysis on grand averages, time-frequency decompositions, or further use of explainability methods) are necessary to clarify the exact nature of the discriminative activity features enabling the CNN to decode surface roughness and how these change with the sensory context (i.e., in light or darkness).

      Thank you for your insightful comment. We recognize the importance of clarifying the exact nature of the high-dimensional neural patterns that the CNN exploits for surface roughness decoding. In response, we have performed additional analyses to provide a more detailed explanation of the CNN's decision-making process and the discriminative features it learned:

      Grand-Average LFP Waveforms Analysis: We calculated the grand-average LFP waveforms for each texture × lighting condition (Figure 4A). While visual inspection did not reveal distinct features in the averaged waveforms, we explored the channel-wise correlations between textures under both light and dark conditions (Figure 4B). We found that the correlation between textures was lower in the dark condition, suggesting that LFPs become more distinct between textures when visual input is absent, which aligns with the CNN’s output.

      Time-Frequency Decomposition (Wavelet Analysis): We also performed time-frequency decomposition of the LFPs using wavelet transforms (Figure 4D). No prominent differences emerged across texture × lighting conditions in the spectral domain. However, upon computing differences in wavelet features between light and dark conditions and analyzing the relationship with the CNN's attribution scores (Supplementary Figures 5A-C), we observed a negative correlation in the 50-60 Hz range and a positive correlation in the 80-90 Hz range. This suggests frequency-specific modulation in LFP activity that may contribute to texture representations, providing further support for the CNN’s learned features.

      (2) The claim regarding cross-modal representation reorganization heavily relies on a silhouette analysis (Figure 5C), which shows a modest effect size and borderline statistical significance (p≈0.05 with n=9+2). More rigorous statistical quantification, such as permutation tests and reporting underlying cluster distances for all animals, would strengthen confidence in this finding.

      Thank you for your thoughtful comment. We appreciate your suggestion to strengthen the statistical rigor of our analysis regarding the cross-modal representation reorganization. In response, we have implemented several additional analyses to more rigorously quantify the separability of neural representations between light and dark conditions:

      (1) Permutation Test for Cluster Separability: We performed a permutation test to assess whether the observed differences in cluster separability between light and dark conditions were statistically significant or could have arisen by chance. The results showed that the silhouette scores for the dark condition consistently exceeded the 95th percentile of the null distribution (Supplementary Figure 4). This permutation test strengthens the validity of our findings, indicating that the enhanced separability in darkness is a systematic reorganization of neural representations, not due to random fluctuations.

      (2) Reporting Cluster Distances: To address concerns about the modest effect size and borderline significance, we have explicitly reported the underlying cluster distances in the form of silhouette scores for each individual animal (Supplementary Table 1, 2). These values reflect the Euclidean distance between clusters within each rat, providing a clearer understanding of the separability observed.

      (3) Additional Statistical Analysis on Silhouette Scores: To further enhance the rigor of our statistical analysis, we recalculated the silhouette scores using 5-fold cross-validation within each animal, ensuring that our results are robust across multiple data splits (Figure 6C).

      By incorporating these additional analyses and reporting detailed cluster distances, we believe we have significantly strengthened the confidence in our claim of cross-modal reorganization.

      (3) While the authors recorded in the somatosensory cortex, primarily known for its tactile responsivity, I would be cautious not to rule out a priori the presence of crossmodal (visual) responses in the area. In this case, the stronger texture separation in darkness might be explained by the absence of some visually-evoked potentials (VEPs) rather than genuine cross-modal reorganization. Clarification is needed to rule out visual interference and this would strengthen the claim.

      Thank you for raising this important point. In response to your concern, we carefully examined whether visually-evoked potentials (VEPs) could be present in the S1 recordings, particularly under the light condition. However, we observed that this experiment did not involve any cue-guided visual stimulation, such as flashing lights or visual cues aligned with the LFP recordings. Without such external visual stimuli, it is unlikely that VEPs would be reliably evoked in the S1. Therefore, we believe the stronger texture separation observed in the dark condition is not due to visual interference, but rather reflects a genuine sensory reorganization in response to the absence of visual input.

      (4) Behavioural controls are limited to gross gait parameters; more detailed analyses of locomotor behavior and additional metrics (e.g., pupil size or locomotor variance) would robustly rule out potential arousal or motor confounds.

      Thank you for your insightful comment regarding behavioral controls. In response, we have added locomotor speed traces aligned with corresponding LFPs (Supplementary Figure 1) to demonstrate that locomotion remained consistent across trials, irrespective of environmental condition (light vs. dark). Additionally, we report locomotor speed variance over 10-minute blocks to confirm no significant motor changes affecting neural recordings. These analyses indicate that LFP differences are unlikely due to locomotor confounds.

      While measuring pupil size could be useful for assessing arousal, the camera resolution in our study was insufficient for reliable measurements. We have noted this limitation in the Discussion and recommend that future studies with high-resolution eye-tracking explore arousal's role in sensory processing in S1.

      (5) The consistent ordering of trials (10 minutes of light then 10 minutes of dark) could introduce confounds such as fatigue or satiation (and also related arousal state), which should be controlled by analyzing sessions with reversed condition ordering.

      Thank you for highlighting the potential confounds due to trial ordering. To address this, we reversed the condition order (dark before light) in a subset of sessions from six rats and reanalyzed the data (Supplementary Figure 3). The results showed not significant, but increase separability in the dark condition, suggesting that the enhanced separability in the dark condition is not due to trial order effects like fatigue or satiation. While order effects may contribute to trial-to-trial variability, the consistent pattern of enhanced separability in the dark further supports the interpretation that visual deprivation directly influences the reorganization of tactile representations in S1.

      (6) The focus on forelimb-aligned LFP analyses raises the possibility that hindlimb-aligned data might yield different conclusions, suggesting alignment effects might bias the results.

      Thank you for your insightful comment on the potential bias of forelimb-aligned LFP analyses. We acknowledge that the choice of alignment event can influence the results and appreciate the suggestion to consider hindlimb-aligned data. However, our experimental design specifically focused on forelimb S1. The forelimb region of S1 was oversampled in our array, and as expected, we observed larger responses there, consistent with the known somatotopic organization of S1.

      While hindlimb-aligned data could provide additional insights, it is not directly relevant to the primary question of how forelimb S1 codes tactile information under visual deprivation. We do not believe the forelimb alignment introduces a bias, as it aligns with the sensory task being investigated. However, we recognize the value of exploring alternative alignments and have now included a discussion in the Methods section regarding the rationale for our design choices.

      (7) The authors' dismissal of amplitude-based metrics as ineffective is inadequately substantiated. A clearer demonstration (e.g., event-related waveforms averaged by conditions, presented both spatially and temporally) would support this claim.

      Thank you for your constructive comment. In response, we have added a more detailed analysis of event-related waveforms, averaged across conditions (light vs. dark, smooth vs. rough textures), and presented them spatially and temporally aligned to forelimb contact (Figure 4A). These waveforms did not show clear, distinct features that could differentiate conditions, which highlights the limitations of traditional amplitude-based metrics in detecting subtle neural activity changes related to visual deprivation.

      We further performed channel-wise correlation analyses (Figure 4B), revealing stronger texture correlations in the light condition, indicating that averaged waveforms do not capture the nuanced differences in neural dynamics. Additionally, time-frequency spectrograms and channel–channel correlation matrices (Figures 4C and 4D) did not show distinct condition differences, reinforcing the limitations of amplitude-based metrics.

      These findings, along with the superior performance of machine learning-based decoding methods (e.g., CNN), support our claim that amplitude-based approaches are insufficient for fully capturing the complexity of the neural data.

      (8) Wording ambiguity regarding "attribution score" versus "activation amplitude" (Figure 5) complicates the interpretation of key findings. This distinction must be clarified for proper assessment of the results.

      Thank you for pointing out the ambiguity between "attribution score" and "activation amplitude." To address this, we have revised the manuscript to use "attribution score" only.

      (9) Generalization across animals remains unaddressed. The current within-subject decoding setup limits conclusions regarding shared neural representations across individuals. Adopting cross-validation strategies and exploring between-animal analyses would add significant value to the manuscript.

      Thank you for highlighting the importance of generalization across animals. While our study focused on within-subject decoding, we acknowledge that this limits conclusions about shared neural representations across individuals. We expect that inter-animal generalization would be challenging, as models trained on data from a single rat may not perform well on data from others due to differences in electrode placement, brain anatomy, and neural representations. We recognize the value of cross-validation strategies and between-animal analyses and will consider them in future work to address this limitation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I would strongly recommend that the authors refine their introduction to be more concise. Many concepts and study aims are repeated many times and, therefore, present as highly redundant text. The introduction may be half the length and still contain the important concepts to set up the justification for the study. I would also suggest refining to be less about sensory deprivation (e.g., with blindness) and more in relation to context, as the acute nature of the study allows one to conclude more about the latter than the former.

      Thank you for your feedback on the introduction. We have revised the section to reduce redundancy and present the key concepts more concisely. We also streamlined the study aims and focused more on the context of the acute nature of the study, as you suggested, rather than emphasizing sensory deprivation. This revision better aligns with the main focus of the research and improves clarity. We believe the updated introduction provides a more direct justification for the study.

      (2) I am not sure if Figures 1-3 are meant to be in grey-scale for some reason (perhaps to represent light and dark), but I would encourage the authors to examine if this is necessary, as the use of color generally helps one more easily follow Figures.

      Thank you for this suggestion. Upon review, we agree that the use of color would enhance the clarity and readability of our figures. We have revised the figures including the newly added supplementary figures to incorporate color.

      (3) Figure 5, Figure legend title - check wording.

      Thank you for pointing this out. The title has been adjusted for consistency with the other figure legends.

      Reviewer #2 (Recommendations for the authors):

      (1) Analyses that would strengthen the main claims (major):

      (a) Identify the features exploited by the CNN.

      (i) Provide grand-average LFP waveforms for each texture × lighting condition (fore- and hind-limb channels shown separately, spatially arranged as in Figure 3C) and try to relate them to the decoding strategy learned by the CNN.

      Thank you for your helpful suggestion. We have calculated the grand-average LFP waveforms for each texture × lighting condition and included them in Figure 4A, with fore- and hind-limb channels shown separately and spatially arranged as in Figure 3C. Upon visual inspection, the mean waveforms did not reveal clear, distinct features. To further investigate, we computed the channel-wise correlation between different textures under both dark and light conditions. By subtracting the correlation coefficients for the dark environment from those in the light, we observed that the correlation between textures was lower in the dark environment (Figure 4B). This suggests that LFPs are more distinct between textures in the dark, supporting the CNN model's output. However, this also indicates that the CNN has captured more complex, nuanced information, as it is able to discriminate between LFPs on a single-trial basis, rather than relying on mean traces.

      To assess how the correlation between average LFP waveforms varied across channels, we also calculated the channel-channel correlation matrix for all 32 channels in each condition. While we found stronger correlations within each S1 subregion, we did not observe clear differences of correlation matrix between light and dark conditions, nor between different textures (Figure 4C).

      (ii) Add channel-wise and time-frequency maps (e.g., wavelet or spectrograms) for each texture × lighting condition and try to relate them to the decoding strategy learned by the CNN.

      Thank you for the valuable suggestion. We calculated wavelet features for each LFP segment and averaged them across trials to assess differences in LFP between light and dark conditions, as well as across textures (Figure 4D). However, no distinct differences were observed in the spectral map. To investigate further, we computed the differences in spectral maps for LFPs in light and dark trials. We then calculated the difference in attribution scores derived from the integrated gradient map (Supplementary Figure 4A). Subsequently, we calculated the correlation coefficients between the differences in integrated gradients and the differences in power across each frequency band in the spectral map (Supplementary Figures 4B and 4C). A negative correlation was found in the 50-60 Hz range, while a positive correlation was observed in the 80-90 Hz range. These findings suggest that frequency-specific patterns of LFP activity in different conditions may be linked to the texture representations captured by the CNN model. We have included a discussion of these findings in [lines 463-468].

      (b) Quantify the "enhanced separability in darkness" more rigorously.

      (i) Report cluster-distances (e.g. Euclidean) for each individual animal.

      We thank the reviewer for this helpful comment. When calculating the silhouette score, we used Euclidean distance as the distance metric. The silhouette score is defined for each data point as the difference between the average distance to points within its assigned cluster and the average distance to points in the nearest other cluster, normalized by the larger of the two values. Thus, the silhouette score inherently reflects the relative cluster distances both within and across conditions for each individual animal. Because we report and statistically analyze silhouette scores (Figure 6C), these values already quantify and compare the Euclidean cluster distances across conditions at the animal level. For clarity, we have now added a definition of the silhouette score in the Methods section of the main text [lines 269-278]. We also included the calculated silhouette scores in Supplementary Table 1.

      (ii) Run a permutation or bootstrap test (shuffling darkness/light labels within animals) to obtain an empirical null distribution for cluster separability in the network embedding space.

      We thank the reviewer for this important suggestion. In response, we implemented a permutation test to assess the robustness of our cluster separability results. Specifically, we shuffled the darkness/light labels within each animal and recalculated silhouette scores across 1000 resamples to generate an empirical null distribution. The observed separability between light and dark conditions consistently exceeded the 95th percentile of the null distribution (Supplementary Figure 3). This confirms that the enhanced cluster separability in darkness was not attributable to random fluctuations in labeling but instead reflected a systematic reorganization of neural representations.

      (c) Control for possible visually-evoked potentials (VEPs).

      (i) Search the LFPs recorded in light for stereotyped VEP components and/or comment on this possible confound (i.e., VEPs in S1?).

      Thank you for raising this point. Although it would be interesting to observe if a VEP is present in the S1 of rats, this experiment did not involve cue-guided visual stimulation. Additionally, there was no environmental visual cue that could serve as an external trigger to align the LFPs for VEP analysis in S1. Furthermore, since even the somatosensory evoked potential was not clearly visible in the S1 LFP without averaging the aligned LFPs, it is unlikely that we would be able to observe VEPs in single trials.

      (d) Address behavioral and arousal confounds.

      (i) Provide example locomotor-speed traces (aligned with corresponding LFPs) and report locomotor-speed variance across the 10-min blocks.

      Thank you for your comment. We had speedometer installed for the recording of the last two rats. We have now provided example speed traces and the speed variance across blocks in Supplementary Figure 1. The traces show that the locomotor-speed was stable in each trial.

      (ii) If available from the camera recordings, include pupil diameter as a proxy for arousal; otherwise, discuss explicitly how arousal changes might affect S1 LFPs.

      Thank you for this suggestion. We strongly agree that measuring pupil diameters should be incorporated into future studies. However, because our camera did not have sufficient resolution to capture pupil diameters, we have addressed this limitation in the discussion section [lines 525-537].

      (e) Address order effects (and motivation/satiety confounds)

      (i) Present at least a subset of sessions in which the dark block precedes the light block; re-analyze the silhouette score/discriminability with block order as a factor.

      Thank you for this helpful suggestion. We conducted additional analyses using sessions from 6 rats in which the dark block preceded the light block (Supplementary Figure 5A). Using the same model architecture, we calculated the silhouette score for each rat (Supplementary Figure 5B). However, when the order was reversed (dark preceding light), this discriminability effect disappeared. Thus, while we observed a trend toward higher scores in the dark condition, no statistically significant differences in texture discriminability were observed.

      If trial order alone accounted for the increase in discriminability, reversing the order would be expected to yield higher silhouette scores in the light condition. Our findings suggest that factors related to order (e.g., thirst or motivation, as you proposed) are not the sole contributors. Furthermore, previous studies in human participants have shown that brief blindfolding can produce lingering increases in tactile sensitivity, indicating a lasting effect of visual deprivation. Thus, the absence of significant differences in texture representation when the dark condition preceded the light condition may reflect such lasting effects. We have included a discussion in [lines 441-452].

      (ii) Discuss explicitly the potential confounding effect of motivational state/thirst.

      We appreciate the reviewer’s insightful comment. In the revised manuscript, we now explicitly address the potential confounding role of motivational state and thirst in shaping our results. Because animals were water-restricted to maintain task engagement, it is possible that increasing thirst or fluctuating motivation over the course of a session could alter arousal or attentional state, thereby influencing neural separability. However, when the trial order was reversed (dark condition preceding light), silhouette scores did not show a significant increase in the second (light) trial. Thus, while we acknowledge that motivational state may contribute to trial-to-trial variability, the systematic increase in separability during darkness cannot be fully explained by thirst or motivational confounds. This addition has been incorporated into the discussion section [lines 441-452].

      (f) Alignment control and the role of forelimb S1.

      (i) Repeat the decoding analysis with LFPs aligned to hind-limb strike; report whether the fore-limb dominance persists.

      Thank you for your thoughtful suggestion. We appreciate the opportunity to clarify. Our study was designed to ask a different question: how the absence of visual input reorganizes tactile encoding for the body part that actually initiates texture contact in our paradigm (the forepaw). Accordingly, all analyses were aligned to forelimb strike and our array intentionally oversampled S1-forelimb relative to S1-hindlimb (18 vs. 14 electrodes; Fig. 1F–G), yielding clear topographic forelimb-locked event-related responses (Fig. 3B–D) and forelimb-channel dominance in the decoding explainability analyses (Fig. 5D–E). Repeating the full decoding locked to hind-limb strike would test a different hypothesis and would be difficult to interpret for three reasons:

      Design/measurement alignment. Our kinematic detection was built to identify forelimb foot strikes. Extending the detector to hindlimb would require new model training/validation and introduces uncertainty in the exact contact timing relative to the LFP segments we analyze.

      Sampling asymmetry. The array and cortical magnification are not balanced across subregions (18 forelimb vs. 14 hindlimb electrodes; Fig. 1G), so a hind-limb–aligned comparison would be confounded by unequal coverage and signal-to-noise across S1 subdivisions rather than reflecting true “dominance.”

      Scope of the claim. We do not claim that the forelimb is globally more informative about texture; we show the intuitive and topographically specific result that “forelimb S1 codes textures touching the forelimb,” and that these representations become more separable in darkness (silhouette increase; Fig. 5C). A hind-limb–locked re-analysis would likely reveal hindlimb contributions when the hindpaw is the alignment event — but that would not change the central conclusion about darkness enhancing tactile representational separability.

      To address the underlying concern about generality without introducing the above confounds, we have clarified these design choices and limitations in the revised Methods [lines 194-197].

      (g) Amplitude-based baseline.

      (i) Show that a simple linear discriminant or logistic-regression model on peak amplitudes (and/or other simple features like trough width/slope) cannot reach the CNN's accuracy. This kind of "baseline" analysis could also be useful to pinpoint the discriminative features learned by the CNN.

      Thank you for your insightful suggestion. We agree that performing a baseline comparison with a simpler model could help highlight the advantage of using a CNN. However, in our dataset, individual LFP traces do not exhibit clear peaks or well-defined features such as peak amplitude, width, or energy, which makes feature extraction using traditional methods like linear discriminants or logistic regression challenging.

      To address this, we performed principal component analysis (PCA) on the raw LFP traces to reduce the dimensionality and applied a support vector machine (SVM) classifier on the reduced features, in line with the approach used for the CNN models (Supplementary Figure 2A). The results of this analysis, demonstrate that the SVM model struggles to effectively discriminate between conditions, further reinforcing the necessity of the CNN model. The CNN’s ability to automatically learn complex features from the raw LFP data appears to be a crucial factor in achieving superior classification performance (Supplementary Figure 2B).

      (h) Cross-validation and inter-animal generalization.

      (i) Consider replacing the single 80/20 split with k-fold cross-validation within animals.

      Thank you for this suggestion. Instead of using an 80/20 split, we performed 5-fold cross-validation on all rats. The silhouette scores were averaged within each animal across the five folds, and Figure 6C was updated accordingly. After performing a paired t-test, we still observed a significant difference in silhouette scores between the light and dark conditions.

      (ii) Comment on inter-animal generalization.

      Thank you for this valuable feedback. Although we did not explicitly test inter-animal generalization, it is unlikely that a model trained on data from one rat would perform equally well when classifying data recorded from another animal. This limitation arises from two main factors. First, despite careful efforts to implant electrodes in the same brain region and cortical layer across experiments, it is impossible to align all 32 electrodes to identical coordinates. Consequently, the recorded LFPs are obtained from slightly different locations, which may reflect distinct neural processing. Second, even within the same species, individual animals differ in brain size and neural circuit organization. Thus, even if electrodes could be placed at identical anatomical locations, inter-individual variability in brain structure would still lead to differences in the recorded signals. Because deep learning models are often sensitive to small perturbations in their input data, we believe that robust inter-animal generalization is unlikely without fine-tuning the model using data from the target animal. This comment has been inserted in the Discussion [lines 494-507].

      (2) Writing, figure and terminology improvements (minor):

      (a) Figure 5F-G axis label. Decide on either "attribution score" or "activation amplitude" and use that term consistently in panels, legend, and text (currently, I believe it could be confused with raw signal amplitude).

      We have unified the terminology to "attribution score" and applied this consistently across the panels, legend, and text.

      (b) Throughout the manuscript, use "population-level activity" or "average population dynamics" when discussing LFPs (I believe it is more correct to reserve "population code" for multiple single-unit datasets).

      We agree with the reviewer’s point and have adapted the term "population dynamics" to describe LFP information consistently throughout the manuscript.

      (c) Lines 219-221, state down-sampling to 2 kHz, whereas line 289 mentions 10 kHz. Reconcile these numbers.

      We apologize for the confusion and thank the reviewer for thoroughly reading the manuscript. Our original sampling rate was 30 kHz, and all analyses were performed on data resampled to 10 kHz. The reference to 2 kHz was an error, and we have corrected it.

      (d) Specify the tail of each statistical test mentioned in the manuscript and any multiple-comparison correction used.

      We have specified the tail of each statistical test and any multiple-comparison corrections used in the "Data Analysis" section of the Methods.

      (e) Line 244: "variables (He et al., 2015)" → "variables (He et al., 2015)".

      We have corrected this formatting issue and revised it to "variables (He et al., 2015)".

      (f) Line 253: "one-dimentional" → "one-dimensional".

      We have corrected the spelling error and revised it to "one-dimensional".

      (3) Data and code sharing:

      (a) Consider depositing data and code for the analysis in public open repositories.

      Thank you for your suggestion. We have set up a public GitHub repository to share the code. Since the full dataset is quite large (~400GB), we have uploaded a smaller example dataset for the analysis.

    1. eLife Assessment

      The authors test the hypothesis that gonadal steroid signaling influences the transcriptional development of specific neurons in the mPOA during adolescence, and that such adolescent development of the mPOA is necessary for mating behaviors. The valuable findings are supported by convincing evidence. This work contributes new insight into hormone-sensitive transcriptional profiles within genetically defined neuron clusters in the mPOA during adolescence and will be of interest to systems and molecular neuroscientists and those interested in development, sex differences, and/or hormonal regulation.

    2. Reviewer #2 (Public review):

      Summary:

      An abundant literature documents molecular changes in the rodent hypothalamus that occur during the transition from prepubertal to mature reproductive physiology. Equally well documented is the role of sex steroids and their receptors during this important period of reproductive development, as well as the importance of GABAergic and glutamatergic neurons. The medial preoptic area (MPOA) is known to play a central role in expression of sexually dimorphic reproductive function and previously reported sexually dimorphic patterns of gene expression are consistent with this role. The present manuscript extends this knowledge base and reports the results of a detailed evaluation of transcriptional dynamics in the MPOA during the adolescent transition to maturity with a particular focus on the role of the estrogen receptor gene (Esr1). Both single cell RNA sequencing (scRNseq) and multiplex in situ hybridization methods were employed and the results subjected to detailed computational analyses to demonstrate that the transcriptomic structure of MPOA neurons displays both sex and cell type specific expression profiles. In addition, both hormonal and genetic manipulations of Esr1 signaling during puberty altered the transcriptional profiles of MPOA neurons, and these changes aligned with maturation of hormone-dependent reproductive function. The authors provide this evidence to illustrate Esr1-dependent control of gene regulatory networks required for normal expression of reproductive behaviors expressed during the transition from adolescence to adulthood. The results presented in this manuscript are extensive and represent the most comprehensive evaluation of transcriptomic changes during reproductive maturation to date. The methods appear strong and the results provide a rich data set that will support a good deal of future analysis.

      Strengths:

      (1) The major strength of this manuscript is the extensive set of images and graphs that illustrate molecular changes that occur in MPOA neurons during adolescence, although additional spatial detail as to locations of the source neurons would be welcome in order to place the changes in the proper circuitry context.

      (2) Targeting Esr1 deletion to MPOA GABA neurons is a good choice, given how these cells have been implicated in sexual differentiation of reproductive behavior previously, and the lack of comparable responses in glutamatergic neurons is convincing. The AAV-frtFlex-Cre virus created by the investigators is a most useful tool for such studies. Profiling distinct transcriptomic trajectories in GABA and glutamatergic neurons during reproductive maturation is impressive and leads to some of the best supported conclusions in this paper.

      (3) Cellular and molecular resolution of the transcriptomics data appears excellent, however, because the source tissue for the scRNAseq analysis was obtained by bulk dissection of the MPOA anatomical resolution is limited. This problem is addressed to some extent by careful comparison of scRNAseq results with previously published spatial transcriptomics data. The HM-HCR-FISH analysis clearly documents spatially restricted changes in gene expression, but it is hard to discern where these changes occur based on the images presented or the descriptions included in the Results. The anatomical schematic included in Figure 4 suggests that investigators are not familiar with components of the MPOA (see Allen Mouse Brain Atlas).

      Weaknesses:

      (1) A major conceptual flaw is that the authors do not distinguish between genetically determined sex differences in patterns of gene expression and differences caused by the fact that MPOA neurons are exposed to different endocrine environments in adolescent males and females, which can cause different transcriptional trajectories independent of genetic sex. This issue does not render their results invalid, but their terminology should address the issue in the discussion and "limitations" section. At the very least the endocrine status of "intact females" should be included.

      (2) A major technical flaw is that the MPOA is treated as a functionally distinct brain region (block dissections) with uniform distribution of cell types (FISH data are not illustrated or reported with sufficient spatial detail). Thus, an enormous amount of molecular data is provided that cannot be mapped to distinct neural circuits, thereby limiting the neurobiological impact. This is also a weakness of the FISH data, which is presented with only small regions illustrated without anatomical detail. In fact, some images are compared that appear to illustrate different MPOA structures, although it is impossible to be certain of this due to the lack of morphological landmarks. The analysis of how Esr1 orchestrates regulatory gene networks is impressive and interesting, but the fact that many of the observed transcriptional events occur in neural circuits that do not overlap confounds interpretation.

      (3) The locations of the AAV injections should be characterized because deleting Esr1 in multiple distinct parts of the MPOA will likely confound interpretation. This is especially problematic given the limited number of mice used for parts of the RNAscope analysis.

      (4) Although the focus of these experiments on adolescence is welcome, neither the Introduction nor the Discussion do a good job of placing these studies in the context of what is already known about brain maturation during puberty. It is true that this is very much a results-focused manuscript, but the scholarship can be improved. Simply stating that your results are consistent with previous reports places an undue burden on the reader to go figure out what is new.

      (5) Throughout the manuscript, the authors utilize obscure abbreviations, which often makes reading their text overly cumbersome. This is certainly justified in certain instances where complex names of analytical methods are used repeatedly, but the authors are encouraged to try and simply their use of non-standard abbreviations.

      Comments on revisions:

      The authors have considered issues raised during the initial review. Although there do not appear to be significant changes to analyses, figures or conclusions, the authors have added important revisions listing limitations in study design and methodology that impact interpretation.

    3. Reviewer #3 (Public review):

      The paper identifies effects of gonadal hormones within hormone-responsive GABAergic neurons in the MPOA. Although it is not surprising that hormones have effects on neurons that express hormone receptors, the current paper adds insights with higher cellular and spatial resolution than previous work and focuses on adolescence period. The paper also identifies a major role for Esr1-dependent mechanisms on behavior using an intersectional genetic strategy to ablate Esr1 in GABAergic or glutamatergic neurons in the MPOA.

      The authors have thoughtfully addressed the reviews, in particular by focusing quantitative analyses on Vgat+Esr1+ clusters and adding important technical and conceptual considerations in the limitations section.

      I have one remaining minor concern. I appreciate that the text now defines "transcriptional maturation". However, the term seems inappropriate when describing the "minimal transcriptional changes" in Vgat+hormone RLow clusters, which implies that they are transcriptionally immature. Do the authors mean to imply that transcriptional maturation is observed in Vgat+Esr1+ clusters but not Vgat+hormone RLow clusters? The authors also use the term "hormone-dependent transcriptional dynamics", which I think is more appropriate. For example, hormone-dependent transcriptional dynamics are observed in Vgat+Esr1+ clusters but not Vgat+hormone RLow clusters.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public review:

      Reviewer #1 (Public review):

      Weaknesses:

      Two minor comments

      (1) Fig 4 (hormone treatment): In this experiment, testosterone is given to males, yet in Sup Fig 6 it is argued that Esr1 is more influential in driving transcriptional changes compared to AR. Does DHT treatment have the same outcome as testosterone? Or, does estrogen treatment in males have the same outcome as testosterone?

      We agree that to distinguish AR and Esr1 activation by testosterone and converted estrogen respectively is a limitation in our study. We added discussion in the “limitation of the study” section.

      Although HM-HCR experiments showed the bidirectional control of transcriptional progression during adolescence, it is unclear if the facilitation in male by testosterone supplement is via activation of AR or Esr1 or both because testosterone will likely be converted to estrogen in the brain. Future studies using dihydrotestosterone (DHT) and estrogen to males may address this issue.

      (2) Fig 3i: There appears to be an age-dependent transcriptional change in male Vgat HR-low cells. Can the authors comment on age-dependent (hormone-independent) transcriptional changes in males versus females.

      We agree that it is important to clarify hormone dependent changes and age dependent changes. We added pair-wise DE results in Vgat HR low population in the main text. As consistent with trajectory analysis, the number of age-dependent genes were fewer than hormonally associated genes.

      “Pair-wise DEG analysis consistently showed that larger number of DEGs between P35 and P23 in Vgat+Esr1+ (male: 146 genes; female: 162 genes) than Vgat+ hormone R<sup>Low</sup> (male: 26 genes; female: 1 gene).”

      Reviewer #2 (Public review):

      Weaknesses:

      (1) A major conceptual flaw is that the authors do not distinguish between genetically determined sex differences in patterns of gene expression and differences caused by the fact that MPOA neurons are exposed to different endocrine environments in adolescent males and females, which can cause different transcriptional trajectories independent of genetic sex. This issue does not render their results invalid, but their terminology should address the issue in the discussion and "limitations" section. At the very least the endocrine status of "intact females" should be included.

      We agree that this was ideal if perinatal and pubertal dynamics are analyzed within the same study to distinguish these two processes. We added discussion in the “limitation section”.

      “2. Although we have identified hormone/Esr1 dependent transcriptional trajectories during adolescence, the relations and interplay with genetically determined perinatal event, which is earlier and robust, are unclear. Some sex differences during adolescence might be an extension of perinatally established sex differences while others might be unique adolescent changes.”

      (2) A major technical flaw is that the MPOA is treated as a functionally distinct brain region (block dissections) with uniform distribution of cell types (FISH data are not illustrated or reported with sufficient spatial detail). Thus, an enormous amount of molecular data is provided that cannot be mapped to distinct neural circuits, thereby limiting the neurobiological impact. This is also a weakness of the FISH data, which is presented with only small regions illustrated without anatomical detail. In fact, some images are compared that appear to illustrate different MPOA structures, although it is impossible to be certain of this due to the lack of morphological landmarks. The analysis of how Esr1 orchestrates regulatory gene networks is impressive and interesting, but the fact that many of the observed transcriptional events occur in neural circuits that do not overlap confounds interpretation.

      We agree that while MPOA is defined based on brain atlas consistently across samples, the boundary is somewhat less obvious compared to other nuclei (e.g. hippocampus, VHM etc). To minimize the contaminations from adjacent areas, we have restricted quantitative analysis to mostly Vgat+ Esr1+ population which are densely located within the MPOA but not in immediately adjacent areas, except posterior BNST which is readily distinguishable. We added clarification in the method as well as added technical limitation in the discussion below.

      Method

      “To disambiguate the MPOA and adjacent brain regions, quantitative analysis is restricted to Vgat+ Esr1+ neurons and is devoid of posterior BNST.”

      Discussion

      “3. While we have observed robust effect of Esr1-KO in scRNAseq experiment which was further validated with FISH experiment, it is possible that there are further heterogeneous Vgat-Esr1 populations in the MPOA which might be differentially targeted in each virally injected sample. To mitigate this, 3-4 mice were pooled for each sample in scRNAseq experiment and in HCR-FISH experiment, in addition to confirming recombinase RNA expression within the MPOA, we included samples with robust Esr1 deletion in the MPOA. Interestingly, due to the technical challenge, Esr1 deletion tends to be more robust than weakly detected recombinase RNA expression (data not shown).”

      (3) The locations of the AAV injections should be characterized because deleting Esr1 in multiple distinct parts of the MPOA will likely confound interpretation. This is especially problematic given the limited number of mice used for parts of the RNAscope analysis.

      We agree that similar to #2, this is an important matter. For HCR experiment, we only included animal with recombinase RNA (Cre or Flp) expression within MPOA. Although the recombinase expression was sufficient enough to qualitatively determine the hit or miss, the detection was weak and it was challenging to determine the extent of viral spread. Thus, we also used successful Esr1 deletion as an additional inclusion criteria for AAV-Cre-YFP group. We have added inclusion criteria in the method and technical consideration in discussion.

      Method

      “For HCR2, AAV was injected unilaterally so that successful targeting of the MPOA with AAVCre-YFP (detection of recombinase RNA within the MPOA) and the deletion of Esr1 were confirmed for inclusion of samples.”

      Discussion

      “3. While we have observed robust effect of Esr1-KO in scRNAseq experiment which was further validated with FISH experiment, it is possible that there are further heterogeneous Vgat-Esr1 populations in the MPOA which might be differentially targeted in each virally injected sample. To mitigate this, 3-4 mice were pooled for each sample in scRNAseq experiment and in HCR-FISH experiment, in addition to confirming recombinase RNA expression within the MPOA, we included samples with robust Esr1 deletion in the MPOA. Interestingly, due to the technical challenge, Esr1 deletion tends to be more robust than weakly detected recombinase RNA expression (data not shown).”

      (4) Although the focus of these experiments on adolescence is welcome, neither the Introduction nor the Discussion do a good job of placing these studies in the context of what is already known about brain maturation during puberty. It is true that this is very much a results focused manuscript, but the scholarship can be improved. Simply stating that your results are consistent with previous reports places an undue burden on the reader to go figure out what is new.

      We agree that contextualizing our study in the scholarship will clarify the novelty and impacts that this study provides to the community. We have updated the introduction adding a review highlighting puberty associated genomic studies in the brain, which are all bulk (brain region level) as well as the very first puberty scRNAseq study in Human testis.

      “Despite the well-established role of these hormones in shaping behavior, the molecular mechanisms underlying their influence on brain development during adolescence are still limited to brain-region level (bulk)[8]in humans and model organisms and adolescent transcriptional dynamics at single cell resolution in the brain remain poorly understood (but see a pioneering study in the human testis[9]).”

      (5) Throughout the manuscript the authors utilize obscure abbreviations, which often makes reading their text overly cumbersome. This is certainly justified in certain instances where complex names of analytical methods are used repeatedly, but the authors are encouraged to try and simplify their use of non-standard abbreviations.

      We agree that this is helpful for readers to have the reference of abbreviations in handy at single location. We added an “abbreviation” section as a reference for readers.

      Medial preoptic area (MPOA)

      Single-cell RNA sequencing (scRNAseq)

      Estrogen receptor 1 (Esr1)

      GABAergic neurons (Vgat+)

      Glutamatergic neurons (Vglut2+)

      Hybridized chain reaction fluorescent in situ hybridization (HCR-FISH)

      Gonadectomized (GDX)

      Partition-based graph abstraction (PAGA)

      Hormone-associated differentially expressed genes (HA-DEGs)

      Multiplexed error-robust fluorescence in situ hybridization (MERFISH) differential gene expression (DE)

      Differentially expressed genes (DEGs)

      Support vector machine (SVM)

      Manifold Enhancement Latent Dimension (MELD)

      Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE)

      Androgen receptor (AR)

      single-cell regulatory network inference (SCENIC)

      Reviewer #3 (Public review):

      We appreciate reviewer for the constructive comments to improve our manuscript.

      Weaknesses:

      We already know that Esr1 is important within GABAergic but not glutamatergic neurons for mating behavior. However, there is not enough data to support the claim that disrupting Esr1 in glutamatergic MPOA neurons "had no observable effect." The MPOA is involved in many behaviors and physiologies that were not investigated. More assays would be required to report "no observable effect."

      The small number of cells included in the transcriptional studies is a general concern, as noted by the authors. This is a particular concern for conclusions related to the role of adolescence in glutamatergic MPOA neurons. The paper reports 24,627 neurons across all treatment groups, which include 3 time points, 2 sexes, and GDX conditions. It seems likely that not much was detected in the glutamatergic neurons because of insufficient power.

      Esr1 knockout is initiated in adolescence, not restricted to adolescence. Do we know that the effects on mating behavior are due to what is happening in adolescence vs. the function of Esr1 in adults? Are the effects different if Esr1 is knocked out in mature adults? This comparison would be important to demonstrate that adolescence is a critical time window for Esr1 function.

      We agree that 1. the relatively mild effects observed in Glutamatergic neurons may be partially due to the scale of the study, and 2. Esr1 deletion is permanent once induced and it is challenging to distinguish adolescent and adult transcriptional dynamics using existing viral strategies.

      We added discussion in the “limitation” section.

      “4. While we have observed robust transcriptional progression in Vgat<sup>+</sup> Esr1<sup>+</sup> neurons during adolescence, we observed more mild alternations in VgluT2<sup>+</sup> neurons. Although the scale of our study is comparable or exceeds prior scRNAseq studies in MPOA[22,29], future larger studies may have more sensitivity to detect adolescent transcriptional dynamics in VgluT2<sup>+</sup> neurons.”

      “5. Although we demonstrated adolescent transcriptional changes were observed as early as P35, and either hormonal deprivation or Esr1 KO in prior to adolescence prevented the transcriptional progression (arrested transcriptional state even at adult), given the viral incubation time and permanent deletion of Esr1 after viral injection, it is challenging to disambiguate the role of Esr1 during adolescence and adult. Future studies injecting the virus at adult may provide additional insights on the similarity and difference between transcriptional changes during puberty and maintained transcriptional states at adult.”

    1. eLife Assessment

      Using the clownfish model, this study examines how growth, feeding, and agonistic behavior result in socially dominant or subordinate states in size- and age-matched individuals of the clownfish, Amphiprion percula. The authors complement this work with whole-body transcriptomics and find significant variation in genes and gene co-expression modules related to growth and satiety-related pathways, as well as ossification-related genes. They provide solid evidence that emerging dominants grow more, eat more, and behave more aggressively than subordinate or solitary individuals; these phenotypic differences are accompanied by distinct gene expression profiles, including variation in growth- and satiety-related pathways. The work is valuable in advancing our understanding of how the social environment regulates phenotypic change; however, claims regarding the mechanistic role of gene expression are only partially supported by the current analyses.

    2. Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting and well-written manuscript on a fascinating question in a "charismatic" model system.

      Strengths:

      1) The Introduction is concise, though it might be helpful to the non-specialist reader to learn a bit more about what is known about the social control of somatic growth across diverse species (including humans), which would help to make this work more generally interesting.

      (2) The experiment is well-designed.

      (3) The data collected are comprehensive.

      (4) The complementary analysis of both feeding and aggression/submission data with and without known social roles is a neat idea and compelling!

      Weaknesses:

      (1) I was surprised that the HPA/stress axis was not considered here at all. Wouldn't we expect that subordinates have increased stress axis activation, which in turn could inhibit their growth and aggressive behavior?

      (2) To what extent are growth, food intake, agonistic behavior, and/or gene expression patterns coordinated across P1 vs P2 pairs? The lack of such an analysis seems like a missed opportunity.

      (3) What was the rationale for using whole bodies for the transcriptome analysis? Given the hypotheses, the forebrain or hypothalamus and certain other organ systems (e.g., liver, gonads, skin, etc.) would have been obvious candidate tissues here. I realize that cost is always a consideration, but maybe a focus on the fore-/midbrain could have been prioritized.

      (4) Given the preceding point, why was a fold-change threshold used for assessing DEGs (supplementary Figure 3)? There is no biological justification to ever use a fold-change threshold, especially in bulk RNA-seq analysis. This is particularly true here, where whole bodies were used for RNA-seq analysis, which is a bit unusual. Relatively small cell populations (such as hypothalamic neurons that regulate growth or food intake) may show substantial gene expression variation across social types, yet will be masked by the masses of other cells in the whole body sample. However, gene expression may still vary significantly, albeit the fold-difference may be small. I therefore suggest a reanalysis that omits any fold-change threshold.

      (5) Why is the analysis of color (hue, saturation) buried in the supplementary materials? Based on the hypotheses that motivated the study, color seems just as relevant as food intake, growth, and agonistic behavior, so even if the results are negative, they should be presented in the main paper.

      (6) The Discussion is sometimes difficult to follow. The authors may want to consider including a conceptual graphic that integrates the different aspects of growth and satiety regulation, etc., into a work-in-progress model of sorts, which would also facilitate clearer hypotheses for future research.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors test growth, behavior, and gene expression in pairs of clownfish as they establish social dominance hierarchies, examining patterns of gene expression in these pairs after dominance has been established. The authors show solid evidence that emerging dominant clownfish show increased growth, aggression, and food consumption compared to their submissive or solitary counterparts, eventually adopting distinct gene expression profiles.

      Major Comments:

      (1) The Introduction is comprehensive, but it could be condensed. Likewise, the discussion could be condensed. There is considerable redundancy between the methods, the results, and the legend in Figure 1. The authors should consolidate and remove the redundancy.

      (2) For Figure 3, the authors are showing PC2 and PC3; why is PC1 not shown? There is so much overlap between the three groups in PC2 vs PC3; it seems unlikely that researchers could conclusively identify any individual as belonging to a group based on the expression profile. The ovals shown do not capture all the points within each of the groups, and particularly the grey S oval seems misaligned with the datapoints shown.

      (3) The authors indicate that the 15 replicates exhibiting the greatest size difference between P1 and P2 were selected for gene profiling. Does this mean that each of the P1 and P2 were pairs with each other? Have the authors tried examining the gene expression patterns in a paired manner? E.g., for the pairs that showed the greatest size differences, do they also show the greatest differences in gene expression? Do the P1s show the most extreme differences from P2s that also show the most extreme P2 differences? Perhaps lines on Figure 3A connecting datapoints from the P1 and P2 pairs would be informative.

      (4) For the specific target pathways that are up- and downregulated in the different backgrounds, I recommend that the authors include boxplots (or heatmaps) showing the actual expression values for these targets. Figure 6 shows a heatmap for appetite-related genes, and it would be great to see a similar graph for the metabolism and glycolysis genes; it would also be informative to see similar graphs for hormonal and sexual maturation pathways as well.

      (5) Particularly given that there is a relatively small number of genes enriched in the different rank conditions, I did not understand the need to do the WGCNA module analysis. I thought that an analysis of GO terms across the dataset would have been more meaningful than the GO term analysis shown in Figure 4, which considers only genes assigned to the "brown WGCNA module". This should be simplified or clarified.

      (6) The authors say that they have identified coordinated changes in behaviors and the "underlying gene expression, leading to the emergence" of social roles. This is a little bit misleading, since the gene expression analysis occurred well after the behavioral and phenotypic differences emerged. Presumably, the hormonal and genetic shifts that actually caused the behavioral and phenotypic difference occurred during the weeks during which the experiment was underway, and earlier capture of the transcriptome would presumably reveal different patterns, and ones that would be considered more causative. The authors acknowledge this in 434-435, but it could be emphasized further.

      (7) The authors have measured a number of differences between the different dominance classes of fish. All these differences were measured relative to the other classes, but in my view, the Solitary group was the closest to a baseline control. So I'm not sure that it is fair to say that "P2 and S individuals showed consistent downregulation of these genes and pathways" (line 401). I encourage the authors to emphasize the differences in gene expression from the "perspective" of the P1 individuals compared to the baseline of P2 and S individuals. Line 474 says that "P2 fish showed significant upregulation" of a number of pathways. It should be very clear what that is compared to (compared to P1, presumably?)

      (8) Along the same lines, the authors say in line 514 that subordinates and solitaries strategically downregulate their growth. I'm not convinced that this is the case: I would consider this growth trajectory to be the default and the baseline. I would interpret that under certain social conditions, a P1 dominant pattern of growth, behavior, and gene expression is allowed to emerge.

    4. Reviewer #3 (Public review):

      Summary:

      The authors tested the hypothesis that interactions among size- and age-matched rivals will lead to the emergence of social roles, accompanied by divergence in four aspects of individual phenotypes: growth, feeding behavior, fighting behaviors, and gene expression in clownfish.

      Strengths:

      The data on growth, feeding rate, and fighting behaviors support the authors' claims.

      Weaknesses:

      Gene analysis conducted in this study is not sufficient to clarify how the relevant genes actually regulate growth and behavior.

      The information obtained from whole-body gene expression analysis is very limited. Various gene expression is associated with the regulation of fighting behaviors, food intake, growth, and metabolism, and these genes are regulated differently across tissues, even within a single individual. Gene expression analysis should be performed separately for each tissue.

      Clownfish undergo sex change depending on social status and body size, as the authors mention in the manuscript. Numerous gene expressions are affected by sex change. It is unclear how this issue was addressed.

    5. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting and well-written manuscript on a fascinating question in a"charismatic" model system.

      Strengths:

      (1) The Introduction is concise, though it might be helpful to the non-specialist reader to learn a bit more about what is known about the social control of somatic growth across diverse species (including humans), which would help to make this work more generally interesting.

      (2) The experiment is well-designed.

      (3) The data collected are comprehensive.

      (4) The complementary analysis of both feeding and aggression/submission data with and without known social roles is a neat idea and compelling!

      Thank you for the positive feedback!

      Here, we investigate phenotypic plasticity associated with the adoption of social roles in the clown anemonefish, with strategic growth being just one aspect of that plasticity. Strategic growth, also known as social control of growth, is a fascinating form of adaptive phenotypic plasticity, whereby individuals modify their growth and size in response to fine-scale changes in social conditions (Buston & Clutton-Brock, 2022). In cooperative breeding systems with high reproductive skew, particularly fishes and mammals (possibly including humans), individuals have been shown to i) increase growth/size on the acquisition of dominant status (Dengler-Crish & Catania, 2007; Johnston et al., 2021; Thorley et al., 2018; Van Schaik & Van Hooff, 1996; Walker & McCormick, 2009), ii) increase growth/size when paired with size matched reproductive rivals (Huchard et al., 2016; Reed et al., 2019; this study), and iii) decrease growth/size to avoid conflict (Buston, 2003; Heg et al., 2004; Wong et al., 2007). While strategic growth is fascinating and clearly occurring in this study, we show coordinated changes of multiple aspects of the phenotype as fish adopt social roles. Therefore, we deliberately framed the Introduction broadly to avoid biasing the reader toward viewing growth as the sole or main driver.

      Weaknesses:

      (1) I was surprised that the HPA/stress axis was not considered here at all. Wouldn't we expect that subordinates have increased stress axis activation, which in turn could inhibit their growth and aggressive behavior?

      We also expected to see the HPA/stress axis activated in subordinates, which is why we carried out a targeted exploration of genes known to play a role in this axis. We did not find any genes that were significantly differentially expressed. We believe that there could be two explanations for this. First, from a methodological perspective, it could be due to our use of a whole-body RNA-seq, which may have masked this signal. Alternatively, the stress axis might play a more complex role than just acting as a simple on/off switch for reduced growth. Its activation may peak when competition over size is at its highest (during week one) or, conversely, it may peak later and help maintain reduced growth once hierarchies are firmly established (particularly after the dominant individual reaches its maximum size). To understand the role of the stress axis, future studies should observe how its activation varies over time. We acknowledge that the absence of a stress‑axis signal and its potential explanations were not clearly discussed in the original manuscript, in the revised version, we will address this issue.

      (2) To what extent are growth, food intake, agonistic behavior, and/or gene expression patterns coordinated across P1 vs P2 pairs? The lack of such an analysis seems like a missed opportunity.

      We had a similar thought. Specifically, we were interested in testing the hypothesis that the final size ratio of pairs, which is indicative of the amount of conflict remaining, would predict gene expression. We examined gene expression within pairs to test for coordinated changes and repeated the analysis, accounting for the pair size ratio. In both cases, we found no clear or consistent pattern within pairs. We will consider including these figures in the Supplementary Materials document.

      (3) What was the rationale for using whole bodies for the transcriptome analysis? Given the hypotheses, the forebrain or hypothalamus and certain other organ systems (e.g.,liver, gonads, skin, etc.) would have been obvious candidate tissues here. I realize that cost is always a consideration, but maybe a focus on the fore-/midbrain could have been prioritized.

      We decided to use whole-body samples for this initial transcriptomic analysis to capture a broad view of gene-expression differences while keeping sequencing costs and sample requirements manageable. We agree with the reviewer that future work should explore specific tissues sampled from individuals at multiple time points to disentangle transcriptomic differences across tissue types.

      (4) Given the preceding point, why was a fold-change threshold used for assessing DEGs (supplementary Figure 3)? There is no biological justification to ever use a fold-change threshold, especially in bulk RNA-seq analysis. This is particularly true here, where wholebodies were used for RNA-seq analysis, which is a bit unusual. Relatively small cell populations (such as hypothalamic neurons that regulate growth or food intake) may show substantial gene expression variation across social types, yet will be masked by the masses of other cells in the whole body sample. However, gene expression may still vary significantly, albeit the fold-difference may be small. I therefore suggest a reanalysis that omits any fold-change threshold.

      We thank the reviewer for this important point, and agree that an arbitrary fold‑change cutoff is inappropriate/unnecessary. It should be noted that this fold-change cut-off was only used in this single figure, and all other analyses used p-values from the entire dataset. We will remove the fold‑change threshold cutoff and correct Supplementary Figure 3, and any corresponding text.

      (5) Why is the analysis of color (hue, saturation) buried in the supplementary materials?Based on the hypotheses that motivated the study, color seems just as relevant as food intake, growth, and agonistic behavior, so even if the results are negative, they should be presented in the main paper.

      We agree that color can be an important social signal, so we included color measurements in our experimental design. However, after careful consideration of the color results, we decided that our experimental timing and husbandry changes introduced multiple confounding factors, preventing us from drawing confident conclusions. Specifically, our fish were ≈1 month old at the transfer from larval to experimental tanks and had already begun to deepen their orange hue, before our experiment. (In the wild, they would settle at two weeks of age, prior to the deepening of the orange hue). Once individuals attain a certain hue, it seems that color development can be halted, but not reversed. The transfer also involved changes in lighting, tank background, and diet, factors known to strongly affect coloration. Our results show a uniform shift in orange hue and saturation across social groups, suggesting that these confounding factors might have dominated changes in hue.

      For transparency, we report the color data in the Supplementary Materials, but we caution against drawing any strong conclusions. In the revised manuscript, we will recommend that future work include a targeted experiment to robustly test for the effect of the adoption of social roles on coloration or the effect of coloration on the adoption of social roles.

      (6) The Discussion is sometimes difficult to follow. The authors may want to consider including a conceptual graphic that integrates the different aspects of growth and satiety regulation, etc., into a work-in-progress model of sorts, which would also facilitate clearer hypotheses for future research.

      Thank you for flagging that parts of the Discussion are a bit difficult to follow. In the revised manuscript, we will work to improve readability of the Discussion. We also appreciate the suggestion of including a conceptual schematic. We will consider whether adding such a graphic will add value to this manuscript or future manuscripts.

      Reviewer #2 (Public review):

      In this manuscript, the authors test growth, behavior, and gene expression in pairs of clownfish as they establish social dominance hierarchies, examining patterns of gene expression in these pairs after dominance has been established. The authors show solid evidence that emerging dominant clownfish show increased growth, aggression, and food consumption compared to their submissive or solitary counterparts, eventually adopting distinct gene expression profiles.

      Major Comments:

      (1) The Introduction is comprehensive, but it could be condensed. Likewise, the discussion could be condensed. There is considerable redundancy between the methods, the results,and the legend in Figure 1. The authors should consolidate and remove the redundancy.

      Thank you for flagging that parts of the manuscript could be condensed, we will work on this as we revise the manuscript.

      (2) For Figure 3, the authors are showing PC2 and PC3; why is PC1 not shown? There is so much overlap between the three groups in PC2 vs PC3; it seems unlikely that researchers could conclusively identify any individual as belonging to a group based on the expression profile. The ovals shown do not capture all the points within each of the groups, and particularly the grey S oval seems misaligned with the datapoints shown.

      We understand the concern raised by the reviewer about the overlap among points in the PCA. We have explored PC1-PC3 and found that PC2 and PC3 showed the clearest, statistically significant clustering by social position, while PC1 did not capture any variation due to social position. We have explored whether other factors might be masking differences, such as genetic relatedness, tank effects, total read count per sample, and found that none of these factors explained sample clustering. Regarding the ellipses shown around the points, they were not intended to capture all points, but rather they show the estimated 95% multivariate t-distribution for that given social group. We will make sure this is clearly explained in the figure legend, and Methods section. In addition, in the revised version, we will show PC1 and PC2, and PC1 and PC3, in the Supplements for transparency.

      (3) The authors indicate that the 15 replicates exhibiting the greatest size difference between P1 and P2 were selected for gene profiling. Does this mean that each of the P1and P2 were pairs with each other? Have the authors tried examining the gene expression patterns in a paired manner? E.g., for the pairs that showed the greatest size differences,do they also show the greatest differences in gene expression? Do the P1s show the most extreme differences from P2s that also show the most extreme P2 differences? Perhaps lines on Figure 3A connecting datapoints from the P1 and P2 pairs would be informative.

      Yes, “15 replicates exhibiting the greatest size difference between P1 and P2 were selected for gene profiling” refers to pairs of P1 and P2, we will make sure this is clearly stated in the revised Methods. Yes, we have explored gene expression data considering the size difference between pairs, and found that it showed no clear differences in gene expression patterns (see earlier response to Reviewer #1). We will consider including these figures in the Supplementary Materials document, as well as adding a version of Figure 3A that clearly shows information on pairs, as suggested by the reviewer.

      (4) For the specific target pathways that are up- and downregulated in the different backgrounds, I recommend that the authors include boxplots (or heatmaps) showing the actual expression values for these targets. Figure 6 shows a heatmap for appetite-related genes, and it would be great to see a similar graph for the metabolism and glycolytic genes; it would also be informative to see similar graphs for hormonal and sexual maturation pathways as well.

      We have explored genes across a broad set of metabolic pathways (glycolysis, TCA cycle, lactic fermentation, PDH complex, cholesterol biosynthesis, fatty-acid synthesis, and beta-oxidation) and show all metabolic genes that showed significant differential expression between P1, P2, and S in Figure 6. Overall, very few metabolism-associated genes were significantly differentially expressed, which is why we decided to combine appetite-regulation and metabolism-associated genes into a single figure (Figure 6). In the revised version, we will ensure that Figure 6 clearly shows the gene sets associated with appetite and metabolism.

      We also examined hormonal pathways (glucocorticoid and thyroid signaling), but did not find genes in these pathways that were significantly differentially expressed. Finally, we would like to clarify that our samples consist of two-month-old juvenile individuals that are sexually immature —under ideal conditions, clown anemonefish can mature in one to two years, but they can also remain sexually immature for a decade or more (Buston & García, 2007) — which is why we did not observe distinct molecular signatures of sexual maturation. We recognize that the sentence at line 520 may be misleading, as we did not identify any gene expression signature that we could confidently associate with signs of sexual maturation. We will make sure that these are clearly stated in the revised version of the manuscript.

      (5) Particularly given that there is a relatively small number of genes enriched in the different rank conditions, I did not understand the need to do the WGCNA module analysis. I thought that an analysis of GO terms across the dataset would have been more meaningful than the GO term analysis shown in Figure 4, which considers only genes assigned to the "brown WGCNA module". This should be simplified or clarified.

      To clarify, GO enrichment analysis does not establish correlations with traits, it only describes which functions or pathways are over-represented in a given gene set. That is why we began by using WGCNA to define gene sets (modules) that are correlated to phenotypes. Our primary rationale for WGCNA was to identify modules of co-expressed genes that show significant statistical correlation with the phenotypes of interest (social role: P1, P2, S; growth; and food intake). Pairwise differential expression analysis (Figure 3B) identified a few hundred significantly differentially expressed genes, but those tests treat genes independently and are not able to help us link coordinated changes of co-expressed genes to phenotypes of interest. Because WGCNA is blind to traits, it first identifies groups of co-expressed genes, which can help resolve gene expression patterns.

      We therefore ran WGCNA on the rlog-transformed dataset to identify modules of co-expressed genes that show significant correlation with phenotypes of interests. For every module that showed such a correlation, we performed GO enrichment and carefully evaluated the resulting GO enrichment trees (see Supplementary Figs. 4–5). The brown module was highlighted in the main text because it was one of the modules with a significant correlation to growth, and its associated GO enrichment showed clear growth-related signals that were not identified in the pairwise differential expression analysis results.

      (6) The authors say that they have identified coordinated changes in behaviors and the"underlying gene expression, leading to the emergence" of social roles. This is a little bit misleading, since the gene expression analysis occurred well after the behavioral and phenotypic differences emerged. Presumably, the hormonal and genetic shifts that actually caused the behavioral and phenotypic difference occurred during the weeks during which the experiment was underway, and earlier capture of the transcriptome would presumably reveal different patterns, and ones that would be considered more causative.The authors acknowledge this in 434-435, but it could be emphasized further.

      We appreciate the reviewer raising this point. In the updated version of the manuscript, we will revise wording to convey that food intake, agonistic behavior, size and growth, and gene expression are all changing continuously, in response to each other and in response to social feedback. An underappreciated aspect of this system (and likely many other systems) is that phenotype (including transcriptome) influences the outcome of social interactions, and the outcome of social interactions influences the phenotype (including the transcriptome). Earlier capture of the transcriptome would reveal different levels of gene expression, reflecting the state of the system at that moment in time.

      (7) The authors have measured a number of differences between the different dominance classes of fish. All these differences were measured relative to the other classes, but in my view, the Solitary group was the closest to a baseline control. So I'm not sure that it is fair to say that "P2 and S individuals showed consistent downregulation of these genes and pathways" (line 401). I encourage the authors to emphasize the differences in gene expression from the "perspective" of the P1 individuals compared to the baseline of P2and S individuals. Line 474 says that "P2 fish showed significant upregulation" of a number of pathways. It should be very clear what that is compared to (compared to P1, presumably?)

      We agree with the reviewer that solitary individuals are the most intuitive baseline. Indeed, the experimental design included solitary fish because we expected they would serve as a useful control. Without social restraint, we anticipated they would show unrestricted growth, feeding, behavior, and associated gene‑expression patterns, similar to dominants.

      We initially ran analyses using solitaries as the baseline, but after examining the results, which showed subordinate‑like characteristics for the solitary individuals, we concluded that solitary individuals are not an ecologically appropriate control for this context. Removing juveniles from a social context and housing them in isolation may be stressful and can affect physiology and behavior in ways that do not reflect a natural baseline. From a life‑history standpoint, solitary living is not the typical state for A. percula.

      For these reasons, we reanalysed the dataset using the dominant (P1) as the reference to enable more ecologically meaningful comparisons (this choice was somewhat arbitrary, subordinates could also have been used as the reference). Given that gene expression is relative, we interpret results from both the dominant (P1) and subordinate (P2) perspectives in the Discussion to provide a complete view. We will clarify wording throughout the manuscript to make it clear that everything is relative (e.g., revising Line 474).

      (8) Along the same lines, the authors say in line 514 that subordinates and solitaries strategically downregulate their growth. I'm not convinced that this is the case: I would consider this growth trajectory to be the default and the baseline. I would interpret that under certain social conditions, a P1 dominant pattern of growth, behavior, and gene expression is allowed to emerge.

      We respectfully disagree with the idea that a single baseline/reference growth trajectory exists for any individual of this species. Growth of individuals is entirely social context-dependent: neither fast nor slow growth represents an inherent baseline. When two size‑matched juveniles meet and compete to establish dominance, accelerated growth is the expected trajectory. By contrast, juveniles joining an existing hierarchy are expected to exhibit reduced growth, which minimizes conflict and facilitates their social integration. Unlike species that show non socially mediated growth trajectories, clown anemonefish do not have a context‑independent growth rate, rather, individuals constantly readjust their growth according to their immediate social environment.

      Therefore, growth trajectories must be considered from the perspective of all group members, because they emerge from interactions among individuals rather than reflecting an intrinsic baseline. In this study, we were interested in the establishment of dominance hierarchy and how individuals adjust their phenotypes during this process. By experimentally pairing size‑matched rivals, both individuals are initially expected to pursue the dominant trajectory, and thus neither individual represents a default state. Instead, the outcome reflects a social decision, after which both individuals reinforce their emerging social roles through coordinated changes.

      Reviewer #3 (Public review):

      Summary:

      The authors tested the hypothesis that interactions among size- and age-matched rivals will lead to the emergence of social roles, accompanied by divergence in four aspects of individual phenotypes: growth, feeding behavior, fighting behaviors, and gene expression in clownfish.

      Strengths:

      The data on growth, feeding rate, and fighting behaviors support the authors' claims.

      Thank you for the positive feedback!

      Weaknesses:

      Gene analysis conducted in this study is not sufficient to clarify how the relevant genes actually regulate growth and behavior.

      The information obtained from whole-body gene expression analysis is very limited.Various gene expression is associated with the regulation of fighting behaviors, food intake, growth, and metabolism, and these genes are regulated differently across tissues,even within a single individual. Gene expression analysis should be performed separately for each tissue.

      We understand the reviewer’s concern about whole‑body transcriptomes and agree that tissue‑specific sampling would provide greater resolution of the mechanisms linking gene expression to growth, agonistic behaviors, and food intake. For this initial study, however, we deliberately chose whole‑body samples to capture a broad, unbiased view of gene expression differences while keeping sequencing costs and sample requirements manageable. We explicitly acknowledge the resulting interpretational limits in the Discussion (lines 464; 529–533), and suggest in the last paragraph that the patterns reported here should be used to build on in future studies exploring targeted, tissue‑specific hypotheses.

      Clownfish undergo sex change depending on social status and body size, as the authors mention in the manuscript. Numerous gene expressions are affected by sex change. It is unclear how this issue was addressed.

      We thank the reviewer for raising this point. Sex change and sexual maturation can indeed drive major transcriptional shifts in clown anemonefish, but our experiment did not encompass such a life‑history transition. All individuals in this experiment were juveniles (≈1 month old at the start, ≈2 months old at the end) and were sexually immature at these ages. Clown anemonefish reach sexual maturation around one to two years under ideal conditions, can delay sexual maturation for years under normal conditions (Buston & García, 2007), and sex change in the genus Amphiprion is known to take over ~5 months (Moyer & Nakazono, 1978). Accordingly, individuals in this study were not sexually mature, and sex change was not biologically plausible over the five-week experimental period of our study. We recognize that the sentence at line 520 may be misleading, as we did not identify any gene expression signature that we could confidently associate with signs of sexual maturation. We will make sure that it is clearly stated that the fish in this study were sexually immature in the revised version.

      References:

      Buston, P. (2003). Forcible eviction and prevention of recruitment in the clown anemonefish. Behavioral Ecology, 14(4), 576–582. https://doi.org/10.1093/beheco/arg036

      Buston, P. M., & García, M. B. (2007). An extraordinary life span estimate for the clown anemonefish Amphiprion percula. Journal of Fish Biology, 70(6), 1710–1719. https://doi.org/10.1111/j.1095-8649.2007.01445.x

      Buston, P., & Clutton-Brock, Tim. (2022). Strategic growth in social vertebrates (WITH REVIEWER COMMENTS). Trends in Ecology & Evolution, 37(8), 694–705. https://doi.org/10.1016/j.tree.2022.03.010

      Dengler-Crish, C. M., & Catania, K. C. (2007). Phenotypic plasticity in female naked mole-rats after removal from reproductive suppression. THE JOURNAL OF EXPERIMENTAL BIOLOGY.

      Heg, D, Bender, N, & Hamilton, I. (2004). Strategic growth decisions in helper cichlids. Proceedings of the Royal Society of London. Series B: Biological Sciences, 271(suppl_6). https://doi.org/10.1098/rsbl.2004.0232

      Huchard, E, English, S, Bell, M B. V., Thavarajah, N, & Clutton-Brock, T. (2016). Competitive growth in a cooperative mammal. Nature, 533(7604), 532–534. https://doi.org/10.1038/nature17986

      Johnston, R A., Vullioud, P, Thorley, J, Kirveslahti, H., Shen, L., Mukherjee, S., Karner, C. M., Clutton-Brock, T, & Tung, J (2021). Morphological and genomic shifts in mole-rat ‘queens’ increase fecundity but reduce skeletal integrity. eLife, 10, e65760. https://doi.org/10.7554/eLife.65760

      Moyer, J. T., & Nakazono, A. (1978). Protandrous Hermaphroditism in Six Species of the Anemonefish Genus Amphiprion in Japan (No. 2). The Ichthyological Society of Japan. https://doi.org/10.11369/jji1950.25.101

      Reed, C., Branconi, R., Majoris, J., Johnson, C., & Buston, P. (2019). Competitive growth in a social fish. Biology Letters, 15(2), 20180737. https://doi.org/10.1098/rsbl.2018.0737

      Thorley, J, Katlein, N, Goddard, K, Zöttl, M, & Clutton-Brock, T. (2018). Reproduction triggers adaptive increases in body size in female mole-rats. Proceedings of the Royal Society B: Biological Sciences, 285(1880), 20180897. https://doi.org/10.1098/rspb.2018.0897

      Van Schaik, C P., & Van Hooff, J A. R. A. M. (1996). Toward an understanding of the orangutan’s social system. In Linda F. Marchant, Toshisada Nishida, & William C. McGrew (Eds.), Great Ape Societies (pp. 3–15). Cambridge University Press. https://doi.org/10.1017/CBO9780511752414.003

      Walker, S P. W., & McCormick, M I. (2009). Sexual selection explains sex-specific growth plasticity and positive allometry for sexual size dimorphism in a reef fish. Proceedings of the Royal Society B: Biological Sciences, 276(1671), 3335–3343. https://doi.org/10.1098/rspb.2009.0767

      Wong, M. Y. L., Buston, P. M., Munday, Philip L., & Jones, Geoffrey P. (2007). The threat of punishment enforces peaceful cooperation and stabilizes queues in a coral-reef fish. Proceedings of the Royal Society B: Biological Sciences, 274(1613), 1093–1099. https://doi.org/10.1098/rspb.2006.0284

    1. eLife Assessment

      This important study highlights the role of MIRO1 in regulating mitochondrial oxidative phosphorylation in smooth muscle cells, a process that appears necessary to sustain their proliferation. Overall, the work provides solid evidence that mitochondrial positioning and function influence vascular disease, although several bioenergetic and mechanistic aspects would benefit from deeper investigation.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.<br /> Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Comments on revisions:

      The authors have adequately addressed all the concerns raised by the reviewers, and the manuscript has been substantially improved

    3. Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses are suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) High-resolution respirometry (Oroboros) to determine mitochondrial ETC activity in permeabilized VSMCs would be informative.

      (2) Therapeutic targeting of MIRO1 failed to prevent neointima formation, however, the technical difficulties of such an experiment is appreciated.

      Comments on revisions:

      The authors have addressed the concerns I previously raised.

    4. Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are useful for understanding the importance of mitochondrial positioning and function in this specific cell type, the main bioenergetic and mechanistic claims are not strongly supported.

      Strengths:

      - This study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.<br /> - This study explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a significant area for both basic and translational biology.<br /> - The use of both in vivo and in vitro systems provides a useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      - Some key bioenergetic aspects may require further investigation.

      Comments on revisions:

      The authors have adequately addressed most of the concerns I raised. I would suggest adding some of the justifications provided to the reviewers to the manuscript to further clarify and aid interpretation of the data, especially for the bioenergetic part (e.g., the proposed interaction with CI components, which might otherwise appear implausible to readers).

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.<br /> Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.

      The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      The revised manuscript includes additional data supporting mitochondrial bioenergetic impairment in MIRO1 knockout VSMCs. Measurements of oxygen consumption rate (OCR), along with Complex I (ETC-CI) and Complex V activity, have been added and analyzed across multiple experimental conditions. Collectively, these findings provide a more comprehensive characterization of the mitochondrial functional state. Following revision, the association between MIRO1 deficiency and impaired Complex I activity is more robust.

      Although the precise molecular mechanism of action remains to be fully elucidated, in this updated version, experiments using a MIRO1 reducing agent are presented with improved clarity

      Although some limitations remain, the authors have addressed nearly all the concerns raised, and the manuscript has substantially improved

      Weaknesses:

      Figure 6: The authors do not address the concern regarding the cristae shape; however, characterization of the cristae phenotype with MIRO1 ΔTM would have strengthened the mechanistic link between MIRO1 and the MIB/MICOS complex

      Although the authors clarified their reasoning, they did not explore in vivo validation of key biochemical findings, which represents a limitation of the current study. While their justification is acknowledged, at least a preliminary exploratory effort could have been evaluated to reinforce the translational relevance of the study.

      Finally, in line with the explanations outlined in the rebuttal, the Discussion section should mention the limits of MIRO1 reducer treatment.

      Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses are suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) High-resolution respirometry (Oroboros) to determine mitochondrial ETC activity in permeabilized VSMCs would be informative.

      (2) Therapeutic targeting of MIRO1 failed to prevent neointima formation, however, the technical difficulties of such an experiment is appreciated.

      Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are useful for understanding the importance of mitochondrial positioning and function in this specific cell type, the main bioenergetic and mechanistic claims are not strongly supported.

      Strengths:

      This study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      This study explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a significant area for both basic and translational biology.

      The use of both in vivo and in vitro systems provides a useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      The proposed link between MIRO1 and respiratory supercomplex biogenesis or function is not clearly defined.

      Completeness and integration of mitochondrial assays is marginal, undermining the strength of the conclusions regarding oxidative phosphorylation.

      We thank the reviewers for their thoughtful and constructive feedback. We appreciate their recognition of our work’s value and the improvements made in this revised version.

      We are particularly grateful to Reviewer 3 for their detailed and insightful comments, which identified errors we (and other reviewers) had unfortunately overlooked. To address these concerns and ensure the manuscript meets the high standards of clarity and rigor we aim for, we have made additional corrections and refinements.

      As part of this process, we conducted a thorough review of the original source files. This was especially important given that the project spanned from 2018 to 2025, and many co-authors have since left their previous positions.

      We appreciate the opportunity to resubmit this manuscript and are confident that these updates fully address the concerns raised by the reviewer and the editorial team.

      Reviewer #3 (Recommendations for the authors):

      (1) I still do not see the data in WB 2G reflecting the quantification in 2H and 2I. Moreover, the authors state they performed 1 additional experiment, but it appears not to have been included in the analysis of 2H and 2I since the graphs remained the same from the last version of the manuscript.

      We apologize for this oversight. The additional experiment has now been incorporated into the analysis for Figures 2H and 2I, and the graphs have been updated accordingly. While we had uploaded the new blot, we inadvertently forgot to update the analysis graphs. Thank you for bringing this to our attention.

      (2) The authors talk several times about "supercomplexes 1 and 2" without testing their precise composition (there is a ton of literature about SC species in several mouse cell types, and separate BN-PAGE immunoblotting of individual MRC complexes would precisely define them in this context)

      We agree with the reviewer that this is an important point. However, structural differences between supercomplexes were outside the scope of this paper, and we did not perform such analyses. That said, examining the precise composition of supercomplexes could be a valuable direction for future work.

      (3) Steady-state levels of MRC subunits do not match the observations from BN-PAGE results. That might be potentially interpreted and explained by the possible accumulation of intermediates but this is not explored.

      We appreciate the reviewer’s observation. There is indeed a strong possibility that differences in the expression of structural components of mitochondrial complexes exist between WT and Miro1 -/- cells. However, in this study, we chose to focus on assessing potential differences in the enzymatic activities of the complexes rather than examining their structural composition. Exploring the accumulation of intermediates and structural differences could be an interesting avenue for future investigations.

      (4) Citrate synthase normalization of kinetic enzyme activities is claimed, yet it is not shown in any graph and no description of the method is provided.

      We sincerely thank the reviewer for pointing out this discrepancy. Upon careful review, we realized that our statement regarding citrate synthase normalization of kinetic enzyme activities in the last revised version was made in error. This was a miscommunication between co-authors, and we did not perform citrate synthase normalization. Instead, the normalization was performed against protein concentration, determined by the BCA assay as described in the manuscript. We regret this oversight and appreciate the opportunity to clarify this.

      (5) Complex I activity is still wrongfully described as NADPH oxidation in the methods

      We corrected this error.

      (6) The authors state 'Thank you for this comment. We believe this is due to a technical issue. Complex IV can be challenging to detect consistently, as its visibility is highly dependent on sample preparation conditions. In this specific case, we suspect that the buffer used during the isolation process may have influenced the detection of Complex IV'. I do not understand this, I find this justification insufficient and not substantiated by any experimental evidence. What buffer has been used for isolation? There are hundreds of protocols for isolation of intact mitochondria and MRC complexes. Also, DDM and digitonin are the gold-standard detergents for MRC complexes isolation and separation via BN-PAGE.

      We thank the reviewer for raising this important point. We have revised the response to clarify the exact experimental conditions and to provide supporting data.

      For BN-PAGE, mitochondrial fractions purified from cultured VSMCs or aortic tissue were prepared using a standard protocol (now explicitly detailed in the Methods). Briefly, mitochondria were resuspended in 6-aminocaproic acid (ACA) buffer containing 750 mM ACA, 50 mM Bis-Tris (pH 7.0), and protease inhibitors. Forty micrograms of mitochondrial protein were solubilized with 1.5% digitonin, using a final detergent-to-protein ratio of 8:1, and incubated on ice for 20 minutes prior to clarification by centrifugation at 16,000 g for 30 minutes at 4°C. Thus, consistent with established standards, digitonin—one of the gold-standard detergents for MRC complex solubilization and BN-PAGE—was used throughout.

      Despite using these widely accepted conditions, we found that detection of fully assembled Complex IV by BN-PAGE was inconsistent, a limitation that has been reported by others and is known to be sensitive to mitochondrial source, tissue type, and solubilization efficiency. To address this directly and avoid over-interpretation, we assessed Complex IV integrity by examining core subunits. As shown in Figure 6—figure supplement 1 (panels B and C), expression levels of MTCO1 and MTCO2, both essential core components of Complex IV, do not differ significantly between WT and Miro1-/- cells, supporting the conclusion that Complex IV abundance is not altered.

      We have revised the manuscript to clarify these methodological details and to explicitly state that conclusions regarding Complex IV are based on subunit analysis rather than BN-PAGE visualization alone.

      (7) Complex V IGA also does not seem to reflect its quantification.

      Thank you for highlighting this concern. To address it, we will include the numerical data alongside the figures to ensure clarity and alignment with our findings. We hope this will provide a more comprehensive understanding and resolve any ambiguity.

      (8) Figure 6 supplement 1, the authors state 'we concentrated on ETC1 and 5 and performed experiments in cells after expression of MIRO1 WT and MIRO1 mutants'. I do not understand, what background is being used? what mutants are being expressed? all the figures refer to Miro1 -/- which is, according to standard genetic nomenclature, a loss-of-function allele (KO).

      Thank you for your comment. To clarify, we first infected MIRO1fl/fl VSMCs with an adenovirus expressing the DNA recombinase Cre or a control adenovirus. Cells infected with the adenovirus expressing Cre are labeled as MIRO1-/- cells. In these MIRO1-/- cells, we then introduced MIRO1 wild type (WT) and MIRO1 mutants via adenoviral expression.

      The mutants include one lacking the transmembrane domain (MIRO1-ΔTM), and another in which the two EF hands of MIRO1 were point-mutated (MIRO1-KK). MIRO1-WT is denoted as Ad WT, the mutant MIRO1-KK as Ad KK, and MIRO1-ΔTM as Ad ΔTM in the figures. We hope this explanation clarifies the experimental background and nomenclature used.

      (9) Figure 6 supplement 1B, no normalization is provided (e.g. VDAC, TOM20 etc.). Interestingly, VDAC is then used to normalize the data in C-D-E-F-G. Also, why is MIRO1 detected in lane 4? Is the mutant stable or not? There is zero signal in A.

      Thank you very much for pointing out that the immunoblot for VDAC1 was missing in Figure 6—Supplement 1B. This figure has been reviewed several times, and unfortunately, this error was not detected. We sincerely apologize for this oversight. We have now revised the figure to include the immunoblot for VDAC1 to address this issue.

      Regarding the detection of MIRO1 in lane 4, we confirm that the "mutant" is not stable. To generate MIRO1 knockout cells, aortic smooth muscle cells from MIRO1fl/fl mice were isolated and cultured, followed by infection with an adenovirus expressing Cre. As these are primary cells and the deletion was induced by Cre expression, the recombination efficiency can vary, which is reflected in the variability observed in lanes 2 and 4 of the immunoblot.

      (10) Why are COX4 levels so low in the 2nd replicate in 7A? the authors 'We also performed anti-VDAC immunoblots on the same membranes as alternative loading control (see image below)'. I could not find the image.

      Thank you for your comment. The second pair of samples in Figure 7A is from a different preparation of mitochondria. In our experimental design, a control sample and a MIRO1 knockdown sample were processed side by side and run next to each other on the immunoblot.

      Regarding the anti-VDAC immunoblot, the image was included in our response to reviewers during the previous revision, as we did not believe it altered the message conveyed by the COX4 blot. However, to ensure clarity and address your concern, we have now included the anti-VDAC immunoblot directly in the figure. We hope this addition resolves any ambiguity and provides further confidence in the data presented.

      (11) The proposed interaction between MIRO1 and NDUFA9 is very difficult to reconcile, as the two proteins reside in distinct mitochondrial compartments. MIRO1 is anchored to the outer mitochondrial membrane (OMM), with its functional domains facing the cytosol, whereas NDUFA9 is a matrix-facing accessory subunit of mitochondrial Complex I, positioned at the interface between the N- and Q-modules.

      We appreciate the reviewer’s comment and agree that MIRO1 and NDUFA9 occupy distinct mitochondrial compartments. MIRO1 is anchored to the outer mitochondrial membrane with cytosol-facing domains, whereas NDUFA9 is a matrix-facing accessory subunit of Complex I at the N/Q-module interface.

      Our data do not suggest a stable, constitutive interaction within intact mitochondria. Rather, the observed association likely reflects an indirect, transient, or context-dependent interaction, potentially occurring during mitochondrial stress, remodeling, or turnover. Such associations may be mediated by multi-protein complexes spanning mitochondrial membranes, dynamic contact sites, or post-lysis interactions detected under experimental conditions. Increasing evidence supports functional coupling between outer mitochondrial membrane proteins and inner membrane or matrix pathways without direct physical binding.

      Additional comments:

      (12) All the raw data should be provided to the readers (uncropped and annotated WB, IHC images, numerical data with statistics applied).

      We agree with the reviewer and appreciate the emphasis on transparency. In accordance with eLife submission requirements, we have provided all raw data. The Source Data files associated with each figure now include uncropped and annotated immunoblots, as well as the numerical source data for all quantified analyses.

      During the compilation of these materials, we were unable to locate the original source files for Figure 2A. The control experiment depicted in the previous version, which demonstrates in vitro recombination, was performed in 2018. However, this experiment was repeated several times throughout the project. Therefore, to ensure the manuscript remains complete, we have replaced this panel with a representative immunoblot from a similar experiment. Additionally, during our review, we discovered a labeling error in Figure 3D and G. We have corrected these figures to ensure accuracy.

      All source files have been provided and carefully labeled to facilitate independent evaluation.

    1. eLife Assessment

      In this important study, Bready et al. investigate how a highly conserved long-range enhancer mediates neural-specific SOX2 regulation during neural differentiation using human neural stem cells. This study has broad appeal to developmental neuroscience; however, the data remain incomplete given the need for homozygous enhancer knockouts and biological replicates in the scRNAseq assays.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors examine how a developmentally regulated cis-regulatory element controls SOX2 expression during neural differentiation of human stem cells. The results suggest that this highly conserved long-range enhancer mediates neural-specific SOX2 regulation and offer insight into the role of promoter-enhancer contacts in this process. Although the findings are interesting, several limitations need to be addressed.

      Strengths:

      A central question in developmental biology is how genes are regulated in a context-dependent manner. SOX2, a major pluripotency factor, is expressed in diverse tissues during development, and therefore understanding the mechanisms that control its spatiotemporal expression is critical. This study addresses this important question by examining the functional relevance of a neural-specific, developmentally regulated SOX2 enhancer and its associated promoter-enhancer contacts in driving gene expression during human neural development. Using multiple model systems and techniques, the authors test the requirement of this enhancer by analyzing SOX2 expression in mutant lines, providing evidence for its role in this process.

      Weaknesses:

      A key limitation of the study is the absence of data from homozygous SOX2 enhancer deletion, which leaves the analysis incomplete and tempers the conclusions that can be drawn. Furthermore, the suitability of teratomas as a model system is questionable, given their limited capacity to recapitulate the spatial patterning, regional specification, and organized developmental processes characteristic of the human forebrain. Finally, the manuscript remains largely descriptive with little mechanistic insight.

    3. Reviewer #2 (Public review):

      Summary:

      The authors use a combination of genomics, genome conformation assays, and CRISPR-mediated deletion to study the transcriptional regulation of the SOX2 gene in human neural stem cells (hNSCs).

      Strengths:

      The authors show that two distal elements, located ~550kb downstream of the SOX2 gene, are important for SOX2 transcription in hNSC. They investigate both the deletion of these elements in established hNSCs and in hNSCs generated by differentiation of human pluripotent stem cells, suggesting these elements are important in both the establishment and maintenance of SOX2 expression in hNSCs.

      Weaknesses:

      Homologous elements have been studied in the mouse genome and have conserved function in mouse NSCs, yet these findings are not mentioned. Inclusion of biological replicates for the scRNA-seq and replicate CRISPR-deleted clones would strengthen the study.

    4. Author Response:

      eLife Assessment

      In this important study, Bready et al. investigate how a highly conserved long-range enhancer mediates neural-specific SOX2 regulation during neural differentiation using human neural stem cells. This study has broad appeal to developmental neuroscience; however, the data remain incomplete given the need for homozygous enhancer knockouts and biological replicates in the scRNAseq assays.

      We thank the expert reviewers and eLife editors Drs. Eade and White for complementing our work and deeming it an “important study” of “broad appeal to developmental neuroscience”. We also acknowledge some of the limitations of our work, including the lack of homozygous deletion of the enhancer element. As we detail below, we tried tirelessly to identify human embryonic stem cell (hESC) clones with homozygous deletions but were unable to. As we speculate in the discussion, this failure may represent a biological property of the enhancer element (possibly an essentiality manifested even in hESCs), or a technical limitation related to the large size (2.7 kb) of the genomic element targeted for deletion. We also clarify that every scRNAseq assay included cells from multiple teratomas.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors examine how a developmentally regulated cis-regulatory element controls SOX2 expression during neural differentiation of human stem cells. The results suggest that this highly conserved long-range enhancer mediates neural-specific SOX2 regulation and offer insight into the role of promoter-enhancer contacts in this process. Although the findings are interesting, several limitations need to be addressed.

      Strengths:

      A central question in developmental biology is how genes are regulated in a context-dependent manner. SOX2, a major pluripotency factor, is expressed in diverse tissues during development, and therefore understanding the mechanisms that control its spatiotemporal expression is critical. This study addresses this important question by examining the functional relevance of a neural-specific, developmentally regulated SOX2 enhancer and its associated promoter-enhancer contacts in driving gene expression during human neural development. Using multiple model systems and techniques, the authors test the requirement of this enhancer by analyzing SOX2 expression in mutant lines, providing evidence for its role in this process.

      We thank the reviewer for highlighting the significance of our work in the field of developmental biology.

      Weaknesses:

      A key limitation of the study is the absence of data from homozygous SOX2 enhancer deletion, which leaves the analysis incomplete and tempers the conclusions that can be drawn. Furthermore, the suitability of teratomas as a model system is questionable, given their limited capacity to recapitulate the spatial patterning, regional specification, and organized developmental processes characteristic of the human forebrain. Finally, the manuscript remains largely descriptive with little mechanistic insight.

      We appreciate the reviewer’s disappointment with lack of data from a homozygous SOX2 enhancer deletion. We too felt disappointed when we started genotyping our hESC clones. In fact, we spent a year screening multiple hESC clones for a homozygous deletion but were unable to find one. We performed several assays to better characterize the heterozygous clones, including Sanger sequencing, whole-genome sequencing (WGS) and fluorescent in situ hybridization (FISH). All assays pointed in the direction of hemizygous deletion. We do not understand the reasons for the absence of homozygous deletion clones. One possibility is that homozygous deletion of the enhancer is selected against in hESCs, thus preventing growth of colonies. Another possibility is the technical challenge of achieving a large deletion (2.7 kb) in hESCs. We also entertained the possibility of the excised enhancer being excised from the genome but retained as extrachromosomal (ec) DNA, thus producing the hemizygous genotype. However, several assays, such as FISH and PCR diagnostics, argued against this possibility.

      The teratoma assay was chosen as an in vivo metric of spontaneous differentiation of hESCs into the three germ layers, because our overarching hypothesis was that perturbing the enhancer element and 3D chromatin loop regulating SOX2 transcription would impair specification of neuroectodermal precursors. We believe that teratomas offer an opportunity to allow pluripotent cells to declare any predilections toward germ layers in unbiased fashion. Importantly, we did not rely solely on teratomas to assess effects of our genomic perturbations on specification of neuroectoderm, but also pursued cerebral organoids as an orthogonal approach focused on the tissue of interest, the central nervous system.

      Our work does not only describe an important mechanism for regulation of SOX2 transcription in the transition from pluripotency to neuroectodermal specification, but also provides mechanistic insight into the question of whether the developmentally co-regulated activation of the enhancer and formation of the 3D chromatin loop are dependent on each other. Our findings indicate that the two processes occur independently of each other, as evidenced by the fact that the enhancer is uncoupled from chromatin folding, as occurs when the adjacent CTCF motif is deleted. This finding raises the possibility that enhancer activation occurs through yet to be determined transcriptional events, and that establishment of the local 3D chromatin architecture helps fine-tune its influences in the Topologically Associating Domain (TAD) of interest.

      We are further pursuing mechanisms that regulate activation of the enhancer within neuroectodermal lineages and may explain its actions on genomic elements other than the SOX2 locus within the relevant TAD. We are also investigating reasons explaining why hemizygous enhancer deletion produces stronger phenotypes than deletion of the CTCF motif that helps stabilize the 3D chromatin loop.

      Reviewer #2 (Public review):

      Summary:

      The authors use a combination of genomics, genome conformation assays, and CRISPR-mediated deletion to study the transcriptional regulation of the SOX2 gene in human neural stem cells (hNSCs).

      Strengths:

      The authors show that two distal elements, located ~550kb downstream of the SOX2 gene, are important for SOX2 transcription in hNSC. They investigate both the deletion of these elements in established hNSCs and in hNSCs generated by differentiation of human pluripotent stem cells, suggesting these elements are important in both the establishment and maintenance of SOX2 expression in hNSCs.

      We thank the reviewer for appreciating the importance of this regulatory mechanism in the establishment and maintenance of SOX2 expression in the human neural lineage.

      Weaknesses:

      Homologous elements have been studied in the mouse genome and have conserved function in mouse NSCs, yet these findings are not mentioned. Inclusion of biological replicates for the scRNA-seq and replicate CRISPR-deleted clones would strengthen the study.

      We appreciate the recommendation of the reviewer to better acknowledge prior work in mouse neural development. We will ensure full acknowledgment of these studies in the revised manuscript.

      We also appreciate the suggestion for biological replicates in our scRNA-seq assays. We clarify that each scRNA-seq arose from combining multiple teratomas from each experimental group, thus ensuring that findings reflect reproducible biology rather than isolated findings from single teratomas. This clarification will be emphasized in the revised manuscript.

      Finally, we absolutely agree with the reviewer that more CRISPR-deleted clones would have strengthened the study. Unfortunately, we realized that characterization of each clone takes multiple years and addition of more clones would have made the study too lengthy.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of short-term plasticity mechanisms by providing evidence for release-independent low-frequency synaptic depression that reflects a redistribution of vesicles within the readily releasable pool, via a reduction in docking site occupancy due to vesicle undocking. The evidence supporting this model is convincing, with rigorous electrophysiological and computational analysis. The work will be of broad interest to cellular neuroscientists and synaptic physiologists.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the mechanisms of low-frequency synaptic depression at cerebellar parallel fiber to interneuron synapses using unitary recordings that allow direct quantification of synaptic vesicle release. They show that sparse stimulation can induce robust synaptic depression even in the absence of substantial vesicle consumption, and that this depressed state is rapidly reversed when stimulation frequency is increased. To account for these observations, the authors propose a model in which low-frequency depression reflects a redistribution of vesicles within the readily releasable pool, in particular, a reduction in docking site occupancy due to vesicle undocking.

      Strengths:

      I found the experimental work to be of high quality throughout. The use of simple synapse recordings to count individual vesicle release events is particularly powerful in this context and allows questions to be addressed that are difficult to approach with more conventional approaches. The demonstration that low-frequency depression can occur independently of prior vesicle release, together with the rapid recovery observed during high-frequency stimulation, places strong constraints on possible underlying mechanisms and represents a clear strength of the study.

      The modeling framework is clearly laid out and helps organize a broad set of observations across stimulation frequencies. Several of the experimental tests appear well-motivated by the model, including the recovery train experiments, the analysis of failures, and the use of doublet stimulation. Taken together, the data provide a coherent phenomenological description of low-frequency depression and its relationship to vesicle availability within the readily releasable pool.

      Weaknesses:

      While the experimental results are strong, the manuscript would benefit from rebalancing the strength of the mechanistic conclusions drawn from the modeling in light of its limitations. The framework is clearly useful and provides a coherent interpretation of the data, but it is not uniquely constrained by the experimental observations, and alternative models or interpretations could plausibly account for the findings. The use of different model regimes concatenated across time, with substantially different parameter values, highlights the abstract nature of the approach. For these reasons, the model seems best presented as one plausible explanatory framework rather than a definitive biological mechanism. Clarifying the distinction between data-driven observations and model-based inferences would help readers assess which conclusions are strongly supported and which remain more speculative.

      The interpretation of the Ca2+-related experiments would benefit from more cautious wording. The absence of detectable changes in presynaptic Ca2+ signals does not exclude more localized or subtle Ca2+-dependent mechanisms, and conclusions regarding Ca2+ independence should therefore be framed accordingly. In addition, while low-frequency depression is still observed at reduced extracellular Ca2+, these experiments appear less diagnostic of the specific model-derived mechanism emphasized elsewhere in the manuscript - namely, a selective reduction in docking-site occupancy - and should be discussed with appropriate qualification in the text.

      Major points:

      (1) Clarify and qualify mechanistic claims derived from the model.

      Throughout the manuscript, changes in model parameters are at times described as if they directly reflected underlying physiological mechanisms. As a result, the conceptual distinction between experimentally observed phenomena, model-derived variables, and biological interpretation is not always clear. Several conclusions in the Results and Discussion are phrased as mechanistic statements, although they rest on assumptions intrinsic to the modeling framework. The authors should systematically review the text and explicitly distinguish between (i) experimentally observed changes in synaptic responses and (ii) inferences about vesicle docking states or transitions within the model.

      In particular, statements implying that vesicle undocking is the mechanism underlying low-frequency depression should be rephrased to reflect that this is an interpretation within the proposed framework rather than a uniquely demonstrated biological process. For example, statements such as "Low-frequency depression is caused by synaptic vesicle undocking" should be replaced with formulations such as "Within the framework of our model, low-frequency depression is accounted for by a redistribution of synaptic vesicles away from docking sites" or "Our results are consistent with a model in which changes in vesicle docking-state occupancy contribute to low-frequency depression."

      A particularly problematic example is the statement that "these experiments further confirm that LFD only involves a decrease in δ, without accompanying changes in ρ or IP size." Here, an experimentally defined phenomenon (LFD) is directly equated with changes in model-derived variables. Such statements should be revised to make clear that δ, ρ, and IP size are inferred quantities within the model, and that the experimental data are interpreted through this framework rather than directly confirming changes in these parameters. Similarly, over-generalizing statements such as "Undocking therefore represents the key mechanism controlling short-term depression across stimulation frequencies" should be softened to reflect that this conclusion emerges from the model rather than from direct experimental evidence.

      (2) Address the biological interpretation of time-dependent model regimes.

      The model relies on distinct parameter regimes applied at different time points, with some transitions effectively suppressed in certain regimes. While this approach captures the data well, its biological interpretation remains unclear. The authors should either (i) expand the discussion to outline plausible biological processes that could give rise to such regime changes (for example, calcium-dependent modulation of transition rates or activity-dependent changes in vesicle state stability), or (ii) more explicitly frame this aspect of the model as a descriptive abstraction rather than a mechanistic proposal. This further underscores the need to clearly separate the descriptive role of the model from claims about underlying biological mechanisms.

      (3) Reframe conclusions drawn from calcium-related experiments.

      The calcium imaging data demonstrate no detectable changes in the measured presynaptic calcium signals under the tested conditions, but they do not rule out that calcium signals contribute in ways undetectable by the assay. Conclusions should therefore be revised to reflect this limitation, avoiding statements that exclude a role for calcium-dependent mechanisms. Wording such as "we did not detect evidence for..." would be more appropriate than conclusions implying the absence of an effect.

      Similarly, while low-frequency depression is still observed at reduced extracellular calcium (1.5 mM Ca²⁺), the specific mechanistic signature emphasized elsewhere in the manuscript - namely a selectively reduced first response during a high-frequency recovery train - is no longer apparent. These experiments should therefore be discussed as consistent with the proposed framework, but not as providing independent support for a selective reduction in docking-site occupancy. Explicitly acknowledging this limitation would improve clarity and avoid over-interpreting these data.

      (4) Soften interpretations based on non-significant comparisons.

      In several places, comparisons that do not reach statistical significance are used to argue for equivalence between conditions (for example, comparisons involving failure versus non-failure trials or different LFD conditions). These conclusions should be revised to emphasize the limits of statistical power and framed as a lack of evidence for a difference rather than evidence of independence.

    3. Reviewer #2 (Public review):

      Summary:

      Silva and co-workers exploit their previously established methods of analyzing release events at single parallel fiber to molecular layer interneuron synapses. They observed synaptic depression at low transmission frequencies (< 5 Hz), which rapidly recovers during high-frequency transmission. Analysis of the time course of low-frequency depression revealed an initial rapid and a slow linearly increasing time course. Strikingly, the initial depression occurred even in the absence of preceding release, arguing against vesicle depletion as the underlying mechanism.

      Strengths:

      The main strength of the study is the careful demonstration of an interesting synaptic phenomenon challenging the classical vesicle-centered interpretation of synaptic depression.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      The finding of release-independent synaptic depression is important and would have widespread implications. Therefore, some more analyses to increase the confidence in these findings could be performed.

      My concern is whether rundown could explain the findings. If the rate of failures in s1 increases and at the same time the amplitude decreases during the experiments, an apparent depression in s2 could arise. The Supplementary Figure 5A addresses run-down, but the figure is not easy to understand, and, as far as I understood, it does not address the question of whether the release-independent depression could be caused by a rundown. To address this, the analysis of Figure 5 could be repeated by investigating the failure rate and amplitude separately or by analyzing the 1st and 2nd half of the recordings separately.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript builds on the observation that, at some synapses, low-frequency stimulation causes synaptic depression, which can be reversed by subsequent high-frequency stimulation. Such low-frequency depression (LFD) cannot be easily explained by the depletion of a single vesicle pool. Here, Silva and colleagues propose a model of activity-dependent vesicle trafficking to explain LFD at synapses between cerebellar granule cells and molecular layer interneurons.

      Strengths:

      Overall, LFD is interesting and worthy of examination, and the authors provide new experimental results that are of the high quality expected from this group.

      Weaknesses:

      The study proposes a novel model of vesicle trafficking that is not explained by known biological mechanisms, and the manuscript does not adequately compare or discuss alternative models.

      I have several concerns about how the authors interpret the data. First, the manuscript's primary conceptual advance is the idea that LFD involves vesicle undocking, rather than depletion. However, most experiments were performed under conditions that promote vesicle depletion (3 mM extracellular Ca2+). When experiments were repeated in physiological Ca2+, there appeared to be little or no LFD (stats are not provided). Second, the RS/DS/DU/undocking model, though not outside the realm of possibility, is not readily explained by known mechanisms and is only loosely supported by experimental findings. Third, when simulating LFD, the authors do not compare alternative models and use inappropriate language to imply that a model fit represents the truth (e.g., "the finding of identical experimental and simulated values confirms that the undocking mechanism accounts for LFD"). Finally, the model is presented in an overly complicated manner. The sheer amount of terms and nomenclature makes the manuscript confusing and difficult to read. Overall, the manuscript would benefit from added experiments and more statistics, a better justification and evaluation of the model, and more nuanced language.

      Major concerns:

      (1) Most experiments were performed under conditions that exacerbate depletion

      In order to attribute LFD to vesicle undocking rather than depletion, it is important to show LFD under conditions where depletion is minimal. As mentioned above, the authors only report significant LFD in elevated extracellular Ca2+. In a small number of experiments performed in more physiological Ca2+ (1.5 mM), there is no depression after a single stimulus, and it is not clear that there was statistically significant depression during a low-frequency train. Several studies cited in support of LFD share this problem:

      • Abrahamsson et al., (2007) recorded from Schaffer collaterals in 4 mM Ca, 3-4X physiological Ca2+.

      • Doussau et al., (2010) recorded from aplysia synapses in 3X Ca compared to seawater.

      • Rudolph et al., (2011) is cited as an example of LFD. However, this study performed experiments at high release probability cerebellar climbing fibers, and reported depression that increased monotonically with

      stimulation frequency, so it does not resemble the phenomenon studied in this paper. Lin et al., (2022) also largely describe monotonic depression at the calyx.

      The authors note that their results differ from those of Atluri and Regehr, but do not mention that a possible reason for the difference is the increased release probability in their experiments.

      The authors should provide statistics for the data obtained in 1.5 mM Ca, and discuss why LFD is increased in conditions that also elevate vesicle release probability.

      (2) Lack of biological mechanisms supporting the model

      The model is presented without compelling biological support. The evidence in support of vesicle undocking comes from experiments by the Watanabe lab, which showed fewer-than-expected docked vesicles under EM when cultured synapses were stimulated immediately prior to high-pressure freezing. Kusick et al were careful to note that these vesicles may have been lost to fusion.

      The putative undocking Kusick describes is immediate (< 5 ms after stimulation), and was not shown to be Ca2+ sensitive. This manuscript describes "calcium-dependent undocking" that proceeds from 10 ms - 200 ms. Multiple studies from the Watanabe lab show that a single stimulus lowers the number of docked vesicles, and subsequently, there is a transient redocking of vesicles that can be blocked by EGTA or Syt7 knockout.

      I also question the rationale for the authors' model that 2 vesicles are coupled in series to a single release site. Previous papers from this lab cited EM studies from frog and neuromuscular that showed filamentous connections between vesicles (do these synapses show LFD?). Here, the authors primarily cite their previous models to support their arguments. I encourage them to continue searching for ultrastructural evidence for 2-vesicle-docking-units and to cite such studies.

      (3) Comparison to other vesicle models

      The authors use overly assertive language to suggest that the model proves a mechanism. "Altogether, these results indicate that the slow phase of LFD ... reflects a δ decrease without significant changes in pr, in ρ or in IP size". Simulating data does not conclusively "indicate" the underlying mechanism, but the authors could state their data can be "explained by a model where..".

      However, LFD does not require activity-dependent undocking. Instead, the phenomenon has been explained by high-release probability, paired with an activity-dependent increase in either docking or release probability (Chiu and Carter, 2024; Doussau et al., 2017). Does the new model do a better job of replicating some facet of the data? If multiple models can explain the same data, how can we determine which model is correct? The "Alternative Presynaptic Depression Mechanisms" should be expanded to discuss these issues.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the mechanisms of low-frequency synaptic depression at cerebellar parallel fiber to interneuron synapses using unitary recordings that allow direct quantification of synaptic vesicle release. They show that sparse stimulation can induce robust synaptic depression even in the absence of substantial vesicle consumption, and that this depressed state is rapidly reversed when stimulation frequency is increased. To account for these observations, the authors propose a model in which low-frequency depression reflects a redistribution of vesicles within the readily releasable pool, in particular, a reduction in docking site occupancy due to vesicle undocking.

      Strengths:

      I found the experimental work to be of high quality throughout. The use of simple synapse recordings to count individual vesicle release events is particularly powerful in this context and allows questions to be addressed that are difficult to approach with more conventional approaches. The demonstration that low-frequency depression can occur independently of prior vesicle release, together with the rapid recovery observed during high-frequency stimulation, places strong constraints on possible underlying mechanisms and represents a clear strength of the study.

      The modelling framework is clearly laid out and helps organize a broad set of observations across stimulation frequencies. Several of the experimental tests appear well-motivated by the model, including the recovery train experiments, the analysis of failures, and the use of doublet stimulation. Taken together, the data provide a coherent phenomenological description of low-frequency depression and its relationship to vesicle availability within the readily releasable pool.

      We thank the Reviewer for his positive assessment of our work.

      Weaknesses:

      While the experimental results are strong, the manuscript would benefit from rebalancing the strength of the mechanistic conclusions drawn from the modelling in light of its limitations. The framework is clearly useful and provides a coherent interpretation of the data, but it is not uniquely constrained by the experimental observations, and alternative models or interpretations could plausibly account for the findings. The use of different model regimes concatenated across time, with substantially different parameter values, highlights the abstract nature of the approach. For these reasons, the model seems best presented as one plausible explanatory framework rather than a definitive biological mechanism. Clarifying the distinction between data-driven observations and model-based inferences would help readers assess which conclusions are strongly supported and which remain more speculative.

      The interpretation of the Ca<sup>2+</sup>-related experiments would benefit from more cautious wording. The absence of detectable changes in presynaptic Ca<sup>2+</sup> signals does not exclude more localized or subtle Ca<sup>2+</sup>-dependent mechanisms, and conclusions regarding Ca<sup>2+</sup> independence should therefore be framed accordingly. In addition, while low-frequency depression is still observed at reduced extracellular Ca<sup>2+</sup>, these experiments appear less diagnostic of the specific model-derived mechanism emphasized elsewhere in the manuscript - namely, a selective reduction in docking-site occupancy - and should be discussed with appropriate qualification in the text.

      Concerning Ca<sup>2+</sup> signals, the Reviewer is right. While we found no change in Ca<sup>2+</sup> signalling apart from a slow Ca<sup>2+</sup> accumulation during long trains at 1 Hz, the possibility of an undetected change cannot be excluded. We have added a word of caution in this direction on p. 11. Concerning the 1.5 mM Ca<sup>2+</sup> experiments, the Reviewer presumably alludes to the first recovery train (yellow) point in Supplementary Fig. 2C. This is also the last point (s11) of the slow train at 0.5 Hz because no delay at all was interposed between the slow train and the recovery train. We have now included one more experiment (with a present total number n = 6), and we have corrected Fig. S2C accordingly. In the new version the depression measured for s4-s10 vs s1 during the 0.5 Hz trains is 0.69 +/- 0.05 (p = 0.00058, paired one-tail t-test). The ratio of the s1 value of the recovery train compared to control s1 is 0.83 +/- 0.08 (p = 0.028, paired one-tail t-test).

      Major points:

      (1) Clarify and qualify mechanistic claims derived from the model.

      Throughout the manuscript, changes in model parameters are at times described as if they directly reflected underlying physiological mechanisms. As a result, the conceptual distinction between experimentally observed phenomena, model-derived variables, and biological interpretation is not always clear. Several conclusions in the Results and Discussion are phrased as mechanistic statements, although they rest on assumptions intrinsic to the modelling framework. The authors should systematically review the text and explicitly distinguish between (i) experimentally observed changes in synaptic responses and (ii) inferences about vesicle docking states or transitions within the model.

      In particular, statements implying that vesicle undocking is the mechanism underlying low-frequency depression should be rephrased to reflect that this is an interpretation within the proposed framework rather than a uniquely demonstrated biological process. For example, statements such as "Low-frequency depression is caused by synaptic vesicle undocking" should be replaced with formulations such as "Within the framework of our model, low-frequency depression is accounted for by a redistribution of synaptic vesicles away from docking sites" or "Our results are consistent with a model in which changes in vesicle docking-state occupancy contribute to low-frequency depression."

      A particularly problematic example is the statement that "these experiments further confirm that LFD only involves a decrease in δ, without accompanying changes in ρ or IP size." Here, an experimentally defined phenomenon (LFD) is directly equated with changes in model-derived variables. Such statements should be revised to make clear that δ, ρ, and IP size are inferred quantities within the model, and that the experimental data are interpreted through this framework rather than directly confirming changes in these parameters. Similarly, overgeneralizing statements such as "Undocking therefore represents the key mechanism controlling short-term depression across stimulation frequencies" should be softened to reflect that this conclusion emerges from the model rather than from direct experimental evidence.

      As suggested, we clarify the distinction in the revised version between experimental data and modelling, and we refrain from making definitive statements on underlying cellular mechanisms.

      (2) Address the biological interpretation of time-dependent model regimes.

      The model relies on distinct parameter regimes applied at different time points, with some transitions effectively suppressed in certain regimes. While this approach captures the data well, its biological interpretation remains unclear. The authors should either (i) expand the discussion to outline plausible biological processes that could give rise to such regime changes (for example, calcium-dependent modulation of transition rates or activity-dependent changes in vesicle state stability), or (ii) more explicitly frame this aspect of the model as a descriptive abstraction rather than a mechanistic proposal. This further underscores the need to clearly separate the descriptive role of the model from claims about underlying biological mechanisms.

      We thank the Reviewer for drawing our attention to this important point. Below 10 ms, rate constants are largely determined by the large amplitude, fast decaying Ca<sup>2+</sup> signal occurring near voltage-dependent Ca<sup>2+</sup> channels (‘Ca<sup>2+</sup> nanodomain’). After 10 ms, the rate constants depend on the low amplitude, slowly decaying Ca<sup>2+</sup> signals averaged over the entire varicosity (‘volume-averaged Ca<sup>2+</sup>’). We explain this better in the revised version (Materials and Methods, p. 21).

      (3) Reframe conclusions drawn from calcium-related experiments.

      The calcium imaging data demonstrate no detectable changes in the measured presynaptic calcium signals under the tested conditions, but they do not rule out that calcium signals contribute in ways undetectable by the assay. Conclusions should therefore be revised to reflect this limitation, avoiding statements that exclude a role for calcium-dependent mechanisms. Wording such as "we did not detect evidence for..." would be more appropriate than conclusions implying the absence of an effect.

      Similarly, while low-frequency depression is still observed at reduced extracellular calcium (1.5 mM Ca<sup>2+</sup>), the specific mechanistic signature emphasized elsewhere in the manuscript - namely a selectively reduced first response during a high-frequency recovery train - is no longer apparent. These experiments should therefore be discussed as consistent with the proposed framework, but not as providing independent support for a selective reduction in docking-site occupancy. Explicitly acknowledging this limitation would improve clarity and avoid overinterpreting these data.

      This has been discussed above (‘weaknesses’).

      (4) Soften interpretations based on non-significant comparisons.

      In several places, comparisons that do not reach statistical significance are used to argue for equivalence between conditions (for example, comparisons involving failure versus non-failure trials or different LFD conditions). These conclusions should be revised to emphasize the limits of statistical power and framed as a lack of evidence for a difference rather than evidence of independence.

      We have attended this point in the revised version.

      Reviewer #2 (Public review):

      Summary:

      Silva and co-workers exploit their previously established methods of analyzing release events at single parallel fiber to molecular layer interneuron synapses. They observed synaptic depression at low transmission frequencies (< 5 Hz), which rapidly recovers during high-frequency transmission. Analysis of the time course of low-frequency depression revealed an initial rapid and a slow linearly increasing time course. Strikingly, the initial depression occurred even in the absence of preceding release, arguing against vesicle depletion as the underlying mechanism.

      Strengths:

      The main strength of the study is the careful demonstration of an interesting synaptic phenomenon challenging the classical vesicle-centered interpretation of synaptic depression.

      We thank the Reviewer for his positive assessment of our work.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      The finding of release-independent synaptic depression is important and would have widespread implications. Therefore, some more analyses to increase the confidence in these findings could be performed.

      My concern is whether rundown could explain the findings. If the rate of failures in s1 increases and at the same time the amplitude decreases during the experiments, an apparent depression in s2 could arise. The Supplementary Figure 5A addresses run-down, but the figure is not easy to understand, and, as far as I understood, it does not address the question of whether the release-independent depression could be caused by a rundown. To address this, the analysis of Figure 5 could be repeated by investigating the failure rate and amplitude separately or by analyzing the 1st and 2nd half of the recordings separately.

      The Reviewer makes a very important point that had escaped our attention. If the responses were declining over the course of an experiment, near the end of the recordings, a high proportion of failures would be associated with a weak response to the second AP. This could distort the relation between initial failures and amount of LFD, perhaps to the point of indicating LFD after failures when there were none. As suggested by the Reviewer, we tested this possibility by examining the stability of the synaptic responses during experiments. We found a mean s<sub>1</sub> value of 0.87 ± 0.13 for the first half of the experiments used in Fig. 5, and of 1.10 ± 0.17 for the second half (p > 0.05, n = 10). This analysis shows that there was no rundown during these experiments. We show in Author response image 1 a plot of s1 as a function of the number of experiments. These plots do not suggest any artefactual correlation between failures, mean s1, and rundown.

      Author response image 1.

      Plot of s1 as a function of train number for the experiments of Fig. 5. In response to a request of Reviewer 2, this figure illustrates the evolution of s1 values as a function of train number for the experiments used to produce Figure 5. In each experiment, about 20 s1 values were obtained at two ISIs (either 10 ms and 500 ms, or 800 ms and 1600 ms). The figure shows two examples of s1 values as a function of train number (these values fluctuate widely between 0 and 3), and the average across cells and ISI values. There is no indication of a rundown of S1 values as a function of train number

      Reviewer #3 (Public review):

      Summary:

      The manuscript builds on the observation that, at some synapses, low-frequency stimulation causes synaptic depression, which can be reversed by subsequent high-frequency stimulation. Such low-frequency depression (LFD) cannot be easily explained by the depletion of a single vesicle pool. Here, Silva and colleagues propose a model of activity-dependent vesicle trafficking to explain LFD at synapses between cerebellar granule cells and molecular layer interneurons.

      Strengths:

      Overall, LFD is interesting and worthy of examination, and the authors provide new experimental results that are of the high quality expected from this group.

      Weaknesses:

      The study proposes a novel model of vesicle trafficking that is not explained by known biological mechanisms, and the manuscript does not adequately compare or discuss alternative models.

      I have several concerns about how the authors interpret the data. First, the manuscript's primary conceptual advance is the idea that LFD involves vesicle undocking, rather than depletion. However, most experiments were performed under conditions that promote vesicle depletion (3 mM extracellular Ca<sup>2+</sup>). When experiments were repeated in physiological Ca<sup>2+</sup>, there appeared to be little or no LFD (stats are not provided). Second, the RS/DS/DU/undocking model, though not outside the realm of possibility, is not readily explained by known mechanisms and is only loosely supported by experimental findings. Third, when simulating LFD, the authors do not compare alternative models and use inappropriate language to imply that a model fit represents the truth (e.g., "the finding of identical experimental and simulated values confirms that the undocking mechanism accounts for LFD"). Finally, the model is presented in an overly complicated manner. The sheer amount of terms and nomenclature makes the manuscript confusing and difficult to read. Overall, the manuscript would benefit from added experiments and more statistics, a better justification and evaluation of the model, and more nuanced language.

      We respectfully disagree with these sweeping criticisms, as described in more detail below.

      Major concerns:

      (1) Most experiments were performed under conditions that exacerbate depletion

      In order to attribute LFD to vesicle undocking rather than depletion, it is important to show LFD under conditions where depletion is minimal. As mentioned above, the authors only report significant LFD in elevated extracellular Ca<sup>2+</sup>. In a small number of experiments performed in more physiological Ca<sup>2+</sup> (1.5 mM), there is no depression after a single stimulus, and it is not clear that there was statistically significant depression during a low-frequency train. Several studies cited in support of LFD share this problem:

      - Abrahamsson et al., (2007) recorded from Schaffer collaterals in 4 mM Ca, 3-4X physiological Ca<sup>2+</sup>.

      - Doussau et al., (2010) recorded from Aplysia synapses in 3X Ca compared to seawater.

      - Rudolph et al., (2011) is cited as an example of LFD. However, this study performed experiments at high release probability cerebellar climbing fibers, and reported depression that increased monotonically with stimulation frequency, so it does not resemble the phenomenon studied in this paper. Lin et al., (2022) also largely describe monotonic depression at the calyx.

      The Reviewer suggests that LFD may only occur under non-physiological conditions, if the release probability has been increased by artificially elevating the extracellular Ca<sup>2+</sup>. The implication is that LFD is at best a curiosity with little or no significance for brain signalling. We disagree with this point of view for several reasons.

      Concerning the statement ‘In order to attribute LFD to vesicle undocking rather than depletion, it is important to show LFD under conditions where depletion is minimal’: This is the purpose of the analysis shown in Fig. 5.

      The statement ‘the authors only report significant LFD in elevated extracellular Ca<sup>2+</sup>’ is inaccurate. Fig. S2C shows a clear LFD in 1.5 mM Ca<sup>2+</sup>, as acknowledged by Reviewer 1 (‘low-frequency depression is still observed at reduced extracellular Ca<sup>2+</sup>’). However, we failed to provide a p-value for the depression in the initial version of the paper (p = 0.004, n = 5, with this data set; paired t-test, one-tailed). In the revised version, we document the 1.5 mM results more extensively, including the incorporation of the results of an additional experiment, and an explicit statistical analysis of the data (p = 0.00058, n = 6; paired t-test, one-tailed).

      Concerning the statement ‘there is no depression after a single stimulus’: We find that the onset kinetics of LFD is slower in 1.5 Ca<sup>2+</sup> than in 3 Ca<sup>2+</sup> (respectively 1.8 ISI and 0.51 ISI, Fig. 2C and Fig. S2C). This explains that the PPR is not significantly <1 in 1.5 Ca<sup>2+</sup> without implying any weakening of extent of LFD at steady state.

      As explained in the manuscript (p. 5), in a previous work, we developed a method to ascribe changes in SV pools, within the RS/DS model, with specific modifications of s1, s2 and s5-s8 during test 100 Hz trains (Tran et al., 2022). This method was developed in 3 mM Ca<sup>2+</sup> conditions, and for this reason, we performed most experiments for the present work in 3 mM Ca<sup>2+</sup>.

      Chiu and Carter (2024) demonstrated LFD in neocortical synapses; they performed their study in 1.2 mM Ca<sup>2+</sup>, not in elevated Ca<sup>2+</sup>.

      Rudolph et al. (2011) showed low frequency depression not only in elevated external Ca<sup>2+</sup>, but also in 0.5 mM Ca<sup>2+</sup>. While Rudolph et al. (2011) did not make an explicit link between their observations and LFD, there is no reason to doubt that these observations are an example of LFD. They showed a biphasic depression when switching the stimulation frequency from 0.05 Hz to 2 Hz. In one of the founding papers of LFD, Doussau et al. (2010) describe a biphasic depression when switching the stimulation frequency from 0.025 Hz to 1 Hz; the Fig. 1 of the two papers (Rudolph 2011 and Doussau 2010) are strikingly similar.

      Lin et al. (2022) would probably not agree with the statement that the depression at the calyx is ‘largely monotonic’, as they stress the finding of quasi-constant depression between 5 and 50 Hz.

      The authors note that their results differ from those of Atluri and Regehr, but do not mention that a possible reason for the difference is the increased release probability in their experiments.

      In fact, we clearly listed the difference in external Ca<sup>2+</sup> as a likely source of the discrepancy by saying ‘This discrepancy presumably stems from differences in experimental conditions (room temperature, stimulation of multiple presynaptic PFs and 2 mM external Ca<sup>2+</sup> concentration in the previous work, vs. near-physiological temperature, single presynaptic stimulation and 3 mM external Ca<sup>2+</sup> here)’.

      The authors should provide statistics for the data obtained in 1.5 mM Ca, and discuss why LFD is increased in conditions that also elevate vesicle release probability.

      See our comments above: the revised version includes the requested statistics. On p. 6 of the manuscript, we do provide an explanation for the apparent lack of LFD at 1.5 Ca<sup>2+</sup> and 2 Hz, namely a superimposition of LFD with facilitation. At 1.5 Ca<sup>2+</sup> and 0.5 Hz, our LFD numbers are not weaker than at 3 mM Ca<sup>2+</sup> and 0.5 Hz of 1 Hz.

      Altogether, it is correct that many LFD experiments have been carried out in high release probability synapses, and/or under conditions of elevated Ca<sup>2+</sup>. However, the reasons underlying these choices are diverse (in our case, to build on the previous SV pool analysis developed in Tran et al. 2022 in 3 Ca<sup>2+</sup> conditions) and do not imply a limitation to the phenomenon. LFD is present in physiological conditions for low-to-moderate release probability synapses (as shown in our work), and altogether, there is no reason to dismiss LFD as nonphysiological.

      (2) Lack of biological mechanisms supporting the model

      The model is presented without compelling biological support. The evidence in support of vesicle undocking comes from experiments by the Watanabe lab, which showed fewerthanexpected docked vesicles under EM when cultured synapses were stimulated immediately prior to high-pressure freezing. Kusick et al were careful to note that these vesicles may have been lost to fusion.

      The Watanabe lab showed an SV deficit at docking sites at times ranging from about 100 ms to several seconds (Kusick et al., 2020, their Fig. 5E). This corresponds to the ISI values where we see paired-pulse depression. In their Summary, Kusick et al. raise the possibility of SV fusion as an alternative to undocking at the 100 ms time point. But, the same issue had previously been considered in Miki et al., 2018 with other techniques (their Fig. 2d), where it was shown that the SV deficit seen in paired-pulse experiments could not be explained by fusion. This leaves undocking as the most likely explanation, at least in our preparation. We have added a new paragraph on p. 14 to clarify this point.

      The putative undocking Kusick describes is immediate (< 5 ms after stimulation), and it was not shown to be Ca<sup>2+</sup> sensitive. This manuscript describes "calcium-dependent undocking" that proceeds from 10 ms - 200 ms. Multiple studies from the Watanabe lab show that a single stimulus lowers the number of docked vesicles, and subsequently, there is a transient redocking of vesicles that can be blocked by EGTA or Syt7 knockout.

      This is not an accurate description of the Kusick results or of our results. In the Kusick paper, the SV deficit seen at <5 ms after stimulation is attributed to exocytosis, not to undocking. Clearly, it is Ca<sup>2+</sup> dependent. Our manuscript describes potential calcium-dependent undocking not during the time 10 ms- 150 ms, during which our undocking rate is assumed to be calcium-independent, but starting at 150 ms, and lasting a few hundred ms thereafter.

      I also question the rationale for the authors' model that 2 vesicles are coupled in series to a single release site. Previous papers from this lab cited EM studies from frog and neuromuscular that showed filamentous connections between vesicles (do these synapses show LFD?). Here, the authors primarily cite their previous models to support their arguments. I encourage them to continue searching for ultrastructural evidence for 2-vesicle-docking-units and to cite such studies.

      It is important to remember that our sequential two-step model was not based on EM data, but on a series of functional data including variance-mean analysis of summed SV release numbers; covariance analysis among subsequent SV release numbers; analysis of release latencies as a function of stimulus number during an AP train; analysis of SV release numbers under conditions of very high release probability. We note that the phenomenon of Ca<sup>2+</sup>-dependent docking that we proposed based on these observations has been consistent with flash-and-freeze or zap-and-freeze results from several laboratories. Concerning potential filamentous connections between SVs and the AZ plasma membrane at a distance of several 10s of nm, this has been seen not only in frog or mice neuromuscular junctions, but also at brain synapses (ex: Siksou et al., Journal of Neuroscience 2007; Cole et al., Journal of Neuroscience 2016; Fernandez-Busnadiego, Journal of Cell Biology 2010; 2013).

      (3) Comparison to other vesicle models

      The authors use overly assertive language to suggest that the model proves a mechanism. "Altogether, these results indicate that the slow phase of LFD ... reflects a δ decrease without significant changes in pr, in ρ or in IP size". Simulating data does not conclusively "indicate" the underlying mechanism, but the authors could state their data can be "explained by a model where..".

      Please see our response above to a similar point by Reviewer 1.

      However, LFD does not require activity-dependent undocking. Instead, the phenomenon has been explained by high-release probability, paired with an activity-dependent increase in either docking or release probability (Chiu and Carter, 2024; Doussau et al., 2017). Does the new model do a better job of replicating some facet of the data? If multiple models can explain the same data, how can we determine which model is correct? The "Alternative Presynaptic Depression Mechanisms" should be expanded to discuss these issues.

      We could not find statements in the Chiu and Carter paper or in the Doussau et al. paper explaining LFD ‘by high-release probability, paired with an activity-dependent increase in either docking or release probability’. As far as we can see, Chiu and Carter do not propose any specific mechanism for LFD, beyond saying that depression and facilitation must be separate. Doussau et al. (their Fig. 6) clearly frame their interpretation in a sequential two-step model. As in the preceding Miki et al. paper (which they cite extensively), they assume a rapid (a few ms), Ca-dependent transition between their ‘reluctant pool’ and their ‘fully-releasable pool’, respectively homologous to RS and DS. Thus, the Doussau et al. interpretation is close to that presented in our present work, even though significant differences exist. An important difference is that Doussau et al. did not use simple synapses, so that they did not have access to key synaptic parameters such as the number of docking sites or the release probability per docking site. Consequently, the model in Doussau et al. does not have the same level of detail as ours. The revised version explains better the differences and similarity between the models of Doussau et al. and that exposed in our work (new paragraph on p. 14).

    1. eLife Assessment

      Mechanical transduction channels of sensory hair cells possess lipid scramblase activity. Membrane lipid disruption resulting from mechanical transduction is thought to be restored by flippase activities. This fundamental study provides compelling evidence that ATP8B1, a P4-ATP flippase and its subunit TMEM30B, are key in mediating this restorative function in outer hair cells of the mammalian cochlea.

    2. Reviewer #1 (Public review):

      Sensory hair cells of the inner ear convert mechanical sound vibrations into electrical signals through mechano-electrical transduction (MET), a process critically dependent on the specialized organization and lipid composition of their plasma membrane. Although the protein components of the MET complex are relatively well characterized, the role of the lipid environment remains poorly understood and often overlooked. Recent discoveries that core MET proteins TMC1 and TMC2 function as lipid scramblases, disrupting membrane lipid asymmetry, expose a significant gap in our understanding of how lipid homeostasis is regulated in hair cells and how membrane dynamics influence MET function.

      In this study, the authors address this gap by identifying the P4-ATPase ATP8B1 and its chaperone TMEM30B as essential regulators of membrane lipid asymmetry in outer hair cells. They also generated HA-tagged knock-in mice to precisely localize the P4-ATPase ATP8B1 and its chaperone TMEM30B within outer hair cells, demonstrating their enrichment in stereocilia, and convincingly demonstrate that loss of these proteins causes phosphatidylserine externalization, hair cell degeneration, and hearing loss in mouse models, phenocopying defects observed in TMC1 mutant mice with constitutive scrambling activity. While these findings establish lipid flippase pathways as critical for hair cell survival and auditory function, they also raise important questions about the precise mechanisms linking lipid asymmetry disruption to MET dysfunction and hair cell pathology.

      Overall, the data convincingly support the conclusion that ATP8B1-TMEM30B flippase activity is required to maintain stereocilia lipid asymmetry and auditory function. The study substantially advances understanding of how lipid homeostasis intersects with MET. However, several points require clarification to ensure that localization claims and mechanistic interpretations are fully supported by the presented data.

      Revisions considered essential by this reviewer are:

      (1) Figure 1D.<br /> The authors should clarify how the qPCR data were normalized and specify the reference (housekeeping) genes used. This information is necessary to evaluate the robustness and comparability of the gene expression data.

      (2) Figure 1F.<br /> The lack of F-actin staining at the hair cell base raises the possibility that the permeabilization conditions may have limited antibody access to certain membrane regions. This is especially important given that the authors used a gentle permeabilization agent such as saponin to preserve membrane integrity. Because the authors conclude that ATP8B1 and TMEM30B are localized "almost exclusively to OHC bundles and the apical membrane, with minimal staining in the remaining plasma membrane," (line 128). Including co-labeling with a plasma membrane marker or more comprehensive F-actin visualization of lateral and basal regions would help ensure that the restricted localization is biological rather than technical. In the absence of such controls, the localization claim may be somewhat overstated and should be tempered accordingly.

      (3) Figure 7B.<br /> Although quantification of ATP8B1-HA intensity at the bundle appears similar between WT and Cib2 KO samples, the representative image suggests that some bundles lack detectable labeling. To better capture phenotype variability, it would be helpful to include an additional quantification showing the fraction or number of bundles with detectable ATP8B1-HA signal in Cib2 KO mice.

      (4) Lines 346-349.<br /> The manuscript suggests that IHCs lack stereocilia-enriched P4-ATPases. However, this conclusion is not directly supported by the presented data. The authors should either provide supporting localization or expression data for other P4-ATPases or soften the statement to indicate that no stereocilia-enriched P4-ATPases were detected under the conditions examined.

      Recommendations:

      (5) The authors convincingly demonstrate that TMEM30B loss results in ATP8B1 mislocalization. While not essential to the central conclusions, examining TMEM30B localization in ATP8B1 KO hair cells would clarify whether this interdependence is reciprocal, as described for other P4-ATPase-CDC50 complexes.

      (6) Lines 359-374.<br /> The discussion of Annexin V labeling is careful and balanced. This paragraph would benefit from referencing other studies that showed minimal Annexin V labeling in healthy P6 organ of Corti, reinforcing that robust PS externalization in the present study is pathological rather than developmental.

      (7) Lines 392-399.<br /> The proposed feedback model linking MET activity and ATP8B1-TMEM30B localization is compelling. The discussion could be strengthened by noting that in TMC1/2 double knockout hair cells, PS externalization is not observed, consistent with the idea that flippase activity becomes critical specifically when scrambling occurs. The mislocalization observed in Cib2 KO hair cells further supports the coupling between TMC-mediated scrambling and flippase-mediated membrane restoration.

    3. Reviewer #2 (Public review):

      Summary:

      Prior work identified TMEM30B (knockout mice) as well as ATP8B1 (human genetics and mouse model), ATP8A2 (knockout mice), and ATP811A (human genetics) as relevant for hearing. The authors also reasoned that, given the recent discovery of TMC1 and TMC2's dual function as mechanotransduction channels of the inner ear and as lipid scramblases, a counterpart flippase should be in the sensory hair-cell stereocilia bundle where mechanotransduction happens. They use CRISPR/CAS to modify the endogenous mouse genes and add an HA tag at the N-terminus of the ATP8B1, ATP8A1, ATP8A2, and ATP11A proteins. Their experiments with these mice unambiguously localized ATP8B1 at the base of outer hair cell stereocilia bundles. Knockout of ATP8B1 results in loss of outer hair cells, deficient auditory function (ABR), and degeneration of outer hair cell stereocilia bundles. Similarly, hair cells from genetically modified mice with endogenous HA-tagged TMEM30B proteins show localization of this protein to outer hair cell stereocilia bundles. TMEM30B knock-out mice phenocopy the ATP8B1 knock-out model. Interestingly, the authors show that annexing V staining precedes hair cell loss in ATP8B1 and TMEM30B knockout mice and that proper localization of these proteins is lost in mice that lack CIB2, a protein essential for hair cell mechanotransduction.

      Strengths:

      (1) Use of knock-in HA-tagged proteins, rather than antibody staining, to unambiguously localize ATP8B1 and TMEM30B.

      (2) Systematic characterization of auditory function (ABR), hair cell loss, and hair-cell stereocilia bundle morphology.

      (3) Advances our understanding of the role played by lipid homeostasis in auditory function.

      (4) Reports on mouse models that will be helpful to further understand the mechanistic role played by ATP8B1 and TMEM30B in normal hearing and hereditary deafness.

      Weaknesses:

      (1) Are the HA tags causing any functional issues? Function and localization of tagged proteins can sometimes be compromised. It would be good to know, for each knock-in model (TMEM30B, ATP8B1, ATP8A1, ATP8A2, and ATP11A ), whether the HA-tagged protein is causing any issues with the mice and particularly with hearing (ABRs). Are these mice normal? Can they hear? These data are missing.

      (2) Following on the point above, is it possible that ATP8B1-HA is well localized, but localization for the other three flippases (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA) is compromised by the tag? Is this potential mislocalization causing any functional phenotypes? (ABRs of point 1). I find it surprising that there are flippases only in outer hair cells, and only formed by ATP8B1. A possible explanation is that the tag is interfering with trafficking. If so, there should be a phenotype (ABRs), although this might be masked by redundancy among these flippases or caused by systemic issues (admittedly difficult to sort out). Given that this manuscript will likely become foundational, and that there is evidence that at least two of the other flippases are involved in hearing loss, it would be good to provide more information about the mice and HA-tagged proteins in the other knock-ins (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA). Depending on the data available for the knock-ins, the authors may want to discuss these scenarios and soften the statement indicating that inner-hair cells may lack flippase activity altogether.

      (3) Expression of ATP8B1 at P0 (Figure 1D), when there should not be protein in outer hair cells yet, seems high. Does this mean that other cells in the cochlea also express ATP8B1? Is this a concern?

      (4) Fluorescence scales in Figure 6 B and D and Figure 7 B and D are very different. So are the values for WT. One would expect that the WT would be similar in all cases (at least within the same compartments), given that the methods section indicates that "All images were collected using identical acquisition parameters, including zoom and laser power, across genotypes". If WT shows such variability, how can we compare?

    4. Author Response:

      Summary of Planned Revisions:

      We will clarify the qPCR methodology and interpretation to address potential misunderstandings.

      We will assess hearing in the generated HA-tagged mouse lines and, where appropriate, include a properly powered ABR analysis in the revised manuscript.

      We will address concerns regarding the z-stack in Figure 1f.

      We will include additional quantification for Figure 7B to strengthen the analysis.

      We will revise the relevant statement to read: “No IHC stereocilia-enriched P4-ATPases were detected under the conditions examined.”

      While we appreciate the suggestion to examine TMEM30B localization on the ATP8B1 KO background, this is not feasible within a reasonable timeframe; we will clarify this limitation in the manuscript.

      We will incorporate relevant prior work (e.g., George and Ricci, 2026) demonstrating minimal Annexin V labeling prior to P6 and lack of PS externalization in TMC1/2 double knockout models.

      We will clarify that hearing thresholds for TMEM30B-HA and ATP8B1-HA lines will be addressed in this study, while additional HA-tagged flippase lines (ATP8A1, ATP8A2, ATP11A) are part of ongoing work to be reported separately.

      We will soften statements regarding HA-tag insertion and clarify that, to our knowledge, localization and function are not disrupted, while acknowledging this as a potential limitation.

      We will revise the Methods section to clarify differences in fluorescence measurements across experiments.

      In addition to the experiments in response to reviewer’s suggestions, we will add the following data that we have generated while the paper was in review:

      Distortion product otoacoustic emission (DPOAEs) of the Atp8b1 KO and Tmem30b KO mice. Consistent with OHC function, their DPOAEs thresholds were elevated.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Figure1D.

      The authors should clarify how the qPCR data were normalized and specify the reference (housekeeping) genes used. This information is necessary to evaluate the robustness and comparability of the gene expression data.

      We thank the reviewer for this comment. qPCR data were normalized to GAPDH as the reference (housekeeping) gene. We will clarify this in the Methods section to ensure transparency and reproducibility.

      (2) Figure 1F.

      The lack of F-actin staining at the hair cell base raises the possibility that the permeabilization conditions may have limited antibody access to certain membrane regions. This is especially important given that the authors used a gentle permeabilization agent such as saponin to preserve membrane integrity. Because the authors conclude that ATP8B1 and TMEM30B are localized "almost exclusively to OHC bundles and the apical membrane, with minimal staining in the remaining plasma membrane," (line 128). Including co-labeling with a plasma membrane marker or more comprehensive F-actin visualization of lateral and basal regions would help ensure that the restricted localization is biological rather than technical. In the absence of such controls, the localization claim may be somewhat overstated and should be tempered accordingly.

      We appreciate this important point. The image shown represents a single z-slice from a larger stack, and the hair cell body lies outside the plane of this section. To clarify this, we will revise the figure presentation. Specifically, we can provide the full z-stack (already available via OSF) and/or replace the image with a resliced whole-mount view to better visualize the full cellular context.

      In terms of the possibility that the lack of staining in the hair cell’s plasma membrane might be due to insufficient antibody penetrance, we routinely perform Prestin (located in OHC plasma membrane) staining after saponin-mediated permeabilization and have never experienced antibody accessibility issues. Nevertheless, we will perform co-labeling for Prestin and include in the new submission.

      (3) Figure 7B.

      Although quantification of ATP8B1-HA intensity at the bundle appears similar between WT and Cib2 KO samples, the representative image suggests that some bundles lack detectable labeling. To better capture phenotype variability, it would be helpful to include an additional quantification showing the fraction or number of bundles with detectable ATP8B1-HA signal in Cib2 KO mice.

      We thank the reviewer for this suggestion. To better capture variability, we will include an additional quantification measuring the fraction of hair cell bundles with detectable ATP8B1-HA and TMEM30B-HA signal per field of view. This analysis will complement the existing intensity-based quantification.

      (4) Lines 346-349

      The manuscript suggests that IHCs lack stereocilia-enriched P4-ATPases. However, this conclusion is not directly supported by the presented data. The authors should either provide supporting localization or expression data for other P4-ATPases or soften the statement to indicate that no stereocilia-enriched P4-ATPases were detected under the conditions examined.

      We agree with the reviewer and will revise this statement to read: “No IHC stereocilia-enriched P4-ATPases were detected under the conditions examined.”

      Recommendations:

      (5) The authors convincingly demonstrate that TMEM30B loss results in ATP8B1 mislocalization. While not essential to the central conclusions, examining TMEM30B localization in ATP8B1 KO hair cells would clarify whether this interdependence is reciprocal, as described for other P4-ATPase-CDC50 complexes.

      We appreciate this insightful suggestion. However, performing this experiment would require generating a compound mouse line (crossing TMEM30B-HA into the ATP8B1 knockout background), which is not feasible within the revision timeframe. Additionally, the lack of a robust commercial antibody for TMEM30B further complicates this approach. We will note this as a future direction in the revised manuscript.

      (6) Lines 359-374.

      The discussion of Annexin V labeling is careful and balanced. This paragraph would benefit from referencing other studies that showed minimal Annexin V labeling in healthy P6 organ of Corti, reinforcing that robust PS externalization in the present study is pathological rather than developmental.

      We thank the reviewer for this suggestion and will incorporate relevant prior work, including George and Ricci (2026), which demonstrates minimal Annexin V labeling prior to P6, and further supports our interpretation.

      (7) Lines 392-399.

      The proposed feedback model linking MET activity and ATP8B1-TMEM30B localization is compelling. The discussion could be strengthened by noting that in TMC1/2 double knockout hair cells, PS externalization is not observed, consistent with the idea that flippase activity becomes critical specifically when scrambling occurs. The mislocalization observed in Cib2 KO hair cells further supports the coupling between TMC-mediated scrambling and flippase-mediated membrane restoration.

      We agree and will expand the discussion to include that TMC1/2 double knockout hair cells do not exhibit phosphatidylserine externalization, supporting the idea that flippase activity becomes critical in the context of scrambling.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Are the HA tags causing any functional issues? Function and localization of tagged proteins can sometimes be compromised. It would be good to know, for each knock-in model (TMEM30B, ATP8B1, ATP8A1, ATP8A2, and ATP11A), whether the HA-tagged protein is causing any issues with the mice and particularly with hearing (ABRs). Are these mice normal? Can they hear? These data are missing.

      We thank the reviewer for raising this important point. In this study, we will focus on TMEM30B-HA and ATP8B1-HA mouse lines, while additional HA-tagged flippase lines (ATP8A1, ATP8A2, ATP11A) are part of ongoing work to be reported separately.

      Both TMEM30B-HA and ATP8B1-HA mice are viable and exhibit normal breeding and aging. Preliminary (pilot) ABR measurements indicate wild-type–like hearing thresholds. We agree that this is important and will attempt to raise sufficient mouse numbers (in the time given) for a properly powered ABR analysis in the revised manuscript.

      (2) Following on the point above, is it possible that ATP8B1-HA is well localized, but localization for the other three flippases (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA) is compromised by the tag? Is this potential mislocalization causing any functional phenotypes? (ABRs of point 1). I find it surprising that there are flippases only in outer hair cells and only formed by ATP8B1. A possible explanation is that the tag is interfering with trafficking. If so, there should be a phenotype (ABRs), although this might be masked by redundancy among these flippases or caused by systemic issues (admittedly difficult to sort out). Given that this manuscript will likely become foundational, and that there is evidence that at least two of the other flippases are involved in hearing loss, it would be good to provide more information about the mice and HA-tagged proteins in the other knock-ins (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA). Depending on the data available for the knock-ins, the authors may want to discuss these scenarios and soften the statement indicating that inner-hair cells may lack flippase activity altogether.

      We appreciate this concern. To our knowledge, the HA tag does not appear to disrupt localization or function of the tagged proteins. However, we agree that this cannot be fully excluded. We will therefore soften our conclusions about IHC flippases and clarify that additional flippases (ATP8A1, ATP8A2, ATP11A) are under investigation and will be described in a separate study.

      (3) Expression of ATP8B1 at P0 (Figure 1D), when there should not be protein in outer hair cells yet seems high. Does this mean that other cells in the cochlea also express ATP8B1? Is this a concern?

      We thank the reviewer for this observation. We interpret the elevated signal at P0 as reflecting transcription preceding detectable protein expression. While expression in other cochlear cell types is possible, we have not observed detectable ATP8B1 localization outside hair cells using the HA-tagged model. We will clarify this point in the manuscript.

      (4) Fluorescence scales in Figure 6 B and D and Figure 7 B and D are very different. So are the values for WT. One would expect that the WT would be similar in all cases (at least within the same compartments), given that the methods section indicates that "All images were collected using identical acquisition parameters, including zoom and laser power, across genotypes". If WT shows such variability, how can we compare?

      We appreciate the need for clarification. Identical acquisition parameters were maintained within each experiment used for direct comparison (e.g., within a given panel). However, different panels (e.g., Figures 6B vs. 6D) were acquired on different days using different imaging settings.

      We will revise the Methods section to explicitly state this and clarify that comparisons are intended only within panels, not across experiments.

  2. Apr 2026
    1. eLife Assessment

      This important study examines the stability and compensatory plasticity in the retinotopic mapping in patients with congenital achromatopsia. It provides convincing evidence for a stable mapping of the visual field in V1, alongside changes of the readout from V1 into V3, which shows revised receptive field location and size. This paper would be of interest to scientists studying the visual system, brain plasticity, and development.

    2. Reviewer #1 (Public review):

      Summary:

      This paper examines plasticity in early cortical (V1-V3) areas in an impressively large number of rod monochromats (individuals with achromatopia). The paper examines three things:

      (1) Cortical thickness. It is now well established that early complete blindness leads to increases in cortical thickness. This paper shows increased thickness confined to the foveal projection zone within achromats. This paper replicates work by Molz (2022) and Lowndes (2021), but the detailed mapping of cortical thickness as a function of eccentricity and the inclusion of higher retinotopic areas is particularly elegant.

      (2) Failure to show largescale reorganization of early visual areas using retinotopic mapping. This is a replication of a very recent study of Molz et al. but I believe, given anatomical variability, the larger n in this study, and how susceptible pRF findings are to small changes in procedure, this replication is also of interest.

      (3) Connective field modelling, examining the connections between V3-V1. The paper finds changes in the pattern of connections, and smaller connective fields in individuals with achromatopsia than normally sighted controls, and suggests that these reflect compensatory plasticity, with V3 compensating for the lower resolution V1 signal in individuals with achromatopsia.

      This is a carefully done study (both in terms of data collection and analysis) that is an impressive amount of work.

      *Effects of eye-movements

      The authors have carried out the eye-movement analyses I asked of them. Unfortunately, in 4 individuals they couldn't calibrate the eyetracker (it's impressive they managed in 10). I think this means that 4 of 13 (since a different participant was excluded from head motion) individuals weren't included in correlation analyses. Limiting the correlation analysis to individuals with better fixation has obvious issues. I'd recommend redoing (or additionally including) stats using non-parametric measures while classifying these 4 as having fixation instability of 3 (i.e. greater instability than the participant with the worst fixation who was successfully calibrated).

      *Interpreting pRFs

      The paper would be strengthened by a little more explicit clarity about what pRFs represent and how that affects their interpretation of their findings as plasticity vs. non-plasticity (I know the authors are aware of this, but I think it would be helpful for readers who are less experienced in pRFs). In the introduction it would be helpful to point out that pRFs represent the collective response of a large population of neurons, and as a result pRF estimates can vary depending on which population of neurons that stimulus drives.

      For example, imagine for the sake of argument that rods only project to V1 neurons with larger receptive fields. If one measured pRFs in a control observer under phototopic vs. scotopic conditions one would see smaller pRFs in the photopic conditions. This wouldn't represent 'plasticity' - it would represent the fact that the firing neurons contributing to the pRF signal are a slightly different population because of a change in the stimulus content. This is of course exactly what you see in 2C. And indeed, the authors make this identical point ". In the non-selective condition, the smaller pRFs in controls are in line with the higher spatial resolution of the<br /> cone system, which is not active in the achromat group." But this point would be clearer if more of the conceptual underpinnings were made explicit in the introduction (or at this point in the paper).

      Shifts in which population of neurons drive your pRFs can explain main of the more puzzling results in the paper without detracting from your main conclusions. For example, in 2D, I don't think it's differences in S/N driving your results (pRFs are at least theoretically meant to be robust to S/N). If smaller RFs 'drop out' under low luminance and these smaller RFs also tend to be more central, then one would expect the control results of 1D. And I think a similar argument might even be made for the smaller difference in the rod monochromats.

      It would be possible to make the point of Figure 4B more simply if Figure 4B was replaced by additional Panels in Figure 2 simply showing V3 pRF sizes/eccentricity distributions. That would make the point that you don't see the same expansion in pRF sizes in V3 in a way that is just as clear, and is closer to the data.

      *Interpreting cRFs

      Similarly, I think the paper would be improved with more clarity about the underlying signal in CF modeling. Once again, I appreciate that the authors are familiar with this, but it will help the reader in interpretation. (And I do believe thinking carefully about this may alter your interpretations). CF receptive fields 'find' the region in V1 that best predict the V3 signal in a given voxel. In resting state this likely represents a combination of:

      (1) visually driven signal - correlations that may or may not reflect connectivity but represent the fact that regions that represent the same region of visual space will be active at the same time.

      (2) global bilaterally symmetrical signal consisting of enhanced correlations between iso-eccentric regions (Raemaekers et al., 2014), which may arise from vasculature that symmetrically stems from the posterior cerebral artery (Tong et al., 2013; Tong and Frederick, 2014).

      (3) intrinsic neural fluctuations that are more strongly correlated between connected neurons. These are likely quite weak compared to the other contributions.

      I think if you ignore 2, (which is not likely to differ between rod mono and controls) and model 1 and 3, you might well see shifts in CFs towards the boundary of the scotoma - essentially the CF's location will be biased towards the region of V1 that has stronger correlations - which = the region which has a visual signal.

      I do find convincing the argument that you don't see the same shift in controls in the rod-selective condition. So I think the results of 4A are fine. But a little more clarity about 'what's under the hood' in CF modeling would be nice.

      *Interpreting the relationship between pRFs and cRFs

      So there's something here that confuses me. We are all agreed that V3 pRF sizes are similar across RM and control. V1 pRFs are larger in RM. It feels intuitive that smaller CFs would compensate but I can't make it make sense to myself when I think it through. Each pRF represents a combination of receptive field location scatter and bandwidth. You want to argue that eccentricity mapping looks pretty normal, so there's no reason to think increased rf scatter, and I can believe that (though I do think this assumption should be discussed explictly).

      So far I think we agree.

      But let's think about what drives a CF during visual stimulation ... Specifically lets think about 'the pRF of the CF' (the region of visual space represented by the cluster of voxels in the CF). If pRFs for individual voxels in V1 are big, then the pRF for the CF is also going to be large. But we know that pRFs for V3 are normal size. So, the V3 CF will 'find' a smaller number of voxels in V1, in order to try to find the 'correct sized' CF pRF. Note that this explanation is very similar to yours. But doesn't require ANY 'intrinsic' connectivity. It's really just assuming the whole thing is driven by the visual signal and the CF size is determined by the ratio of the pRF sizes in V3 vs. V1.

      One possible solution would be to regress out the visual stimulus and redo this analysis based on the residuals.

    3. Reviewer #3 (Public review):

      Summary:

      This study addresses a long-standing question in visual neuroscience concerning how the human visual system balances stability and plasticity when sensory input is altered from early in life. Using achromatopsia as a model of lifelong cone deprivation, the authors examine whether early visual cortex undergoes retinotopic reorganization to compensate for the absence of foveal cone input, or whether canonical retinotopic organization is largely preserved. By combining fMRI-based population receptive field (pRF) mapping with connective field (CF) modelling, the authors characterize changes across multiple hierarchical stages of visual processing.

      The main findings indicate that primary visual cortex (V1) shows no systematic remapping of the foveal projection zone, whereas extrastriate cortex, particularly V3, exhibits altered patterns of sampling from V1. The authors interpret these results as evidence for hierarchical adaptation, whereby downstream readout mechanisms adjust to make more efficient use of degraded rod-mediated input while preserving early-stage retinotopic organization.

      Strengths:

      A major strength of this work is the use of silent substitution to generate rod-selective stimuli. This approach enables a principled comparison between achromats and typically sighted controls by isolating rod-driven responses in both groups. In doing so, the study overcomes a key limitation of prior work, where differences in cortical organization could often be confounded by differences in photoreceptor class rather than reflecting neural reorganization per se. The inclusion of a rod-driven baseline in controls provides an important reference for distinguishing long-term adaptation from transient or stimulus-driven effects.

      Another notable strength is the integration of CF modelling alongside conventional pRF mapping. While pRF analyses alone suggest enlarged receptive fields in V1, consistent with reduced spatial resolution, the CF analysis offers a more mechanistic account by revealing changes in how V3 samples information from the V1 surface. This multi-level modelling approach moves beyond descriptive accounts of cortical map structure and provides a framework for interpreting how downstream areas may adjust their integration strategies under conditions of altered input.

      Weaknesses:

      Although the study is methodologically strong, the central claims regarding stability and compensatory plasticity require clearer conceptual framing and stronger empirical support. Stability is primarily defined as the absence of large-scale retinotopic remapping in V1, yet the presence of significantly enlarged V1 pRFs indicates substantial tuning-level plasticity at the input stage; distinguishing topographic stability from functional reorganization would therefore strengthen the interpretation. Moreover, the proposed compensatory mechanism raises a signal-processing concern, as reduced downstream sampling (smaller CFs in V3) cannot restore spatial information lost due to coarse upstream representations, and may instead limit integration. The mechanistic link between altered CF properties and normalization of extrastriate pRFs is not directly tested, as group differences are not shown to covary across individuals or visual field locations. Finally, the interpretation of these changes as compensatory implies functional benefit, yet no behavioral or performance measures are provided to establish that the observed reorganization preserves or enhances visual function, leaving open whether these effects reflect adaptive optimization or passive downstream consequences of altered input.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines plasticity in early cortical (V1-V3) areas in an impressively large number of rod monochromats (individuals with achromatopia). The paper examines three things:

      (1) Cortical thickness. It is now well established that early complete blindness leads to increases in cortical thickness. This paper shows increased thickness confined to the foveal projection zone within achromats. This paper replicates the work by Molz (2022) and Lowndes (2021), but the detailed mapping of cortical thickness as a function of eccentricity and the inclusion of higher visual areas is particularly elegant.

      (2) Failure to show largescale reorganization of early visual areas using retinotopic mapping. This is a replication of a very recent study by Molz et al. but I believe, given anatomical variability (and the very large n in this study) and how susceptible pRF findings are to small changes in procedure, this replication is also of interest.

      (3) Connective field modelling, examining the connections between V3-V1. The paper finds changes in the pattern of connections, and smaller connective fields in individuals with achromatopsia than normally sighted controls, and suggests that these reflect compensatory plasticity, with V3 compensating for the lower resolution V1 signal in individuals with achromatopsia.

      Strengths:

      This is a carefully done study (both in terms of data collection and analysis) that is an impressive amount of work. I have a number of methodological comments but I hope they will be considered as constructive engagement - this work is highly technical with a large number of factors to consider.

      Weaknesses:

      (1) Effects of eye-movements

      I have some concerns with how the effects of eye-movements are being examined. There are two main reasons the authors give for excluding eye-movements as a factor in their results. Both explanations have limitations.

      (a) The first is that R2 values are similar across groups in the foveal confluence. This is fine as far as it goes, but R2 values are going to be low in that region. So this shows that eyemovements don't affect coverage (the number of voxels that generate a reliable pRF), but doesn't show that eye-movements aren't impacting their other measures.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space. 

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures were then averaged across the two run repeats.”

      We report the resulting new fixation data results as follows:

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      (b) The authors don't see a clear relationship between coverage and fixation stability. This seems to rest on a few ad hoc examples. (What happens if one plots mean fixation deviation vs. coverage (and sets the individuals who could not be calibrated as the highest value of calibrated fixation deviation. Does a relationship then emerge?).

      In any case, I wouldn't expect coverage to be particularly susceptible to eye-movements. If a voxel in the cortex entirely projects to the scotoma then it should be robustly silent. The effects of eye-movements will be to distort the size and eccentricity estimates of voxels that are not entirely silent.

      There are many places in the paper where eye-movements might be playing an important role. 

      Examples include the larger pRF sizes observed in achromats. Are those related to fixation instability?

      We thank the reviewer for their comment. As detailed in our previous response, we have now extracted fixation instability data from additional patients and have expanded our discussion of its potential effects throughout the manuscript.

      Given that fixation instability is expected to increase pRF size by a fixed amount, that would explain why ratios are close to 1 in V3 (Figure 4).

      We agree with the reviewer’s point, that the ratio change on its own is not strong evidence of compensation, this analysis was meant to complement the CF result. The plot in Figure 4 is intended to reconcile the connective field (CF) and pRF results. Its purpose is to illustrate that even though larger pRFs in achromats might seem counterintuitive alongside their smaller V3 CF sizes, the pRF data do not contradict the CF findings but they are in fact consistent with one another. We also agree that there are alternative explanations for the differences in pRF size, such as fixation stability, and we have now added this point to the text.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      (2) Topography

      The claim of no change in topography is a little confusing given that you do see a change in eccentricity mapping in achromats. 

      Either this result is real, in which case there *is* a change in topography, albeit subtle, or it's an artifact. 

      Perhaps these results need a little bit of additional scrutiny. 

      One reason for concern is that you see different functions relating eccentricity to V1 segments depending on the stimulus. That almost certainly reflects biases in the modelling, not reorganization - the curves of Figure 2D are exactly what Binda et al. predict. 

      Another reason for concern is that I'm very surprised that you see so little effect of including/not including the scotoma - the differences seem more like what I'd expect from simply repeating the same code twice. (The quickest sanity check is just to increase the size of the estimated scotoma to be even bigger?).

      We thank the reviewer for their comment. We have double-checked our scotoma modelling, confirming its correct implementation. The results of the scotoma modelling are not identical to the full one, just similar (see below).

      Previous studies on “artificial scotomas” (such as the one reported by Binda et al.) have shown mixed results. While Binda and colleagues found that modelling artificial scotomas normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rodfree zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas. Moreover, it is unclear whether scotoma modelling is beneficial in clinical populations as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors. A recent achromatopsia study (Anderson et al. 2024) also found no change in pRF estimates with scotoma modelling.

      In our scotoma analyses, we found meaningful differences only in the non-selective condition in controls where cones in the rod-free zone are stimulated - which would be the main expected effect of this modelling exercise (see below). In all other conditions (rod-selective in controls, both conditions in achromats), only rods are stimulated, we found no difference in coverage, eccentricity or pRF size when modelling the scotoma likely because the foveal signal is weak/absent, and did not contribute much to pRF estimates in the unmasked analyses.

      This means we cannot account for the eccentricity shift as an edge effect with this scotoma model – but we remain cautious about interpreting it as real. This is because first, as we mention in the paper, in the non-selective condition, which has a higher signal-to-noise ratio, the eccentricity estimates in achromats match those of the control group's rod system. Second, it is still possible that the observed shift is an artefact of modelling that was not accounted for by the approach of scotoma modelling.

      Our claim of "no change in topography" specifically referred to the absence of "filling-in" as measured by cortical coverage - the percentage of activated tissue regardless of fitted parameters. However, to avoid confusing given the eccentricity and pRF size results we now rephrased our claim.

      Abstract:

      “Cortical input stages (V1) exhibited high stability, with input-deprived cortex showing no retinotopic remapping and exhibiting structural hallmarks of deprivation.”

      Results (pRF eccentricity):

      “It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      To better illustrate the effect of scotoma modelling text has been added to Supplement 3:

      “Studies on artificial scotomas, where part of the visual field is masked, suggest that pRF estimates of eccentricity and size can be biased by fitting scotoma-edge artefacts, and that these can be mitigated by modelling the scotoma in the pRF fitting procedure (e.g., Binda et al. 2013).

      We therefore repeated the pRF modelling procedure with the rod-scotoma being modelled as a black oval mask (1.25°x0.9°) over the stimulus aperture model. As expected, a visible difference between the two models is only apparent in the nonselective condition in controls where the cones in the rod-free zone are being stimulated. In all the other conditions (rod-selective in controls, and both stimulation conditions in achromats) only the rods are stimulated, therefore the masked stimulus still matches the retinal activation, and no major differences can be observed. Performing the same statistical tests applied to the full model in the main text yields equivalent results of equivalent coverage in the rod-selective condition, with equivalent coverage across groups(t(47) = 0.78, p=0.43, BF10=0.31) and controls show a higher coverage in the non-selective stimulation condition compared to achromats (Mann U(52)=141, p<0.01; unequal variance, reverted to non-parametric).

      This consistency in pRF properties when modelling the rod scotoma, is in line with previous results from scotoma modelling; While Binda and colleagues found that this normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rod-free zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas, and as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors, it is unclear how artificial scotoma findings generalise to clinical populations. Our results are in line with a recent achromatopsia study (Anderson et al. 2024) which also found no change in pRF estimates with scotoma modelling.”

      I'd also look at voxels that pass an R2>0.2 threshold for both the non-selective and selective stimulus. Are the pRF sizes the same for both stimuli? Are the eccentricity estimates? If not, that's another clear warning sign.

      Comparable results were obtained when using higher R2 thresholds. These results are now included in Supplement 6.

      (3) Connective field modelling

      Let's imagine a voxel on the edge of the scotoma. It will tend to have a connective field that borders the scotoma, and will be reduced in size (since it will likely exclude the cortical region of V1 that is solely driven by resting state activity). This predicts your rod monochromat data. The interesting question is why this doesn't happen for controls. One possibility is that there is topdown 'predictive' activity that smooths out the border of the scotoma (there's some hint of that in the data), e.g., Masuda and Wandell.

      One thing that concerns me is that the smaller connective fields don't make sense intuitively. When there is a visual stimulus, connective fields are predominantly driven by the visual signal. In achromats, there is a large swath of cortex (between 1-2.5 degrees) which shows relatively flat tuning as regards eccentricity. The curves for controls are much steeper, See Figure 2b. This predicts that visually driven connective fields should be larger for achromats. So, what's going on?

      The reviewer raises interesting points about the interpretation of our connective field results. The possibility of differential top-down modulation between controls and achromats is intriguing, however it is not supported by the data, if top-down modulation is activating foveal V1 in controls then we shouldn’t see a drop in the amount of significant vertices sampling from the fovea in the rod-selective condition compared to the non-selective, but in fact we do see quite a large drop in the amount of significant vertices in that area in the rod-selective condition. Therefore, at the moment we do not think there is strong basis to assume our data could be explained by achromats lacking top-down predictive activity in the scotoma area that is present in controls.

      Regarding the concern about smaller CFs seeming counterintuitive given the flat eccentricity tuning in achromats' V1: we believe there is not a straightforward prediction from pRF properties to CF sizes. The relationship between V1 pRF characteristics and V3 CF sampling is complex and not well-established in the literature, and the two can be decoupled to some degree. For instance, in our data, controls show flat V1 pRF sizes in the rod-selective condition (similar to achromats), yet their V3 CF sizes maintain the typical eccentricity-dependent increase seen in the non-selective condition. This suggests that CF size patterns don't simply mirror V1 pRF properties or visual stimuli responses.

      Importantly, CF modelling fundamentally differs from pRF analysis in how it might be affected by scotomas. Unlike pRF analysis where a scotoma creates a "silent" region in visual space, in CF modelling the deprived cortex remains physically present and continues generating neural signals (albeit not visually-driven ones). If V3-V1 connectivity were anatomically fixed, V3 would continue sampling from deprived V1 regions even if they do not produce visual-driven signals. A change in this sampling pattern, as we see in our data, is therefore evidence for plasticity.

      Our data support this interpretation. First, in achromats, the CF size pattern observed cannot be easily explained by scotoma-edge artefacts. V3 vertices sampling from the immediate vicinity of the scotoma (1°-3°) show CF sizes comparable to controls. The effect is only significant further away from the scotoma (4°-6°).

      Second, to assess how the presence of a scotoma affects CF measure we can compare the two conditions in the controls, since the rod-selective condition has a scotoma present and the nonselective condition does not. For this purpose, we performed an additional analysis, quantifying on a vertex-by-vertex level the differences in CF fitted parameters between the two stimulation conditions across V1. See results below. In achromats there are no systematic shifts between the stimulation conditions, as expected as both are rod-driven. In controls, this analysis reveals only subtle shifts (~0.45° in the rod-selective condition). CF size has also changed slightly although not significantly different from that observed in achromats. These shifts are much smaller than the CF size and eccentricity differences between controls and achromats, so we consider it unlikely that our findings are driven by scotoma artefacts.

      Author response image 1.

      Results (CF size):

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.

      To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      The beta parameter is not described (and I believe it can alter connective field sizes).

      In Author response image 2, we plot the beta parameter of the pRF modelling in V1 with no R<sup>2</sup> filtering, error bars are 95% CIs:

      Author response image 2.

      The reviewer did not specify how beta might alter connective field sizes. We assume he meant that as in pRF mapping, the slope of activity from deprived to non-deprived cortex will artefactually create a CF model fit with smaller CF sizes. To test this, we calculated the slope of beta values between 0° and 3° in each participant in the rod-selective condition, as this range includes the scotoma and the area at the edge of the scotoma. We then used the slope as a covariate in an ANCOVA when comparing the CF sizes across groups in each sampled V1 segment. Accounting for the beta slope of V1 did not change the reported results. This analysis still shows smaller CF sizes in V3 in the rod-selective conditions between 4°-6° eccentricity – these differences remain significant (p<0.001 for 4°-5° and p<0.05 for 5°-6° when comparing achromats vs controls).

      Similarly, it's possible to get very small connective fields, but there wasn't a minimum size described in the thresholding.

      CF sizes were fit with a grid fit. Possible values were [0.5,1,2,3,4,5,7,10]. Therefore, the minimum size is 0.5. Filtering out the smallest connective field sizes does not change the results:

      Author response image 3.

      I might be missing something obvious, but I'm just deeply confused as to how the visual maps and the connectome maps can provide contradictory results given that the connectome maps are predominantly determined by the visual signal. Some intuition would be helpful.

      We agree that this appears counterintuitive, and now added further clarification. The two models (pRF and CF) fundamentally differ in what they measure and how they relate to visual processing. V1 pRF sizes reflect the relationship between neural activity and visual stimuli - essentially how much of a visual stimulus drives a voxel's response - while V3 CF sizes reflect how V3 samples from the V1 cortical surface, indicating how many V1 voxels contribute to a V3 voxel's activity.

      The measures constrain each other, as a V3 voxel's pRF size is expected to match the pooling of its connected V1 inputs. But they can be decoupled: A V3 voxel could sample from a small area of V1 cortex (a small CF in mm) that happens to represent a large area of visual space if those V1 voxels have large pRFs. The aim of Figure 4B is to clarify that the measures are consistent with one another even though they diverge in direction. In achromats, where V1 voxels have larger pRFs (coarser spatial resolution), V3 appears to compensate by sampling more selectively from V1 via smaller CF sizes. Theoretically, this should reduce the pRF size difference between controls and patients in V3, a prediction that our data supports.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Some analyses might also help provide the reader with insight. For example, doing analyses separately on V3 voxels that project entirely to scotoma regions, project entirely to stimulusdriven regions, and V3 voxels that project to 'mixed' regions.

      We agree that it is important to plot the connective field dynamics across the scotoma region.

      In Figure 4A we split the V3 vertices based on the V1 area they sample from. Therefore the 0°-1° would be considered as mainly sampling from the “scotoma” region and the higher the eccentricity is, the less “scotoma” it includes. The V3 vertices that have a significantly smaller CF size compared to controls are those sampling from mostly if not entirely stimulusdriven regions 4°-5° and 5°-6°. We are not sure how further binning the data by within, across and outside scotoma would be more informative.

      However, in Author response image 4, we plot in more details the distribution of CF sizes sampling from a V1 segment clearly inside and clearly outside the scotoma. The top figure shows the CF size distribution of V3 vertices that sample from a V1 0°-1° segment, where V1 is deprived of input due to the rod scotoma. In achromats, there is a clear drop in vertices with a very small (0.5) CF size. The bottom figure shows the distribution of V3 vertices that sample from the V1 4°-5° segment which falls outside the scotoma and shows a significant difference in CF size across the groups. Here in achromats you can see a drop in larger V3 CF sizes sampling from the V1 region, and an increase in smaller ones (note that this further addresses a previous concern that connective field differences across groups are solely driven by very small CFs).

      Author response image 4.

      Following the reviewer’s comment we have added the following statement in the results section discussing CF size:

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.”

      The finding that pRF sizes are larger in achromats by a constant factor as a function of eccentricity is what differences in eye-movements would predict. It would be worth examining the relationship between pRF sizes and fixation stability.

      We found no relationship between fixation stability and pRF size in V1, although as we explain in response to an earlier point, this does not fully exclude the reviewers alterative explanation, which we now add to the discussion.

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Reviewer #2 (Public review):

      Summary:

      The authors inspect the stability and compensatory plasticity in the retinotopic mapping in patients with congenital achromatopsia. They report an increased cortical thickness in central (eccentricities 0-2 deg) in V1 and the expansion of this effect to V2 (trend) and V3 in a cohort with an average age of adolescents.

      In analyzing the receptive fields, they show that V1 had increased receptive field sizes in achromats, but there were no clear signs of reorganization filling in the rod-free area. In contrast, V3 showed an altered readout of V1 receptive fields. V3 of achromats oversampled the receptive fields bordering the rod-free zone, presumably to compensate and arrive at similar receptive fields as in the controls.

      These findings support a retention of peripheral-V1 connectivity, but a reorganization of later hierarchical stages of the visual system to compensate for the loss, highlighting a balance between stability and compensation in different stages of the visual hierarchy.

      Strengths:

      The experiment is carefully analyzed, and the data convey a clear and interesting message about the capacities of plasticity. 

      Weaknesses:

      The existence of unstable fixation and nystagmus in the patient group is alluded to, but not quantified or modeled out in the analyses. The authors may want to address this possible confound with a quantitative approach.

      We have responded to this in the “Recommendations for the authors” section of this reviewer, as they included a more detailed description of these points there.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I think the term rod monochromats should be included early in the paper since it's a more intuitive term to describe this population.

      We agree with the reviewer that the term “rod monochromats” is more intuitive as it clarifies the retinal source of the disease but have chosen the term achromats for consistency with a wide literature of published work in this group, including our own and our close collaborators’. To clarify, in the first mention of the group as achromats in the introduction we have now added this term:

      “Achromatopsia (also known as rod monochromacy) causes cone photoreceptors in the retina to be inactive from birth (Aboshiha et al., 2014).”

      (2) The paper essentially contains two definitions of 'eccentricity'. One (atlas/segments) comes from the Benson atlas and the other (functional) comes from pRF mapping. It would be good to make this distinction terminology clearer earlier in the paper. It would also be good to use more consistent terminology. I assume 'sampled atlas V1 eccentricity' in 3A is the same as 'V1 segment' in 1A?

      For consistency we have now referred to these as V1 segment and sampled V1 segment in the figures when describing the atlas-based definition, and eccentricity for the measured pRF-based eccentricity.

      (3) The 'stability vs. plasticity' framing in the introduction could be tightened slightly.

      We have made the following changes following the reviewer’s comment:

      “In the visual domain, the focal point of the debate on plasticity and stability has hinged on the extent to which retinal input deprivation can drive local reorganisation in early visual cortex, for example, for deprived tissue to take on inputs from spared retinal locations (Adams et al., 2007; Baker et al., 2005, 2008; Baseler et al., 2002, 2011; Calford et al., 2005; Dilks et al., 2009; Dumoulin & Knapen, 2018; Ferreira et al., 2016; Goesaert et al., 2014; Haak et al., 2015; Molz et al., 2023; Ritter et al., 2019; Schumacher et al., 2008). In reality visual impairment is a more global phenomenon, affecting all levels of visual processing, with complex dynamics beyond constricted local retinocortical projection zones(Carvalho et al., 2019).”

      (4) Figure 1A, define the x axis as degrees.

      We have now added the ° sign to all the tick labels indicating Benson map eccentricity.

      (5) Figure 2B, is there room for pictures of the silent substitution/standard stimulus

      We have now added images in a Supplement 5 to avoid cluttering the main Figure 2B

      (6) Figure 2

      Panel A has a slightly weird organization. The reader is supposed to compare the square symbols to each other, and the circles to each other, why not organize the figure so they are adjacent in the graph (i.e. non selective control, non-selective achromat, selective control, selective achromat)? That also helps the reader orient that in the non-selective conditions you have almost complete pRF coverage. 

      We have taken on the reviewer’s suggestion and changed the order.

      In the inset, maybe use empty symbols? That's the traditional way to say that the square/circle applies to both red and black.

      We prefer the current format.

      Figure 2C - the symbols change to circles? Why not keep the symbols of A?

      We have now changed the symbols of 2C&D.

      I'd put the non-selective maps above the selective maps?

      We appreciate the feedback but prefer to keep it as it is, as we feel the critical point is conveyed by the rod maps.

      (7) 'We propose a new hierarchical model of neural adaptation'. These ideas are hardly new. There are also other models, that would explain your data (cumulative plasticity) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5953572/

      We thank the reviewer for the reference. We have now cited it in our discussion and removed the word “new” form the mentioned sentence.

      “Therefore, there is theoretically broader scope for experience-dependent reweighting of inputs (Beyeler et al., 2017; Makin & Krakauer, 2023) and to optimise use of inputs that are still available, more reliable, or more relevant in the impaired system. Conversely, higher-order visual areas may appear more plastic simply because they integrate the cumulative effects of learning from multiple lower stages (Beyeler et al., 2017).”

      We propose a hierarchical model of neural adaptation…” [deleted the word new]

      (8) Line 508. No image of the stimulus is contained in the paper

      Corrected

      (9) Line 620. I believe the Figure is 1B, not 1C.

      Corrected

      (10) Figure 4A. CF Size - add mm2 to the axes.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      I am not an expert on pRF mapping, and as such, I am unsure how to relate to pRF mapping performed in patients with unstable fixation (not quantified, but referred to) and nystagmus, such as the achromatic population here. Since the majority of the results hinge on this analysis, I would appreciate more data about the differences between the groups. Supplement 2, which is meant to speak to this, shows only the data from 3 typical participants, and in itself is not evidence for "no correlation between stable fixation and enhanced foveal". Additionally, I'd appreciate a clear methods explanation of how the authors address these confounds; this is too important a concern to be left for the discussion section.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space.

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures when then averaged across the two run repeats.”

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      The field connectivity analysis similarly seems to be used only on task data from the same design; if it was replicated from resting-state data, that would be a good way to show consistency which is independent of measures requiring fixation. 

      We agree that resting-state data would be valuable; however, we did not collect such data in these individuals due to time limitations. Instead, we demonstrate the consistency and reliability of our results by replicating our findings across two different stimulation conditions (rod-selective and non-selective), which differ in luminance, contrast and signal amplitude in both groups and for controls also in the photoreceptors involved. The convergence of results across these distinct visual conditions strengthens our confidence in the reliability of the observed effects. Also, notably, CF estimates have been shown to be robust to large eye movements, and therefore also to differences in fixation stability across groups (Tangtartharakul et al., 2023).

      The authors may want to contextualize their findings in relation to what reorganization exists in cases of late-onset loss of part of the visual field on one hand (stroke recovery), and in the case of complete blindness from early life on the other, as both speak to different levels of plasticity the visual system is capable of.

      We thank the reviewer for their comment and have added a new paragraph discussing this topic.

      Discussion:

      “Our findings on hierarchical adaptation have broader implications for other visual disorders, depending on their timing and nature. For instance, a central scotoma acquired in adulthood, as in macular degeneration, may not trigger the same V3 sampling shifts (Haak et al., 2016), suggesting a sensitive window for this form of plasticity, after which connective fields remain more stable. This also raises questions about congenital blindness, where the absence of any driving input could lead to weakening or repurposing of hierarchical connections (Saccone et al., 2024). Moreover, principles may differ between a deprived but structurally intact cortex, as in retinal dystrophies, and a physically damaged cortex, as in stroke. In the latter, more extensive reorganisation may be required to sample effectively from surviving, and potentially disparate, regions of V1. Perceptual training effects in stroke rehabilitation may reflect such dynamics (Cavanaugh et al., 2025; Elshout et al., 2021).”

      A more minor point: Can the authors clarify what the dark adaptation is used for, and provide the supplementary analysis showing that the duration difference for some of the participants didn't impact the results (stated but not shown).

      The dark adaptation period before the rod-selective condition allowed rod photoreceptors to recover from bleaching caused by prior mesopic light exposure, ensuring optimal rod sensitivity under scotopic conditions. To verify that our 15-minute adaptation period was sufficient, we tested 10 control participants with an extended 45-minute adaptation period. As we found no differences in the resulting rod maps between standard and extended adaptation protocols, these participants were combined with the main control group for all analyses. Author response image 5 are the plots for the two dark adaptation periods.

      Author response image 5.

    1. eLife Assessment

      This valuable study presents a hierarchical computational model that integrates locomotion, navigation, and learning in Drosophila larvae. The evidence supporting the model is convincing, as it qualitatively replicates empirical behavioral data. While some simplifications in neuromechanical representation and sensory-motor integration are limiting factors, the reported modular framework will be of interest for computational modeling of biological movement and adaptive behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The paper presents a three-layered hierarchical model for simulating Drosophila larva locomotion, navigation, and learning. The model consists of a basic locomotory layer that generates crawling and turning using a coupled-oscillator framework, incorporating intermittency in movement through alternating runs and pauses. The intermediate layer enables navigation by allowing larvae to actively sense and respond to odor gradients, facilitating chemotaxis. The adaptive learning layer integrates a spiking neural network model of the Mushroom Body, simulating associative learning where larvae modify their behavior based on past experiences. The model is validated through simulations of free exploration, chemotaxis, and odor preference learning, demonstrating close agreement with empirical behavioral data. This modular framework provides a valuable advance for modeling of larva behavior.

      Strengths:

      Every modeling paper requires certain assumptions and abstractions. The main strength of this paper lies in its modular and hierarchical approach to modeling behavior, making connections to influential theories of motor control in the brain. The authors also provide a convincing discussion of the experimental evidence supporting their layered behavioral architecture. This abstraction is valuable, offering researchers a useful conceptual framework and marking a significant step forward in the field. Connections to empirical larval movement are another major strength.

      Weaknesses:

      While the model represents a conceptual advance in the field, some of its assumptions and choices fall behind state-of-the-art approaches. One limitation is the paper's simplified representation of larval neuromechanics, in which the body is reduced to a two-segment structure with basic neural control. Another limitation is the absence of an explicit neuromuscular control system, which would better capture the role of segmental central pattern generators (CPGs) and neuronal circuits in regulating peristalsis and turning in Drosophila larvae. Many detailed neuromechanical models, as cited by the authors, have already been published. These abstractions overlook valuable experimental studies that detail segmental dynamics during crawling and the larval connectome.

      The strength of the model could also be its weakness. The model follows a subsumption architecture, where low-level behaviors operate autonomously while higher layers modulate them. However, this approach may underestimate the complexity of real neural circuits, which likely exhibit more intricate feedback mechanisms between sensory input and motor execution.

    3. Reviewer #2 (Public review):

      The paper proposes a hierarchically layer approach to larval locomotion, chemotaxis and learning. The model consists of a basic locomotor layer with two coupled oscillators, one for crawls and one for turns. The intermediate layer modulates the frequency and amplitude of tunings to enables chemotaxis. The higher layer, integrates a spiking neural network model of the Mushroom Body to modify the door valence in response to experience as during learning.

      The model is compared to experimental data with a good degree of agreement. This modular framework provides a valuable advance for modeling larva behavior.

      Strengths:

      A novel multilayer level model that reflects current thinking of the neuronal organisation of motor control. The model is very useful to investigate the neuronal architecture of central pattern generators<br /> and higher order motor control circuits that could be linked to larval connectome data.

      Weaknesses:

      All the limitations of the model are discussed and therefore the paper perfectly fits its purpose.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      We are happy to read that this reviewer considers the proposed behavioral architecture ‘a significant step forward in the field’, and that she/he recognizes the strengths of our work in the modular and hierarchical approach that provides connections to influential theories of motor control in the brain, in the experimental evidence it is based on, and in the valuable abstractions that we have chosen for the larval behavioral modeling.

      The reviewer raises important points about the simplifications we have made, both conceptually and in the specific implementation of larval behaviors. Our main goal in this study is to introduce a conceptual framework that integrates agent-based modeling with systems neuroscience models in a modular fashion. To serve this purpose, we aimed for a minimal yet representative implementation at the motor layer of the architecture, calibrated to larval locomotion kinematics. This choice enables efficient simulation while allowing us to test top-down modulation and adaptive mechanisms in higher layers without the computational overhead of a full neuromechanical model. In addition to chemotaxis, we have recently used this simplified approach to model thermotaxis in larvae (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The reviewer notes the absence of explicit segmental neuromuscular control or central pattern generators (CPGs). We deliberately abstracted from these mechanisms, representing the larval body as two segments with basic kinematic control, to focus on reproducing overall locomotor patterns. This bisegmental simplification, which we illustrate in Supplemental Video “Bisegmental larva-body simplification”, retains the behavioral features relevant to our current aims. However, the modular structure of the framework means that more detailed neuromechanical models—incorporating CPG dynamics or connectome-derived circuit models—can be integrated in future work without altering the architecture as a whole.

      We fully agree that real neural circuits are more complex than a strict subsumption architecture implies. In the Drosophila larva, there is clear evidence for ascending sensory feedback from the motor periphery to premotor and higher brain circuits, as well as neuromodulatory influences. These add layers of complexity beyond the predominantly descending control in our present model. At the same time, both larval and adult connectome data show that across-level descending and ascending connections are sparse compared to the dense within-layer connectivity. We see value in casting our model as a hierarchical control system precisely to make the strengths and limitations of such an abstraction explicit. The revised manuscript will include further discussion of these points.

      In summary, our design choices reflect a trade-off: by limiting the biological detail in the lower layers, we gain computational efficiency and maintain a clear modular structure that can host models at different levels of abstraction. This ensures that the architecture remains both a tool for immediate behavioral simulation and a scaffold for integrating richer neural and biomechanical models as they become available.

      Reviewer #2 (Public review):

      We thank the reviewer for recognizing the novelty of our locomotory model, particularly the implementation of peristaltic strides based on our new analyses of empirical larval tracks, and for providing constructive feedback that will help us improve the manuscript.

      The reviewer highlights the need for clearer explanations of the chemotaxis and odor preference modules. We expand these sections in the revised manuscript with more explicit descriptions of model structure, parameterization, and calibration. As mentioned above, we have also prepared a separate preprint dedicated to the larvaworld Python package, which contains detailed implementation notes and hands-on tutorials that allow users to adapt or extend individual modules.

      Regarding the comparison to empirical behavior in chemotaxis, our present analysis is indeed primarily qualitative. However, we would like to emphasize that the temporal profile of odor concentration at the larval head in our simulations matches that measured in Gomez-Marin et al. (Nature Comm., 2011, DOI: https://doi.org/10.1038/ncomms1455) using only one additional free parameter, while all parameters of the basic locomotory model had been fitted to a separate exploration dataset before and were kept fixed in the chemotaxis experiments. In addition to the simulation of chemotaxis in the present paper, we recently used larvaworld in a practical model application to estimate a species-specific parameter of thermotaxis from experiments across different drosophilids (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The preference index in our simulations was computed using the same definition as in the established experimental group assay for larval memory retention, enabling a direct quantitative comparison between simulated and empirical results. Variability in the simulated outcomes arose naturally from inter-individual differences in body length and locomotory parameters, derived from real larval measurements, as well as from the random initial orientation of each individual in the arena. These factors contributed to variation in individual tracks and ultimately produced preference index values that closely matched those observed experimentally. In the revised manuscript, we also discuss handedness, as highlighted by the reviewer, as another meaningful expression of inter-individual variability in Drosophila larvae and insects more generally.

      Finally, we acknowledge the reviewer’s concern about the scalability and broader applicability of the model. While the present paper focuses on three specific behavioral paradigms (exploration, chemotaxis, odor preference), the modular structure of the architecture is designed for flexibility: modules at any layer can be exchanged for more detailed or alternative implementations, and new sensory modalities or behaviors can be integrated without redesigning the system. The larvaworld package, associated codebase, and documentation are openly available to encourage adoption and adaptation by the larval research community.

      Reviewer #3 (Public review):

      This public review provides an excellent account of our central aim to build an easily configurable, well-documented platform for organism-scale behavioral simulation and we are happy to read that the reviewer considers this an excellent goal.

      We thank the reviewer for her/his account of our well-organized code using contemporary Python tooling. We are currently further improving code readability and code documentation, and we will release a new version of the larvaworld Python package. We further agree with the reviewer’s assessment that understanding the model calibration currently requires reading of the appendix. For the revised manuscript we thus aim at improving our description of all calibration and modeling steps along the way. We will also make sure to improve the description of the experimental datasets used for calibration.

      We recognize that our description of the paper’s scientific contribution could be clearer. In revision, we will sharpen the Introduction and Discussion to highlight our main contributions:

      (1) Promoting a shift from isolated neural circuit modeling to integrated agent-based simulations in realistic environments.

      (2) Proposing the layered behavioral architecture, adopting the subsumption paradigm for modular integration.

      (3) Providing the larvaworld software as a ready-to-use, extensible modeling platform.

      (4) Implementing an empirically calibrated locomotory model and demonstrating its integration with navigation and learning modules in replicated behavioral paradigms.

      We agree with the reviewer that the next challenge is to integrate the empirically based behavioral simulations presented here with functional brain models capable of reproducing or predicting experimental findings at the level of cellular neurophysiology, including the effects of cell-type-specific manipulations such as gene knock-down or optogenetic activation/inhibition. However, based on our experience with systems-level modeling, we deliberately invested in behavioral simulation because functional models of the nervous system—including our own—often lack translation into simulated agent behavior. In many cases, model output is limited to one or more variables that can at best be interpreted as a behavioral bias, and most often represents an “average animal” that fails to capture inter-individual differences. By linking our spiking mushroom body model to behavioral simulations in a group of individual agents during memory retention tests (Figure 6C,D), we were able to achieve a first successful direct comparison between simulated and experimental behavior metrics—in this case, the behavioral preference index reported in Jürgensen et al. (iScience, 2024, DOI:

      https://doi.org/10.1016/j.isci.2023.108640).

      Finally, we reiterate that the layered behavioral architecture is designed to promote a modular modeling paradigm. Our adoption of a subsumption architecture does not conflict with the concept of behavioral primitives; on the contrary, the notion that such primitives follow (semi-)autonomous motor programs and can be combined into more complex behaviors was the starting point for our implementation of the architecture in the fly larva. In our view, a genuinely contradictory paradigm for neural control of behavior would require a non-modular, strictly non-hierarchical organization of the nervous system and, by extension, of behavioral control.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      See public review for main points. To summarize, I find the conceptual framework of the paper very valuable and an important advance. However, in this age of data, I would have expected that the authors would make an effort to build more realistic models that could relate directly to neural data (including connectome and activity) and muscular dynamics at the segmental level.

      This point is addressed in detail in our public review response. In brief, we agree that a segmental neuromechanical model informed by connectome data would provide richer mechanistic insight. However, such an approach would greatly increase complexity and reduce accessibility. Our aim here is to present a coarse-grained, kinematic-level framework that is modular, extensible, and designed to accommodate models at different levels of abstraction. Importantly, extensions that incorporate realistic neuromechanics or connectome-derived circuits can be readily integrated, provided they conform to the modular principles of the proposed behavioral architecture.

      The authors do not cite figures in order or appearance, which makes it hard to read.

      This has been corrected. Figures are now cited in the correct order throughout the revised manuscript.

      I would explain the model in more detail in the main text. Currently, the model is introduced through Figure 1 in an abstract way. It is really hard to make the connection between this figure to the nuts-and-bolts of neuromechanics. And, I believe, for this paper, the details of the modeling matter and are not just technical points to be hidden in the appendix. The video (video 1) is not helpful.

      We have restructured the Model section to provide more detail directly in the main text, moving explanations that were previously confined to the Appendix. This includes explicit description of the locomotory oscillator model, the intermittency module, and their empirical calibration. At the same time, we retained mathematical and implementation details in Materials & Methods to keep the reading flow accessible. Additionally, we expanded the caption of Video 1 and clarified in the text what it illustrates, making the video more informative.

      Modeling choices lead to further weaknesses. While the model can replicate observed locomotory patterns, it does not fully explain the underlying neurobiological mechanisms that govern behavioral intermittency. For example, the crawl-bend interference mechanism, while capturing observed phase-dependent attenuation of turning, is implemented in a simplified, statistical manner rather than being derived from detailed neuromuscular dynamics. The intermittent locomotion model, which generates alternating runs and pauses, relies on log-normal distributed stridechains but does not explicitly model neural mechanisms responsible for switching between movement states.

      We agree with this point. A fully mechanistic implementation of crawl-bend interference would require a detailed segmental neuromechanical model, which we deliberately refrained from integrating in order to keep the current study tractable and focused on a coarse-grained, kinematic-level description. Likewise, the intermittency module is currently based on data-fitted distributions of stridechains and pause durations, without explicit modeling of the neural mechanisms responsible for switching between these states. To our knowledge, these mechanisms remain unresolved, though alternative approaches have been suggested, for example, an artificial neural network model of intermittency (Sakagiannis et al., 2020). To ensure this limitation is transparent to the reader, we now explicitly state it in a newly added “Limitations of the study” subsection in the Discussion.

      We also highlight that the behavioral architecture is designed to be extensible, so that future work may incorporate such mechanistic models when available, while preserving the modular framework.

      I am curious about why the authors chose to model the mushroom body with much more realism than other modules.

      We clarified that this choice was not due to a bias in modeling depth, but to demonstrate the modularity and flexibility of the architecture. The mushroom body (MB) model we integrated was developed in our previous work as a biologically realistic spiking neural network. By incorporating it into the current framework, we show that models of very different abstraction levels – from simple statistical oscillators to detailed spiking networks – can coexist and interact under the same architecture. This rationale is now explicitly stated in the Discussion.

      Reviewer #2 (Recommendations for the authors):

      The manuscript from Sakagiannis et al. proposes a novel model for locomotion and foraging in Drosophila. Their ambition is to make a unified model that will incorporate distinct layers of complexity to describe and predict the locomotor behaviour of a larva, during exploration, chemotaxis and even learning. The paper fails in doing so, starting with a rather interesting exploratory model and becoming less and less convincing as it progresses, with thinner (chemotaxis) and thinner (learning) experimental and theoretical support. The model for chemotaxis is extremely simplified compared to the work of other laboratories. The associative learning paradigm is taken from another paper from the same research group and is not sufficiently explained. In its current form, the paper is of very limited theoretical and practical value. The analysis is insufficient to judge the overall quality and scalability of the model. It is hard to know if the model could be adopted by others in the larval community more widely in other animals. Would it be flexible and robust enough to be used to model other behavioural conditions?

      We appreciate this critical perspective. Our aim is not to present a final, fully parameterized model of all larval behaviors, but to introduce a flexible, modular behavioral architecture that integrates models at different levels of abstraction and can be expanded by the community. To support adoption, we have revised the manuscript to highlight the availability of the framework as a Python package (larvaworld), supplemented with documentation, tutorials, and code examples. This makes it easier for other researchers to reuse, extend, and test the architecture under additional behavioral conditions. We also explicitly refer to modeling studies that have adopted the proposed framework and the locomotory model itself.

      Below, we address the reviewer’s points layer by layer.

      (1) Exploratory behaviour. The strongest part of the paper. The authors propose a new method to analyse locomotion. They take into consideration the instantaneous linear and angular velocity. They assume the existence of two oscillators, which is really interesting. They incorporate the distribution of pauses duration and number of the strides. The incorporation of the strides is very exciting. They do not include handedness with has already been studied and incorporated in a mode for exploration they seem to have missed (Wosniack et al 2022). Figure 4 shows the dispersion. At first glance, it is very obvious that the model larvae do not behave like the animal. The distance they move from the centre is wider (Figure 4A). What is measured in dispersion (Figure 4B)? Just the distance travelled during 40s? A better measure of the similarities or differences between the model and real larvae would be interesting, such as analysing the Mean Square Displacement. Would the model be good if compared to the long-term exploratory behaviour from Sims et al. 2020, that the author previously used?

      The authors should convince the readers that their model is better, or at least as good than the ones already available.

      We thank the reviewer for these constructive suggestions. In the revised manuscript we now reference and discuss handedness, citing Wosniack et al. (2022, eLife), and highlight its potential role as an additional axis of individual variability. We also clarified the distance metrics used in Figure 4: dispersal denotes the Euclidean distance from the origin at the end of the trajectory, while pathlength denotes the cumulative distance travelled. Since larvae typically encounter the arena boundary within the first 40 seconds of exploration, dispersal is shown only over this interval.

      With respect to the reviewer’s suggestion of using mean-squared displacement (MSD), we now explicitly describe the relation between dispersal and MSD. Dispersal is an individual-level displacement measure from which population-level metrics such as MSD can be directly derived.

      Regarding long-term exploration, we agree that extended trajectories—as reported by Sims et al. (2020) over timescales of up to one hour—constitute a valuable complementary regime. Our experimental dataset is limited to 3-minute recordings in a bounded Petri dish, which constrains the accessible timescales of dispersal analysis. We now explicitly note in the Results that comparison to long-horizon datasets such as Sims et al. (2020) represents an important future direction that will require larger or unbounded arenas.

      Together, these revisions strengthen the presentation of the exploration results and clarify how our model relates to established statistical measures of larval foraging behaviour.

      (2) Chemotaxis. The chemotaxis model is so briefly explained in the result section that it is hard to understand. A modulation of the frequency and amplitude of lateral oscillator as a function of the concentration? The authors cannot differentiate between weathervaning and turning in this model (at least I can't understand how). What happened with the distribution of pauses and the directions of turns in Figure 5? The authors do not use real behavioural data to contract their model. How do we know that the parameters they have used reflect the larval behaviour? For example: what is the success rate for larvae to reach the area of high concentration? How close do they get? What is the length of the tracks from start to a target area of high concentration? Where are the calibration data for chemotaxis? This information is critical to understand the model, it needs to be shown in the result section. The authors mention an 8.9uM peak concentration. Of what?

      The model is oversimplified in comparison with Davies et al. 2015 and it is not clear at all how it reflects the real chemotaxis, which is a rather complex behaviour.

      We thank the reviewer for these detailed comments. In the revised manuscript we substantially expanded the description of the chemotaxis model. We now provide an explicit mathematical formulation of how odor concentration modulates the lateral oscillator through the quantity A<sub>0</sub>, which perturbs both the frequency and amplitude of bending according to the mechanism proposed by Wystrach et al. (2016). We additionally clarify that the motor layer - including the intermittency module and all parameters governing crawling, pausing, and turning - remains fully identical to the configuration calibrated on the exploration dataset; no refitting was performed for the chemotaxis condition.

      To address the reviewer’s question regarding the distinction between weathervaning and head casting, we now explain that both behaviours emerge naturally from the same coupled-oscillator structure via stride-phase–dependent crawl–bend interference. High-amplitude headcasts occur during pauses when crawl-induced attenuation is lifted, whereas low-amplitude weathervaning arises during runs when the interference is active.

      This unified mechanism eliminates the need for separate modules.

      The chemotaxis experiments were implemented to qualitatively replicate the behavioural patterns described in Gómez-Marín et al. (2011, Fig. 1A–1F), and we now include explicit figure references in the captions. Because the present implementation is a proof of concept rather than a quantitatively calibrated chemotaxis model, we do not report success rates, approach distances, or track-length statistics, as these depend strongly on odorscape geometry and calibration against quantitative single-animal datasets that were not available for the current work. This clarification has been added to the text and is stated explicitly again in the Limitations section.

      Finally, we now specify that the reported odor concentrations (e.g. 8.9,µM) follow the values used in Gómez-Marín et al. (2011), and we added the precise Gaussian function used to generate the odorscape in the Materials & Methods. Together, these revisions provide a clear and transparent account of the chemotaxis model and its scope.

      (3) Associative learning paradigm. I assume that the authors intended to incorporate a bias in chemotaxis behaviour towards a particular odorant (CS) that would have been associated with a reward food (US). However the model works slightly differently, it is represented by an aversive and an appetitive gradient.

      Theoretically, this is already an assumption (unless there is evidence for it, that should be referenced). It would be more conservative to have one neutral side and one appetitive (attractive) side. Second, the use of a mushroom body model, (even though it has already been published) to decide on the valence adds a layer of complexity that seems unnecessary. The learning process is different from the output process. Finally, the model intends to show us a "realist simulation of Drosophila locomotion" and we do not know how the larvae reach the right side during the test. It would be useful to have some comparison of the larval and model behaviour towards the preferred side.

      In this last section, the objective of the research unweaves and falls short of its ambition.

      We thank the reviewer for these helpful comments. In the revised manuscript we clarified that our implementation follows the standard larval conditioning protocol in which a rewarded odor (CS+) is tested against a neutral odor, not against an aversive one. The previously contradictory phrasing has been corrected, and the text now consistently reflects the established experimental procedure.

      We further explain that the mushroom body (MB) model is included not in order to increase biological complexity in this section, but to demonstrate the flexibility of the proposed behavioral architecture: detailed circuit models and more abstract motor modules can coexist under the same framework. The MB model implements associative plasticity independently of any behavioral simulation, and its output - a scalar odor valence - is transformed linearly into an odor-gain parameter that modulates turning during the test phase. This separation between learning and behavioral output mirrors the logic of the biological system while keeping the overall architecture modular.

      Regarding the reviewer’s request for insight into “how larvae reach the right side,” we note that standard group assays used in larval olfactory learning provide only population-level preference indices rather than detailed individual trajectories. Our comparison to empirical data therefore relies on these established preference indices, which the model successfully reproduces across training trials, including the characteristic saturation reported in Jürgensen et al. (2024). We now state explicitly that although the behavioral simulation does generate full trajectories for each virtual larva, the lack of corresponding experimental single-animal tracks precludes a direct trajectory-level comparison. This clarification has been added to the revised text.

      Together, we believe that these revisions improve clarity and better situate the learning simulations within both the behavioral architecture framework and the constraints of available experimental data.

      Reviewer #3 (Recommendations for the authors):

      Figure 1a is very dense and I am struggling with the terms "reactive" and "basic" due to a general lack of clarity about the details of the model organization. For example, why do all of the sensory inputs point to turning proprioception? Why is proprioception two different things for turning and crawling? Why are some senses in light green while olfaction is in dark green? Why is feedback only from feeding, when crawling, head casting, and turning will change the sensory environment as well? Why is head casting not a behavioral module here? Why focus on following/being constrained by the "subsumption architecture paradigm" over a focus on the known literature and neuroanatomy?

      We thank the reviewer for this careful inspection of Figure 1. In the revised version we improved both the figure and its caption, as well as the corresponding description in the text.

      Specifically:

      - The “basic” layer has been renamed the “motor” layer for clarity, and the caption has been expanded to better describe each component.

      - The sensory inputs are now shown to target the motor layer as a whole, rather than just the proprioceptive component of turning.

      - Each motor module is conceptualized as a sensorimotor loop (green-red), which explains why proprioception appears in both crawling and turning.

      - The color coding has also been clarified: modules used in the current simulations are shown in darker shades, while others are faded.

      - Sensory perturbations caused by body locomotion – as in the case of crawling and turning – are not depicted in the figure as feedback between modules. We make this more explicit in the caption. The signal from feeding to the above layers is neuromodulatory – as indicated by the purple arrowhead.

      Finally, we explain that head casting and weathervaning are not modeled as separate modules, since both behaviors emerge from the coupled oscillator mechanism through crawl-bend interference. Our adherence to the subsumption architecture paradigm is motivated by its success in robotics and its conceptual alignment with hierarchical sensorimotor loops, but we have now made clearer that this is a simplifying framework rather than a rigid constraint.

      "Stimulus free conditions" (line 102) don't really exist. Substrate and temperature will always be present, light will have some intensity, etc. Does this really refer to fictive behaviors?

      We thank the reviewer for raising this point. In the revised manuscript we have removed the term “stimulus-free conditions” entirely to avoid the misleading implication that larvae experience no sensory input. We now explicitly describe these experiments as free exploration in the absence of navigation-guiding gradients, which accurately reflects the laboratory assay while avoiding any suggestion of fictive behavior. This terminology has been updated consistently throughout the text.

      The first results section is closer to an introduction than the intro itself is, owing to its focus on the context of the work the paper actually does rather than a broad review of larval behaviors that are not considered within this work.

      We believe the reviewer is referring to the “Model” section rather than the “Results.” The Model section is deliberately separated to outline the theoretical background of the behavioral architecture and to make explicit the general modeling assumptions, which explains why it cites previous work in detail. By contrast, the Introduction is intended as a brief overview of the broader larval behavioral repertoire, since the larva serves here as the case study for our framework. Presenting this repertoire is important because it defines the behaviors that populate the different layers of the architecture, even if only a subset of them is implemented in the simulations presented in this study.

      While the model components are described in the modeling section, no question is actually discussed. What is the goal of this model?

      This broader question is addressed in the public review section

      "Crawler" and "turner" are inconsistently described. They are described as "modules" in Figure 1, but they seem more like behavioral primitives.

      The specific terms "crawler" and "turner" refer to the computational modules, but correctly the reviewer points out that these generate the respective “crawling” and “turning” behavioral primitives. This has been made explicit in the Materials & Methods.

      Do larva-larva interactions matter here?

      In the revised manuscript we now state explicitly that larva–larva interactions are not included in the present simulations, as each virtual larva is modeled independently in accordance with the single-animal datasets used for calibration. We also point the reader to the Limitations section, where we note that although social interactions lie outside the scope of this study, the Larvaworld software package already supports tactile sensing and collision handling, enabling such interactions to be incorporated in future work.

      The description of the locomotor system, with coupled oscillators between crawling frequency and bending is very empirical. Is this because of the 2-segment model effectively limiting peristalsis to a single segment? What are the limits of this approach?

      The stride-phase–dependent modulation of bending amplitude was identified through kinematic analysis of full 12-segment larval datasets and is therefore independent of our later decision to implement a two-segment simplification. This means that the empirical relationship we describe should hold for any multisegment model, regardless of the reduced representation used in the present implementation. Generally, we performed our detailed empirical analyses with the goal to uncover statistical relations, which in turn were use for our data-driven coupled oscillator model in combination with the stochastic element of stride-chain and pause duration.

      Line 190: The paper starts discussing experimental larva tracks. These experiments need to be described.

      The reviewer probably refers to the dataset analysed in this study. This is a public dataset as described in the Dataset Description section in Materials & Methods, along with a description of the experiment per se.

      The purpose of Figure 2 is not entirely clear. Several panels are not referenced in the text (F,G,H) and all panels are referenced extremely out of order. Figure 3 is similarly hard to follow for the same reasons of being referenced out of order. In fact, this section is largely duplicated by the "Model calibration" appendix, which I find to be much more clearly written and with more directly relevant figure panels.

      In the revised manuscript, all panels of Figures 2 and 3 are now cited in the correct order, and their roles in the narrative have been clarified. Figure 2 is explicitly presented as a summary of the empirical kinematic analyses that motivate the structure of the locomotory model, while Figure 3 illustrates the corresponding model components. To avoid redundancy with the “Model calibration” appendix, we streamlined the main text and replaced duplicated descriptions with cross-references to the appendix, which contains the full methodological detail.

      The data describe larvae behaving with a range of parameters, presumably both as individuals and across time. However, the models described seem to employ a population of larvae that shares a common best-fit parameter and the equations presented in the methods are all ordinary differential equations without noise or stochasticity. Where is the inter-individual variation coming from?

      The reviewer is correct to point out the importance of variability. Our approach is agent-based, and we model populations of non-identical individuals rather than replicates of a single average larva. The simulated larvae retain variability across several parameters, capturing the combined range observed in the data. This was described in the original manuscript, and to avoid possible misunderstandings, we have now expanded the “Inter-individual variability” section in the Materials & Methods and, where appropriate, clarified this point elsewhere in the text.

      The absolute orientation of trajectories in Figure 4A is not meaningful in your model. I suspect it would be more informative to show aligned trajectories in order to better visually assess the behavioral similarity. Also, the biological experiment needs to be described here. Time crawling seems to not be a great fit, although the peaks are fairly well aligned. Do you have thoughts on why this is?

      In Figure 4A, which is intended as a visual comparison between experimental and simulated trajectories, the experimental tracks were transposed so that all starting points coincide at the center of the arena. As the reviewer notes, they were not rotated to a common axis, since our subsequent analysis focuses on spatial dispersal rather than directional alignment. The description of the experimental dataset has been clarified in the revised text.

      The reviewer is also correct that the distribution of time spent crawling is narrower in the simulations than in the experimental data. This reflects the fact that in the present study only three crawling-related parameters were sampled to generate inter-individual variability, and time spent crawling was not among them. We deliberately chose to assess how well the model reproduces distributions for behavioral metrics that were not explicitly fitted or parameterized. This point has now been made explicit in the revised manuscript.

      How did you assess the agreement of chemotaxis results with Gomez-Martin et al? It would be useful for the comparison to be made explicit within this paper, as well. How were the chemotaxis parameters fit?

      The agreement between experimental and simulated chemotaxis was assessed only qualitatively, as we did not perform quantitative locomotor analyses on chemotaxis datasets. For these simulations we used the same motor layer, including all its modules, as calibrated in the free-exploration condition (Fig. 4). The only additional adjustment was a single weighting parameter that translates the appetitive or aversive valence of odor sources into modulatory input for the bending module. This parameter was tuned manually using a visual criterion of performance, to ensure that both attractive and aversive chemotaxis were observable. We now make explicit in the text that for more complex simulations we retain the calibration obtained in simpler conditions and build upon it, rather than re-optimizing the model. Moreover, we now provide reference to the exact figure numbers in Gomez-Martin et al. for direct visual comparison also of the perceived concentration metrics in our Figure 5E&F where experimental and simulated data show a very good correspondence.

      Similarly, what are the key parameters for the mushroom body model and how did you fit their relationship to behavior? Was there actually feedback between the behavior of the larva and the training or was the SNN only used to generate the odor gain constant?

      The reviewer is correct to highlight this point. In the present study the mushroom body model was simulated independently to generate the odor-specific behavioral bias. This output was then translated into an odor gain constant, which served as input for the subsequent behavioral simulations of odor preference. There was no closed-loop interaction between the larval behavior and the training of the spiking network in this version. Establishing such a closed-loop connection is part of our future goals.

      It is unclear where feeding (as introduced in Figure 1) entered into the work presented, if at all.

      The reviewer is correct that the feeding module does not play a role in the present study. It was included in the behavioral architecture for completeness and because it is already implemented in the larvaworld package (see Sakagiannis et al., 2024). We have clarified this in the revised text.

      "During pauses, the input to the crawler module I_c = 0 and therefore forward..." The equations presented for the crawler module do not contain I_c.

      The inconsistency regarding the crawler module input has also been corrected. The equations now explicitly include the tonic input parameter, making them consistent with the descriptive text and our model implementation.

      Larva do more than crawl forward, they can also hunch up, head cast with their head in the air, dig, crawl backward, roll, and other behaviors. Because the individual modules in this framework have been defined as coupled oscillators, how would you decide to implement such aspects? At what point does the oscillator approach break down? In this model, how does the larva decide whether to bend left or right, and how is that affected by the environment or internal state? Can a larva bend in the same direction twice in a row?

      The intermittent coupled-oscillator model presented here does not attempt to cover the full larval repertoire, such as hunching, digging, backward crawling, or rolling. Nor does it explicitly implement handedness as a directional bias. Nevertheless, the framework already allows for sequences of repeated turns: from a stationary position a larva can execute successive bends of varying amplitude, which may occur in the same direction, mimicking repeated head casts to one side.

      Extending the model to include additional locomotor primitives would require the development of new modules, which could expand the basic locomotor layer either alongside or in place of the current lateral oscillator module. As noted in the manuscript, the modules implemented here are not intended as definitive but as placeholders that demonstrate how the architecture can integrate more elaborate models in the future. In this context, future directions include introducing handedness as part of inter-individual variability and enriching the behavioral repertoire with additional modules to capture the broader range of larval actions.

      I was not able to install `larvaworld` either through pip in a fresh environment on OS X 15 and various Python versions between 3.8 and 3.12. I ran into a range of issues, from `tables` (which is understandable) to issues installing the old NumPy in Python 3.12 where `setuptools` is no longer included. The packaging should be made more robust, or the working environment could be better defined. For example, the version pinning of dependencies seems much more strict than I would expect for a user-focused Python library, particularly with out-of-date versions of core tools like NumPy.

      We thank the reviewer for going to length and testing the implementation and pointing these issues to us. We have recently updated the package (version 2.0.1, November 2025) to improve installation robustness, relaxed unnecessary dependency pinning, and provided an environment specification to facilitate reproducibility. The revised manuscript directs users to recently updated installation instructions.

      Automated testing for python versions 3.10-3.11 for MacOS, Windows and Ubuntu is already implemented. Unfortunately we have not yet tried it on OS X15. Please post any issues on the larvaworld’s github page : https://github.com/nawrotlab/larvaworld.

    1. eLife Assessment

      This important study combines behavioural psychophysics with image-computable modelling to test whether face recognition relies on view-selective or view-tolerant mechanisms. Although the diagnostic orientation content of faces varies with viewpoint (more horizontal for frontal views, more vertical for profiles), human recognition remains predominantly tuned to horizontal information, consistent with the predictions of a view-tolerant model. The evidence for view-tolerant tuning to horizontal orientations is compelling, although questions remain about the plausibility of the computations implemented in the view-tolerant model and how they map onto mechanisms of everyday face recognition.

    2. Reviewer #1 (Public review):

      Summary:

      The authors describe the results of a single study designed to investigate the extent to which horizontal orientation energy plays a key role in supporting view-invariant face recognition. The authors collected behavioral data from adult observers who were asked to complete an old/new face matching task by learning broad-spectrum faces (not orientation filtered) during a familiarization phase and subsequently trying to label filtered faces as previously seen or novel at test. This data revealed a clear bias favoring the use of horizontal orientation energy across viewpoint changes in the target images. The authors then compared different ideal observer models (cross-correlations between target and probe stimuli) to examine how this profile might be reflected in the image-level appearance of their filtered images. This revealed that a model looking for the best matching face within a viewpoint differed substantially from human data, exhibiting a vertical orientation bias for extreme profiles. However, a model forced to match targets to probes at different viewing angles exhibited a consistent horizontal bias in much the same manner as human observers.

      Strengths:

      I think the question is an important one: The horizontal orientation bias is a great example of a low-level image property being linked to high-level recognition outcomes and understanding the nature of that connection is important. I found the old/new task to be a straightforward task that was implemented ably and that has the benefit of being simple for participants to carry out and simple to analyze. I particularly appreciated that the authors chose to describe human data via a lower-dimensional model (their Gaussian fits to individual data) for further analysis. This was a nice way to express the nature of the tuning function favoring horizontal orientation bias in a way that makes key parameters explicit. Broadly speaking, I also thought that the model comparison they include between the view-selective and view-tolerant models was a great next step. This analysis has the potential to reveal some good insights into how this bias emerges and ask fine-grained questions about the parameters in their model fits to the behavioral data.

      Weaknesses:

      I'll start with what I think is the biggest difficulty I had with the paper. Much as I liked the model comparison analysis, I also don't quite know what to make of the view-tolerant model. As I understand the authors' description, the key feature of this model is that it does not get to compare target and probe at the same yaw angle, but must instead pick a best match from candidates that are at different yaws. While it is interesting to see that this leads to a very different orientation profile, it also isn't obvious to me why such a comparison would be reflective of what the visual system is probably doing. I can see that the view-specific model is more or less assuming something like an exemplar representation of each face: You have the opportunity to compare a new image to a whole library of viewpoints and presumably it isn't hard to start with some kind of first pass that identifies the best matching view first before trying to identify/match the individual in question. What I don't get about the view-tolerant model is that it seems almost like an anti-exemplar model: You specifically lack the best viewpoint in the library but have to make do with the other options. I sort of understand the reasoning that this enforces tolerance of viewpoint variability, but I'm not clear on whether or not this is a version of face familiarity and recognition that the authors think has an analog in human visual processing.

      I do think that this model is interesting in terms of the differential tuning it exhibits, but don't find it easy to align with any theoretical perspective on face recognition. Specifically, do the authors think there is a stage of face processing in which tolerance as they've operationalized it in the model is extant? What I'm looking for is a concrete description of the circumstances that the authors are saying lead to this kind of model potentially being a meaningful analog of face recognition. For example, is the idea that one may become familiar with a face in some very limited set of viewpoints and then be presented with that face in other views?

      Alternatively, if the authors prefer to say that they simply thought this was a nice exercise in terms of identifying a different model and that it may not be a meaningful proxy for face recognition. I think that's fine, to be clear! I just still don't see anything in the text that convinces me of the ecological validity of this version of view-tolerance.

    3. Reviewer #2 (Public review):

      This study investigates the visual information that is used for the recognition of faces. This is an important question in vision research and is critical for social interactions more generally. The authors ask whether our ability to recognise faces, across different viewpoints, varies as a function of the orientation information available in the image. Consistent with previous findings from this group and others, they find that horizontally filtered faces were recognised better than vertically filtered faces. Next, they probe the mechanism underlying this pattern of data by designing two model observers. The first was optimised for faces at a specific viewpoint (view-selective). The second was generalised across viewpoints (view-tolerant). In contrast to the human data, the view-specific model shows that the information that is useful for identity judgements varies according to viewpoint. For example, frontal face identities are again optimally discriminated with horizontal orientation information, but profiles are optimally discriminated with more vertical orientation information. These findings show human face recognition is biased toward horizontal orientation information, even though this may be suboptimal for the recognition of profile views of the face.

      One issue in the design of this study was the lowering of the signal-to-noise ratio in the view-selective observer. This decision was taken to avoid ceiling effects. However, it is not clear how this affects the similarity with the human observers.

      Another issue is the decision to normalise image energy across orientations and viewpoints. I can see the logic in wanting to control for these effects, but this does reflect natural variation in image properties. So, again, I wonder what the results would look like without this step.

      Despite the bias toward horizontal orientations in human observers, there were some differences in the orientation preference at each viewpoint. For example, frontal faces were biased to horizontal (90 deg) but other viewpoints had biases that were slightly off horizontal (e.g. right profile: 80 deg, left profile: 100 deg). This does seem to show that differences in statistical information at different viewpoints (more horizontal information for frontal and more vertical information for profile) do influence human perception. It would be good to reflect on this nuance in the data.

      Comments on revisions:

      I am happy with the response and changes to the comments in my review. The key findings from this study are: (1) that there is bias toward the use of horizontal information across all viewpoints for face recognition in humans using an old-new recognition task. (2) In contrast, the optimal information for matching faces varies as a function of viewpoint. The view-selective model shows horizontal information is dominant for frontal views and vertical information is dominant for profile views.

      The data from the view-tolerant model is less easy to interpret as it doesn't fit with any theoretically plausible model of face recognition. It might be a useful model for a face matching task in which participants had to match unfamiliar faces across viewpoints. This might be a possible extension of the current work.

      Nonetheless, I still think this is an interesting contribution to the literature.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors describe the results of a single study designed to investigate the extent to which horizontal orientation energy plays a key role in supporting view-invariant face recognition. The authors collected behavioral data from adult observers who were asked to complete an old/new face matching task by learning broad-spectrum faces (not orientation filtered) during a familiarization phase and subsequently trying to label filtered faces as previously seen or novel at test. This data revealed a clear bias favoring the use of horizontal orientation energy across viewpoint changes in the target images. The authors then compared different ideal observer models (cross-correlations between target and probe stimuli) to examine how this profile might be reflected in the image-level appearance of their filtered images. This revealed that a model looking for the best matching face within a viewpoint differed substantially from human data, exhibiting a vertical orientation bias for extreme profiles. However, a model forced to match targets to probes at different viewing angles exhibited a consistent horizontal bias in much the same manner as human observers.

      Strengths:

      I think the question is an important one: The horizontal orientation bias is a great example of a low-level image property being linked to high-level recognition outcomes, and understanding the nature of that connection is important. I found the old/new task to be a straightforward task that was implemented ably and that has the benefit of being simple for participants to carry out and simple to analyze. I particularly appreciated that the authors chose to describe human data via a lower-dimensional model (their Gaussian fits to individual data) for further analysis. This was a nice way to express the nature of the tuning function, favoring horizontal orientation bias in a way that makes key parameters explicit. Broadly speaking, I also thought that the model comparison they include between the view-selective and view-tolerant models was a great next step. This analysis has the potential to reveal some good insights into how this bias emerges and ask fine-grained questions about the parameters in their model fits to the behavioral data.

      Weaknesses:

      I will start with what I think is the biggest difficulty I had with the paper. Much as I liked the model comparison analysis, I also don't quite know what to make of the view-tolerant model. As I understand the authors' description, the key feature of this model is that it does not get to compare the target and probe at the same yaw angle, but must instead pick a best match from candidates that are at different yaws. While it is interesting to see that this leads to a very different orientation profile, it also isn't obvious to me why such a comparison would be reflective of what the visual system is probably doing. I can see that the view-specific model is more or less assuming something like an exemplar representation of each face: You have the opportunity to compare a new image to a whole library of viewpoints, and presumably it isn't hard to start with some kind of first pass that identifies the best matching view first before trying to identify/match the individual in question. What I don't get about the view-tolerant model is that it seems almost like an anti-exemplar model: You specifically lack the best viewpoint in the library but have to make do with the other options. Again, this is sort of interesting and the very different behavior of the model is neat to discuss, but it doesn't seem easy to align with any theoretical perspective on face recognition. My thinking here is that it might be useful to consider an additional alternate model that doesn't specifically exclude the best-matching viewpoint, but perhaps condenses appearance across views into something like a prototype. I could even see an argument for something like the yaw-averages presented earlier in the manuscript as the basis for such a model, but this might be too much of a stretch. Overall, what I'd like to see is some kind of alternate model that incorporates the existence of the best-match viewpoint somehow, but without the explicit exemplar structure of the view-specific model.

      The design of the view-tolerant model aligned with the requirements of tolerant recognition and revealed the stimulus information enabling to abstract identity away from variations in face appearance. However, it did not involve the notion that such ability may depend on a prototype or summary representation of face identity built up through varied encounters (Burton, Jenkins and Schweinberger 2011, Jenkins, White et al. 2011, Mike Burton 2013, Burton, Kramer et al. 2016, Menon, Kemp and White 2018).

      We agree with the Reviewer that the average of the different views of a face is a good proxy of its central tendency (i.e., stable identity properties; Figure 1). We thus followed their suggestion and included an additional model observer that compared specific views to full-spectrum view-averaged identities. The examination of the orientation tuning profile of this so-called view-average model observer confirmed the crucial contribution of horizontal identity cues to view-invariant recognition as the horizontal range best predicted the average summary of full-spectrum face appearances across views. This additional model observer is now presented in the Discussion and Supplementary files 2 and 3.

      Besides this larger issue, I would also like to see some more details about the nature of the cross-correlation that is the basis for this model comparison. I mostly think I get what is happening, but I think the authors could expand more on the nature of their noise model to make more explicit what is happening before these cross-correlations are taken. I infer that there is a noise-addition step to get them off the ceiling, but I felt that I had to read between the lines a bit to determine this.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers: ‘Sensitivity d’ of the view-tolerant model was much lower than view-selective model and human sensitivity (Supplementary File 2), even without noise. The view-tolerant model therefore processed fully visible stimuli (SNR of 1). This decreased sensitivity in the view-tolerant compared to the view-selective model is expected, as none of the probes exactly matched the target at the pixel level due to viewpoint differences. In contrast to humans who rely on internally stored representations to match identity across views, the model observer lacks such internal representations and entirely relies on (less efficient) pixelwise comparisons.’

      Another thing that I think is worth considering and commenting on is the stimuli themselves and the extent to which this may limit the outcomes of their behavioral task. The use of the 3D laser-scanned faces has some obvious advantages, but also (I think) removes the possibility for pigmentation to contribute to recognition, removes the contribution of varying illumination and expression to appearance variability, and perhaps presents observers with more homogeneous faces than one typically has to worry about. I don't think these negate the current results, but I'd like the authors to expand on their discussion of these factors, particularly pigmentation. Naively, surface color and texture seem like they could offer diagnostic cues to identity that don't rely so critically on horizontal orientations, so removing these may mean that horizontal bias is particularly evident when face shape is the critical cue for recognition.

      Our stimuli were originally designed by Troje and Bulthoff (1996). These are 3D laser scans of white individuals aged between 20 and 40 years, posing with a neutral expression. Different views of the faces were shot under a fixed illumination. Ears and a small portion of the neck were visible while the hair region was removed. All face images had a normalized skin color and we further converted them to grayscales

      While we agree that this stimulus set offers a restricted range of within- and between-identity variations compared to what is experienced in natural settings, we believe that the present findings generalize to more ecological viewing conditions. Indeed, past evidence showed that the recognition of face pictures shot under largely variable pose, age, expression, illumination, hair style is tuned to the horizontal range of the face stimulus (Dakin and Watt 2009, Dumont, Roux-Sibilon and Goffaux 2024). In other words, our finding that view-tolerant identity recognition is mainly driven by horizontal face information would likely replicate with the use of a more ecological stimulus set.

      Moreover, the skin color normalization and grayscale conversion, while limiting the range of face variability, did not eliminate the contribution of surface pigmentation in our study. It is thus unlikely that our findings exclusively reflect the orientation dependence of face shape processing. Pigmentation refers to all surface reflectance properties (Russell, Sinha et al. 2006) and hue (color) is only one among others. The grayscaled 3D laser scanned faces used here contained natural variations in crucial surface cues such as skin albedo (i.e., how light or dark the surface appears) and texture (i.e., spatial variation in how light is reflected); they have actually been used to disentangle the role of shape and surface cues to identity recognition (e.g., Troje and Bulthoff 1996, Vuong, Peissig et al. 2005, Russell, Sinha et al. 2006, Russell, Biederman et al. 2007, Jiang, Dricot et al. 2009). Moreover, a past study of ours demonstrated that the diagnosticity of the horizontal range of face information is not restricted to face shape cues; the specialized processing of face shape and surface both selectively rely on horizontal information (Dumont, Roux-Sibilon and Goffaux 2024).

      For these reasons, the present findings are unlikely to be fully determined by shape processing, and we expect them to generalize to more ecological stimulus sets. We discuss these aspects in the revised manuscript.

      Reviewer #2 (Public review):

      This study investigates the visual information that is used for the recognition of faces. This is an important question in vision research and is critical for social interactions more generally. The authors ask whether our ability to recognise faces, across different viewpoints, varies as a function of the orientation information available in the image. Consistent with previous findings from this group and others, they find that horizontally filtered faces were recognised better than vertically filtered faces. Next, they probe the mechanism underlying this pattern of data by designing two model observers. The first was optimised for faces at a specific viewpoint (view-selective). The second was generalised across viewpoints (view-tolerant). In contrast to the human data, the view-specific model shows that the information that is useful for identity judgements varies according to viewpoint. For example, frontal face identities are again optimally discriminated with horizontal orientation information, but profiles are optimally discriminated with more vertical orientation information. These findings show human face recognition is biased toward horizontal orientation information, even though this may be suboptimal for the recognition of profile views of the face.

      One issue in the design of this study was the lowering of the signal-to-noise ratio in the view-selective observer. This decision was taken to avoid ceiling effects. However, it is not clear how this affects the similarity with the human observers.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers.

      Another issue is the decision to normalise image energy across orientations and viewpoints. I can see the logic in wanting to control for these effects, but this does reflect natural variation in image properties. So, again, I wonder what the results would look like without this step.

      All stimuli were matched for luminance and contrast. It is crucial to normalize image energy across orientations as natural image energy is disproportionately distributed across orientations (e.g., Hansen, Essock et al. 2003). Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Keil 2008, Keil 2009, Goffaux and Greenwood 2016). If not normalized after orientation filtering, such uneven distribution of energy would boost recognition performance in the horizontal range across views. Normalization was performed across our experimental conditions merely to avoid energy from explaining the influence of viewpoint on the orientation tuning profile.

      We were not aware of any systematic natural variations of energy across face views. To address this, we measured face average energy (i.e., RMS contrast) in the original stimulus set, i.e., before the application of any image processing or manipulation. Background pixels were excluded from these image analyses. Across yaws, we found energy to range between .11 and .14 on a 0 to 1 grayscale. This is moderate compared to the range of energy variations we measured across identities (from .08 to .18). This suggests that variations in energy across viewpoints are moderate compared to variations related to identity. It is unclear whether these observations are specific to our stimulus set or whether they are generalizable to faces we encounter in everyday life. They, however, indicate that RMS contrast did not substantially vary across views in the present study and suggest that RMS normalization is unlikely to have affected the influence of viewpoint on recognition performance.

      In the revised methods section, we explicitly motivate energy normalization: ‘Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Goffaux, 2019; Goffaux & Greenwood, 2016; Keil, 2009). Across yaws, we found face energy to range between .11 and .14 on a 0 to 1 grayscale, which is moderate compared to the range of face energy variations we measured across identities (from .08 to .18). To prevent energy from explaining our results, in all images, the luminance and RMS contrast of the face pixels were fixed to 0.55 and 0.15, respectively, and background pixels were uniformly set to 0.55. The percentage of clipped pixel values (below 0 or above 1) per image did not exceed 3%.’.

      Despite the bias toward horizontal orientations in human observers, there were some differences in the orientation preference at each viewpoint. For example, frontal faces were biased to horizontal (90 degrees), but other viewpoints had biases that were slightly off horizontal (e.g., right profile: 80 degrees, left profile: 100 degrees). This does seem to show that differences in statistical information at different viewpoints (more horizontal information for frontal and more vertical information for profile) do influence human perception. It would be good to reflect on this nuance in the data.

      Indeed, human performance data indicates that while identity recognition remains tuned to horizontal information, horizontal tuning peak shows some variation across viewpoints. We primarily focused on the first aspect because of its direct relevance to our research objective, but also discussed the second aspect: with yaw rotation, certain non-horizontal morphological features such as the jaw line or nose bridge, etc. may increasingly contribute to identity recognition, whereas at frontal or near frontal views, features are mostly horizontally-oriented (e.g., Keil 2008, Keil 2009). In the revised Discussion, we directly relate the modest fluctuations of peak location to yaw differences in face feature appearance.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Based on a discussion with the reviewers, we integrated the recommendations and reached a consensus on the eLife assessment. To move from a "solid" to a "compelling/convincing" strength-of-evidence rating, please address the reviewers' comments. Key points are to clarify and test the plausibility of the models (e.g., effects of different noise-addition steps, inclusion/exclusion of specific orientation channels in the view-dependent comparison, and alternative decision criteria), and to address or discuss the limitations of the stimulus set in capturing recognition under more naturalistic scenarios, for example, including texture cues.

      Reviewer #1 (Recommendations for the authors):

      I generally found the paper to be very well-written, so I have only a few minor comments here.

      (1) I didn't really follow why the estimation of the Gaussian functions described in the text was preferred over a simpler ML framework. Do these approaches differ that much? I see references to prior studies in which these were applied, so I can certainly go check these out, but I could see value in adding just a bit of text to briefly make the case that this is important.

      Employing a simpler linear framework, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze. The interaction term would almost certainly reach significance but its interpretation would be limited. We would either have to rely on numerous local comparisons, which are not particularly informative for our research objectives (e.g., knowing whether d’ differs significantly between two adjacent orientations at a given viewpoint is of little relevance), or to use a polynomial contrast approach (testing the linear, quadratic, … up to the 7th order trends), which would also be difficult to interpret. For such complex, approximately Gaussian-shaped data, the highest-order polynomial trend would likely provide the best fit, but without offering meaningful insight.

      In contrast, a nonlinear approach appears more appropriate. The Gaussian model we used allows us to characterize the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation (or bandwidth) and base amplitude. These parameters are not merely statistical parameters. Rather, they are directly interpretable in cognitive/functional terms. The peak location corresponds to the orientation at which the Gaussian curve is centred, i.e. the preferred orientation band for identity recognition. The standard deviation represents the width of the curve, reflecting the strength or selectivity of the tuning. The base amplitude is the height of the Gaussian curve base, indicating the minimum level of sensitivity, typically found near vertical orientation. Finally, the peak amplitude refers to the height of the Gaussian curve relative to its baseline, that is, it captures the advantage of horizontal over vertical orientations.

      Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin and Watt 2009, Goffaux and Greenwood 2016). Orientation selectivity at primary stages of visual processing has also been modelled using Gaussian (or Difference of Gaussians; Ringach, Hawken and Shapley 2003).

      We revised the data analysis section to include a justification for our use of a Gaussian model: ‘Therefore, fitting the human sensitivity data could be fitted using a simple Gaussian model. seemed most appropriate as it allows characterizing the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation and base amplitude, which are directly interpretable in cognitive/functional terms. Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin & Watt, 2009; Goffaux & Greenwood, 2016). Simpler frameworks, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze and interpret.’

      (2) When reporting the luminance and contrast of your stimuli, please make clear what these units and measures are. This was a case where I had to take a second to assure myself that I knew what the values meant.

      We clarified that the luminance and contrast values reported in the manuscript are on a grey scale ranging from 0 to 1.

      (3) In your Procedure section, I think describing the familiarization task right away would help the text flow more clearly. At present, you began talking about the old/new task, and I was immediately wondering how familiarization worked!

      The procedure section now starts with the description of the familiarization task.

      (4) p. 3 - "Culminates" doesn't seem like the right word here.

      We agree and rephrased this way: ‘The tolerance of face identity recognition is stronger for familiar than unfamiliar faces’.

      (5) p. 5 - I think "with the multiple" shouldn't have "the".

      Indeed, we removed the “the”.

      Reviewer #2 (Recommendations for the authors):

      I enjoyed reading the manuscript, but thought the Introduction was a bit long. I wasn't sure about the relevance of the section on temporal contiguity. I think this might have been more relevant if this had been a manipulation in the design. So, I wonder if this might be shortened or removed to focus on the key questions. On the other hand, I found the overview of the view-selective and view-tolerant to be a bit brief. There is plenty of detail here, but I found it difficult to break down what was done when I first read it. It might be good to provide an overview in the Discussion too.

      While past research on the contribution of temporal contiguity to face identity recognition brings interesting insights into the nature of the visual experience leading to view-tolerant performance, we agree with the Reviewer that this aspect is not directly at stake here. We reduced the review of this literature in the Introduction. We clarified the description of the model observers as suggested by the reviewer and made sure to provide an overview of the model observers in the Discussion as well.

      References.

      Burton, A. M., R. Jenkins and S. R. Schweinberger (2011). "Mental representations of familiar faces." Br J Psychol 102(4): 943-958.

      Burton, A. M., R. S. Kramer, K. L. Ritchie and R. Jenkins (2016). "Identity From Variation: Representations of Faces Derived From Multiple Instances." Cogn Sci 40(1): 202-223.

      Dakin, S. C. and R. J. Watt (2009). "Biological "bar codes" in human faces." J Vis 9(4): 2 1-10.

      Dumont, H., A. Roux-Sibilon and V. Goffaux (2024). "Horizontal face information is the main gateway to the shape and surface cues to familiar face identity." PLoS One 19(10): e0311225.

      Goffaux, V. and J. A. Greenwood (2016). "The orientation selectivity of face identification." Scientific Reports 6(34204): 34204.

      Hansen, B. C., E. A. Essock, Y. Zheng and J. K. DeFord (2003). "Perceptual anisotropies in visual processing and their relation to natural image statistics." Network 14(3): 501-526.

      Jenkins, R., D. White, X. Van Montfort and A. Mike Burton (2011). "Variability in photos of the same face." Cognition 121(3): 313-323.

      Jiang, F., L. Dricot, V. Blanz, R. Goebel and B. Rossion (2009). "Neural correlates of shape and surface reflectance information in individual faces." Neuroscience 163(4): 1078-1091.

      Keil, M. S. (2008). "Does face image statistics predict a preferred spatial frequency for human face processing?" Proc Biol Sci 275(1647): 2095-2100.

      Keil, M. S. (2009). ""I look in your eyes, honey": internal face features induce spatial frequency preference for human face processing." PLoS Comput Biol 5(3): e1000329.

      Menon, N., R. I. Kemp and D. White (2018). "More than a sum of parts: robust face recognition by integrating variation." R Soc Open Sci 5(5): 172381.

      Mike Burton, A. (2013). "Why has research in face recognition progressed so slowly? The importance of variability." Q J Exp Psychol (Hove) 66(8): 1467-1485.

      Ringach, D. L., M. J. Hawken and R. Shapley (2003). "Dynamics of orientation tuning in macaque V1: the role of global and tuned suppression." Journal of neurophysiology 90(1): 342-352.

      Russell, R., I. Biederman, M. Nederhouser and P. Sinha (2007). "The utility of surface reflectance for the recognition of upright and inverted faces." Vision Res 47(2): 157-165.

      Russell, R., P. Sinha, I. Biederman and M. Nederhouser (2006). "Is pigmentation important for face recognition? Evidence from contrast negation." Perception 35(6): 749-759.

      Troje, N. F. and H. H. Bulthoff (1996). "Face recognition under varying poses: the role of texture and shape." Vision Res 36(12): 1761-1771.

      Vuong, Q. C., J. J. Peissig, M. C. Harrison and M. J. Tarr (2005). "The role of surface pigmentation for recognition revealed by contrast reversal in faces and Greebles." Vision Res 45(10): 1213-1223.

    1. eLife Assessment

      This Review Article provides a compendium of advice for MD-PhD students to consider when deciding which, if any, clinical field they will select for residency training. It is grounded in published data and effectively considers factors including the potential for clinical disciplines to sustain research integration, provide mentorship, meet lifestyle expectations, and foster a long-term career as a research-focused physician-scientist.

    2. Reviewer #1 (Public review):

      Summary:

      This brief piece by Swartz and colleagues outlines the complexities surrounding the choice of clinical specialty for physician-scientists. It is, in general, clear and well-written, and it will be useful to research-oriented medical students choosing a path and to the mentors who are guiding them.

      Strengths:

      The writing is clear. The points made are not profound, but they are important and will be of use to the intended audience.

      Weaknesses:

      I have only minor suggestions for improvement. There are some areas of redundancy where the article could be tightened up by consolidating.

    3. Reviewer #2 (Public review):

      Summary:

      This article is a useful compendium of advice for MD/PhD students (and research-focused MD students) to consider when it is time to decide on a clinical field for residency training. The authors are a distinguished group of physician-scientists and program directors who are drawing on published data and their own experience as mentors to provide advice and resources to students about to make what can be a career-defining choice. It makes an effective argument for considering important differences between clinical fields in their ability to sustain research integration, provide mentorship, meet lifestyle expectations, and foster a long-term career as a research-focused physician-scientist.

      Strengths:

      (1) A lot has been written about physician-scientists as an endangered species. Given the important role that physician-scientists can play if they engage in research that is informed by experience in patient care, not nearly enough has been written about the choices that students make during training that can keep them on track or throw them off.

      (2) The article provides not only general advice, but specific information in the 2 tables that can help trainees to weigh their priorities and consider their options.

      (3) Among the best advice is to weigh clinical demands, maintenance of procedural skills, recognition of the impact of research time on salary, and the impact of high salaries on the tension between research effort and clinical effort in clinical departments, which is where most physician-scientists in academia are employed.

      Areas for potential improvement:

      (1) Some of the most useful pieces of advice are scattered through the text when they might be more impactful if focused. For example, what are the 4 or 5 most essential factors that someone in an MD/PhD or an MD program should weigh when they are deciding between clinical disciplines? There are also published data on the experience of past graduates in achieving a research-focused career in each clinical discipline. How should that data be applied by trainees? What are the factors that should be weighed in deciding where to work as a research-focused physician once training has been completed?

      (2) Some clinical fields at academic institutions have proved to be much more hospitable to careers as research-focused physicians than others. Published data highlight the challenges. I believe the authors have tried very hard to present a balanced perspective, but in the process, they have, I believe, missed an opportunity to guide trainees and make them aware of what they should look for to avoid making a decision that may prove incompatible with their long-term goals.

      (3) An issue that hasn't been raised: Where will be the jobs for physician-scientists who have an MD {plus minus} PhD and want to do research and discovery? How many openings will there be for physician-scientists in academia 5-10 years from now? In industry? How are recent events in Washington affecting the continuation of those jobs? Unfortunately, I am not aware of labor statistics for physician-scientists, but perhaps the authors can find them.

      (4) Additional questions that can be raised and addressed in the article: Should one of the "smart choices" in the article's title be where you do the residency, and not just which residency you do? How important is it to be at a successful, research-intensive medical center/university, both during and after residency and fellowship training? If being in an institution where there are numerous very successful physician-scientists and scientists improves the likelihood of being able to sustain a physician-scientist career, how should graduating students improve their chances of being at one of those institutions?

      (5) In every clinical discipline, there are departments that value physician-scientists more than other departments and invest accordingly. What advice would the authors give to help graduating students identify those departments?

    4. Author response:

      Thank you for the valuable feedback. We will be updating the manuscript to incorporate the reviewers' terrific suggestions. We specifically have:

      • Reduced redundancy and streamlined overlapping sections (especially around research alignment, protected time, and clinical demands)

      • Made the core decision-making framework more explicit and easier to extract (in a new Table 1, with clearer synthesis in the text)

      • Strengthened the emphasis on institutional/program context as a key determinant of success—arguably as important as specialty choice

      • Added more actionable guidance for trainees on how to evaluate departments (e.g., NIH Reporter, T32 presence, R01 density, K→R track record)

      • Included a slightly more explicit statement acknowledging that while all specialties can support physician-scientist careers, the structural ease varies and may require different levels of negotiation/support

      We did not address the broader workforce/job market question, since it feels outside the scope.

    1. eLife Assessment

      This valuable paper provides convincing evidence that humans can navigate better through maps whose local transitions were learned in an intermixed order than maps whose local transitions were learned in neighboring groups. The authors put forward a potential mechanism in which the grouped learning resulted in mental fragmentation, though evidence for this mechanism is incomplete. The work will be of interest to researchers studying cognitive maps and curriculum learning.

    2. Reviewer #1 (Public review):

      This paper investigates how different learning curricula influence the way that humans piece together directly experienced transitions into a broader cognitive map. When adjacent learning trials were grouped within rows or columns of the map, subsequent navigation through the map was weaker than when adjacent learning trials came from disjoint spaces in the map. The authors speculate that the grouped curriculum resulted in mental fragmentation that made navigation across space more difficult later on.

      This is an interesting paradigm that contributes useful new findings in the domain of map learning to the growing literature on curriculum learning. The evidence for a difference between conditions is highly compelling, but, as the authors are very transparent in acknowledging in the Discussion, the evidence for their proposed mechanism - mental fragmentation under grouped learning - is somewhat weak. The study thus presents an intriguing empirical result but not an ironclad mechanistic account.

      An alternative - by their account, "less interesting" - explanation is that grouped learning was easier because trials in close succession had overlapping elements, and so participants were not trying as hard or as engaged. There is a literature on spaced (as opposed to massed) learning being better for subsequent memory because it increases retrieval effort. It seems very plausible that this could be going on here, and the control experiment reported in the supplement would not help to rule this out. This literature deserves some discussion.

      The Introduction focuses entirely on literature showing advantages in grouped over intermixed learning, setting that up as the most well-motivated expectation from the literature. Upon finding the opposite, the Discussion then mentions that interleaving has been found to be useful in "applied domains", but then returns to how surprising this is in light of recent findings in the category learning literature. But there is a substantial earlier literature on interleaved vs blocked curricula in category learning, very often finding advantages for interleaving. See, e.g., Carvalho & Goldstone, 2015, for a review. There is also a paper showing interleaving advantages in associative inference, Zhou et al., 2023, JEP:G, which is very relevant to several of the discussion section paragraphs. Thus, the treatment of the prior curriculum learning literature is currently sparse.

    3. Reviewer #2 (Public review):

      I think this paper is an excellent and timely contribution. It clearly shows that learning overlapping relationships in a disjoint training schedule (where the overlaps are not encountered close together in time) appears to aid the formation of an integrated associative memory structure (a cognitive map) and supports generalisation. I believe the methods are sound and the results are clear. I only have a couple of methodological questions that may not warrant any changes to the paper (or only very minor changes/additions):

      (1) The mixed effects models did not include random slopes for the within-subject factors ("spatial manipulation" and "block"), and so the corresponding fixed effect inferences may be unsafe. Having said that, it is likely that including these slopes may not be warranted given their contribution to the model's fit. I recommend that the authors check this.

      (2) The mixed effects models for accuracy appear to model average performance across trials rather than using a generalised linear model with a (e.g.) logit link function and the binomial distribution to characterise performance. I think this is a little sub-optimal, as the latter is often more sensitive. Nonetheless, it is not in any way wrong; the results are clear enough as is, and there may be a good reason to avoid a non-linear link function, which can alter the interpretation of effects close to the ceiling and floor.

      I think the introduction and/or discussion would benefit from contrasting their results with Berens & Bird (2022, PLOS Comp Bio). In this paper, it is shown that blocking the training of discriminations in a linear hierarchy (what we call progressive training) substantially benefited transitive inference performance. This seems at odds with the author's finding that "participants struggle to integrate information across rows and columns, i.e. across groups of transitions that were trained separately in time".

      I would really like to know what the authors think about this discrepancy (or, indeed, whether they think there is one at all). Is it possibly because "progressive" learning is some combination of "grouping", "blocking" and "chaining" (where there is a structured overlap between adjacently trained relationships)? Or is it something else, e.g., that there is a fundamental difference between learning associations and discriminations (personally, I lean on this explanation)?

      Relevant to this, the authors note that their "findings do contradict recent reports from the category learning literature, where blocking seems to help learning and generalisation (Dekker et al., 2022; Flesch et al., 2018; Noh et al., 2016). It may be that where the goal is not to learn a complex knowledge structure - like a map - but simply to compress exemplars by mapping them onto a smaller number of labels - the benefits of blocking emerge." However, the benefit of progressive (blocked) training in my own work was observed in a task that required learning a complex/relational structure in the form of a transitive hierarchy, which theoretical accounts suggest depends on learning map-like representations (Whittington et al., 2020).

    4. Reviewer #3 (Public review):

      Summary:

      This study examines how training regimes influence the formation of cognitive maps. Participants learned two relational maps over three days through pairwise transitions: one map was trained with grouped sequences that followed rows or columns, while the other was trained with disjoint transitions sampled randomly across the map. In addition, the study manipulated the temporal spacing of training blocks (blocked vs. semi-blocked) and tested whether the results generalized across two map geometries (a 5×5 grid and a 4×4 torus).

      Furthermore, they run a follow-up experiment (or condition) testing rows and columns shuffled in the grouped condition.

      While grouped training produced better performance during learning, the authors report that disjoint training led to superior performance at test on tasks probing the global map knowledge.

      Summarising experimental design:

      (1) Map geometry (between-subjects): 5×5 grid vs 4×4 torus

      (2) Training block schedule (between-subjects): Blocked vs Semi-blocked

      (3) Training regime/transition sampling (within-subject): Grouped or Disjoint (Day 1 and Day 2)

      Strengths:

      The study addresses a clear and timely theoretical question about how the training regime affects the formation of cognitive maps. A further strength is the well-controlled experimental design, allowing the authors to test their hypotheses in a systematic and informative way.

      Weaknesses:

      (1) If I understood correctly, participants learned one map on the first day and the other on the second day, with the training regime (grouped vs. disjoint) counterbalanced across maps. This raises the possibility that experience with one training regime on day one could influence performance on the second day. For example, it would be interesting to examine whether participants who experienced the disjoint regime first showed any differences when learning the grouped regime on the following day. While it may be difficult to fully disentangle such transfer effects from the main training regime effects, it would be informative to test whether performance on the second day depends on the regime experienced on the first day (e.g., whether prior exposure to the disjoint regime predicts performance on the subsequent grouped training, but not vice versa).

      (2) The author mentions a control experiment. Did the participants in the control experiment complete only the training phase or also the testing tasks used in the main experiment? If testing was included, it would be informative to report whether performance at test was comparable to that observed in the main experiment. Given that this condition appears to involve blocked transitions while moving across both rows and columns, I would expect performance to fall somewhere between the grouped and disjoint conditions.

      (3) Participants' performance did not differ between conditions in the map reconstruction task, suggesting that participants in both the grouped and disjoint regimes were ultimately able to form a cognitive map. Was this task always administered last during the testing session? I wonder whether the explicit request of the reconstruction task could have influenced participants' awareness of the map structure.

      (4) The manuscript describes the study as consisting of four experiments (two groups per map shape, differing in the blocked versus semi-blocked schedule). However, based on the design described in the Methods, this appears more accurately characterized as a single experiment with two between factors: map geometry (grid vs. torus) and blocking schedule (blocked vs. semi-blocked) manipulated between participants, and training regime (grouped vs. disjoint) manipulated within participants.

      (5) It is not entirely clear to me from the Results section whether performance at test differed between the two map geometries (grid and torus), or whether the reported effects of training regime were consistent across them.

    1. eLife Assessment

      The authors combined human assembloids, fetal brain tissue, bulk and single cell RNA sequencing, and live imaging to understand the molecular mechanisms affected by hypoxia during cortical development. The findings are very important to the neurodevelopmental field, They reveal new insights into how migration of cortical interneurons can be affected in hypoxic conditions, and provide exciting models to probe broad neurodevelopmental processes in health and disease. The evidence is compelling. The data and analyses are very rigorous and go beyond the state-of-the-art.

    2. Reviewer #1 (Public review):

      Summary:

      This work aims to elucidate the molecular mechanisms affected in hypoxic conditions causing reduced cortical interneuron migration. They use human assembloids as a migratory assay of subpallial interneurons into cortical organoids and show substantially reduced migration upon 24 hours hypoxia. Bulk and scRNA-seq shows adrenomedullin (ADM) up-regulation, as well as its receptor RAMP2 confirmed at protein level. Adding ADM to the culture medium after hypoxic conditions rescues the migration deficits, even though the subtype of interneurons affected is not examined. However, the authors demonstrate very clearly that ineffective ADM does not rescue the phenotype and blocking RAMP2 also interferes with the rescue. The authors are also applauded for using 4 different cell lines and using human fetal cortex slices as an independent method to explore the DLXi1/2GFP-labelled iPSC-derived interneuron migration in this substrate with and without ADM addition (after confirming that also in this system ADM is up-regulated). Finally, the authors demonstrate PKA - CREB signalling mediating the effect of ADM addition, and also lead to up-regulation of GABAreceptors. Taken together this is a very carefully done study on an important subject - how hypoxia affects cortical interneuron migration. In my view it would be of great interest for the readers of Elife.

      Strengths:

      Its strengths are the novelty and the thorough work using several culture methods and 4 independent lines.

      Weaknesses:

      The main weakness is that we dont know which interneuron subtypes are most affected by hypoxia and which may be rescued in their migration by ADM.

      A further weakness is that the few genes confirmed to be regulated after hypoxia do not help determining which statistical cut-off can be considered reliable, given that they didn't compare strongly regulated versus weakly regulated genes.

      Comments on revisions:

      Unfortunately, the authors did not address my suggestions. While they show example stainings of interneuron subtypes, they do not show if Calretinin, calbinin or somatostatin+ interneurons are differentially affected by hypoxia or the rescue with ADM. I still consider this an important piece of information to add.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Puno and colleagues investigates the impact of hypoxia on cortical interneuron migration and downstream signaling pathways. They establish two models to test hypoxia, cortical forebrain assembloids and primary human fetal brain tissue. Both of these models provide a robust assay for interneuron migration. In addition, they find that ADM signaling mediates the migration deficits and rescue using exogenous ADM. The findings are novel and very interesting to the neurodevelopmental field, revealing new insights into how cortical interneurons migrate and as well, establishing exciting models for future studies.The authors use sufficient iPSC lines including both XX and XY, so analysis is robust. In addition, the RNAseq data with re-oxygenation is a nice control to see what genes are changed specifically due to hypoxia. Further, the overall level of valiation of the sequencing data and involvement of ADM signaling is convincing, including the validation of ADM at the protein level. Overall this is a very nice manuscript. I have a few comments and suggestions for the authors.

      Strengths/Weaknesses:

      (1) Can they comment on the possibility of inflammatory response pathways being activated by hypoxia - has this been shown before? While not the focus of the manuscript, it would be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      (2) Can they comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms at place in ventral vs dorsal areas.

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Fig 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      (4) Can the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known but was not discussed.

      (6) In the Discussion section - it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might results in, in terms of functional consequences for neural circuit development

      Comments on revisions:

      The authors have addressed my comments thoroughly. I have no further comments or suggestions

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to test whether hypoxia disrupts the migration of human cortical interneurons, a process long suspected to underlie brain injury in preterm infants but previously inaccessible for direct study. Using human forebrain assembloids and ex vivo developing brain tissue, they visualized and quantified interneuron migration under hypoxic conditions, identified molecular components of the response, and explored the effect of pharmacological intervention (specifically ADM) on restoring the migration deficits.

      Strengths:

      The major strength of this study lies in its use of human forebrain assembloids and ex vivo prenatal brain tissue, which provide a direct system to study interneuron migration under hypoxic conditions. The authors combine multiple approaches: long-term live imaging to directly visualize interneuron migration, bulk and single-cell transcriptomics to identify hypoxia-induced molecular responses, pharmacological rescue experiments with ADM to establish therapeutic potential, and mechanistic assays implicating the cAMP/PKA/pCREB pathway and GABA receptor expression in mediating the effect. Together, this rigorous and multifaceted strategy convincingly demonstrates that hypoxia disrupts interneuron migration and that ADM can restore this defect through defined molecular mechanisms.

      Overall, the authors achieve their stated aims, and the results strongly support their conclusions. The work has significant impact by providing the first direct evidence of hypoxia-induced interneuron migration deficits in the human context, while also nominating a candidate therapeutic avenue. Beyond the specific findings, the methodological platform-particularly the combination of assembloids and live imaging-will be broadly useful to the community for probing neurodevelopmental processes in health and disease.

      Comments on revisions:

      The authors have fully addressed my concerns by incorporating the relevant discussion into the manuscript, especially regarding how well the migration observed in hSO-hCO assembloids reflects in vivo condition. I have no further comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #2 (Public review): 

      Weaknesses:

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      We thank the reviewer for reviewing our manuscript and for the important comment about inflammation. Indeed, hypoxia has been shown to activate the inflammatory response pathways. In various studies, it was found that HIF-1a can interact with NF-κB signaling, leading to the upregulation of pro-inflammatory cytokines such as IL-1β, IL-6, and TNF-α (Rius et al., Cell, 2008; Hagberg et al., Nat Rev Neurol, 2015).

      In our transcriptomics data (Fig. 2D), and to the reviewers’ point, we identified enrichment of inflammatory signaling response following the hypoxic exposure. Since hSO at the time of analyses do contain some astrocytes, we think these contribute to the observed pro-inflammatory changes and emphasize the feasibility of capturing this response in organoids in vitro. This is also important because ADM is known to have anti-inflammatory properties and should be investigated as such in future studies focused on hypoxia-induced inflammation.

      In the manuscript, we included a few sentences in the discussion to address the lack of in-depth analyses of inflammation as a limitation of our study.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      Based on our scRNA-seq data in hSOs showing significant upregulation of ADM expression in astrocytes and progenitors, and increased expression of RAMP2 receptors on neurons, we speculate that the primary mechanism is likely to involve paracrine interactions. However, we cannot exclude autocrine mechanisms with the current experiments. Dissecting these interactions in a cell-type specific manner could be an important focus for future ADM-related studies.

      To address the question about the possible different mechanisms in ventral versus dorsal areas, in the revision, we plotted and included in the figures the data about the cell-type expression of ADM and its receptors in hCOs (Fig. S3)

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      We thank the reviewer for this comment and the observation. Although we did not include a traditional positive control in these ELISA assays, several lines of evidence indicate that the measurements are reliable. First, the standard curves behaved as expected, and all sample values fell within the assay’s dynamic range. Second, technical replicates showed low variability, and the observed changes across experimental conditions (e.g., hypoxia vs. control) were consistent with the expected biological responses based on previous literature. We agree that including western blot validation would strengthen the findings, and we will note this for our future studies focused on CREB and ADM.

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      We appreciate the reviewers’ insightful question. Currently, not much is known about the molecular pathways and downstream cellular events triggered by ADM binding to RAMP2 in inhibitory neurons, and in general in brain cells. The data from our study brings the first information about the cell-type specific expression of ADM in baseline and hypoxic conditions and is one of the key novelties of our study.

      While the signaling landscape of ADM in interneurons is largely unexplored, several studies in other (non-brain) cell types have demonstrated that ADM binding to RAMP2 can activate downstream cascades such as the cAMP/PKA/CREB pathway, PI3K/AKT, and ERK/MAPK, all of which are also known to be critical regulators of neuronal development and survival. These previously published data along with our CREB-targeted findings in hypoxic interneurons, suggest ADM–RAMP2 signaling could influence multiple aspects of interneuron biology, but these remain to be evaluated in future studies.

      We agree with the reviewer that CREB has a wide range of transcriptional targets. We decided to focus on GABA as a target of CREB for two main reasons, including: (i) GABA signaling has been previously shown to play an important role in the migration of cortical interneurons, and (ii) a previous study by Birey et al. (Cell Stem Cell, 2022) demonstrated that CREB pathway activity is essential for regulating interneuron migration in assembloid models of Timothy Syndrome, thus further providing evidence that dysregulation of CREB activity disrupts migration dynamics.

      While our study provides a first step toward uncovering the mechanisms of interneuron migration protection by ADM, we fully acknowledge that future work will be needed to delineate the full spectrum of ADM–RAMP2 downstream signaling events in inhibitory neurons and other brain cells.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      We appreciate this question from the reviewer; however, this was not something that we focused on in this manuscript due to the already large amount of data included. A separate study focusing on neurogenesis defects and the molecular mechanisms of injury for that specific developmental process would be an important next step.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

      We thank the Reviewer for the suggestion of detailing the functional impact of reduced inhibitory neuron migration. The manuscript to discuss that previous studies show that failure of interneurons to migrate and reach their designated targets within the appropriate developmental window leads to their elimination through apoptosis. Decreased numbers (or abnormal development) of interneurons are associated with neurodevelopmental impairments and abnormal functional connectivity in the brain.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should examine if all cortical interneurons are affected by ADM or only subtypes (Parvalbumin/Somatostatin).

      We thank the reviewer for raising this important question. In our study, we utilized the Dlx1/2b::eGFP reporter to broadly label cortical interneurons; however, this system does not distinguish specific interneuron subtypes. To address this, in the manuscript we used the single-cell RNA sequencing data and immunostainings to provide this information. As expected based on our previous reports, most cortical interneurons present in organoids are represented by calretinin (CALB2), somatostatin (SST) and calbindin (CALB1). These data are now presented in Fig. S3.

      Separately, we used available scRNA-seq data from developing human brain and showed that at ~20 PCW, the developing human brain has similar types of cortical interneurons. These data are now included in Fig. S5.

      (2) The authors should test more candidates from their bulk RNA-seq data with different fold changes for regulation after hypoxia, to allow the reader to judge at which cut-off the DEGs may be reproducible. This would make this database much more valuable for the field of hypoxia research.

      We appreciate the reviewers’ thoughtful suggestion. In addition to the bulk RNA-seq analysis, we did validate several upregulated hypoxia-responsive genes with varying fold changes by qPCR; these include PDK1, PFKP, VEGFA (Fig. S1).

      We do agree that in-depth investigation of specific cut-offs would be interesting, however, this could be the focus of a different manuscript.

      Reviewer #3 (Recommendations for the authors):

      Most of the evidence presented is convincing in supporting the conclusions, and I have only minor suggestions for improvement:

      (1) The bulk RNA-seq was performed in hSOs only, which may not fully capture the phenotypes of migrating or migrated interneurons. It would be valuable, if feasible, to sort migrated cells from hSO-hCO assembloids and specifically examine their molecular mediators.

      We thank the reviewer for this suggestion. While it is likely that the cellular environment will have some influence on a subset of the molecular changes, based on all the data from the manuscript and our specific target, the RNA-sequencing on hSOs was sufficient to capture essential changes like ADM upregulation. The in-depth exploration on differential responses of migrated versus non-migrated interneurons to hypoxia could be the focus of a different project.

      (2) In Figure 3, it is striking that cell-type heterogeneity dominates over hypoxia vs. control conditions. A joint embedding of hSO and hCO cells could provide further insight into molecular differences between migrated and non-migrated interneurons.

      We thank the reviewer for this observation and opportunity to clarify. Since we manually separated the assembloids before the analyses, we processed these samples separately. That is why they separate like this. In the revision, we added data about ADM expression and its receptors’ expression in the hCOs.

      (3) It would be helpful to expand the discussion on how closely the migration observed in hSO-hCO assembloids reflects in vivo conditions, and what environmental aspects are absent from this model. This would better frame the interpretation and translational relevance of the findings.

      We thank the Reviewer for bringing up this important point. Although the assembloid model offers the unique advantage of allowing the direct investigation of migration patterns of hypoxic interneurons, we fully agree it does not fully recapitulate the in vivo environment. While there are multiple aspects that cannot be recapitulated in vitro at this time (e.g. cellular complexity, vasculature, immune response, etc), we are encouraged by the validation of our main findings in ex vivo developing human brain tissue, which strongly supports the validity of our findings for in vivo conditions.

      We expanded our discussion to include more details and the need to validate these findings using in vivo models.

      (4) The authors suggest that hypoxia is also associated with delayed interneuron maturation, yet the bulk RNA-seq data primarily reveal stress and hypoxia-related genes. A more detailed discussion of why genes linked to interneuron maturation and function were not strongly affected would clarify this point.

      We thank the Reviewer for the opportunity to clarify.

      The RNAseq data was performed during the acute stages of hypoxia/reoxygenation and we think a maturation phenotype might be difficult to capture at this point and would require analysis at later in vitro assembloid maturation stages.

      Our speculation about a possible maturation defect is based on data from previous studies from developmental biology that showed failure of interneurons to reach their final cortical location within a specified developmental window will impair their integration within the neuronal network, and thus lead to maturation defects and possible elimination by apoptosis.

      Since preterm infants suffer from countless hypoxic events over multiple months, we speculate these repetitive events are likely to induce cumulative delays in migration, inability of interneurons to reach their target in time, followed by abnormal integration within the excitatory network, and eventual elimination of some of these interneurons through apoptosis. However, the direct demonstration of this effect following a hypoxic insult would require prolonged in vivo experiments in rodents to follow the migration, network integration and apoptosis of interneurons; to our knowledge this experimental design is not technically feasible at this time, and thus this hypothesis remains speculative and only included in the discussion.

      (5) Relatedly, while the focus on interneuron migration is well justified, acknowledging how hypoxia might also impact other aspects of cortical development (e.g., progenitor proliferation, neuronal maturation, or circuit integration) would place the findings in a broader developmental framework and strengthen their relevance.

      We appreciate the Reviewer’s suggestion to discuss the role of hypoxia on other interneuron developmental processes during cortical development. In the manuscript, we included text in the discussion about the likely effects of hypoxia on interneuron proliferation, maturation and circuit integration.

      (6) Very minor: in Figure S3C and D, it was not stated what the colors mean (grey: control, yellow: hypoxia)

      Thank you for pointing out this error; we corrected it in our revision.

    1. eLife Assessment

      This valuable study proposes a novel rapid-entry mechanism for Staphylococcus aureus, involving the rapid release of calcium from lysosomes. The paper's strength lies in its very interesting hypothesis. The methods used are solid and adequately support the conclusions.

    2. Reviewer #2 (Public review):

      [Editors' note: This version was assessed by the editors. The authors have addressed a point raised by Reviewer #2, who thought the authors compared cells grown in low-serum and high serum conditions. This has been clarified in the latest version.]

      In the manuscript Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake.

      A key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype / conditional phenotype for genetic knock out is a major weakness.

      In the previous version, the authors perform experiments with ASM KO cells to provide genetic evidence of the role for ASM in S. aureus entry through lysosomal modulation.