6,561 Matching Annotations
  1. Last 7 days
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors report the results of a tDCS brain stimulation study (verum vs sham stimulation of left DLPFC; between-subjects) in 46 participants, using an intense stimulation protocol over 2 weeks, combined with an experience-sampling approach, plus follow-up measures after 6 months.

      Strengths:

      The authors are studying a relevant and interesting research question using an intriguing design, following participants quite intensely over time and even at a follow-up time point. The use of an experience-sampling approach is another strength of the work.

      Weaknesses:

      There are quite a few weaknesses, some related to the actual study and some more strongly related to the reporting about the study in the manuscript. The concerns are listed roughly in the order in which they appear in the manuscript.

      We truly appreciate your dedicating time and efforts to review our manuscript. Yes, we do perceive that those weaknesses you raised all make sense. We agree with you on almost all the suggestions that you detailed below, particularly in clarifying statistics and sample size determination. Please see specific responses below.

      Major Comments

      (1) In the introduction, the authors present procrastination nearly as if it were the most relevant and problematic issue there is in psychology. Surely, procrastination is a relevant and study-worthy topic, but that is also true if it is presented in more modest (and appropriate) terms. The manuscript mentions that procrastination is a main cause of psychopathology and bodily disease. These claims could possibly be described as 'sensationalized'. Also, the studies to support these claims seem to report associations, not causal mechanisms, as is implied in the manuscript.

      Thank you for this very practical suggestion. We agree that the current statements to underline the importance of procrastination are somewhat overreaching. Upon revision, we have overall toned down such claims by explicitly stating them as “associative evidence”, and rewritten a portion of terms in a more modest and balanced style. Please see specific revisions in the main text below:

      Introduction Section (Page 5, Line 64-81)

      “Procrastination is increasingly becoming a prevalent behavioral problem around the world, which reflects the irrational voluntary postponement of scheduled tasks albeit being worse off for such delays (Blake, 2019; Steel, 2007). In the epidemiological investigations, more than 15% of adults were identified as having chronic procrastination problems, and the situation for students was worse as 70-80% of undergraduates engaged in procrastination (American College Health Association, 2022; Ferrari et al., 2005). Moreover, the behavioral genetic evidence indicates a certain heritability of procrastination in human beings as well (Gustavson et al., 2017; Gustavson et al., 2014, 2015). In addition to its prevalence, the undesirable associations between procrastination behavior and health also warrant cautions. There is cumulative evidence to show the close associations between procrastination behavior and working performance, financial status, interpersonal relationships, and subjective well-being (Ferrari, 1994; Pychyl & Sirois, 2016; Steel et al., 2021). Further, as the prospective cohort studies indicated, many mental health problems emerge alongside procrastination, particularly in sleep problems, depression, and anxiety (Hairston & Shpitalni, 2016; Johansson et al., 2023). Even worse, chronic procrastination behavior has been observed to impair general health, as manifested by the intimate associations with close system disruption, gastrointestinal disturbance, as well as a high risk of hypertension and cardiovascular disease (Sirois, 2015; Sirois, 2016). ... ”

      (2) It is laudable that the study was pre-registered; however, the cited OSF repository cannot be accessed and therefore, the OSF materials cannot be used to (a) check the preregistration or to (b) fill in the gaps and uncertainties about the exact analyses the authors conducted (this is important because the description of the analyses is insufficiently detailed and it is often unclear how they analyzed the data).

      We are sorry to encounter a serious technical barrier making our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account (please see the screenshot below). This results in no access to all materials already deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report. We reckon that this may be triggered by my affiliation change to the Third Military Medical University of the People’s Liberation Army (PLA).

      To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” into the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the whole revised manuscript. Furthermore, we fully understand the gaps of comprehending the statistics of this study, resulting from inadequate methodological details in the reporting. Therefore, we have clearly reported extensive details in the Methods section to clarify how to conduct those analyses, favoring the smooth evaluations of our conclusions. Please see what we have added in the lines below (Comments #4-9).

      Methods Section (Page 5, Line 186-191)

      “This study fully adhered to CONSORT reporting guidelines, and was originally preregistered in the OSF repository (10.17605/OSF.IO/Y3EDT). However, due to the technical constraint related to OSF account service (see SM), this OSF page is no longer accessible. For transparency and best practices of open science, based on the original protocol documentations, a preregistration statement has been reconstructed to clarify aprior hypotheses, sample size determinations, and analysis plans for this study (Table S1).”

      (3) Related to the previous point: I find it impossible to check the analyses with respect to their appropriateness because too little detail and/or explanation is given. Therefore, I find it impossible to evaluate whether the conclusions are valid and warranted.

      Again, we apologize for confusing you because of inadequate statistical and methodological details. As you may know, this manuscript has ever been reviewed by Nature Human Behaviour, which editorially constrained the paper length. Thus, a substantial number of details had to be omitted or removed. As you kindly suggested, we have diligently added extensive descriptions to clarify how we carried out statistical analyses in the present study. Please see specific instances underneath.

      (4) Why is a medium effect size chosen for the a priori power analysis? Is it reasonable to assume a medium effect size? This should be discussed/motivated. Related: 18 participants for a medium effect size in a between-subjects design strikes me as implausibly low; even for a within-subjects design, it would appear low (but perhaps I am just not fully understanding the details of the power analysis).

      Thank you for raising this crucial question. We have determined this a priori effect size based on the existing work we published previously (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In our pilot study (Xu et al., 2023), we identified a significant interaction effect between the single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in the laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori. To clarify, we have explicitly justified the selection of this effect size in the Methods section.

      Methods Section (Page 5, Line 206-215)

      “A full randomized block design was used to assign participants to both groups (active neuromodulation group, NM; sham-control group, SC) (see Fig. 2C). As the pilot study probing into the effect of single-session tDCS stimulation to change procrastination willingness indicated (t = 2.38, p = .02, 95% CI [0.14, 1.49]; Xu et al., 2023), statistical power was predetermined by G*Power at a relatively medium effect size (1-β err prob = 0.80, f = 0.25), yielding the total sample size at 18 to reach acceptable power (see SM Methods and Fig. S1)....”

      We fully understand that this sample size to reach a medium effect size is seemingly low, and that the18 participants for each group are apparently limited in any case. Upon double-checking these power analyses, we confirmed that this sample size requirement is indeed correct. Please see the G*Power outputs in Author response image 1.

      Author response image 1.

      Despite the absence of algorithmic errors in the power analysis here, we are aware that this limited sample size may hamper statistical robustness. To tackle this weakness, we have clearly warranted such cautions in the Limitation section:

      Limitations Section (Page 12, Line 637-640)

      “... In addition to technical limitations, given the apparently limited size of the sample (total N = 46), it warrants caution in generalizing these findings elsewhere, and necessitates further validations in a large-scale cohort.”

      (5) It remains somewhat ambiguous whether the sham group had the same number of stimulation sessions as the verum stimulation group; please clarify: Did both groups come in the same number of times into the lab? I.e., were all procedures identical except whether the stimulation was verum or sham?

      Yes, we fully followed the CONSORT pipeline to carry out this double-blind trial, and thus confirmed that all the participants in both groups had the same number of stimulation sessions in our lab. That is to say, except for the stimulation type (verum vs sham), all the procedures, equipment and even the room were identical for all the participants. For clarification, we have clearly stated this in the main text:

      Results Section (Page 9, Line 419-423)

      “In both groups, almost all participants (93.2%, 41/44) reported perceiving acceptable pain stemming from current stimulation, and believed they were receiving treatment (91.30% (21/23) for active neuromodulation group (NM), 86.95% (20/23) for sham control group (SC), x<sup>2</sup> = 0.224, p = .636). All the participants were engaged in the identical experimental procedures excepting to stimulation’s type (active vs sham). ...”

      (6) The TDM analysis and hyperbolic discounting approach were unclear to me; this needs to be described in more detail, otherwise it cannot be evaluated.

      We apologize for the inadequate details, which hindered a precise understanding of the TDM and the hyperbolic discounting model. The Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations to take away from playing actions now for avoiding negative experiences). Once task aversiveness overrides the pursuit of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). Considering the nonlinear dynamics inherent in this hyperbolic discounting, we therefore employed a log-spaced temporal sampling scheme (Myerson et al., 2001) to strengthen curve-fitting performance (please see the schematic diagram (https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time)):

      Specifically, based on the log-spaced temporal sampling rule, five time points were first selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampling occurred at 10:00, 16:00, 18:00, 19:30, 20:00). At each time point, participants reported task aversiveness (A) on a 0–100 Visual Analog Scale (VAS). Then, task aversiveness discounting was calculated as 1- (A<sub>t</sub> / A<sub>earliest</sub>), where t<sub>earliest</sub> was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from these five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed as the trapezoidal integration of task aversiveness discounting over time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination. As you kindly suggested, we have added these details to explicitly clarify how to use the hyperbolic discounting approach for determining sampling time points and for calculating AUC of task aversiveness discounting.

      Methods Section (Page 6, Line 268-283)

      “On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives when performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a priori by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting, requiring ≥ 4 points (Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure.”

      Methods Section (Page 7, Line 318-334)

      “... As articulated temporal decision theoretical model above, the task aversiveness evoked by executing a task was temporally dynamic in a hyperbolic discounting pattern, with sharply discounting in faring away from deadline but slowly discounting in nearing deadline (Zhang & Feng, 2020). To quantitatively characterize the task aversiveness with consideration for its dynamics, the model-free area under the curve (AUC) was calculated. Specifically, based on the log-spaced temporal sampling rule, task aversiveness was measured by 100-point visual analog scale at the five sampling moments. Then, the task aversiveness discounting (A) was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point, serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), the AUC was computed as the trapezoidal integration between task aversiveness discounting and time across five data points, basing on the Myerson algorithm (Myerson et al., 2001). By doing so, a higher AUC reflects stronger temporal discounting of task aversiveness along with nearing deadline, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. As for the task outcome value, it was theoretically posited as a relatively stable evaluation of the task (Zhang & Feng, 2020; Zhang et al., 2021).”

      References

      Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235

      Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312

      Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053

      Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492

      Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643

      (7) Coming back to the point about the statistical analyses not being described in enough detail: One important example of this is the inclusion of random slopes in their mixed-effects model which is unclear. This is highly relevant as omission of random slopes has been repeatedly shown that it can lead to extremely inflated Type 1 errors (e.g., inflating Type 1 errors by a factor of then, e.g., a significant p value of .05 might be obtained when the true p value is .5). Thus, if indeed random slopes have been omitted, then it is possible that significant effects are significant only due to inflated Type 1 error. Without more information about the models, this cannot be ruled out.

      Thank you for sharing this very timely and crucial comment. After careful scrutiny, we identified this statistical flaw you pointed out - each participant was not yet modeled as random slopes but as random intercepts merely. As you kindly suggested, we have reanalyzed all the statistics by adding random slopes (i.e., (1 + day|SubjectID)). Results showed a statistically significant interaction effect for both procrastination willingness (β = -7.8, SE = 1.8, DF = 45.6, p < .001) and actual procrastination rates (β = -7.4, SE = 2.4, DF = 46.6, p = .004), indicating the effectiveness of multi-session neuromodulation in mitigating procrastination. In the post-hoc simple effect analyses, participants who engaged in active neuromodulation (NM) showed a significant increase in task-execution willingness (i.e., decreased procrastination willingness; NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction) and a decrease in actual procrastination rates (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), while no such effects were identified for participants in the sham control group (for willingness, SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction; for actual procrastination, SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction). Taken together, we do appreciate your pointing out this definitely crucial statistical weakness, and have confirmed that our findings remain reliable after adjusting for Type 1 error by adding random slopes. Moreover, as you kindly suggested, we have incorporated these statistical details, particularly those concerning the GLMM, into the main text to facilitate your evaluation. Please see specific revisions below:

      Methods Section (Page 8, Line 381-401)

      “To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test....”

      Results Section (Page 9, Line 428-449)

      “To identify whether ms-tDCS targeting the left DLPFC can alleviate subjective procrastination willingness and actual procrastination behavior, a generalized linear mixed-effects model with Scatterthwaite algorithm was built, with task-execution willingness and actual procrastination rates (PR) as primary outcomes, respectively. For procrastination willingness, results showed a statistically significant interaction effect between multi-session neuromodulations and groups (β = -7.8, SE = 1.8, DF = 45.6, p < .001; Fig. 3A). In the post-hoc simple effect analysis, it demonstrated a significantly increased task-execution willingness (i.e., decreased procrastination willingness) after neuromodulation in the active neuromodulation group (NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction), but no such effects were identified in the sham control group (SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction) (Fig. 3B-C). A linear uptrend for task-execution willingness was further observed across multiple sessions in the active NM group, indicating gradually increasing neuromodulation effects (Fig. 3D; p < .01, Mann-Kendall test). For actual procrastination behavior, changes to actual procrastination rates across all the sessions have been detailed in the Fig. 3E. Similarly, a statistically significant interaction effect was identified here (β = -7.4, SE = 2.4, DF = 46.6, p = .004), and the simple effect analysis further revealed decreased actual procrastination rates after ms-tDCS in the active neuromodulation group (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), but no such prominent changes found in the sham control group (SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction) (Fig. 3F-G). Also, a significant downtrend for procrastination rates across all the sessions was identified in the active NM group (Fig. 3H; p < .01, Mann-Kendall test).”

      (8) Related to the previous point: The authors report, for example, on the first results page, line 420, an F-test as F(1, 269). This means the test has 269 residual degrees of freedom despite a sample size of about 50 participants. This likely suggests that relevant random slopes for this test were omitted, meaning that this statistical test likely suffers from inflated Type 1 error, and the reported p-value < .001 might be severely inflated. If that is the case, each observation was treated as independent instead of accounting for the nestedness of data within participants. The authors should check this carefully for this and all other statistical tests using mixed-effects models.

      Thank you for underlining this very timely and helpful comment. As you correctly pointed out above, we did not include random slopes in the original GLMM, highly risking the inflation of the false-positive rate (i.e., Type-I error). By adding the random slopes, we reanalyzed all the statistics from the GLMM, and confirmed that all the findings are still reliable from those new GLMMs with random slopes. Again, thank you for this crucial statistical advice, and please see the above response for full details regarding what we have revised to address this comment you kindly raised.

      (9) Many of the statistical procedures seem quite complex and hard to follow. If the results are indeed so robust as they are presented to be, would it make sense to use simpler analysis approaches (perhaps in addition to the complex ones) that are easier for the average reader to understand and comprehend?

      We do thank you for this practical and helpful comment. In the original manuscript, we incorporated a joint model of longitudinal and survival data (JM-LSD), in conjunction with machine learning algorithms, to strengthen the robustness of our statistical findings. Nevertheless, we all agree with you on this point: there is no need to complicate the analyses by repeatedly probing the same research question to increase methodological robustness, at the expense of compromising readability and intelligibility for a broader audience. As you suggested, we have removed these complicated statistical methods, and merely maintained the primary ones - GLMM and X<sup>2</sup> cross-tab test, as well as a complementary one - Mann-Kendall linear trend test. Thus, we have almost rewritten the whole Results section. Please see the specific instances below:

      Results Section (Page 9, Line 468-485)

      “Ms-tDCS changes task aversiveness and task-outcome value

      Both task aversiveness and task outcome value serve as key pathways determining whether one would procrastinate. To this end, we further utilized a generalized linear mixed-effects model to examine the effects of ms-tDCS on changes in task aversiveness and task outcome value. Task aversiveness changes across all the sessions are shown in the Fig. 4A and 4C. We demonstrated a statistically significant decrease in task aversiveness and an increase in task outcome value via ms-tDCS in the neuromodulation group (Task aversiveness: interaction effect, β = -0.12, SE = 0.04, DF = 46.7, p = .002; simple effect, NM-before <sub>(AUC)</sub>: 1.13 ± 0.53, NM-after <sub>(AUC)</sub>: 1.95 ± 0.85, t.ratio = 4.5, p < .001, Tukey correction; Outcome value: β = -6.8, SE = 1.74, DF = 46.2, p < .001; simple effect, NM-before: 35.86 ± 27.82, NM-after: 73.08 ± 23.33, t.ratio = 5.0, p < .001, Tukey correction; see Fig. 4B), but not in the sham control group (Task aversiveness: SC-before <sub>(AUC)</sub>: 1.07 ± 0.51, SC-after <sub>(AUC)</sub>: 1.28 ± 0.46, t.ratio = 1.3, p = .20, Tukey correction; Outcome value: SC-before: 34.00 ± 25.17, SC-after: 40.13 ± 28.94, t.ratio = 0.8, p = .41, Tukey correction; see Fig. 4D). In the neuromodulation (NM) group, task aversiveness steadily decreased with the cumulative number of stimulation sessions, while perceived task outcome value increased significantly (see Fig. 4E-F, p < .05, Mann-Kendall test). Thus, it provides causal evidence clarifying that neuromodulation to left DLPFC reduces task aversiveness and enhances task-outcome value meanwhile.”

      Results Section (Page 10, Line 525-542)

      “Long-term effects of ms-tDCS

      We have also attempted to conduct a follow-up investigation to test the long-term retention of ms-tDCS in reducing actual procrastination. Almost all the participants had undergone follow-up except one in the neuromodulation group after last neuromodulation for 6 months (N<sub>NM</sub> = 22, N<sub>SC</sub> = 23). Thus, the GLMM was constructed, with the PR before first neuromodulation vs. PR after last neuromodulation for 6 months as covariates of interest. Results showed the statistically significant group*time interaction effects (β = 16.5, SE = 9.9, p = .049). Simple-effect model demonstrated a decrease in actual procrastination rates in the active neuromodulation group after last stimulation for 6 months compared to baseline (β = -22.05, SE = 10.0, p = .038, Tukey correction; NM-before: 40.68 ± 37.96, NM-after<sub>6-months</sub>: 18.63 ± 29.80), and revealed null effects in the SC group (β = 1.26, SE = 9.78, p = .99, Tukey correction; SC-before: 46.47 ± 40.75, SC-after<sub>6-months</sub>: 47.73 ± 39.18) (see Fig. 6).. Furthermore, using a nonparametric x<sup>2</sup> test to compare differences in the number of procrastinated tasks, we still found a statistically significant reduction in procrastination frequency in NM group after neuromodulation for 6 months compared to baseline (x<sup>2</sup> = 3.30, p = .035, NM-before: 68.19% (15/22), NM-after<sub>6-months</sub>: 40.91% (9/22)), while no significant changes were observed in the SC group (x<sup>2</sup> = 0.11, p = .74, SC-before: 69.56% (16/23), SC-after<sub>6-months</sub>: 73.91% (17/23)). Therefore, beyond to short-term effects, the benefits of ms-tDCS neuromodulation to reduce procrastination pose the long-term retention.”

      (10) As was noted by an earlier reviewer, the paper reports nearly exclusively about the role of the left DLPFC, while there is also work that demonstrates the role of the right DLPFC in self-control. A more balanced presentation of the relevant scientific literature would be desirable.

      We are grateful to you for noticing the unbalanced presentation of the literature on left DLPFC. As you kindly suggested, we have added literature to support the association between self-control and the right lateralization of the DLPFC. Please see below for what we have revised:

      Introduction Section (Page 4, Line 137-143)

      “...In addition to the left lateralization, there is solid evidence indicating significant associations between self-control and the right DLPFC indeed, particularly given that this region specifically functions in top-down regulation, future self-continuity representation and social decisions (Huang et al., 2025; Lin and Feng, 2024; Knoch & Fehr, 2007). Despite this case, Xu and colleagues demonstrated null effects of anodally stimulating the right DPFC to modulate either value evaluation or emotional regulation for changing procrastination willingness (Xu et al., 2023).”

      (11) Active stimulation reduced procrastination, reduced task aversiveness, and increased the outcome value. If I am not mistaken, the authors claim based on these results that the brain stimulation effect operates via self-control, but - unless I missed it - the authors do not have any direct evidence (such as measures or specific task measures) that actually capture self-control. Thus, that self-control is involved seems speculation, but there is no empirical evidence for this; or am I mistaken about this? If that is indeed correct, I think it needs to be made explicit that it is an untested assumption (which might be very plausible, but it is still in the current study not empirically tested) that self-control plays any role in the reported results.

      We truly appreciate your pointing out this weakness with regard to conceptualization. Yes, you are correct in understanding this causal chain: we conceptually speculate that the HD-tDCS stimulation over the left DLPFC operates self-control to change procrastination, rather than empirically validating this component in the chain: brain stimulation→increased self-control→increased task outcome value→decreased procrastination. In this causal chain, we did not collect data to directly measure self-control at either baseline or post-neuromodulation times. Therefore, we all agree with your suggestion to explicitly claim this case in the main text. Following this advice, we have redrawn a portion of the Conclusion by clearly pointing out the hypothesis-generating role of self-control in mitigating procrastination, and have further claimed this case in the Limitation section:

      Abstract Section (Page 2, Line 55-57)

      “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and offers a validated, theory-driven strategy for interventions.”

      Results Section (Page 10, Line 489-492 and 520-522)

      “Given the dual neurocognitive pathways identified above—reduced task aversiveness and increased task-outcome value—we proposed that these changes, conceptually driven by enhanced self-control via ms-tDCS over left DLPFC, account for how neuromodulation reduces procrastination. ...”

      “In summary, these findings demonstrated a mechanistic pathway underlying procrastination: the self-control that was conceptualized to be governed by left DLPFC mitigate procrastination by plausibly increasing task-outcome value.”

      Discussion Section (Page 13, Line 642-645)

      “Moreover, this study did not collect data for assessing participants’ self-control at either baseline or post-neuromodulation, thereby limiting our ability to determine whether the effects on procrastination were uniquely attributable to neuromodulation-induced changes in self-control. ...”

      (12) Figures 3F and 3H show that procrastination rates in the active modulation group go to 0 in all participants by sessions 6 and 7. This seems surprising and, to be honest, rather unlikely that there is absolutely no individual variation in this group anymore. In any case, this is quite extraordinary and should be explicitly discussed, if this is indeed correct: What might be the reasons that this is such an extreme pattern? Just a random fluctuation? Are the results robust if these extreme cells are ignored? The authors remove other cells in their design due to unusual patterns, so perhaps the same should be done here, at least as a robustness check.

      Thank you for raising this highly important and helpful comment. Indeed, we fully understand that this result is somewhat extraordinary, a fact that was equally striking to us when unblinding the data. After carefully scrutinizing the data and statistics, we are thrilled to confirm that this pattern is true. In support of this observation, we were gratified to receive numerous thank-you letters from participants who engaged in active neuromodulation. They expressed gratitude to us, and reported that they have substantially ameliorated procrastination behavior in real-life activities after completing the trial. While this does not constitute formal scientific evidence, we are also glad to see the benefits of this neuromodulation for those procrastinators.

      Two reasons could account for this pattern herein. One interpretation is to attribute this pattern to “scalar inflation”. In the present study, the procrastination rate was calculated as 1 minus the task-completion rate (e.g., 80%, 60%, 40%) by the deadline. At sessions # 6 and #7, all the participants completed their real-life tasks before the deadline, yielding a 0% (1 minus 100% completion rate) procrastination rate, without any between-individual variation. Thus, rather than there being no individual variation in procrastination, this scalar – the procrastination rate - is too insensitive to capture subtle differences per se. For instance, although participants #1 and #2 both showed a 0% procrastination rate - meaning that both completed their tasks before the deadline - Participant #1 might have completed it 3 hours before the deadline, whereas Participant #2 might have completed it only 10 minutes before. In this case, the “scalar inflation” emerges to let us perceive that both participants have equivalent procrastination rates, although participant #2 may have a higher procrastination level than #1. As conceptually defined in the field, procrastination is contextualized as “not completing a task before the deadline”. Thus, if this task is completed before the deadline, regardless of whether it was finished close to or far in advance of the deadline, this case is defined as “no procrastination”. In the present study, the primary outcome is whether a participant procrastinated on a real-life task before the deadline in real-world settings, irrespective of when she/he completed this task. Thus, this scalar - procrastination rate - fits our conceptualization of procrastination.

      Another reason is the potential accumulative effects from sequential multi-session tDCS stimulation. As shown in Mann-Kendall trend tests, the procrastination rates show a significant linear downtrend in the active neuromodulation group across sessions, even after removing sessions #6 and #7. This indicates that the improvements of going against procrastination may be sequentially accumulative along with the increase in sessions, implying a potential “dose-dependent effect”. Despite a speculative interpretation, this “dose-dependent effect” in neuromodulation has been well-documented in previous studies, showing the robustly linear association between the number of sessions and effectiveness (c.f., Cole et al., 2020; Hutton et al., 2023; Sabé et al., 2024; Schulze et al., 2018). Therefore, although this extreme pattern is somewhat extraordinary compared to previous observations, it makes sense.

      Yes, this is a definitely great idea to carry out a robustness check by removing sessions #6, #7, or both. We do believe that this analysis could support statistical robustness to go against potential biases from extreme cells. By doing so, we found that all the group*treatment_day interaction effects remained significant when removing either session #6 or session #7 (or even both, all p-values < .05), indicating high statistical robustness. Please see Supplementary table S3 and S4

      Taken together, in spite of their being extraordinary, we confirm that those findings are statistically robust to extreme outliers. As you kindly suggested, we have added those findings of the robustness check into the revised Supplemental Materials section.

      References

      Cole, E. J., Stimpson, K. H., Bentzley, B. S., Gulser, M., Cherian, K., Tischler, C., Nejad, R., Pankow, H., Choi, E., Aaron, H., Espil, F. M., Pannu, J., Xiao, X., Duvio, D., Solvason, H. B., Hawkins, J., Guerra, A., Jo, B., Raj, K. S., Phillips, A. L., … Williams, N. R. (2020). Stanford Accelerated Intelligent Neuromodulation Therapy for Treatment-Resistant Depression. The American journal of psychiatry, 177(8), 716–726. https://doi.org/10.1176/appi.ajp.2019.19070720

      Hutton, T. M., Aaronson, S. T., Carpenter, L. L., Pages, K., Krantz, D., Lucas, L., Chen, B., & Sackeim, H. A. (2023). Dosing transcranial magnetic stimulation in major depressive disorder: Relations between number of treatment sessions and effectiveness in a large patient registry. Brain stimulation, 16(5), 1510–1521. https://doi.org/10.1016/j.brs.2023.10.001

      Sabé, M., Hyde, J., Cramer, C., Eberhard, A., Crippa, A., Brunoni, A. R., Aleman, A., Kaiser, S., Baldwin, D. S., Garner, M., Sentissi, O., Fiedorowicz, J. G., Brandt, V., Cortese, S., & Solmi, M. (2024). Transcranial Magnetic Stimulation and Transcranial Direct Current Stimulation Across Mental Disorders: A Systematic Review and Dose-Response Meta-Analysis. JAMA network open, 7(5), e2412616. https://doi.org/10.1001/jamanetworkopen.2024.12616

      Schulze, L., Feffer, K., Lozano, C., Giacobbe, P., Daskalakis, Z. J., Blumberger, D. M., & Downar, J. (2018). Number of pulses or number of sessions? An open-label study of trajectories of improvement for once-vs. twice-daily dorsomedial prefrontal rTMS in major depression. Brain stimulation, 11(2), 327–336. https://doi.org/10.1016/j.brs.2017.11.002

      (13) The supplemental materials, unfortunately, do not give more information, which would be needed to understand the analyses the authors actually conducted. I had hoped I would find the missing information there, but it's not there.

      Sorry to offer uninformative supplemental materials (SM) in the original submission. As you suggested, we have added a substantial number of details to clarify how we conducted data analyses in the main text, and also tightened the whole SM section to improve readability and comprehensibility. We do hope that this revised manuscript could offer clear and adequate information in understanding methods and statistics for broader readers.

      In sum, the reported/cited/discussed literature gives the impression of being incomplete/selectively reported; the analyses are not reported sufficiently transparently/fully to evaluate whether they are appropriate and thus whether the results are trustworthy or not. At least some of the patterns in the results seem highly unlikely (0 procrastination in the verum group in the last 2 observation periods), and the sample size seems very small for a between-subjects design.

      Thank you for this very helpful summary. As you kindly suggested above, we have overhauled this manuscript to address those points that you listed here, particularly where we added relevant literature to balance our claims, added a huge amount of details to sufficiently/transparently report statistics, and conducted a robustness check to confirm the statistical robustness of our findings to those plausible extreme patterns (sessions #6 and #7), as well as justified how we determined this sample size fulfilling medium statistical power in a priori. Please see above for full details regarding how we addressed those comments, point-by-point.

      Reviewer #2 (Public Review):

      Chen and colleagues conducted a cross-sectional longitudinal study, administering high-definition transcranial direct stimulation targeting the left DLPFC to examine the effect of HD-tDCS on real-world procrastination behavior. They find that seven sessions of active neuromodulation to the left DLPFC elicited greater modulation of procrastination measures (e.g., task-execution willingness, procrastination rates, task aversiveness, outcome value) relative to sham. They report that tDCS effects on task-execution willingness and procrastination are mediated by task outcome value and claim that this neuromodulatory intervention reduces procrastination rates quantified by their task. Although the study addresses an interesting question regarding the role of DLPFC on procrastination, concerns about the validity of the procrastination moderate enthusiasm for the study and limit the interpretability of the mechanism underlying the reported findings.

      Strengths:

      (1) This is a well-designed protocol with rigorous administration of high-definition transcranial direct current stimulation across multiple sessions. The approach is solid and aims to address an important question regarding the putative role of DLPFC in modulating chronic procrastination behavior.

      (2) The quantification of task aversiveness through AUC metrics is a clever approach to account for the temporal dynamics of task aversiveness, which is notoriously difficult to quantify.

      Thank you for taking your invaluable time to review our manuscript, warmly applauding the strength in research design and the conceptualization of scaling task aversiveness, as well as kindly sharing such helpful and insightful evaluations. As you correctly pointed out, we are aware of the absence of detailed, clear and understandable reporting of measures (e.g., real-world procrastination), statistics and methods, in the original manuscript. Following all your suggestions, we have thoroughly revised this manuscript to address those comments that you kindly made, point-by-point. Please see the full response underneath.

      Weaknesses:

      (1) The lack of specificity surrounding the "real-world measures" of procrastination is problematic and undermines the strength of the evidence surrounding the DLPFC effects on procrastination behavior. It would be helpful to detail what "real-world tasks" individuals reported, which would inform the efficacy of the intervention on procrastination performance across the diversity of tasks. It is also unclear when and how tasks were reported using the ESM procedure. Providing greater detail of these measures overall would enhance the paper's impact.

      We genuinely appreciate your raising this very crucial comment. We are sorry for omitting a tremendous number of methodological details to comply with the editorial requirement on the manuscript’s length, which hampered the comprehension of how we measure “real-life tasks” and “real-world procrastination”.

      As shown in the schematic diagram for experimental procedure (Fig. 1), the experimental protocol alternated between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On each Neuromodulation Day, participants received either active or sham HD-tDCS, and—critically—before stimulation—were instructed to specify a real-life task they were required to complete the following day, with a deadline between 18:00 and 24:00. This ensured ≥24 hours between neuromodulation and task execution, isolating offline after-effects. For instance, on Day #2 (Neuromodulation Day), before carrying out stimulation, participants were asked to report a real-life task that has a deadline within 18:00 - 24:00 for tomorrow’s “task day” (Day #3) (please see the schematic diagram in Author response image 2).

      Author response image 2.

      There are some real-life tasks that they reported in our experiment as examples: “Complete and submit a homework assignment”, “Complete a standardized English proficiency test”, “Complete an online course module required for applying a Class C driver’s license”, “Prepare slides for a seminar presentation”, “Practice guitar”, “Practice Chinese calligraphy”, and “Do the laundry”. Reported tasks spanned academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity.

      On each “task day”, participants engaged in an intensive Experience Sampling Method (iESM) protocol via a custom-built mobile app. Using this app, participants were required to report a subjective task-execution willingness score (i.e., a one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”; procrastination willingness = 100 – the task-execution willingness score), the subjective task aversiveness (i.e., a one-item 100-point visual analog scale), the subjective task outcome value (i.e., a one-item 100-point visual analog scale), and the objective procrastination rate, respectively.

      Rather than self-reported scores from those one-item visual analog scales, we asked participants to report real “task completion rate” for the objective quantification of the “real-world procrastination behavior”. Specifically, at the deadline, each participant was asked to report whether she/he had completed this task. If she/he reported not having yet completed the task (i.e. procrastination behavior emerged), she/he was further required to report the percentage of the task completed (1% - 99%), which was defined as the task completion rate. By doing so, we could calculate the real-world procrastination rate for the real-life task as the “1 – the task completion rate”. For instance, if a participant did not complete her/his real-life task before the deadline (i.e. she/he procrastinated this task) and reported completing 75% of this task at the deadline, her/his real-world procrastination rate was computed as the 25% (1 - 75%) (Please see the schematic diagram in Author response image 3).

      Moreover, rather than merely a self-reported task completion rate, each participant was also asked to upload proof (e.g., screenshots of submitted assignments, photos of printed documents, system timestamps) to the ESM digital system for validation.

      Author response image 3.

      To determine the sampling time points for this mobile app in the ESM, we capitalized on both the conceptual temporal decision model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001) (please see the schematic diagram in https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time):

      By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00). Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. As the primary outcomes, the procrastination rate (i.e., 1 – the task completion rate) and the procrastination willingness were sampled at the deadline point.

      Furthermore, yes, we fully concur with you on this great idea, that is, transparency about task diversity strengthens the generalizability of our findings. In response, we have tabulated these real-life tasks that were reported in this experiment in the independent Appendix 1, with automatic translations from Chinese to English via Qwen GPT. Please see below for what we have added to the main text:

      Methods Section (Page 6-7, Line 238-308)

      “Nested cross-sectional longitudinal design

      This study used a nested cross-sectional longitudinal design to investigate whether the multiple-session anodal HD-tDCS targeting the left DLPFC could reduce actual procrastination behavior and to probe how this effect manifests. To assess procrastination in daily life, we implemented a 15-day protocol alternating between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On the Neuromodulation days, the 20-min anodal HD-tDCS neuromodulation targeting the left DLPFC was performed for HD-tDCS active group at intervals of 2 days, while the sham-control group received sham HD-tDCS training. This HD-tDCS training was repeated for a total of seven sessions, and lasted 15 days (see Fig. 1a). Crucially, to capture procrastination in ecologically valid contexts, prior to receiving either active or sham HD-tDCS (administered between 09:00–18:00), participants were instructed to specify a real-life task they were personally obligated to complete the following day, with a self-defined deadline strictly constrained to 18:00–24:00 to ensure ≥24 hours between stimulation offset and task deadline, thereby isolating offline after-effects. This task should meet the following three criteria: (a) it should be already assigned in the real-world settings; (b) deadline should be constrained to 18:00-24:00 (see above); (c) it should be more likely to induce procrastinate. By doing so, more than 300 real-life tasks were collected, spanning academic (e.g., “submit a statistics homework assignment”), occupational (e.g., “draft and email a project proposal”), administrative (e.g., “complete online application for Class C driver’s license”), self-improvement (e.g., “practice guitar for ≥30 minutes”), domestic (e.g., “do laundry ”), and health-related (e.g., “running 2,000m for exercise”). Full task list has been tabulated in the Appendix 1. As primary outcomes, all the participants were required to reported task-execution willingness (TEW) (Zhang & Feng, 2020; Zhang, Liu, et al., 2019), for a real-life task 24 hours post-neuromodulation. Thus, procrastination willingness was quantified as 100-TEW score (see underneath for details). Furthermore, we asked participants to report the actual task completion rate (CR) of the task at the deadline (e.g. participant A finished 90% homework at deadline and reported this situation to us at deadline). In this vein, the actual procrastination rate (PR) was quantified as 1-CR.

      On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a prior by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting (requiring ≥ 4 points; Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure. To obviate the confounds of daily emotions in task aversiveness evaluation, we used the averaged scores of PANAS at 10:00 (noon) and 16:00 (afternoon) as anchoring points to quantify one’s daily emotions by using this ESM app. Before each session of HD-tDCS training, each participant was required to report a real-life task whose deadline is tomorrow. To obtain the long-term effect of HD-tDCS (i.e., the interval between HD-tDCS and task completion is at least 24 hours), the task deadline that participants reported was required to be between 18:00 - 24:00. Once a sampling time reached, this app would send a digital message to require participants to fill online form for data collection.

      Quantification of covariates of interests

      Outcome variables of this study were twofold: one is task-execution willingness and another is procrastination rate (PR). Task-execution willingness is used to evaluate one’s subjective inclination to avoid procrastination (Zhang & Feng, 2020). In this vein, we used a 100-point scale to require participants to report their task-execution willingness (0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). This metric was recorded 24 hours after neuromodulation to examine its long-term effects. PR is used to quantify the extent to which one task has been procrastinated, and was calculated as 1 - CR (task completion rate). Critically, at the precise deadline, the app prompted participants to (a) indicate task completion status (yes/no), and if incomplete, (b) report the percentage completed (1–99%), defined as the Task CR, while simultaneously uploading objective evidence (e.g., screenshots of submitted files, photos of physical outputs, system-generated logs, or app-exported records). If the task was actually completed before the deadline, the CR would be 100% and the PR would be calculated as 0% (1-CR). PR was recorded at the actual task deadline for each participant. We were also interested in re-investigating their actual procrastination by using PR 6 months after the last neuromodulation to test the long-term retention of this neuromodulation effect.”

      References

      Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235

      Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312

      Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053

      Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492

      Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643

      (2) Additionally, it is unclear whether the reported effects could be due to differential reporting of tasks (e.g., it could be that participants learned across sessions to report more achievable or less aversive task goals, rather than stimulation of DLPFC reducing procrastination per se). It would be helpful to demonstrate whether these self-reported tasks are consistent across sessions and similar in difficulty within each participant, which would strengthen the claims regarding the intervention.

      Thank you for raising this very crucial comment. We indeed agree with you on this point that the reported effects may vary with task difficulties and task-execution proficiency, which potentially confound the effects of stimulation on mitigating procrastination. As you correctly comment, given no data collection on difficulties or other relevant characteristics of tasks, we cannot completely rule out this confounder in interpreting our findings on the one hand. As a result, we have explicitly claimed this limitation in the Discussion section.

      On the other hand, despite no quantitative evidence, this risk of confounding main effects with disparities in task characteristics was controlled experimentally. As we reported above, all the reported tasks were mandated to meet three criteria: (a) they were already assigned in the real-world settings; (b) the deadline was constrained to 18:00-24:00; (3) they were likely to lead to procrastinate. To do so, each participant was clearly instructed to report a real-life task that was more likely to be procrastinated in real-world settings, and was not allowed to report easy, achievable and cost-less tasks. Supporting this case, those reported tasks were found spanning academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity and difficulty. This was resonated by observing the high within-subject task homogeneity. For instance, for Participant #5, she/he reported the tasks that were almost all around academic activities across all the sessions. Therefore, as the task list reported (please see Appendix 1), these self-reported tasks were plausibly consistent across sessions and similar in difficulty within each participant.

      In addition, as we tested, almost all the participants reported they were receiving treatment, with 91.30% (21/23) for the active neuromodulation group (NM) and with 86.95% (20/23) for the sham control group (SC) (x<sup>2</sup> = 0.224, p = .636), indicating the effectiveness of the double-blinding methods. If participants learned across sessions to report more achievable or less aversive task goals, their procrastination willingness and procrastination rates for their reported tasks would all increasingly decrease, irrespective of whether they were in the active neuromodulation-effect group or the sham group. However, no such effects - procrastination willingness and procrastination rates for their reported tasks increasingly decreasing across sessions - existed in the sham control group (Mann-Kendall test, for procrastination willingness, tau = 0.60, p = .13; for procrastination rate, tau = 0.61, p = .13), indicating no statistically significant learning effect or strategic effect on task performance. Again, thank you for this very crucial comment, and we do hope these clarifications could address it.

      Limitations Section (Page 12, Line 637-640)

      “In addition, despite instructing to report valid real-life tasks with high probabilities to procrastinate, we had not yet measured the task difficulty and consistency across sessions for each participant. Consequently, interpreting the effects of neuromodulation to mitigate procrastination as “unique contributions” should warrant cautions. ...”

      (3) It would be helpful to show evidence that the procrastination measures are valid and consistent, and detail how each of these measures was quantified and differed across sessions and by intervention. For instance, while the AUC metric is an innovative way to quantify the temporal dynamics of task-aversiveness, it was unclear how the timepoints were collected relative to the task deadline. It would be helpful to include greater detail on how these self-reported tasks and deadlines were determined and collected, which would clarify how these procrastination measures were quantified and varied across time.

      We do appreciate your highlighting the importance of clarifying how to measure procrastination, substantially helping readers to interpret these findings. As reported above, the primary outcomes of this experiment included subjective procrastination willingness and objective actual procrastination rate. For the subjective procrastination willingness, using the purpose-built mobile app, participants were required to report subjective task-execution willingness score (i.e., one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). Thus, the procrastination willingness was computed as “100 – the task-execution willingness score”. For the objective procrastination rate, rather than self-reported scores from those one-item visual analog scales, we asked participants to report the real “task completion rate from 1% to 99%” for the objective quantification of the “real-world procrastination behavior”. Full details can be found in Response #1.

      For determining sampling time points for the quantification of AUC, we capitalized on both the conceptual Temporal Decision Model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when being far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001). By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00).

      Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. After capturing the task aversiveness from those five time points, the task aversiveness discounting was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from those five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed via the trapezoidal integration between task aversiveness discounting and time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination.

      Taken together, following your suggestion, we have added a substantial number of details to clarify how to measure procrastination, when to sample the data and how to estimate the AUC into the revised manuscript. Please see them in Response #1.

      (4) There are strong claims about the multi-session neuromodulation alleviating chronic procrastination, which should be moderated, given the concerns regarding how procrastination was quantified. It would also be helpful to clarify whether DLPFC stimulation modulates subjective measures of procrastination, or alternatively, whether these effects could be driven by improved working memory or attention to the reported tasks. In general, more work is needed to clarify whether the targeted mechanisms are specific to procrastination and/or to rule out alternative explanations.

      Yes, we fully agree with you on this consideration: we should tone down the conclusions currently claimed in the main text, given the inherent shortcomings mentioned above. As you helpfully suggested, we have moderated our overall claims regarding the effects of multi-session neuromodulation in alleviating chronic procrastination. Please see specific instances below:

      Abstract Section (Page 2, Line 55-57)

      “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and potentially offers a validated, theory-driven strategy for interventions.”

      Conclusion Section (Page 13, Line 657-664)

      “In conclusion, this study potentially provides an effective way to reduce both procrastination willingness and actual procrastination behavior by using neuromodulation on the left DLPFC. Furthermore, such effects have been observed for 2-day-interval long-term after-effects, and were also found for 6-month long-term retention in part. More importantly, this study identified that the ms-tDCS neuromodulation could decrease task aversiveness and increase task outcome value while, and further demonstrated that the increased task outcome value could predict decreased procrastination, a relationship conceptually driven by enhancing self-control. In this vein, the current study enriches our understanding of neurocognitive mechanism of procrastination by showing the prominent role of increased task outcome value in reducing procrastination. Also, it may provide an effective method for intervening in human procrastination.”

      Moreover, yes, as we clarified above, in addition to the objective measure of procrastination behavior, we also leveraged a one-item visual analog scale (i.e. one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”) to measure subjective procrastination willingness. Results demonstrated that the subjective procrastination willingness significantly decreased across neuromodulation sessions in the active group, but not in the sham control group, consistent with the observed reduction in the objective procrastination measure. In addition, we all perceive it as helpful and crucial to note that we cannot draw the conclusion that the effects of neuromodulation on mitigating procrastination are contributed by increasing task outcome value uniquely. Given no measures or evidence of other factors, such as working memory and attention, we cannot rule out other neurocognitive pathways. To address this point, we have removed or rephrased such statements throughout the whole revised manuscript, and explicitly constrained to interpret this neurocognitive mechanism (i.e., increased task outcome value) within the theory-driven framework of the temporal decision model.

      Reviewer #3 (Public review):

      This manuscript explores whether high-definition transcranial direct current stimulation (HD-tDCS) of the left DLPFC can reduce real-world procrastination, as predicted by the Temporal Decision Model (TDM). The research question is interesting, and the topic - neuromodulation of self-regulatory behavior - is timely.

      Many thanks for kindly dedicating time to review our manuscript, and for the helpful comments detailed below. Thank you for appreciating the novelty of this study.

      However, the study also suffers from a limited sample size, and sometimes it was difficult to follow the statistics.

      Thank you for pointing out these crucial concerns. As you correctly raised, the sample size is somewhat small in any case, but we confirm that this sample size is adequate to obtain medium statistical power.

      For estimating the sample size, we determined the a priori effect size based on the existing work we published (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In this pilot study, we identified a significant interaction effect between single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori.

      Using the GPower software with an estimation of a medium effect size, we determined that a total sample size of N<sub>total</sub> = 34 could reach adequate statistical power. Please see outputs of the GPower in Author response image 1.

      As for the statistics, we genuinely acknowledge that the vague methodological descriptions and complex algorithms indeed complicated the understanding of the methods and statistics. To address this, echoing the comment raised by Reviewer #1, we have removed the complicated statistics and methods, and further clarified how we used the generalized linear mixed-effect model (GLMM) for statistical analysis. Please see the specific revisions below:

      Methods Section (Page 8, Line 378-403)

      “Statistics

      All the statistics were implemented by R (https://www.rstudio.com/) and R-dependent packages.

      To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test. Regarding the 6-month follow-up investigation, this GLMM was also built to examine the long-term retention of neuromodulation on reducing actual procrastination.”

      The preregistration and ecological design (ESM) are commendable, but I was not able the find the preregistration, as reported in the paper.

      We are sorry to encounter a serious technical barrier that has rendered our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account. This has prevented access to all materials deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report (please see the screenshot below). We reckon that this may be due to my affiliation change to the Third Military Medical University of People’s Liberation Army (PLA).

      To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” to the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the revised manuscript.

      Overall, the paper requires substantial clarification and tightening.

      We are grateful for your evaluation, and we fully agree with you. In response, we have added a tremendous number of details to clarify how to measure procrastination, how to conduct the statistical analyses, and how to collect real-life tasks, as well as other experimental materials. Please see the revisions in the Methods section of the revised manuscript. Again, thank you for those helpful suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Supplemental Materials, page 4, lines 163 to 167 seem to be from a different manuscript (as the section talks about neural markers, significant clusters, and brain networks).

      We are sorry for erroneously embedding this irrelevant section here. We have removed it, and have double-checked the document to avoid such mistakes.

      (2) I'm no expert here, but some of the trace and density plots in the SOM look problematic (e.g., Figure S5 top panel). But it's not made clear to which model/analysis these plots belong, so they are not very helpful without that information.

      Thank you for bringing these potentially problematic plots to our attention. Following your great suggestion, these results have been removed from the SM to amplify readability and comprehensibility.

      (3) Table S1 reports side effects "from the neurostimulation" (this is also the language used in the main manuscript), but having the flu is rather unlikely to be a side effect from the stimulation, isn't it? Thus, this language is highly confusing, and when reading the main text, it's not clear that these are just life events that are most likely unrelated to the stimulation, but have the potential to affect the measured variables (i.e., ultimately, they seem a source of noise).

      We apologize for this confusing wording. Here, the “side effects” are defined as confounding effects deriving from unexpected life events that uncontrollably disrupt task execution and task performance, such as “having the flu”, or “an unexpected mandatory CCP (Communist Party of China) meeting assignment”. To obviate misunderstanding, we have rephrased “side effects” as “unexpected life events disrupting task execution” in both the main text and the SM section both.

      (4) The use of the English language could be improved.

      Thank you for your very practical suggestion. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) It would be helpful to include greater detail about the ESM procedure and details of the self-reported tasks. This would help rule out potential confounds of difficulty or learning (e.g., participants may have learned to identify more achievable and less difficult tasks across the sessions, which would mean they are learning to perform the task better rather than to procrastinate less). Further elaboration on the quantification of procrastination measures would help clarify the mechanism underlying this behavior, which is important for clarifying how these effects arise and what aspect of procrastination behavior is being targeted by the tDCS intervention (and rule of alternative explanations).

      We wholeheartedly appreciate your sharing this very crucial recommendation. As we mentioned above, we fully followed your helpful suggestions, particularly by adding massive details to fully report how to collect real-life tasks (with consistent and plausible difficulty across sessions), how to determine sampling time points, and how to quantify metrics (e.g., subjective procrastination willingness score, objective procrastination rate, AUC of task aversiveness, and task outcome value) to the revised manuscript. We do believe that these revisions and clarifications are imperative and necessary. By including these details, we do believe that the readability and clarity have been substantially improved in the current form. Please see the specific revisions and clarifications above.

      (2) It would be helpful to proofread for grammatical and spelling typos (e.g., DLPFC is spelled incorrectly in line 140, Satterwaite is spelled incorrectly in Line 415).

      Thank you for your kind suggestion. Both spelling typos have been corrected, and we have double-checked the revised manuscript to ensure no such typos remain. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.

      (3) Please clarify in Figure 4 that a higher AUC is associated with lower task aversiveness (which is stated in the methods but not clearly in the figure).

      Many thanks to you for your helpful suggestion. As you kindly suggested, we have clarified this case in the figure legend.

      Reviewer #3 (Recommendations for the authors):

      I want to see the preregistration.

      Thank you for your helpful recommendation. As we replied above, a serious technical issue on OSF occurred, making our preregistration invisible and inaccessible. OSF has disabled my account, claiming to detect “suspicious user’s activities” in my account. As a result, there is no access to all materials that were already deposited in this OSF account, including this preregistration. We have reconstructed this preregistration based on archived documents, and reported it in the SM. As we reported above, although this partially addresses the problem, it no longer fulfills the best practices of preregistration. Consequently, in addition to transparently reporting this case, we have removed all the preregistration statements throughout the revised manuscript.

    1. Author response:

      Both reviewers noted that some published studies question the association of HPV types with cervical cancer survival {PMIDs 36207323 and 33117670}, while others did not observe that {REFS 69-74 in Chakravarty}. We appreciate both reviewers pushing us to discuss and hypothesize (even speculate) on our finding that HPV types not in phylogenetic clade α9 types (including HPV18) had more recurrences than α9 types (including HPV16). The most likely explanation is that we analyzed 225 HPV types not just the most prevalent types. Specifically, each of the 5 recurrences in our pilot study had different HPV types (α7’s: 18, 39, 45, 70 & α5: 69). Similarly, on re-examination of the TCGA data set, we found that 80% of the 181 α9 samples had HPV16, while 52.5% of the non-α9 samples had HPV18, consistent with a broader variety of types in the latter. We note that PMID: 36207323 did assess a broad number of HPV types, but these were classified into three non-cladistic categories, HPV16, HPV18 and Other for comparison. More in line with the main point of that study, HPV18 was enriched, though not significantly, in the more pathogenic C2 group (which was defined by a deep analysis of specific genomic alterations). It can be speculated that perhaps α9 types are less proficient at effecting or interacting with some C2 characteristic(s). Overall, we suggest that these observations emphasize the importance of examining the full spectrum of HPV types including phylogenetic relationships in cervical cancers induced by these viruses.

      Reviewer #1:

      The detection of “non-tumor HPVs” was noted as a potential limitation. The highly multiplexed, HC+SEQ methodology that we use obviously detects many HPV types and thus can identify lesions with multiple HPV types as occurred in Patient 16 and in other HPV cancers. It is unclear what role multiple HPV types might play in tumorigenesis if any. Regardless of whether broad detection of HPV types proves to be a limitation or an advantage, it will be interesting. Our approach in this study focused on integration of HPV DNAs into human DNA, as this is a key event in cervical tumorigenesis. We believe that detection of clonally expanded cells with an integrated URR-E6-E7 DNA segment of any HPV type (whether high-risk, low-risk, or intermediate, or even perhaps non α-clade {PMID:40742260}) should be viewed with suspicion. For the small fraction of cervical cancers that contain only unintegrated HPV DNA, it will be interesting to see if these viral DNAs share any particular properties.

      The reviewer asked for details of the HPV DNA capture probes used. All were from the proprietary Roche Nimblegen SeqCap EZ System. They encompassed all HPV types from HPV1 through HPV225.

      The reviewer questioned why the data verifying the viral-human DNA junctions in primary tumor tissue by the orthogonal approach of PCR assays PCR assays were not shown. The data summary and the approach used for PCR are in Figure 1, Table 1 and Supplementary Table 1. Only the dozens of agarose gel photographs were not shown. PCR assays that addressed key issues comparing primary and metastatic sites and confirming HPV16 + HPV18 coinfection are shown in Figure 2 and Figures 4A & 4B, respectively.

      Reviewer #2:

      The reviewer raised general issues about data quantification and statistical adequacy. Regarding data quantification, we used a strict, conservative guideline of a 10 read minimum per junction in the DNA from tumor samples. This was based on the sequence analysis pipeline design and on our requirement that some clonal expansion of cells containing specific junctions must have occurred. Extensive complications to comparing quantified read counts in different studies are detailed below in the responses to specific comments. The statistical methods used were based on the dichotomous variable of detection versus no detection of integrated HPV DNA. For this study, we also used the orthogonal method of verifying every junction by PCR with one primer in viral DNA and the other in flanking human DNA followed by Sanger sequencing. The statistical methods used were entirely appropriate for this dichotomous variable and time to event analyses. Nonetheless, we concur that quantification of HPV DNA integration would be an interesting variable to consider once carefully controlled methodologies are applied considering the issues detailed below.

      Regarding the first point about variability in HPV-human junction number in different studies: The number of HPV DNA genome and junction read counts obtained from a sample are subject to numerous technical and biological variables. Extensive caution should be applied when comparing quantitative results among different studies, and this particularly includes the number HPV-human DNA junctions detected. Among the factors that can be involved among different studies are the following: 1) inadequate deduplication of sequence reads; 2) “barcode-hopping” or “bleed-through” from one sample to another and thus cross-contamination of one sample with another during multiplexed short-read sequencing; 3) variation in the fraction of cells that are tumor cells in the post-clinical analysis sample of tissue obtained; 4) artifactual ligation of HPV and human DNA segments occurring at the adaptor ligation step of short-read sequencing; 5) variability in the mismatch settings of computational sequence aligners used; 6) perhaps most importantly, the level of genomic instability of each particular integration locus; and 7) subclonal variation in proliferation or survival of cells containing specific junctions within a lesion. The reviewer correctly implied that our requirement for a minimum of 10 sequence reads at each junction excludes low level, subclonal variants. Nonetheless, one tumor did have two integrations (Table 1). More importantly, we emphasize that all five tumor-recurrences at distant metastatic sites in our study had the exact same integration event as the primary tumor (determined at single nucleotide resolution at both ends). We judge this to be compelling evidence that the approach we use correctly identifies the key integration event underlying each cancer.

      Regarding the second point about ratios between genomic DNA copy numbers and junction read counts: Both human genome and HPV genome copy numbers deserve mention in regard to this issue. HPV HC+SEQ highly enriches for viral DNA, with the advantage gained of high read depth for viral sequences, but with human DNA largely excluded (except for the junction reads). Thus, ratios of junctions to the rest of the human genome cannot be assessed as they can be with whole genome sequencing methodologies. While HPV genome read depth can be ascertained with HC+SEQ reads (as in Figure 1C, 1D, 1E), and the reviewer’s suggestion raises the possibility of using junction to viral read ratios to normalize data to compare different integration loci and even perhaps different studies, there are nonetheless additional, biomedically relevant complications. HPV DNA segments are sometimes often present as tandem units with or without human DNA segments in tumors (Figure 1E shows the former), and this affects the ratio of junctions to viral genomes. Thus, using the suggested ratios would require additional normalization for tandem copy numbers, and thus, it would be difficult to use them in a manner analogous to gene-specific read counts per million total read ratios in RNA-seq.

      Regarding the third point about comparing read counts from primary tumor tissue with those from cfDNA: Ours was a retrospective study using archived samples that were available, and the HPV genome coverage obtained by HC+SEQ using cfDNA varied (Table 1). Assessment of viral DNA genome and human junction reads in a quantitatively reliable manner by HC+SEQ will require application of precise collection, storage, and processing of cfDNA samples. Nonetheless, the results presented in this study, while variable among the different samples, were entirely sufficient to test the dichotomous variable analyzed. We note that this included orthogonal, PCR verification of junctions, based on the straightforward, abundant identification of the junctions by HC+SEQ in the primary tumor samples. We emphasize that examination of HPV DNA integration directly interrogates a key, likely causal event in HPV cervical tumorigenesis.

      Regarding the fourth point about many of the initial cancer samples harboring no junction breakpoints: 100% of the 16 initial, cervical, primary tumor tissue samples harbored an integration (one sample had two). The reviewer is correct that many of the initial cfDNA samples lacked HPV DNA integration as assessed by HC+SEQ and by PCR based on the junctions detected in the primary tumor tissue. We interpret this to mean that these cancers were not spilling genomic DNA containing the integrated HPV DNA into serum at sufficient levels to be detected, and judge this to be due to underlying, unidentified, biomedically-relevant effects.

      Regarding the fifth point about HPV-human DNA junctions being used as a measure of tumor heterogeneity and subclonal variation: We concur with the reviewer that this is an interesting, important issue. We noted it in the response to the “first” point (numbers 6 and 7) above. Again, one of the samples had two integrations, and this patient did not suffer a recurrence (Table 1, Figure 1). Based on our ongoing experience, to take findings of junction subclonality beyond just detection of multiple integration junctions, we believe that development of in situ, single cell approaches are necessary to reveal the full meaningful picture of subclonality.

      Beyond these quantitative issues that we raise in response to Reviewer #2’s comments, the Reviewers’ comments point at important, incompletely understood aspects about HPV tumorigenesis. Our finding of the identical viral DNA insertions in primary tumors and metastases point to a central, constant role for these structures in viral tumorigenesis. Nonetheless, the issues raised point to key questions concerning subclonality, detailed structures and quantification of HPV and human tandem DNA units, intrachromosomal DNA vs. ecDNA, genomic instability of integrated HPV DNA loci, and cell-to-cell variation, and what roles these might play in tumorigenesis.

      Regarding the point about cell-free DNA breakpoints, we note the field of circulating tumor DNA fragmentomics that examines the sequences and a host of structural properties of circulating DNAs derived from tumors including specific, short sequences at the ends (breakpoints) of DNA fragments circulating in blood. These are of emerging significance as biomarkers for cancer {PMIDs:40038442 and 41043439}. We note that cell free DNA breakpoints are not synonymous with DNA junctions. We stress again that the main point of our manuscript was to investigate HPV-human DNA junctions in cfDNA, as this directly addresses a likely causal mechanism underlying HPV cervical tumorigenesis. Additional, future studies would be required to assess the effectiveness of our targeted, individualized approach relative to other aspects of fragmentomics in cervical cancer.

      In summary, we restate one of the reviewers’ points. “This study provides important foundational evidence for further evaluating the clinical utility of HPV DNA detection from cfDNA and specifically assessing for integration junctions.” Both reviewers raised thoughtful points about DNA integration and HPV tumorigenesis, and prospective studies are required to refine and evaluate clinical utility of the new findings presented here.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This interesting paper probes the problematic relationships between the classical "spiralian" taxa, i.e., annelids, molluscs, brachiopods, platyhelminths and nemerteans, and shows that the branches leading to them are so short as to be unreliable guides to their relationships. This, in turn, has important implications for how we view the origin of the animal phyla.

      Strengths:

      A very careful analysis of a famous old problem with quite significant results. The results seem to be robust and support their conclusions.

      It often passes uncommented that many different trees are published about animal relationships, yet some parts of the tree seem extremely difficult to resolve; the spiralians are perhaps the most difficult case. More recently, problems about sponges or ctenophores as sister groups to the rest of the animals have alerted us to major areas of uncertainty in large-scale phylogenetic reconstruction; this paper is a welcome reminder that other, perhaps even harder, problems exist which may be difficult to ever resolve with the (molecular) data we have.

      Weaknesses:

      The paper could have perhaps drawn out some of the implications of its results in a clearer manner.

      Reviewer #2 (Public review):

      Summary:

      The relationships among the phyla making up Spiralia - a major clade of animals including molluscs, annelids, flatworms, nemerteans and brachiopods - have been challenging from a phylogenomic perspective despite decades of molecular phylogenetic effort. Every topology uniting subsets of these phyla has been recovered with apparent support in at least one study, yet no consensus has emerged even from large-scale genomic datasets. Serra Silva and Telford set out to determine whether this instability reflects a genuine biological signal being obscured by analytical limitations, or whether it reflects a rapid, near-simultaneous origin of these phyla that has left behind in modern genomes far too little phylogenetic information to resolve. They focused deliberately on five phyla, reducing the problem to a tractable set of 15 unrooted and 105 rooted topologies, and applied a suite of complementary approaches across two independent datasets and multiple substitution models to test whether any topology is significantly preferred over alternatives.

      Strengths:

      (1) The conceptual framing of the problem is excellent, and the study makes a convincing case across several lines of evidence. By enumerating all possible topologies and demonstrating empirically that every one of the 15 unrooted arrangements has been recovered as the preferred solution in at least one published study, the authors make a strong argument about the state of the field. The use of two entirely independent datasets as a consistency check is great, and convergence between them, where it occur,s substantially strengthens confidence in the conclusions.

      (2) It is my view that the simulation framework is a particular strength. Generating data on a fully unresolved star tree and scoring those data under both correctly-specified and misspecified substitution models provides convincing evidence that the strong preference for rooting Spiralia on the flatworm branch is, at least partly, an analytical artefact driven by the exceptionally long branch in combination with compositional heterogeneity across sites. This is an important methodological demonstration with implications beyond spiralian phylogenetics, as the same issue is likely to affect other deep, long-branched lineages in the animal tree of life.

      (3) The randomised taxon-jackknifing approach is a very nice addition here. The demonstration that preferred topologies shift depending on which species happen to be sampled (even within the same phylum) is a convincing indicator of weak signal, and provides a practical caution for future studies that may report strong support for a particular spiralian arrangement based on a fixed taxon sample.

      (4) The branch-length analyses, benchmarking internal interphylum branches against the already disputed and extremely short branch uniting deuterostomes (work also by this group), are well-conceived and solid.

      (5) I think it is worth highlighting the notable intellectual honesty throughout the paper: the authors do not overstate their results, correctly acknowledging that while the unrooted topology grouping molluscs with brachiopods and flatworms with nemerteans emerges most consistently, this preference is not statistically significant under more adequate substitution models and may itself carry some artefactual component.

      Weaknesses:

      (1) The restriction to five phyla is the most significant limitation, as the authors acknowledge this and give a clear computational justification, but readers should be aware that the paper's convincing conclusions apply specifically to the five focal phyla and the evidence remains incomplete with respect to spiralian phylogeny as a whole.

      (2) The treatment of substitution model adequacy, while commendably thorough for site-heterogeneous models, is necessarily bounded. The authors note that models accounting for non-stationarity, across-lineage compositional heterogeneity, or mixtures of tree histories might yield different results, and that even the most sophisticated currently available approaches have not produced consistent spiralian topologies across studies. This is not a criticism of what has been done here - the analytical scope is reasonable and well-implemented - but it means the paper cannot be read as a definitive demonstration that no model will ever resolve these relationships. The distinction between a true hard polytomy and a radiation that is effectively unresolvable given current data and methods could be drawn more sharply in the discussion.

      (3) The reticulation-aware coalescent analyses are presented somewhat briefly relative to the likelihood-based topology scoring. The finding that flatworms are recovered within a paraphyletic jaw-bearing animal clade in both summary trees - interpreted as long-branch attraction - is striking, and its implications for gene-tree-based approaches to spiralian rooting deserve more discussion than they currently receive.

      (4) The central conclusions - that interphylum branches in Spiralia are extraordinarily short, that topological preferences are strongly model-dependent and taxon-sampling-sensitive, and that an ancient rapid radiation is the most parsimonious explanation - are convincingly supported by the evidence presented. The identification of flatworm long-branch attraction as an important confounding factor in rooting analyses is itself an important and well-demonstrated result.

      Conclusion:

      This paper clearly makes an important contribution to the ongoing debate about spiralian relationships and, more broadly, to methodological discussions about how to handle anciently diversified clades where phylogenetic signal is genuinely limited. The exhaustive topology-scoring framework combined with taxon-jackknifing and simulation under unresolved trees is a valuable methodological template that could usefully be applied to other notoriously difficult nodes in the animal tree. I thoroughly enjoyed the discussion of the implications of these findings for interpreting Cambrian fossils and the evolutionary history of shells, segmentation, larval types and other characters - it is both thoughtful and thought-provoking and will be of broad interest well beyond the phylogenomics and zoology communities. From a very practical perspective, the data and scripts provided make the work useful to researchers wishing to apply similar approaches to other groups.

      Reviewer #3 (Public review):

      Summary:

      This paper addresses the controversial internal relationships within the Spiralia, a major clade of invertebrate animals including molluscs, annelids, brachiopods and flatworms.

      Strengths:

      Performs a range of empirical analyses and simulations that address the core question. Although a favoured unrooted topology finds some support, this is not strongly endorsed in the paper.

      Weaknesses:

      (1) Only considers a subset of relevant phyla (e.g. gastrotrichs are relevant to the phylogenetic position of Platyhelminthes), although how this would change the scale of the analyses (i.e. number of topologies) is addressed in the paper.

      (2) Discussion of Spiralia evolution and broader context, particularly the relevance for the fossil record. Line 448: our current understanding of the early spiralian fossil record is quite consistent with the main results of this paper. For example, there are very few claims for fossils that sit on the short branch leading to Spiralia (or Lophotrochozoa as defined here) that this paper discusses. Many of the key fossils that inform on the characters discussed in the introduction, which have unusual character combinations, have an apomorphy of one of the phyla discussed, and so are resolved as members of the stem lineages of particular phyla.

      (3) This is what you would expect with long phylum stem lineages (line 148) and a short spiralia stem lineage. For example, the mollusc Wiwaxia has chaetae, but a mollusc like Radula (Smith 2012), the conchiferan mollusc Pelagiella has chaetae and a coiled shell (Thomas et al. 2020). The only fossil groups that are routinely discussed as belonging to the stem lineage of more than one phylum are the tommotiids, which have chaetae, segmentation and a complex mineralised skeleton (but not shells in the brachiopod/mollusc sense, see Guo et al 2023) but they sit on the lophophorate stem lineage, a synapomorphy rich group the monophyly of which the present paper endorses (e.g. line 435). The fossil record is consistent with the scenario presented in line 442, e.g. convergent loss or reduction of chaetae and segmentation and convergent evolution of shells in molluscs and brachiopods.

      We thank the reviewers for their kind comments. Please see below for detailed responses to all identified weaknesses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some minor comments that might help improve the paper:

      (1) Abstract L17. "Most analyses on the 15 unrooted trees showed a preference for the same topology but the support over other solutions was non significant" - I don't really understand this sentence in the context of the paper; it makes it sound as if the tree is, after all, well resolved! Non-significant, or not significant better than non significant?

      Having read the rest of the paper I see what this refers to (uT4), but still I don't understand the second clause.

      Re-written to clarify.

      (2) Introduction L31. This makes it sound as if phoronids are actually part of brachiopods, and while that was recovered by Cohen and Weydmann 2005, I'm not sure if it's really a general result. In addition, rather than using "brachiopods plus phoronids" everywhere, you could use "Brachiozoa" (Cavalier-Smith 1998, Biol. Rev).

      We have updated our text and figures to use Brachiozoa.

      (3) L36-37. Yes, but the presence of Chaetagnatha in this clade is suggestive that their primitive body size is not small.

      Have made clear that chaetognaths are not all tiny.

      (4) L85. Kumar et al. may have claimed that Spiralia are as old as 670, but many other analyses would suggest a range of different results. Why choose just this one? In addition, this age seems rather incompatible with your results.

      We agree this maximum age is highly improbable (the principal point remains the deep age of the protostomes). We have used a different reference and refer to a generally acceptable minimum age only.

      (5) L88. The key part of this sentence, "proving a hard polytomy", comes at the end of a long set of references that makes it hard to connect to the lead-in "given the age of", so I would suggest rephrasing.

      Rephrased for clarity.

      (6) L109. It is unclear what this means in the context: "and even support multiple topologies".

      Re-worded for clarity.

      (7) Figure 1. Why did you choose to indicate brachiopods plus phoronids as a larval form, unlike the other clades? Perhaps it's because we don't know what the last common ancestor of the two looked like (unless P is an ingroup of B), but that's arguably true for some of the other clades as well!

      Apologies, this was laziness as we already had a line drawing of an actinotroch larva. Have improved the images in figures 1 and 5 where required.

      (8) L164. Reticulation-aware analyses. As I understand it, this would include introgression, hybridization, etc. However, incomplete lineage sorting has also been invoked, not just for Cambrian-explosion age events but also for other major radiations, such as for angiosperms and birds. How significant might ILS be for generating the results you get?

      Section title amended. Results section updated to reflect this. We now explicitly mention the potential impact of ILS and introgression on spiralian relationships in our discussion.

      Unrooted trees analysis:

      (9) L405 on. Maybe it would be worth including a figure showing the relative branch lengths of uT4. All the images of trees show similar-length branches, which gives off the wrong impression within the context of the paper!

      We understand the motivation, but we worry that showing uT4 as the sole phylogram may end up with this being interpreted by a casual reader as being the main result of the paper. Hopefully the figures with branch lengths encompass this information well enough and with no danger of misinterpretation.

      (10) L430 on. Why is this a "conservative" interpretation?

      Yes agreed not clear. Have changed to “We interpret our results as showing that…”

      (11) You mention synapomorphy accumulation time and implicitly equate shortness of branches with shortness of time. However, other options are available under varying diversification rate models (e.g. ClaDs, Barido-Sottani et al. 2023 Syst. Biol.; CET, Budd and Mann 2025, Syst.Biol.). In particular, the latter paper shows that when unusually large clades are selected for study (as is arguably the case here), then those clades are likely to have started with very high "evolutionary tempo", which speeds up all aspects of evolution, including diversification rates.

      In the Budd and Mann scenario large clades begin with high tempo of cladogenesis, high substitution rate and high diversification rate (rapid origin of new characters). This would suggest that the period of the radiation was extra rapid (even less time than in a ‘normal’ period during which smaller clades emerge) so we feel the point stands.

      (12) L449. Maybe refer to the Song et al. paper again here on scaphopods plus bivalves, as it makes the same sort of points, albeit in a slightly different context.

      We thank the reviewer for the suggestion and have added the citation where relevant.

      (13) Finally, to return to L20. You mention implications for the Cambrian fossil record, but then fail to deliver any!

      We have hopefully addressed this remark in the discussion better (at least to the extent we are qualified to).

      Yet if you are correct, then synapomorphy accumulation would unite groups of phyla, and would surely lead to a scenario highly incompatible with clock models suggesting deep origins of clades (as they would all be more fossilisable).

      Apologies but we don’t completely understand this point as ‘synapomorphy accumulation would unite groups of phyla’ is a little ambiguous. Of course, this is generally true, but our results suggest there was little opportunity to accumulate identifiable synapomorphies linking pairs, triplets or quartets of our 5 spiralian phyla.

      In addition, clock results suggest rather long periods of time leading to the phyla, which would imply that there would have to be extremely slow rates of molecular evolution to yield the short early branches here. Also, it might be worth referring to papers compatible with this view, such as Wernström, J.V. et al., EvoDevo 13, 17 (2022). https://doi.org/10.1186/s13227-022-00202-8 or some of the palaeo literature, such as Budd and Jackson 2016, Phil Trans.

      The referee refers to clock results suggesting a (deep) Ediacaran origin of Lophotrochozoa/Spiralia. We interpret the spiralian radiation itself as rapid but, in the absence of a clock analysis, we cannot comment on when it took place.

      Reviewer #2 (Recommendations for the authors):

      (My not very) Major points - as I feel this is an excellent paper.

      (1) The coalescent-based summary tree analyses warrant expansion. The recovery of flatworms within a paraphyletic jaw-bearing animal clade in both summary trees is a striking result attributed to long-branch attraction, but this interpretation would be strengthened by examining whether pruning or downweighting the longest-branching taxa within those groups affects the outcome, or by reporting per-node quartet scores more fully. This would make the reticulation-aware results more directly informative and would bring this section into better balance with the detailed likelihood-based analyses.

      We thank the reviewer for the suggestion of the expanded analyses. We have now done these, and they yielded essentially the same results as the unpruned analyses. Additionally, while not discussed, we ran the Astral analyses on the subset of gene-trees where all groups of interest (spiralian phyla and superphyletic Ecdysozoa, Deuterostomia, etc.) were monophyletic and found no changes to interphylum quartet scores beyond those due to enforced (super)phylum monophyly, with Platyhelminths still recovered within Gnathifera.

      We have expanded our description of the results slightly as well as our discussion. Location of the tables with detailed quartet scores and local posterior probabilities has been added to Fig. S1’s legend.

      (2) It would strengthen the paper to include at least a brief analysis or explicit discussion of whether any currently available models accounting for non-stationary or across-lineage compositional heterogeneity show any change in the pattern of support, even if only tested on a subset of topologies. A null result here would itself be informative and would make the conclusions more robust to the concern that unexamined model classes might behave differently.

      We thank the reviewer for the suggestion, but this represents a considerable amount of new work and we think it falls outside the scope of the present work. We have, as suggested, included this as a discussion point.

      (3) The authors note that topologies grouping flatworms with ribbon worms appear among the higher-scoring arrangements even under model misspecification in simulations. It would be helpful to comment explicitly on whether the apparent signal for this grouping should therefore be regarded with particular scepticism, or whether it survives artefact correction in any of the analyses, as this is a grouping that has appeared repeatedly in the literature and readers will want guidance on how to interpret it.

      We do state that the nemertean+platyhelminth grouping seems likely to be at the least emphasised by an artefact (as the referee points out it is common to the higher scoring trees in the star tree simulations). We state that this suggests “…that this grouping derives some support from systematic errors.” We now return briefly to this in the discussion.

      Writing and presentation

      (1) The abstract states that rooting Spiralia on the flatworm branch "is a long-branch artefact" - this is slightly stronger than the language used in the body of the paper, where the authors correctly write that this preference is "at least enhanced by" the artefact. The abstract phrasing should be softened to reflect the more nuanced conclusion in the text.

      Good point. Done.

      (2) A brief signposting sentence near the start of the Results, setting out the overall analytical logic before the individual sections begin, would help orient readers. The strategy - score all topologies, test robustness to model choice and taxon sampling, then use simulation to identify artefactual signals - is clear in retrospect but would benefit from being made explicit upfront.

      We have taken this suggestion on board. The summary seemed in the end better placed as the final part of the introduction.

      (3) Figure 3 is complex and would be easier to interpret with a brief explanatory note in the legend clarifying what a wide versus narrow range of log-likelihood scores across topologies means in practical terms for statistical resolution between trees.

      Added sentence to legend.

      Minor Corrections:

      (1) The Figure 2 legend contains a typographical error: "shorter than the short, disputed deuterostome branch" should read "shorter than."

      Done

      (2) At least one reference appears to carry a future publication year (Ishii et al., 2026) and should be verified for accuracy before final submission.

      This reference is correct per the journal’s website. We did find Google Scholar to list it as being from 2025.

      Reviewer #3 (Recommendations for the authors):

      (1) Abstract/SI definitions of Spiralia/Lophotrochozoa

      While I don't have strong feelings about this, if Spiralia is being used as an apomorphy-based name, then it still might be equivalent to Lophotrochozoa, as spiral cleavage in Gnathostoniula jenneri was illustrated by Riedl (1969). Although no other studies have replicated this observation, this should at least be mentioned.

      Sorry this reference to gnathostomulid spiral cleavage was included in a longer version of the discussion of nomenclature. This was first reduced in length (which was when the mention of gnathostomulid spiral cleavage was dropped) then finally moved to the supplementary material. We have now re-included mention of this in the discussion in supplementary info.

      The SI text suggests that the name Lophotrochozoa, as used in its original form by Halanych et al. (1995), was a node-based definition, and that this name is for the sister group of Ecdysozoa. However, in that paper, the name is actually defined as "as the last common ancestor of the three traditional lophophorate taxa, the molluscs, and the annelids, and all of the descendants of that common ancestor". This definition would exclude Gnathifera, and depending on the internal relationships of the non-Gnathiferan phyla, may be equivalent (or not) to the usage of the name Spiralia adopted in the present paper. The perils of mixing node and apomorphy-based definitions of clades are clear, and the situation is less straightforward than the paper suggests, and (somewhat unhelpfully given the subject of the paper) may only become clearer if the relationships of non-ecdysozoan protostomes are resolved.

      We believe that the community universally understood the definition of Lophotrochozoa following the 1997 paper (by the authors who also provided the original 1995 definition). This 1997 definition included both chaetognaths and rotifers as examples of the Gnathifera. The Spiralia, in contrast, began life not even as a name for a clade but a description of a character shared by some apparently unrelated taxa – similar to a grouping of ‘carnivores’. The introduction of a new name was, we suggest, unhelpful. We hope that by defining our terms up front the meaning in the current paper is clear.

      (2) Introduction

      Line 76. Some references needed regarding claims that there was a polymeric brachiopod ancestor, e.g. Gutman (1978), Temereva and Malakhov (2011), Guo et al. (2023). Likewise for the chaetae of brachiopods, annelids and molluscs, e.g. Schiemann (2017), as it's key to trace where these ideas originated.

      Added

      Figure 1. This is a nice illustration of the uncertainty in the relationships of these groups. However, I kept checking which thumbnail image was which for nemerteans and annelids. A minor suggestion, but perhaps a polychaete instead for the annelid?

      We have replaced the rather poor image of an earthworm with a polychaete and also now include labels. We hope the improved images are more helpful. Good point.

      (3) Results

      Branch length comparison. I understand why the deuterostome stem was chosen as the branch for comparison from the point of view of phylogenetic uncertainty. However, what about the branch leading to ecdysozoa or the branch subtending lophotrochozoan and/or gnathifera? Given that the short internodes are used as an argument underpinning uncertain relationships, can we be sure that Gnathifera is not nested within the group of interest, especially given that Gnathifera contains many long-branched taxa and the root may be misplaced within the group?

      We have added the Lophotrochozoa and Ecdysozoa median lengths to our plots and now discuss both the lophotrochozoan branch in our results.

      Line 249. Given that Spiralia is the group of interest, why were the Gnathiferans also chosen at random?

      The point of the experiment was to see the effect of taxon sampling on the consistency of the resulting topology. Random sampling across the tree seems helpful in this context. We chose Gnathifera as one group to sample from as this ensured they would be present in all trees. This seems appropriate as they are the sister group of the clade of interest and as such their inclusion reflects a choice a typical investigator might make when choosing which species to include. Additionally, as noted in the reviewer’s earlier comment, Gnathifera includes many long-branched taxa and we wanted to ensure our root-placement results were robust to this aspect of taxon sampling.

      (4) Discussion

      Line 448. Our current understanding of the early spiralian fossil record is quite consistent with the main results of this paper. For example, there are very few claims for fossils that sit on the short branch leading to Spiralia (or Lophotrochozoa as defined here) that this paper discusses. Many of the key fossils that inform on the characters discussed in the introduction that have unusual character combinations have an apomorphy of one of the phyla discussed, and so are resolved as members of the stem lineages of particular phyla.

      This is what you would expect with long phylum stem lineages (line 148) and a short spiralia stem lineage. For example, the mollusc Wiwaxia has chaetae, but a mollusc like radula (Smith 2012), the conchiferan mollusc Pelagiella has chaetae and a coiled shell (Thomas et al. 2020). The only fossil groups that are routinely discussed as belonging to the stem lineage of more than one phylum are the tommotiids, which have chaetae, segmentation and a complex mineralised skeleton (but not shells in the brachiopod/mollusc sense, see Guo et al 2023) but they sit on the lophophorate stem lineage, a synapomorphy rich group the monophyly of which the present paper endorses (e.g. line 435). The fossil record is consistent with the scenario presented in line 442, e.g. convergent loss or reduction of chaetae and segmentation and convergent evolution of shells in molluscs and brachiopods.

      We accept these points (though are clearly not experts on these fossils). We have (slightly tentatively given our lack of expertise) expanded our discussion to include these fossil taxa with their combinations of characters.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Kim and Parsons present a timely overview of the NTR/prodrug system and its applications in regenerative biology research, with particular emphasis on tissue-specific cell ablation. The system has substantially advanced the field by enabling non-invasive, conditional cell elimination, and has proven especially powerful in zebrafish, though applications in other classical model organisms are also noted. The review covers the historical origins of the NTR system, its use in regeneration studies, small molecule screening, and genetic and CRISPR-based screening, as well as future directions, including the development of the highly efficient NTR2 enzyme variant.

      Strengths:

      This is a useful and well-structured contribution. The manuscript is a valuable resource for the regeneration biology community.

      Weaknesses:

      The impact and scientific value of this paper could be meaningfully enhanced by addressing several points outlined below. The concerns centre on completeness, conceptual precision, and the depth of mechanistic discussion.

      (1) Title: Species specificity.

      Given that the review's primary focus is the zebrafish model, it would be appropriate to include the species name in the title. This would improve discoverability and accurately set the scope of the article for prospective readers.

      Thank you for this suggestion. In revising the review, we have substantially expanded the content to address the reviewers' comments, including adding more detail on the use of NTR in other species. We agree that the majority of published work, and the research we cover, has been conducted in zebrafish, and we have clarified this in the abstract and introduction. However, our aim in writing the review was also to highlight that there is no intrinsic barrier to adopting this technique more broadly in other systems. Notably, NTR was first developed in mice, but with a prodrug that proved difficult to use, and it was not widely pursued. In mouse models, the development of DTR offered an alternative, though that approach carries risks of kidney toxicity and is incompatible with chronic ablation due to immunogenicity. Given this context, we would prefer to retain a title that does not limit the scope exclusively to zebrafish, so as not to discourage readers working in other model systems who might benefit from considering the NTR system.

      (2) Subchapter: Physical injury.

      The subchapter enumerates different types of physical injury models but would benefit from a more substantive comparative discussion. In particular, the authors are encouraged to address the following:

      (2.1) Outcome comparison: Surgical and other invasive approaches cause damage to entire tissue structures comprising multiple cell types, whereas tissue-specific genetic ablation eliminates a defined cell population while leaving the surrounding architecture largely intact. This fundamental distinction has direct implications for the interpretation of regenerative outcomes and should be clearly articulated.

      We appreciate the reviewer raising these important points, as well as those noted in Section 2.2. We addressed the concerns from Sections 2.1 and 2.2 throughout multiple parts of our review, specifically in the following sections:

      • Physical injury – where we highlight the importance of precisely characterizing the nature and extent of tissue damage in order to appropriately interpret subsequent biological responses.

      • Chemogenetic cell-specific ablation – where we expand on this theme by discussing the advantages of selectively eliminating discrete cell populations and how this improves mechanistic interpretation of regeneration.

      • Development of NTR as a suicide gene – where we examine apoptotic pathways and their relevance to nitroreductase-mediated cell ablation.

      • NTR/prodrug systems in regenerative studies – where we compare what is currently known about immune activation and inflammatory responses across different NTR-based ablation paradigms.

      (2.2) Inflammatory response: Invasive injuries typically trigger a robust inflammatory response, which itself can be a potent driver of regeneration. By contrast, genetic cell ablation may elicit a qualitatively different inflammatory reaction. A comparative discussion of this distinction would help readers appreciate a critical limitation of genetic ablation systems relative to models of natural, accidental tissue damage.

      Please see above response 2.1

      (3) Subchapter: Cell-specific toxins.

      This subchapter would benefit from several targeted expansions:

      (3.1) Off-target effects: The authors should include evidence that the exemplified drugs have known off-target activities, with a discussion of how these confounded the interpretation of experimental data. At least a few concrete published examples should be cited.

      Thank you very much for the comments. We have strengthened the discussion of off-target effects by adding concrete published examples. We now note that MPTP/MPP⁺ can affect noradrenergic and serotonergic systems in addition to dopaminergic neurons, that aminoglycoside antibiotics can damage support cells and afferent neurons at higher concentrations with compound-specific differences in ototoxicity, and that streptozotocin exhibits hepatotoxicity beyond pancreatic β-cells.

      (3.2) Completeness of the toxin list: The current list appears illustrative rather than comprehensive. A more complete enumeration would be valuable, particularly for neurotoxins and drugs targeting sensory cells, as these are highly relevant to the zebrafish regeneration field.

      We have now consolidated the toxins discussed throughout the review into Table 1, which includes additional entries alongside the previously listed agents. We have explicitly noted that this list is representative rather than exhaustive, as the full range of cell-specific toxins used across species is extensive.

      (3.3) Interspecies differences: It would be informative to specify whether drug specificity differs across species, as this is a practical consideration for researchers working in organisms other than zebrafish.

      We appreciate the reviewer’s question regarding potential interspecies differences in prodrug performance. Early work using NTR in mammals was conducted in mice, and all five published mouse studies relied exclusively on CB1954. No other NTR-activating prodrugs have been reported in mouse models, so direct comparisons are not available. Likewise, all published Xenopus studies used MTZ and thus do not provide internal comparisons across prodrugs. The Nematostella study employed NFP (citing rationale from a zebrafish study) and the approach yielded effective ablation.

      The only non-zebrafish study that directly compared prodrugs is the Drosophila work, which evaluated MTZ, RNZ, and NFP and reported lower activity for MTZ relative to the other compounds. Because it is not clear whether the authors were aware of the batch variability of MTZ or the need for freshly prepared solutions, interpreting this specific comparison is difficult.

      To address the reviewer’s comment, we have expanded the section on non-zebrafish organisms to clearly state which prodrug was successfully used in each species. However, given the limited number of studies, the absence of titration experiments, and the lack of standardized conditions across laboratories, we do not feel that the available evidence supports drawing conclusions about interspecies differences in prodrug performance.

      Consistent with our original discussion and based on the broader biochemical and empirical data available, we continue to recommend RNZ as the starting point for new experiments.

      (4) Subchapter: Optogenetic cell ablation.

      The authors note that optogenetic cell ablation has not yet been applied in conventional regeneration studies. It would strengthen this section to include a discussion of the underlying reasons for this gap, whether technical or biological, so that readers can appreciate the barriers and potential for future adoption.

      We thank the reviewer for this helpful suggestion. As recommended, we have added a concise, explicitly speculative statement discussing potential technical factors that may explain why optogenetic cell ablation has not yet been widely applied in regeneration studies. Specifically, we note that KillerRed-based ablation requires localized light delivery and ROS generation, making it best suited for discrete, optically accessible cells and less practical for targeting large or deep tissues. We also highlight that the dependence on microscopy-based illumination inherently limits throughput. This new text clarifies possible barriers to broader adoption while acknowledging that these points remain speculative.

      (5) Terminology: "Suicide gene".

      The use of the term "suicide gene" to nitroreductase is conceptually imprecise and merits reconsideration. Strictly speaking, a suicide gene is one whose expression alone is sufficient to kill the cell, as in the case of genes encoding direct triggers of apoptosis or the catalytic A subunit of diphtheria toxin (DTA). NTR does not meet this criterion: it requires the exogenous administration of a prodrug (e.g., metronidazole) to produce a cytotoxic metabolite and is therefore only conditionally lethal.

      It is worth noting that nitroreductases evolved in bacteria and fungi as enzymes involved in chemoprotection and detoxification, converting potentially toxic and mutagenic nitroaromatic compounds into less harmful metabolites (PMID: 18355273). This biological context further underscores that NTR is not inherently a lethal protein. The authors are encouraged to replace or qualify the term "suicide gene" and instead adopt terminology that more accurately reflects the conditional, prodrug-dependent nature of the system.

      We appreciate the reviewer’s thoughtful attention to terminology. We agree that, in its most classical and stringent sense, a suicide gene is one whose expression alone is sufficient to induce cell death. We also recognize that NTR does not meet this strict criterion.

      At the same time, we note that the term has broadened in contemporary usage, particularly within applied and translational contexts, to encompass prodrug-dependent systems. For example, the National Cancer Institute Thesaurus defines a suicide gene as “a gene which will cause a cell to kill itself, typically through interaction with a prodrug,” and Taber’s Medical Dictionary likewise states that it is “a gene that causes a cell to kill itself, usually by encoding an enzyme that converts a nontoxic prodrug into a toxic metabolite.” Under these widely used definitions, NTR is included within the scope of suicide gene systems.

      Nevertheless, we appreciate that terminology in this area is not universally standardized. To ensure clarity for all readers, we have added a brief definition in the revised manuscript explicitly noting the conditional, prodrug-dependent nature of NTR-mediated ablation. We are grateful to the reviewer for prompting this clarification.

      (6) NTR/MTZ in regenerative studies: Mechanistic depth.

      While the review catalogues several studies employing the NTR/MTZ system, it lacks mechanistic depth regarding the cellular basis of ablation. The following questions should be addressed, where evidence exists in the literature:

      (6.1) Temporal dynamics of cell death: What is known about the kinetics of NTR/MTZ induced lethality across different tissue types in larval and adult zebrafish, as well as other organisms? Are there age- and tissue-specific differences in the speed or completeness of ablation?

      Thank you for this important question. We have added text noting that the kinetics and completeness of NTR/prodrug-mediated ablation vary across experimental contexts, including with differences in NTR expression, enzyme/prodrug pairing, dose, cell type, and developmental stage. Published studies illustrate that the time course of ablation can differ substantially between models. Because most studies were designed to optimize ablation within individual tissues rather than for direct side-by-side comparison, the literature does not yet support broad quantitative conclusions about age- or tissue-specific differences across systems.

      (6.2) Mechanism of cell death: What is the cellular basis of NTR/MTZ-induced cytotoxicity in zebrafish? In particular, do the toxic metabolites preferentially cause mitochondrial damage or nuclear DNA damage, and what downstream death pathways are engaged?

      Thank you for the comments. We have added text discussing the mechanism of NTR/MTZ-induced cell death. We now note that NTR-mediated reduction of MTZ generates reactive intermediates that cause DNA damage and oxidative stress, with cell death occurring predominantly through apoptosis. We have also more strongly emphasized that in dopaminergic neurons, mitochondrial damage was identified as the primary cytotoxic mechanism. We acknowledge that the relative contribution of these pathways is likely to vary by cell type and remains an important area for future study.

      (6.3) Proliferative versus post-mitotic cells: Are proliferating and non-proliferating cells equally sensitive to the NTR/MTZ system, or does the proliferative status of a cell influence susceptibility? This is a practically important question for researchers designing ablation experiments in tissues with mixed cell populations.

      We appreciate the reviewer’s insightful question. We have now added a brief clarification to this section explaining that the NTR/MTZ system has been shown to act in a cell-cycle–independent manner, and both proliferating and post-mitotic cells can be ablated effectively.

      (6.4) Ablation of progenitor cells: Are there published examples demonstrating that co-ablation of differentiated functional cells and organ-specific progenitor cells abolishes regenerative capacity? Such examples would be highly informative in illustrating the system's power to dissect the cellular requirements for regeneration.

      To our knowledge, the zebrafish lateral line currently provides the clearest example in which NTR-mediated ablation of progenitor populations results in a loss of regenerative capacity. In this system, targeted ablation of support-cell progenitors severely reduces hair-cell regeneration, illustrating how NTR enables direct testing of cellular requirements for tissue repair.

      Addressing the points above, particularly the comparative discussion of injury models and inflammatory responses, the clarification of terminology, and the mechanistic discussion of NTR/MTZ-induced cell death would substantially strengthen the review's scientific contribution and utility.

      Reviewer #2 (Public review):

      Summary:

      Kim and Parsons reviewed the nitroreductase (NTR)/prodrug system: when engineered cells expressing the enzyme NTR are treated with prodrug (e.g. metronidazole), NTR converts the prodrug into a cytotoxic compound that kills these cells. The review covers how the system has been developed, spatiotemporal control of targeted cell ablation, and its broad utility to study regenerative mechanisms, model human diseases, and screen chemicals to discover pro-regenerative and protective compounds. They further discussed the newer version of NTR, a more potent prodrug, and experimental design, which not only expands the possible utility of the NTR/prodrug system, but also allows the research community to develop a precise, reproducible and versatile platform.

      Strengths:

      The review summarized landmark work application of the NTR/prodrug system, and recent studies, with focus on the model organism zebrafish. The review provides a good gateway to understanding the system and considering regenerative studies.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Kim and Parsons presents an overview of the nitroreductase/metronidazole (NTR/MTZ) cell ablation system.

      Strengths:

      This manuscript nicely places the NTR/MTZ system in the context of other cell ablation methods, with a discussion of their respective advantages and disadvantages. This review is particularly useful for highlighting the many ways the NTR/MTZ system has been applied to study the regeneration of multiple cell types and to model different degenerative human diseases. The review concludes with a discussion on recent improvements made to the system and practical considerations and "best practices" for NTR-based experiments. This review could be a helpful resource, especially for researchers new to regeneration or cell ablation studies.

      Weaknesses:

      Although the NTR/MTZ system has been used in other model organisms, this review is primarily focused on its uses in zebrafish. While this is understandable given the wide adoption of NTR/MTZ in the zebrafish field, discussion of the unique considerations and/or challenges for non-zebrafish systems would be an interesting addition and could broaden the potential audience for this review. Additional minor revisions, as suggested below, could also improve readability.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Since the lab mouse is an important mammalian model system, with certain tissues harbouring some regenerative capabilities, including the peripheral nervous system (e.g., sciatic nerve regeneration after crush), and myelin, etc., it would be great if a section could be included to discuss the potential adoption of the NTR/prodrug system in future mouse studies.

      We appreciate the reviewer’s suggestion to discuss the potential future use of the NTR/prodrug system in mouse models. In surveying the literature, we identified only five mouse studies employing NTR, all of which used CB1954. These early studies were conducted primarily as proof-of-principle work in the context of gene-directed enzyme prodrug therapy (GDEPT) for cancer, rather than for regenerative or lineage-specific ablation applications. We added this point to the text.

      Since those reports, we have not found additional examples of NTR use in mice. We do not know the precise reasons for this limited adoption, but it may reflect the availability of alternative ablation systems that are widely established in mouse research, such as the diphtheria toxin receptor (DTR) system.

      We agree that certain mouse tissues exhibit regenerative capacity and that targeted ablation tools can be valuable in such contexts. To address the reviewer’s point, we have added text noting the very limited historical use of NTR/CB1954 in mouse. We have no explanation as to why no one moved onto using NTR/MTZ in the mouse but note in two places in the text that DTR is preferred method to use in mouse ablation experiments (even though DT does cause kidney damage and is incompatible with chronic studies!).

      Minor:

      (1) Line 174-176, the sentence was repeated.

      (2) Figure 1, for the transgenic line, please be consistent with the line name in italics.

      Reviewer #3 (Recommendations for the authors):

      (1) In the abstract as well as in the main text, the authors note that the NTR/MTZ system has been used in multiple model systems. Yet, most of the review, and especially the practical advice given at the end, is very zebrafish-focused. Although this is understandable given the wide adoption of NTR/MTZ in the zebrafish field, the authors might consider revising the abstract to make it clearer that this review is primarily concerned with the use of the NTR/MTZ system in zebrafish.

      Thanks for the suggestion. We have changed last half of first paragraph in abstract

      That said, a brief discussion of any unique considerations and/or challenges for non-zebrafish systems would be an interesting addition and could broaden the potential audience for this review.

      Agreed and we have expanded in several places in the text to discuss more about the NTR system in non-zebrafish. We especially expanded our discussion about NTR in the mouse.

      (2) Line 176: There is a repetition of the sentence, "NTR/MTZ-mediated ablation has also been adapted for other model organisms."

      Found and deleted. Thank you!

      (3) Line 177: To improve clarity, the authors should include species names to prevent confusion. For example, both Xenopus laevis and Xenopus tropicalis are commonly used model organisms. Similarly, multiple Drosophila species are used by researchers.

      Added melanogaster and laevis to text.

      (4) Can the authors address whether alternatives to MTZ (RNZ, etc.) have the same issues with batch-to-batch variability? That might be an important consideration for potential users. It would also be useful to include practical guidance for accounting for batch variability, for example, how to determine optimal prodrug concentrations, whether effective concentrations need to be determined for every batch/replicate/experiment, etc.

      Added text that discusses that, it is not yet known whether RNZ exhibits batch-to-batch variability similar to MTZ, as this has not been systematically reported. Given the potential for variability, it would be prudent for researchers to titrate each new batch of RNZ or, alternatively, adopt a dosing strategy that exceeds the minimum effective concentration to ensure consistent ablation results.

      (5) For the last section ("Experimental design: Practical and technical considerations"), readability would be improved by applying a consistent bullet point format.

      Made the changes as requested.

      (6) Figure 1: Asterisks are not defined.

      The asterisks where to link to two boxes depicting the same transgene without rewriting the name of the transgene. Clearly, this wasn’t clear, so we have added explanation to legend too.

      (7) Figure 3: Given that the schematics specify expression of NTR1 and NTR1.1, I assume this figure is adapted or based on previous published report(s). If so, the reference(s) should be noted in the figure legend or on the figure itself (as done for Figure 1). If the schematic is meant to depict only in general terms how binary expression vectors can be used, a more inclusive "NTR" label might be less confusing.

      Changed figure legend and figure

      (8) Figure 4: To improve readability and accessibility, the authors should consider modifying panels C-N to use a more colorblind-friendly palette (e.g., green/magenta) or to present each channel as separate grayscale images.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses of the methods and results:

      - Line 162: need to establish and verify the PKH26-labeled TSL cells were unaffected by the dhh-/- environment. No data to support the claim that they were unaffected.

      We thank the reviewer for this important comment. In dhh<sup>-/-</sup> recipient testes, PKH26-labeled TSL cells were observed within the interstitial compartment (Fig. 3C3). Importantly, these PKH26-positive cells could be induced by SAG treatment to differentiate into Cyp11c1-positive steroidogenic cells (Fig. 3E3), indicating that they remained viable in the dhh<sup>-/-</sup> environment.

      We have revised the Results section (line 171–173) to “These results suggest that SLC differentiation is inhibited, whereas the survival and engraftment of PKH26-labeled TSL cells were not affected in dhh<sup>-/-</sup> XY tilapia testes.”

      - The rescued phenotype caused by the addition of ptch2-/- to the dhh-/- model is a compelling. To further define potential ptch1 contributions, it would be helpful to examine the expression level of ptch1 in the context of the ptch2-/- and ptch2-/-;dhh-/- mutant animals. Any compensatory increase in ptch1 in either case, without obvious phenotype changes, would support the dominant role for ptch2.

      We thank the reviewer for this valuable suggestion. We have now performed RT-qPCR analysis of ptch1 expression in XY testes from WT, ptch2<sup>-/-</sup> and dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> fish at 90 dah. As shown in Fig. S8, no significant differences in ptch1 mRNA levels were detected among these genotypes, indicating that loss of ptch2 does not induce compensatory upregulation of ptch1 at the transcriptional level under the conditions examined. We have revised the Discussion section (line 277–290) to “The specificity for Ptch2 in this context might stem from unique co-receptor interactions or expression patterns within the testicular niche. To preliminarily assess potential compensatory regulation, we examined ptch1 expression in XY testes from WT, ptch2<sup>-/-</sup> and dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> fish at 90 dah. No significant differences in ptch1 mRNA levels were detected among these genotypes (Fig. S8), suggesting that loss of ptch2 does not trigger compensatory upregulation of ptch1 at the transcriptional level under the conditions examined. Nonetheless, global ptch2 mutation affects multiple tissues, whereas our mechanistic focus is on SLC differentiation within the testicular niche. Moreover, the early embryonic lethality of global ptch1 mutation in tilapia (Liu et al., 2024) precludes direct assessment of its role in postnatal testis development. Therefore, although our findings strongly support a predominant role for Ptch2 in mediating Dhh signaling in SLCs, definitive resolution of receptor specificity will require future Leydig cell-specific conditional knockout models.”

      - Activity of individual gli factors need additional reconciliation. The expression profiles for both alternative gli factors should be quantified in each knockout cell line to establish redundancy and/or compensation.

      We agree that quantifying the expression of alternative gli genes might be informative. In the present study, TSL-gli1<sup>-/-</sup> cells completely lose responsiveness to Dhh stimulation in the 8×GLI luciferase assay, whereas TSL-gli2<sup>-/-</sup> and TSL-gli3<sup>-/-</sup> cells retain normal pathway activation (Fig. 5B), which unambiguously suggest that Gli1 is the principal transcriptional effector in tilapia SLCs under our experimental conditions. Redundancy and/or compensation of alternative gli factors need further genetic dissection in the future study.

      - Figure 5E: An important control is missing that includes evaluation of HEK293 cells transfected with pcDNA3.1-OnGli1 without the addition of pGL3-sf1.

      We don’t think HEK293 cells transfected with pcDNA3.1-OnGli1 without the addition of pGL3-sf1 is an important control in our study. In the dual-luciferase assays, we think pcDNA3.1 + pGL3 (empty reporter) and pcDNA3.1 + pGL3-sf1 controls were sufficient.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation; minor corrections:

      - Include Park paper (Endocrinology 2007) somewhere near line 73. Need to acknowledge this paper as it is one of the first to connect Dhh to Sf1.

      We have now included the citation of Park et al. (Endocrinology 2007) in the Introduction (now line 81).

      - Include Kothandapani paper (PLoS Genetics 2020) somewhere near line 86. Need to acknowledge this paper as it is the only to reconcile the data showing no difference in Gli1 or Gli2 knockouts, but loss of Leydig cell function due to Gli3 activity.

      We have now included the citation of Kothandapani et al. (PLoS Genetics 2020) in the Introduction (now line 97).

      - Please include sequences of B1 and B2 in sf1 promoter, how conserved are they to the canonical Gli binding sequence?

      We have revised the Results section (line 216–218) to “Functional annotation of its promoter region identified two conserved Gli1-binding motifs, B1 (AACCACCCA) and B2 (GAGCCACCCA)”.

      - Figure 1 or results text: please clarify that the dhh-/- model used is the delta13bp mutation.

      We have clarified in the Results section (line 133) that the dhh<sup>-/-</sup> model corresponds to the 13-bp (CAGGGATGCGGAC) frameshift deletion.

      - Figure 5E legend: please clarify that HEK293 cells are used

      We have revised the Figure 5E legend to explicitly state that the dual-luciferase reporter assays were performed in HEK293 cells. Revised legend sentence (line 743-746): HEK293 cells were co-transfected with pRL-TK, pGL3, pcDNA3.1, pGL3-sf1, pcDNA3.1-On Gli1, and the indicated cold probe constructs, and luciferase activity was measured 48 hours post-transfection.

      - Figure S5E: * indicates the heteroduplex-it seems that there is a heteroduplex highlighted with the asterisk at ~600bp size; based on homozygous and mutant bands, it seems the asterisk should be highlighting the duplex near those sized bands. What are the bands up at ~600bp?

      We thank the reviewer for the careful observation. In Figure S5E, the bands observed at approximately ~600 bp represent heteroduplex products formed during the re-annealing of PCR amplicons derived from heterozygous individuals. During denaturation and re-annealing, WT and mutant strands can pair in different configurations, generating distinct heteroduplex conformations that migrate more slowly than homoduplex products in PAGE. As a result, two heteroduplex bands are visible at ~600 bp, reflecting alternative mismatched duplex structures. The homoduplex WT and mutant bands are indicated separately by arrows.

      - Figure S7F: dhh-/- data are missing

      We thank the reviewer for pointing out this omission. The missing dhh<sup>-/-</sup> dataset has now been added to Figure S7F, and the figure has been updated accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Comments on revised version:

      The authors have appropriately addressed my comments and questions from the initial review process. My remaining concern relates to the lack of evidence to confirm proteasomal inhibition by lactacystin in both promastigotes and amastigotes. The immunoblotting experiment newly presented does not reveal a clear increase in the levels of poly-ubiquitylated proteins in treated parasites. In fact, poly-Ub levels were lower at both the 4h and 18h timepoints of treatment. If alternative antibodies or additional immunoblots are not available, the manuscript would benefit from an expanded discussion of this observation and potential explanations. In particular, the interpretation that lactacystin stabilizes ama- and pro-specific degradation would be greatly strengthened by such validation.

      Reviewer #2 (Public review):

      General comments on the revisions:

      My view is that the authors have made significant, satisfactory changes that address the comments and queries I made on the original manuscript (Review Commons).

      There are two areas where the authors had to make major changes/justifications where further comment is merited, these were:

      RNA-seq.

      The most significant issue was the originally underpowered RNA-seq which had only two replicates. This has been repeated with four replicates now. This has not led to changes in the interpretation of the data between the original study and this one. One comment that the authors make in the response to this was : "Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary". Ensuring that animal experiments are properly powered and that maximum robustness of the data from the minimum sample size is an important part of experimental design for ethical use of animal models. Essentially the replication here could have been avoided if the original study had used 1 more animal. However, the new version of RNA-seq brings appropriate confidence to the interpretation of the data.

      Phosphoproteomics.

      The authors provide a robust justification of their strategy for the phosphoproteomics and highlight the inclusion criteria for phosphosites: "Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate". The way missing values were dealt with is explained "For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition." This fills in some of the gaps I was missing from the original manuscript, and I am satisfied that the data analysis is entirely appropriate for a discovery/system -based approach such as this one. The authors also edit the manuscript to reflect that "occupancy" or "stoichiometry" might not be the best description of what they were presenting and switched to the terminology of "normalised phosphorylation level" - I think this is an appropriate response.

      Overall, in the absence of follow up experiments on specific individual examples, some of the claims in the original submission were toned down and reflect a more neutral description of the data now. Significantly, the data still underpin a key role for regulation of the ribosome between the amastigote and promastigote stages (and during the differentiation process). The recursive and reciprocal links between the phosphorylation and ubiquitination systems are interesting and present many opportunities for future investigation.

      Reviewer #3 (Public review):

      Summary:

      The authors proposed to use 5-layer systems level analysis (genomics, transcriptomics, proteomics / protein degradation, metabolomics, phosphoproteomics) to uncover how post-transcriptional mechanisms regulate stage differentiation in Leishmania donovani.<br /> This enabled the identification of several potential regulatory networks, including the regulation of stage-specific gene clusters by RNA stabilisation or decay, proteasomal degradation and protein phosphorylation.

      In the new version of this manuscript, the authors have addressed all questions raised by the reviewers.

      Strengths:

      Although some observations in this study have already been described in the literature, the integrated analysis applied here provides a novel view on how different levels of post-transcriptional networks regulate Leishmania differentiation. This "5-layer system" represents the first analysis of this depth in kinetoplastid parasites.

      The revised version with an increased sample number for the RNA-seq now made the authors assumptions adequate to their obtained data.

      The use of a proteasomal inhibitor adds an interesting insight in how protein degradation is involved in the parasite differentiation, confirming previous observations in the literature, and help to explain the discrepancies between mRNA and protein expression in the different stages.

      Weaknesses:

      While this work provides an impressive and foundational dataset, it opens the door for future research to rigorously validate these initial findings and conclusions.

      Significance and Impact in the field.

      The different datasets generated in this study will be of great interest to the parasitology community, either to be used for hypothesis generation, to validate data from other sources, etc.

      The multi-layered analysis performed here identified a series of potential feedback loops and regulatory networks to be further explored in organisms that lack transcriptional control.

      According to the reviewers’ comments, we made the following minor changes:

      As suggested by reviewer 1, we have extended the discussion of the results related to the analysis of the ubiquitination pattern by Western blot analysis as follows: “Proteasome inhibition blocked amastigote-to-promastigote differentiation, without inducing rapid global accumulation of ubiquitinated proteins (Figure S7C, upper panel) consistent with a quiescent-like state and low basal ubiquitin–proteasome system activity in amastigotes. After 18 h, ubiquitination levels remained similar to untreated cells, indicating that protein turnover and ubiquitin accumulation are primarily driven by developmental remodeling rather than acute proteasome inhibition. In promastigotes, the lack of detectable change (Fig. S7C, lower panel) may also reflect high basal ubiquitination, engagement of compensatory pathways such as autophagy, and/or only partial proteasome inhibition.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      - Supplementary figure 3 is not referenced in the main text.

      - The authors removed the "infinite" sign from figures 3 and 4 to better present the data according to their chosen approach to missing values when LFQ=0. However, the sign is still present in the respective figure legends, please adjust.

      Supplementary Figure 3 (Figure S3) is now referenced in the main text as requested.

      The "infinite" sign has been removed from the legends of Figures 3 and 4 as requested.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary:

      This manuscript reports the identification of putative orthologues of mitochondrial contact site and cristae organizing system (MICOS) proteins in Plasmodium falciparum - an organism that unusually shows an acristate mitochondrion during the asexual part of its life cycle and then this develops cristae as it enters the sexual stage of its life cycle and beyond into the mosquito. The authors identify PfMIC60 and PfMIC19 as putative members and study these in detail. The authors at HA tags to both proteins and look for timing of expression during the parasite life cycle and attempt (unsuccessfully) to localise them within the parasite. They also genetically deleted both gene singly and in parallel and phenotyped the effect on parasite development. They show that both proteins are expressed in gametocytes and not asexuals, suggesting they are present at the same time as cristae development. They also show that the proteins are dispensible for the entire parasite life cycle investigated (asexuals through to sporozoites), however there is some reduction in mosquito transmission. Using EM techniques they show that the morphology of gametocyte mitochondria is abnormal in the knockout lines, although there is great variation.

      Major comments:

      The manuscript is interesting and is an intriguing use of a well studied organism of medical importance to answer fundamental biological questions. My main comments are that there should be greater detail in areas around methodology and statistical tests used. Also, the mosquito transmission assays (which are notoriously difficult to perform) show substantial variation between replicates and the statistical tests and data presentation are not clear enough to conclude the reduction in transmission that is claimed. Perhaps this could be improved with clearer text?

      We would like to thank the reviewer for taking the time to review our manuscript. We are happy to hear the reviewer thinks the manuscript is interesting and thank the reviewer for their constructive feedback.

      To clarify the statistical analyses used, we included a new supplementary dataset with all statistical analyses and p-values indicated per graph. Furthermore, figure legends now include the information on the exact statistical test used in each case.

      Regarding mosquito experiments, while we indeed reported a reduction in transmission and oocysts numbers, we are aware that this effect might be due to the high variability in mosquito feeding assays. To highlight this point, we deleted the sentence “with the transmission reduction of [numbers]….” and we included the sentence “The high variability encountered in the standard membrane feeding assays, though, partially obstructs a clear conclusion on the biological relevance of the observed reduction in oocyst numbers“

      More specific comments to address:

      Line 101/Fig1E (and figure legend) - What is this heatmap showing. It would be helpful to have a sentence or two linking it to a specific methodology. I could not find details in the M+M section and "specialized, high molecular mass gels" does not adequately explain what experiments were performed. The reference to Supplementary Information 1 also did not provide information.

      We added the information “high molecular mass gels with lower acrylamide percentage” to clarify methodology in the text. Furthermore, we extended the figure legend to include all relevant information. Further experimental details can be found in the study cited in this context, where the dataset originates from (Evers et al., 2021).

      Line 115 and Supplementary Figure 2C + D - The main text says that the transgenic parasites contained a mitochondrially localized mScarlet for visualization and localization, but in the supplementary figure 2 it shows mitotracker labelling rather than mScarlet. This is very confusing. The figure legend also mentions both mScarlet and MitoTracker. I assume that mScarlet was used to view in regular IFAs (Fig S2C) and the MitoTracker was used for the expansion microscopy (Fig S2D)?

      Please clarify.

      We thank the reviewer for pointing this out – this was indeed incorrectly annotated. We used the endogenous mito-mScarlet signal in IFA and mitoTracker in U-ExM. The figure annotation has now been corrected.

      Figure 2C - what is the statistical test being used (the methods say "Mean oocysts per midgut and statistical significance were calculated using a generalized linear mixed effect model with a random experiment effect under a negative binomial distribution." but what test is this?)?

      The statistic test is now included in the material and method section with the sentence “The fitted model was used to obtain estimated means and contrasts and were evaluated using Wald Statistics”. The test is now also mentioned in the figure legend.

      Also the choice of a log10 scale for oocyst intensity is an unusual choice - how are the mosquitoes with 0 oocysts being represented on this graph? It looks like they are being plotted at 10^-1 (which would be 0.1 oocysts in a mosquito which would be impossible).

      As the data spans three orders of magnitude with low values being biologically meaningful, we decided that a log scale would best facilitate readability of the graph. As the 0 values are also important to show, we went with a standard approach to handle 0s in log transformed data and substituted the 0s with a small value (0.001). We apologize for not mentioning this transformation in the manuscript. To make this transformation transparent, we added a break at the lower end of the log-scaled y-axis and relabelled the lowest tick as ‘0’. This ensures that mosquitoes with zero oocysts are shown along the x-axis without being assigned an artificial value on the log scale. We would furthermore like to highlight that for statistics we used the true value 0 and not 0.001.

      Figure 2D - it is great that the data from all feeding replicates has been shared, however it is difficult to conclude any meaningful impact in transmission with the knock-out lines when there is so much variation and so few mosquitoes dissected for some datapoints (10 mosquitoes are very small sample sizes). For example, Exp1 shows a clear decrease in mic19- transmission, but then Exp2 does not really show as great effect. Similarly, why does the double knock out have better transmission than the single knockouts? Sure there would be a greater effect?

      We agree with the reviewer and with the new sentence added, as per major point, we hope we clarified the concept. Note that original Figure 2D has been moved to the supplementary information, as per minor comment of another reviewer.

      Figure 3 legend - Please add which statistical test was used and the number of replicates.

      Done

      Figure 4 legend - Please add which statistical test was used and the number of replicates.

      Done. Regarding replicates, note that while we measured over 100 cristae from over 30 mitochondria, these all stem from the same parasite culture.

      Figure 5C - the 3D reconstructions are very nice, but what does the red and yellow coloring show?

      Indeed, the information was missing. We added it to the figure legend.

      Line 352 - "Still, it is striking that, despite the pronounced morphological phenotype, and the possibly high mitochondrial stress levels, the parasites appeared mostly unaffected in life cycle propagation, raising questions about the functional relevance of mitochondria at these stages."

      How do the authors reconcile this statement with the proven fact that mitochondria-targeted antimalarials (such as atovaquone) are very potent inhibitors of parasite mosquito transmission?

      Our original sentence was reductive. What we wanted to state was related to the functional relevance of crista architecture and overall mitochondrial morphology rather than the general functional relevance of the mitochondria. We changed the sentence accordingly.

      Furthermore, even though we do not discuss this in the article, we are aware of mitochondria targeting drugs that are known to block mosquito transmission. We want to point out that it is difficult to discern the disruption of ETC and therefore an impact on energy conversion with the impact on the essential pathway of pyrimidine synthesis, highly relevant in microgamete formation. Still, a recent paper from Sparkes et al. 2024 showed the essentiality of mitochondrial ATP synthesis during gametogenesis so it is very likely that the mitochondrial energy conversion is highly relevant for transmission to the mosquito.

      Reviewer #1 (Significance):

      This manuscript is a novel approach to studying mitochondrial biology and does open a lot of unanswered questions for further research directions. Currently there are limitations in the use of statistical tests and detail of methodology, but these could be easily be addressed with a bit more analysis/better explanation in the text.

      This manuscript could be of interest to readers with a general interest in mitochondrial cell biology and those within the specific field of Plasmodium research.

      My expertise is in Plasmodium cell biology.

      We thank the reviewer for the praise.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Major comments:

      (1) In my opinion, the authors tend to sensationalize or overinterpret their results. The title of the manuscript is very misleading. While MICOS is certainly important for crista formation, it is not the only factor, as ATP synthase dimer rows make a highly significant contribution to crista morphology. Thus, one can argue with equal validity that ATP synthase should be considered the 'architect', as it's the conformation of the dimers and rows modulate positive curvature. Secondly, while cristae are still formed upon mic60/mic19 gene knockout (KO), they are severely deformed, and likely dysfunctional (see below). Thus, I do not agree with the title that MICOS is dispensable for crista formation, because the authors results show that it clearly is essential. So, the title should be changed.

      We thank the reviewer for taking the time to review our manuscript.

      Based on the reviewers’ interpretation we conclude the title does not come across as intended. We have changed the title to: “The role of MICOS in organizing mitochondrial cristae in malaria parasites”

      The Discussion section starting from line 373 also suffers from overinterpretation as well as being repetitive and hard to understand. The authors infer that MICOS stability is compromised less in the single KOs (sKO) in compared to the mic60/mic19 double KO (dKO). MICOS stability was never directly addressed here and the composition of the MICOS complex is unaddressed, so it does not make sense to speculate by such tenuous connections. The data suggest to me that mic60 and mic19 are equally important for crista formation and crista junction (CJ) stabilization, and the dKO has a more severe phenotype than either KO, further demonstrating neither is epistatic.

      We do agree with the reviewer’s notion that we did not address complex stability, and our wording did not make this sufficiently clear. We shortened and rephrased the paragraph in question.

      The following paragraphs (line 387 to 422) continues with such unnecessary overinterpretation to the point that it is confusing and contradictory. Line 387 mentions an 'almost complete loss of CJs' and then line 411 mentions an increase in CJ diameter, both upon Mic60 ablation. I do not think this discussion brings any added value to the manuscript and should be shortened. Yes, maybe there are other putative MICOS subunits that may linger in the KOS that are further destabilized in the dKO, or maybe Mic60 remains in the mic19 KO (and vice versa) to somehow salvage more CJs, which is not possible in the dKO. It is impossible to say with confidence how ATP synthase behaves in the KOs with the current data.

      We shortened this paragraph.

      (2) While the authors went through impressive lengths to detect any effect on lifecycle progression, none was found except for a reduction in oocyte count. However, the authors did not address any direct effect on mitochondria, such as OXPHOS complex assembly, respiration, membrane potential. This seems like a missed opportunity, given the team's previous and very nice work mapping these complexes by complexome profiling. However, I think there are some experiments the authors can still do to address any mitochondrial defects using what they have and not resorting to complexome profiling (although this would be definitive if it is feasible):

      i) Quantification of MitoTracker Red staining in WT and KOs. The authors used this dye to visualize mitochondria to assay their gross morphology, but unfortunately not to assay membrane potential in the mutants. The authors can compare relative intensities of the different mitochondria types they categorized in Fig. 3A in 20-30 cells to determine if membrane potential is affected when the cristae are deformed in the mutants. One would predict they are affected.

      Interesting suggestion. As our staining and imaging conditions are suitable for such analysis (as demonstrated by Sarazin et al., 2025, https://www.biorxiv.org/content/10.1101/2025.11.27.690934v1), we performed the measurements on the same dataset which we collected for Figure 3. We did, however, not detect any difference in mitotracker intensity between the different lines. The result of this analysis is included in the new version of Supplementary figure S6.

      ii) Sporozoites are shown in Fig S5. The authors can use the same set up to track their motion, with the hypothesis that they will be slower in the mutants compared to WT due to less ATP. This assumes that sporozoite mitochondria are active as in gametocytes.

      While theoretically plausible and informative, we currently do not know the relevance of mitochondrial energy conversion for general sporozoite biology or specifically features of sporozoite movement. Given the required resources and time to set this experiment up and the uncertainty whether it is a relevant proxy for mitochondrial functioning, we argue it is out of scope for this manuscript.

      iii) Shotgun proteomics to compare protein levels in mutants compared to WT, with the hypothesis that OXPHOS complex subunits will be destabilized in the mutants with deformed cristae. This could be indirect evidence that OXPHOS assembly is affected, resulting in destabilized subunits that fail to incorporate into their respective complexes.

      While this experiment could potentially further our understanding of the interaction between MICOS and levels of OXPHOS complex subunits we argue that the indirect nature of the evidence does not justify the required investments.

      To expedite resubmission, the authors can restrict the cell lines to WT and the dKO, as the latter has a stronger phenotype that the individual KOs and conclusions from this cell line are valid for overall conclusions about Plasmodium MICOS.

      I will also conclude that complexome/shotgun proteomics may be a useful tool also for identifying other putative MICOS subunits by determining if proteins sharing the same complexome profile as PfMic60 and Mic19 are affected. This would address the overinterpretation problem of point 1.

      (3) I am aware of the authors previous work in which they were not able to detect cristae in ABS, and thus have concluded that these are truly acristate. This can very well be true, or there can be immature cristae forms that evaded detection at the resolution they used in their volumetric EM acquisitions. The mitochondria and gametocyte cristae are pretty small anyway, so it not unreasonable to assume that putative rudimentary cristae in ABS may be even smaller still. Minute levels of sampled complex III and IV plus complex V dimers in ABS that were detected previously by the authors by complexome profiling would argue for the presence of miniscule and/or very few cristae.

      I think that authors should hedge their claim that ABS is acristate by briefly stating that there still is a possibility that miniscule cristae may have been overlooked previously.

      We acknowledge that we cannot demonstrate the absolute absence of any membrane irregularities along the inner mitochondrial membrane. At the same time, if such structures were present, they would be extremely small and unlikely to contain the full set of proteins characteristic of mature cristae. For this reason, we consider it appropriate to classify ABS mitochondria as acristate. To reflect the reviewer’s point while maintaining clarity for readers, we have slightly adjusted our wording in the manuscript, changing ‘fully acristate’ to ‘acristate’.

      This brings me to the claim that Mic19 and Mic60 proteins are not expressed in ABS. This is based on the lack of signal from the epitope tag; a weak signal is detected in gametocytes. Thus, one can counter that Mic19 and Mic60 are also expressed, but below the expression limits of the assay, as the protein exhibits low expression levels when mitochondrial activity is upregulated.

      We agree with the reviewer that the absence of a detectable epitope-tag signal does not definitively exclude low-level expression, and we have therefore replaced the term ‘absent’ with ‘undetectable’ throughout the manuscript. In context with previous findings of low-level transcripts of the proteins in a study by Lopez-Berragan et al. and Otto et al., we also added the sentence “The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.” to the discussion. At the same time, we would like to clarify that transcript levels for both genes fall within the <25th percentile, suggesting that these low values likely represent background signal rather than biologically meaningful expression. This interpretation is further supported by proteomic datasets in PlasmoDB, which report PfMIC19 and PfMIC60 expression in gametocyte and mosquito stages, but not in asexual blood stages.”

      To address this point, the authors should determine of mature mic60 and mic19 mRNAs are detected in ABS in comparison to the dKO, which will lack either transcript. RT-qPCR using polyT primers can be employed to detect these transcripts. If the level of these mRNAs are equivalent to dKO in WT ABS, the authors can make a pretty strong case for the absence of cristae in ABS.

      We appreciate the reviewer’s suggestion. As noted in the Discussion, existing transcriptomic datasets already show detectable MIC19 and MIC60 mRNAs in ABS. For this reason, we expect RT-qPCR to reveal low (but not absent) levels of both transcripts, unlike the true loss expected to be observed in the dKO. Because such residual signals have been reported previously and their biological relevance remains uncertain, we do not believe transcript levels alone can serve as a definitive indicator of cristae absence in ABS.

      They should highlight the twin CX9C motifs that are a hallmark of Mic19 and other proteins that undergo oxidative folding via the MIA pathway. Interestingly, the Mia40 oxidoreductase that is central to MIA in yeast and animals, is absent in apicomplexans (DOI: 10.1080/19420889.2015.1094593).

      Searching for the CX9C motifs is a valuable suggestion. In response to the reviewer´s suggestion we analysed the conservation of the motif in PfMIC19 and included this in a new figure panel (Figure 1 F).

      Did the authors try to align Plasmodium Mic19 orthologs with conventional Mic19s? This may reveal some conserved residues within and outside of the CHCH domain.

      In response to this comment we made Figure 1 F, where we show conserved residues within the CHCH domains of a broad range of MIC19 annotated sequences across the opisthokonts, and show that the Cx9C motifs are conserved also in PfMIC19. Outside the CHCH domain, we did not find any meaningful conservation, as PfMIC19 heavily diverges from opisthokont MIC19.

      (5) Statistical significance. Sometimes my eyes see population differences that are considered insignificant by the statistical methods employed by the authors, eg Fig. 4E, mutants compared to WT, especially the dKO. Have the authors considered using other methods such as student t-test for pairwise comparisons?

      The graphs in figures 3, 4 and 5 got a makeover, such that they now are in linear scale and violin plots (also following a suggestion from further down in the reviewer’s comments). We believe that this improves interpretability. ANOVA was kept as statistical testing to assure the correction for multiple comparisons that cannot be performed with standard t-test. A full overview of statistics and exact pvalues can also be found in the newly added supplementary information 2.

      Minor comments:

      Line 33. Anaerobes (eg Giardia) have mitochondria that do produce ATP, unlike aerobic mitochondria

      We acknowledge that producing ATP via OXPHOS is not a characteristic of all mitochondria-like organelles (e.g. mitosomes), which is why these are typically classified separately from canonical mitochondria. When not considering mitochondria-like organelles, energy conversion is the function that the mitochondrion is most well-known for and the one associated with cristae.

      Line 56: Unclear what authors mean by "canonical model of mitochondria"

      To clarify we changed this to “yeast or human” model of mitochondria.

      Lines 75-76: This applies to Mic10 only

      We removed the “high degree of conservation in other cristate eukaryotes” statement.

      Line 80: Cite DOI: 10.1016/j.cub.2020.02.053

      Done

      Fig 2D: I find this table difficult to read. If authors keep table format, at least get rid of 'mean' column' as this data is better depicted in 2C. I suggest depicted this data either like in 3B depicting portion of infected vs unaffected flies in all experiments, then move modified Table to supplement. Important to point out experiment 5 appears to be an outlier with reduced infectivity across all cell lines, including WT.

      To clarify: the mean reported in the table indicates the mean per replicate while the mean reported in figure 2C is the overall mean for a given genotype that corrects for variability within experiments. We agree that moving the table to the supplementary data is a good idea. We decided to not include a graph for infected and non-infected mosquitoes as this information would be partially misleading, highlighting a phenotype we argue to be influenced by the strong variability.

      Fig. 3C-G: I feel like these data repeatedly lead to same conclusions. These are all different ways of showing what is depicted in Fig 2B: mitochondria gross morphology is affected upon ablation of MICOS. I suggest that these graphs be moved to supplement and replaced by the beautiful images.

      Thank you for the nice comment on our images. We have now moved part of the graphs to supplementary figure 6 and only kept the Relative Frequency, Sphericity and total mitochondria volume per cell in the main figure.

      Line 180: Be more specific with which tubulin isoform is used as a male marker and state why this marker was used in supplemental Fig S6.

      We have now specified the exact tubulin isoform used as the male gametocyte marker, both in the main text and in Supplementary Fig. S6. This is a commercial antibody previously known to work as an effective male marker, which is why we selected it for this experiment. This is now clearly stated in the manuscript.

      Line 196 and Fig 3C: the word 'intensities' in this context is very ambiguous. Please choose a different term (puncta, elements, parts?). This is related to major point 2i above.

      To clarify the biological effect that we can conclude form the measurement, we added an explanation about it in the respective section of the results, and we decided to replace the raw results of the plug-in readout with the deduced relative dispersion.

      Line 222: Report male/female crista measurements

      We added Supplementary information 2, which contains exact statistical test and outcomes on all presented quantifications as well as a per-sex statistical analysis of the data from figure 4. Correspondingly, we extended supplementary information 2 by a per-sex colour code for the thin section TEM data.

      Fig. 4B-E: depict data as violin plots or scatter plots like Fig. 2C to get a better grasp of how the crista coverage is distributed. It seems like the data spread is wider in the double KO. This would also solve the problem with the standard deviation extending beyond 0%.

      We changed this accordingly.

      Lines 331-333: Please clarify that this applies for some, but not all MICOS subunits. Please also see major point 1 above. Also, the authors should point out that despite their structural divergence, trypanosomal cryptic mitofilins Mic34 and Mic40 are essential for parasite growth, in contrast to their findings with PfMic60 (DOI: https://doi.org/10.1101/2025.01.31.635831).

      This has been changed accordingly.

      Line 320: incorrect citation. Related to point 1above.

      Correct citation is now included in the text.

      Lines 333-335. This is related to the above. Again, some subunits appear to affect cell growth under lab conditions, and some do not. This and the previous sentence should be rewritten to reflect this.

      This has been changed accordingly.

      Line 343-345: The sentence and citation 45 are strange. Regarding the former, it is about CHCHD10, whose status as a bona fide MICOS subunit is very tenuous, so I would omit this. About the phenomenon observed, I think it makes more sense to write that Mic60 ablation results in partially fragmented mitochondria in yeast (Rabl et al., 2009 J Cell Biol. 185: 1047-63). A fragmented mitochondria is often a physiological response to stress. I would just rewrite as not to imply that mitochondrial fission (or fusion) is impaired in these KOs, or at least this could be one of several possibilities.

      The sentence has been substituted following the indication of the reviewer. Though we still include the data of the human cells as this has also been shown in Stephens et al. 2020.

      Line 373: 'This indicates' is too strong. I would say 'may suggest' as you have no proof that any of the KOs disrupts MICOS. This hypothesis can be tested by other means, but not by penetrance of a phenotype.

      Done

      Line 376-377; 'deplete functionality' does not make sense, especially in the context of talking about MICOS subunit stability. In my opinion, this paragraph overinterprets the KO effects on MICOS stability. None of the experiments address this phenomenon, and thus the authors should not try to interpret their results in this context. See major point 1.

      We removed the sentence. Also, the entire paragraph has been shortened, restructured and wording was changed to address major point 1.

      Other suggestions for added value

      (1) Does Plasmodium Sam50 co-fractionate with Mic60 and Mic19 in BN PAGE (Fig. 1E)

      While we did identify SAMM50 in our BN PAGE, the protein does not co-migrate with the MICOS components but instead comigrates with other components of a putative sorting and assembly machinery (SAM) complex. As SAMM50, the SAM complex and the overarching putative mitochondrial membrane space bridging (MIB) complex are not mentioned in the manuscript, we decided to not include the information in Author response image 1.

      Author response image 1.

      Reviewer #2 (Significance):

      The manuscript by Tassan-Lugrezin is predicated on the idea that Plasmodium represents the only system in which de novo crista formation can be studied. They leverage this system to ask the question whether MICOS is essential for this process. They conclude based on their data that the answer is no, which the authors consider unprecedented. But even if their claim is true that ABS is acristate, this supposed advantage does not really bring any meaningful insight into how MICOS works in Plasmodium.

      First the positives of this manuscript. As has been the case with this research team, the manuscript is very sophisticated in the experimental approaches that are made. The highlights are the beautiful and often conclusive microscopy performed by the authors. Only the localization of Mic60 and Mic19 was inconclusive due to their very low expression unfortunately.

      The examination of the MICOS mutants during in vitro life cycle of Plasmodium falciparum is extremely impressive and yields convincing results. Mitochondrial deformation is tolerated by life cycle stage differentiation, with a modest but significant reduction of oocyte production, being observed.

      However, despite the herculean efforts of the authors, the manuscript as it currently stands represents only a minor advance in our understanding of the evolution of MICOS, which from the title and focus of the manuscript, is the main goal of the authors.

      In its current form, the manuscript reports some potentially important findings:

      (1) Mic60 is verified to play a role in crista formation, as is predicted by its orthology to other characterized Mic60 orthologs.

      (2) The discovery of a novel Mic19 analog (since the authors maintain there is no significant sequence homology), which exhibits a similar (or the same?) complexome profile with Mic60. This protein was upregulated in gametocytes like Mic60 and phenocopies Mic60 KO.

      (3) Both of these MICOS subunits are essential (not dispensable) for proper crista formation

      (4) Surprisingly, neither MICOS subunit is essential for in vitro growth or differentiation from ABS to sexual stages, and from the latter to sporozoites. This says more about the biology of plasmodium itself than anything about the essentiality of Mic60, i.e. plasmodium life cycle progression tolerates defects to mitochondrial morphology. But yes, I agree with the authors that Mic60's apparent insignificance for cell growth in examined conditions does differ with its essentiality in other eukaryotes. But fitness costs were not assayed (e.g. by competition between mutants and WT in infection of mosquitoes)

      (5) Decreased fitness of the mutants is implied by a reduction of oocyte formation.

      While interesting in their own way, collectively they do not represent a major advance in our understanding of MICOS evolution. Furthermore, the findings bifurcate into categories informing MICOS or Plasmodium biology. Both aspects are somewhat underdeveloped in their current form.

      This is unfortunate because there seem to be many missed opportunities in the manuscript that could, with additional experiments, lead to a manuscript with much wider impact. For me, what is remarkable about Plasmodium MICOS that sets it apart from other iterations is the apparent absence of the Mic10 subunit. Purification of plasmodium MICOS via the epitope tagged Mic60 and Mic19 could have verified that MICOS is assembled without this core subunit. Perhaps Mic60 and Mic19 are the vestiges of the complex, and thus operate alone in shaping cristae. Such a reduction may also suggest the declining importance of mitochondria in plasmodium.

      Another missed opportunity was to assay the impact of MICOS-depletion of OXPHOS in plasmodium.

      This is a salient issue as maybe crista morphology is decoupled from OXPHOS capacity in Plasmodium, which links to the apparent tolerance of mitochondrial morphology in cell growth and differentiation. I suggested in section A experiments to address this deficit.

      Finally, the authors could assay fitness costs of MICOS-ablation and associated phenotypes by assaying whether mosquito infectivity is reduced in the mutants when they are directly competing with WT plasmodium. Like the authors, I am also surprised that MICOS mutants can pass population bottlenecks represented by differentiation events. Perhaps the apparent robustness of differentiation may contribute plasmodium's remarkable ability to adapt.

      I realize that the authors put a lot of efforts into their study and again, I am very impressed by the sophistication of the methods employed. Nevertheless, I think there is still better ways to increase the impact of the study aside from overinterpreting the conclusions from the data. But this would require more experiments along the lines I suggest in Section A and here.

      We thank the reviewer for their extensive analysis of the significance of our findings, including the compliments on our microscopy images and the sophisticated experimental approaches. We hope we have convincingly argued why we could or could not include some of the additional analyses suggested by the reviewer in section 1 above.

      With regard to the significance statement, we want to point out that our finding that PfMICOS is not needed for initial formation of cristae (as opposed to organization thereof), is a confirmation of something that has been assumed by the field, without being the actual focus of studies. We argue that the distinction between formation and organization of cristae is important and deserves some attention within the manuscript. The result of MICOS not being involved in the initial formation of cristae, we argue to be relevant in Plasmodium biology and beyond. As for the insights into how MICOS works in Plasmodium we have confirmed that the previously annotated PfMIC60 is indeed involved in the organization of cristae. Furthermore, we have identified and characterized PfMIC19. These findings, we argue, are indeed meaningful insights into PfMICOS.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      MICOS is a conserved mitochondrial protein complex responsible for organising the mitochondrial inner membrane and the maintenance of cristae junctions. This study sheds first light on the role of two MICOS subunits (Mic60 and the newly annotated Mic19) in the malaria parasite Plasmodium falciparum, which forms cristae de novo during sexual development, as demonstrated by EM of thin section and electron tomography. By generating knockout lines (including a double knockout), the authors demonstrate that knockout of both MICOS subunits leads to defects in cristae morphology and a partial loss of cristae junctions. With a formidable set of parasitological assays, the authors show that despite the metabolically important role of mitochondria for gametocytes, the knockout lines can progress through the life stages and form sporozoites, albeit with diminished infection efficiency.

      We thank the reviewer for their time and compliment.

      Major comments:

      (1) The authors should improve to present their findings in the right context, in particular by:

      i) giving a clearer description in the introduction of what is already known about the role of MICOS. This starts in the introduction, where one main finding is missing: loss of MICOS leads to loss of cristae junctions and the detachment of cristae membranes, which are nevertheless formed, but become membrane vesicles. This needs to be clearly stated in the introduction to allow the reader to understand the consistency of the authors' findings in P. falciparum with previous reports in the literature.

      We extended the introduction to include this information.

      iii) at the end to the introduction, the motivating hypothesis is formulated ad hoc "conclusive evidence about its involvement in the initial formation of cristae is still lacking" (line 83). If there is evidence in the literature that MICOS is strictly required for cristae formation in any organism, then this should be explained, because the bona fide role of MICOS is maintenance of cristae junctions (the hypothesis is still plausible and its testing important).

      To clarify we rephrased the sentence to: “Although MICOS has been described as an organizer of crista junctions, its role during the initial formation of nascent cristae has not been investigated.”

      (2) Line 96-97: "Interestingly, PfMIC60 is much larger than the human MICOS counterpart, with a large, poorly predicted N-terminal extension." This statement is lacking a reference and presumably refers to annotated ORFs. The authors should clarify if the true N-terminus is definitely known - a 120kDa size is shown for the P. falciparum but this is not compared to the expected length or the size in S. cerevisiae.

      To solve the reference issue, we added the uniprot IDs we compared to see that the annotated ORF is bigger in Plasmodium. We also changed the comparison to yeast instead of human, because we realized it is confusing to compare to yeast all throughout the figure, but then talk about human in this specific sentence.

      Regarding whether the true N-terminus is known. Short answer: No, not exactly.

      However, we do know that the Pf version is about double the size of the yeast protein.

      As the reviewer correctly states, we show the size of 120kDa for the tagged protein in Figure 1G. Considering that we tagged the protein C-terminally, and observed a 120kDa product on western blot, it is safe to conclude that the true N-terminus does not deviate massively from the annotated ORF, and hence, that there is a considerable extension of the protein beyond a 60kDa protein. We do not directly compare to yeast MIC60 on our western blots, however, that comparison can be drawn from literature: Tarasenko et al., 2017 showed that purified MIC60 running at ~60kDa on SDS-PAGE actively bends membranes, suggesting that in its active form, the monomer of yeast MIC60 is indeed 60kDa in size.

      To clarify, we now emphasize that we ran the Alphafold prediction on the annotated open reading frame (annotated and sequenced by Bohme et al. and Chapell et al. now cited in the manuscript), and revised the wording to make clear what we are comparing in which sentence.

      (3) lines 244-245: "Furthermore, our data indicates the effect size increases with simultaneous ablation of both proteins?". The authors should explain which data they are referring to, as some of the data in Fig 3 and 4 look similar and all significance tests relate to the wild type, not between the different mutants, so it is not clear if any overserved differences are significant. The authors repeat this claim in the discussion in lines 368-369 without referring to a specific significance test. This needs to be clarified.

      As a reply to this and other comments from the reviewers we added the multiple testing within all samples. In addition, to clarify statistics used we included a supplementary dataset with all p-values and statistical tests used.

      (4) lines 304-306: "Though well established as the cristae organizing system, the role of MICOS in initial formation of cristae remains hidden in model organisms that constitutively display cristae.". This sentence is misleading since even in organisms that display numerous cristae throughout their life cycle, new cristae are being formed as the cells proliferate. Thus, failure to produce cristae in MICOS knockout lines would have been observable but has apparently not been reported in the literature. Thus, the concerted process in P. falciparum makes it a great model organism, but not fundamentally different to what has been studied before in other organisms.

      We deleted this statement.

      (5) lines 373-378. "where ablation of just MIC60 is sufficient to deplete functionality of the entire MICOS (11, 15),". The authors' claim appears to be contrary to what is actually stated in ref 15, which they cite:

      "MICOS subunits have non-redundant functions as the absence of both MICOS subcomplexes results in more severe morphological and respiratory growth defects than deletion of single MICOS subunits or subcomplexes."

      This seems in line with what the authors show, rather than "different".

      This sentence has been removed.

      (6) lines 380-385: "... thus suggesting that membrane invaginations still arise, but are not properly arranged in these knockout lines. This suggests that MICOS either isn't fully depleted,...". These conclusions are incompatible with findings from ref. 15, which the authors cite. In that study, the authors generated a ∆MICOS line which still forms membrane invaginations, showing that MICOS is not required at all for this process in yeast. Hence the authors' implication that MICOS needs to be fully depleted before membrane invaginations cease to occur is not supported by the literature.

      This sentence has been deleted in the revised version of the manuscript.

      Minor comments:

      (1) The authors should consider if the first part of their title could be seen as misleading: It suggests that MICOS is "the architect" in cristae formation, but this is not consistent with the literature nor their own findings.

      Title is changed accordingly

      - Line 43, of the three seminal papers describing the discovery of MICOS in 2011, the authors only cite two (refs 6 and 7), but miss the third paper, Hoppins et al, PMID: 21987634, which should probably be corrected.

      Done, the paper is now cited

      - Page 2, line 58: for a more complete picture the authors should also cite the work of others here which shows that although at very low levels, e.g. complex III (a drug target) and ATP synthase do assemble (Nina et al, 2011, JBC).

      Done

      - Page 3, line 80: "Irrespective of the shape of an organism's cristae, the crista junctions have been described as tubular channels that connect the cristae membrane to the inner boundary membrane (22, 24)." This omits the slit-shaped cristae junctions found in yeast (Davies et al, 2011, PNAS), which the authors should include.

      The paper and concept have been added to the manuscript, though the sentence has been moved up in the introduction, when crista junctions are first introduced.

      - Line 97: "poorly predicted N-terminal extension", as there is no experimental structure, we don't know if the prediction is poor. Presumably the authors mean either poorly ordered or the absence of secondary structure elements, or the poor confidence score for that region in the prediction? This should be clarified or corrected.

      We were referring to the poor confidence score. To address this comment as well as major point 2, we rewrote the respective paragraph. It now clearly states that confidence of the prediction is low, and we mention the tool that was used to identify conserved domains (Topology-based Evolutionary Domains).

      - Line 98: "an antiparallel array of ten β-sheets". They are actually two parallel beta-sheets stacked together. The authors could find out the name of this fold, but the confidence of the prediction is marked a low/very low. So, its existence is unknown, not just its "function".

      We adapted the domain description to “a stack of two parallel beta-sheets" and replaced the statement on unknown function by the statement “Because this domain is predicted solely from computational analysis, both its actual existence in the native protein and its biological function remain unknown.”

      - Fig 1B: The authors show two alphafold predictions of S. cerevisiae and P. falciparum Mic60 structures. There is however an experimental Mic60/19 (fragment) structure from the former organism (PMID: 36044574), which should be included if possible.

      We appreciate the reviewer’s suggestion and note that the available structural data indeed provides valuable insight into how MIC60 and MIC19 interact. However, these structures represent fusion constructs of limited protein fragments and therefore capture only a small portion of each protein, specifically the interaction interface. Because our aim in Fig. 1B is to compare the overall domain architecture of the full-length proteins, we believe that including fragment-based structures would be less informative in this context.

      - Line: 318-321: "The same trend was observed for PfMIC19 and PfMIC60. Although transcriptomic data suggested that low-level transcripts of PfMIC19 and PfMIC60 are present in ABS (38), we did not detect either of the proteins in ABS by western blot analysis. While this statement is true, the authors should comment on the sensitivity of the respective methods - how well was the antibody working in their hands and how do they interpret the absence of a WB band compared to transcriptomics data?

      The HA antibody used in our experiments is a standard commercial reagent that performs reliably in both WB and IFA, although it shows a low background signal in gametocytes. We agree that the sensitivity of the method and the interpretation of weak or absent bands should be addressed explicitly. Transcript levels for both PfMIC19 and PfMIC60 in asexual blood stages fall within the <25 percentile, suggesting that these signals likely represent background. Nevertheless, we acknowledge that low-level protein expression below the detection limit of western blot analysis cannot be excluded. To reflect these considerations, we added the sentence: ‘The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.

      - Lines 322-323: would the authors not typically have expected an IFA signal given the strength of the band in Western blot? If possible, the authors should comment if the negative fluorescence outcome can indeed be explained with the low abundance or if technical challenges are an equally good explanation.

      Considering the nature of the investigated proteins (embedded in the IMM and spread throughout the mitochondria) difficulties in achieving a clear signal in IFA or U-ExM are not very surprizing. While epitopes may remain buried in IFA, U-ExM usually increases accessibility for the antibodies. However, U-ExM comes at the cost of being prone to dotty background signals, therefore potentially hiding low abundance, naturally dotty signals such as the signal of MICOS proteins that localize to distinct foci (at the CJ) along the mitochondrion. Current literature suggests that, in both human and yeast, STED is the preferred method for accurate spatial resolution of MICOS proteins (https://www.ncbi.nlm.nih.gov/pubmed/32567732,https://www.ncbi.nlm.nih.gov/pubmed/3206734 4). Unfortunately, we do not have experience with, nor access to, this particular technique/method.

      - Lines 357-365: the authors describe limitations of the applied methods adequately. Perhaps it would be helpful to make a similar statement about the analysis of 3D objects like mitochondria and cristae from 2D sections. E.g. the apparent cristae length depends on whether cristae are straight (e.g. coiled structures do not display long cross sections despite their true length in 3D).

      The limitations of other methods are described in the respective results section.

      We added a clarifying sentence in the results section of Figure 4:

      “Note that such measurements do not indicate the true total length or width of cristae, as the data is two-dimensional. The recorded values are to be considered indicative of possible trends, rather than absolute dimensions of cristae.“

      This statement refers to the length/width measurements of cristae.

      In the context of Figure 4D we mention the following (see preprint lines 229 – 230): “We expect this effect to translate into the third dimension and thus conclude that the mean crista volume increases with the loss of either PfMIC19, PfMIC60, or both.”

      For Figure 5, we included a clarifying statement in the results section of the preprint (lines 269 – 273): “Note that these mitochondrial volumes are not full mitochondria, but large segments thereof. As a result of the incompleteness of the mitochondria within the section, and the tomography specific artefact of the missing wedge, we were unable to confirm whether cristae were in fact fully detached from the boundary membrane, or just too long to fit within the observable z-range.”

      - Line 404: perhaps undetected or similar would be a better description than "hidden"?

      The sentence does not exist in the revised manuscript.

      Reviewer #3 (Significance):

      The main strength of the study is that it provides the first characterisation of the MICOS complex in P. falciparum, a human parasite in which the mitochondrion has been shown to be a drug target. Mic60 and the newly annotated Mic19 are confirmed to be essential for proper cristae formation and morphology, as well as overall mitochondrial morphology. Furthermore, the mutant lines are characterised for their ability to complete the parasite life cycle and defects in infection effectivity are observed. This work is an important first step for deciphering the role of MICOS in the malaria parasite and the composition and function of this complex in this organism. The limitation of the study stems from what is already known about MICOS and its subunits in great detail in yeast and humans with similar findings regarding loss of cristae and cristae defects. The findings of this study do not provide dramatic new insight on MICOS function or go substantially beyond the vast existing literature in terms of the extent of the study, which focuses on parasitological assays and morphological analysis. Exploring the role of MICOS in an early-divergent organism and human parasite is however important given the divergence found in mitochondrial biology and P. falciparum is a uniquely suited model system. One aspect that would increase the impact of the paper would be if the authors could mechanistically link the observed morphological defects to the decreased infection efficiency, e.g. by probing effects on mitochondrial function. This will likely be challenging as the morphological defects are diverse and the fitness defects appear moderate/mild.

      As suggested by Reviewer 2, we examined mitochondrial membrane potential in gametocytes using MitoTracker staining and did not observe any obvious differences associated with the morphological defects. At present, additional assays to probe mitochondrial function in P. falciparum gametocytes are not sufficiently established, and developing and validating such methods would require substantial work before they could be applied to our mutant lines. For these reasons, a more detailed mechanistic link between the observed morphological changes and the reduced infection efficiency is currently beyond reach.

      The advance presented in this study is to pioneer the study of MICOS in P. falciparum, thus widening our understanding of the role of this complex to different model organism. This study will likely be mainly of interest for specialised audiences such as basic research parasitologists and mitochondrial biologists. My own field of expertise is mitochondrial biology and structural biology.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, where both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulrosclerosis over several months. Because of concerns about incomplete KO, the authors generated and studied podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism of cell death, the authors performed global proteomics and found that spliceosome proteins were downregulated. They confirmed this directly by using long-read sequencing. These results suggest a novel role for insulin and IGF1R signaling in RNA splicing in podocytes.

      This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling regulates splicing is not directly addressed but implicates potentially the phosphorylation downstream of these receptors. In the revised manuscript, it is shown that the mouse KO is incomplete potentially explaining the slow onset of renal insufficiency. Direct measurement of GFR and serial serum creatinines might also enhance our understanding of progression of disease, proteinuria is a strong sign of renal injury. An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful but may be masked by defects in other spliceosome genes. As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.

      Significance:

      With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism.

      Comments on revised version:

      I'm satisfied with the revised manuscript and the responses to my previous concerns.

      Thank you.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.

      Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.

      Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.

      Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The four figures are generally well-designed, bars/superimposed dot-plots.

      Methods are generally well described.

      Comments on revised version:

      Coward and colleagues have done an excellent job of responding to all the reviewer comments.

      Thank you.

      Reviewer #4 (Public review):

      Summary and background:

      This report entitled "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte" from Hurcombe et al is based on a mouse double knockdown of the IR and IGF1R and a parallel cultured mouse podocyte model. Insulin/IGF signaling system in mammals evolved as three gene reduplicated peptides (insulin, IGF-1, and IGF-2) and their two receptors IR and IGF1R that cross-react to variable extents with the peptides, are ubiquitously expressed, and signal through parallel pathways. The major downstream effect of insulin is to regulate glucose uptake and metabolism, while that of the IGF pathways is to regulate growth and cell cycling in part through mTORC1. The GH-IGF-1-IGF1R pathway regulates post-natal growth. IGF-2 signaling is thought to play a major role in regulating intrauterine growth and development, although IGF-2 is also present at high levels in post-natal life. Thus, one would anticipate that reducing IR/IGF1R signaling in any cell would slow growth and cell cycling by reducing growth factor and metabolic mTORC1-mediated and other processes including the splicing of RNA for protein synthesis.

      Thank you for this new extra review and assessing our paper with new suggestions (we addressed the previous suggestions to the satisfaction of other reviewers). Of note -regarding this introduction – the podocyte is a terminally differentiated cell and may have unique responses to insulin / IGF as it is accepted it does not generally proliferate (hence we consider understanding the actions of insulin / IGF and their receptors to be of interest). Indeed, we have recently shown a contrasting effect of IGF signalling in the podocyte. Partial suppression of the IGF1 receptor is beneficial in contrast to near complete suppression that results in mitochondrial dysfunction (PMID:38706850).

      Mouse IR/IGF1R double knockdown model:

      A double knockdown mouse model was generated by interbreeding mice with different genetic backgrounds carrying floxed sites for IR and IGF-1R to produce mixed background offspring with both floxed IR and IGF-1R genes. These mice were crossed so that the podocin promoter driven-Cre (that comes on at about embryonic day 12 bas podocytes are developing) would delete IR and IGF-1R genes. Since podocin is believed to be an absolutely podocyte-specific protein, this podocin promoter this is predicted to specifically knock down the IR and IGF1R genes only in podocytes. The weight and growth of double KO offspring was not different from controls, but some proportion of the double knockdown mice subsequently developed proteinuria by 6 months and 20% died, although no specific data is provided to identify the cause of the deaths since eGFR was not decreased. Surviving mice were evaluated at 6 months of age. The efficacy of knockdown was not demonstrated in the mouse model itself, although a temperature-sensitive cell line developed from these double knockdown mice showed that expression of IR and IGF-1R proteins in the Cre-treated cell line were both reduced by about 50% (no statistical analysis of this result provided).

      In the knockout mice, proteinuria was significantly increased by 6 months, but not at earlier time points. Histologic analysis showed proteinaceous casts, glomerulosclerosis and interstitial fibrosis. Podocyte number was stated to be reduced by about 30% in double knockdown mice, although the method by which this was evaluated seems to have been by counting WT1 positive nuclei in glomerular cross-sections, an approach that is well-known not to be a reliable way of assessing true podocyte number. No information is provided about podocyte size, density or glomerular volume.

      Comment: If IR/IGF1R deletion plays a significant role in normal podocyte function sufficient to cause proteinuria and glomerulosclerosis then the effect of reduced IR and IGF1R protein expression on podocyte function would have been expected to produce a phenotype before 6 months. A more likely scenario to explain the overall result is that deleting the IR and IGF1R genes at about embryonic day12 impacted podocyte development to a variable extent such that some mice developed fewer podocytes per glomerulus than other mice. As mice grow and their glomeruli and glomerular capillary area increases, those mice with fewer podocytes would not be able to completely cover the filtration surface with foot processes and would develop proteinuria and glomerulosclerosis. If reduced podocyte number per glomerulus is the proximate cause of the observed proteinuria, then modulation of the body and kidney growth rate by calorie restriction to slow growth (lower circulating IGF-1 levels) would be expected to be protective, while a high protein high calorie diet (higher circulating IGF-1 levels) or uni-nephrectomy to increase kidney growth rate would be expected to enhance proteinuria and glomerulosclerosis.

      Thank you for these comments. In response to them:

      (1) WT1 as a marker of podocyte number. We agree may not be the most accurate way of precisely measuring podocyte number but is widely accepted in the field (PMID:33655004 / PMID:38542564) and we think convincingly shows fewer podocytes at 6-months.

      (2) Podocyte size and density was not measured. This was not the focus of the paper and the histology obviously showed a significant phenotype in several mice (Figs 1D-F). Of note we did objectively assess a glomeruloscleorosis index (Fig 1D). We took the approach to understand mechanism through non-biased proteomics and phospho-proteomics of conditionally immortalised podocytes in which we had convincingly knocked down the insulin and IGF1 receptors (Figure 2)

      (3) You did not study the mice earlier to ascertain the developmental phenotype. We concede we did not do this but there was no significant proteinuria detected early in the mice so elected not to increase mouse numbers by studying them then (which we consider good practice for reduction, replacement and refinement). We suspect there would have been subtle changes in those mice that had significantly reduced simultaneous IR and IGF1R knockdown. It was precisely because of this that we generated a conditionally immortalised podocyte cell line with robust simultaneous knock-down of both receptors.

      (4) You did not show significant insulin and IGF1 receptor knockdown in the conditionally immortalised cell line (reviewer states it was 50%). We clearly knocked both receptors down (insulin and IGF1R) in the podocyte line by >80% which was highly statistically significant (p<0.00001). Figure 2A. We agree this was crucial (and we made the cell line because of the variability in the mouse model).

      The model as used may be more representative of a variable degree of podocyte depletion than an effect of impaired IR/IGF1R signaling. Therefore, although the phenotype may be ultimately attributable to the IR/IGF1R gene deletions the proteinuria and glomerulosclerotic phenotype itself was probably a consequence of defective podocyte development. Examining podocyte number, size, density and glomerular volume at earlier time points (4 weeks) would help to answer this question. Therefore, a more appropriate title would be "The insulin/IGF axis is critically important (for) normal podocyte development and deployment". In this context the effect of the knockdowns on splicing would make more sense.

      Please see our response (above). We think our final conclusion that in the podocyte the insulin/IGF axis is important for spliceosome activity and control is valid. This is due to our findings (both total and phospho proteomics results) and considering recent other papers showing this axis can rapidly phosphorylate a variety of spliceosome proteins in different cell types (PMID:39939313 / PMID:32888406). All discussed in detail in the manuscript).

      Cell culture studies. A cell line was generated using a temperature sensitive SV40 system that has been previously reported from this laboratory. A detailed analysis is provided to show that double knockout cells exhibited abnormal spliceosome activity. This forms the basis for the conclusion that "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte". There are several concerns that weaken this conclusion.

      (1) In the double knockdown cell culture system about 30% of cells were "lost" by 3 days and about 70% of cells were "lost" by 5days. The studies were done at the 3 day time point. It is not clear whether "lost" cells were in the process of dying, stress-induced detachment, or just growing more slowly than control due to reduced IR and IGF-1R signaling. These processes could have impacted splicing in a non-specific way independent of IR/IGF1R signaling itself.

      (2) Can a single cell line derived from the double floxed mice be relied on to provide an unbiased picture of the effect of deleting IR and IGF-1R? Presumably, the transfection and selection process will select for cells that survive thereby including unknown biases, possibly related to spliceosome function. Is a single cell line adequate? These investigators have extensive experience with this type of analysis, but this question is not addressed in the discussion.

      (3) To determine whether the effect is specific to reduced IR/IGFR signaling the deletion of IR and IGF-1R could be corrected by transfecting full length IR and IGF-1R cDNAs into the cells to restore normal IR/IGF1R signaling. If transfected cells with intact IR and IGF-1R expression and activity returns spliceosome activity to normal this would be evidence that receptors themselves play some role in spliceosome activity, as opposed to the downstream effect on growth limitation/stress on the cells.

      (4) Other ways of testing whether the splicing effect is specifically due to reduced IR/IGF-1R signaling would be to (a) block IR and IGF1R receptors using available inhibitors, (b) remove or reduce insulin, IGF-1 and IGF-2 levels in the culture medium, (c) use low glucose and amino acid culture medium to slow growth rate independent of receptor function, (d) or block intra-cellular signaling via the IR and IGF-1R receptors through mTORC1 inhibition using rapamycin or other signaling targets.

      (5) It would be useful to determine whether the cultured cells stressed in other ways (e.g. ischemia, toxins, etc.) also results in the same splicing abnormalities.

      Point 1. 70% cell loss was observed at day 7 (not day 5). We found approximately 20% loss at day 3. We opted to go for this early date hypothesising the key detrimental processes would be clear then. This 3 day time point also ensures there has been enough time to allow for the expression of Cre recombinase, receptor gene excision and degradation of existing endogenous IR/IGF1R following lentiviral transduction. Interestingly we did not find a major “death or apoptosis” signal in our data then but agree it should be considered. We think this is a specific pathway as we have examined several other conditionally immortalised detrimental podocyte cell line previously using proteomics with a much more severe phenotype of cell death (E.g. podocyte GSK3 alpha/beta knockdown) and we detected NO spliceosome signal (PMID:30679422). Furthermore, there are now other podocyte proteomics “stress” studies that have been published in which there is proteinuria and significant cell loss / death that also do not show spliceosome dysfunction. These include studying the detailed proteosomal signature of podocytes stressed with Doxorubicin and Lipopolysaccharide endotoxin LPS in mice (PMID:32047005) and bradykinin stimulation of rat podocytes (PMID:32518694).

      Point 2. Yes, we think it is valuable and reproducible. We generated a podocyte cell line from insulin receptor and IGF1 receptor homozygous floxed cells. Hence there is no selection bias in the cells when generating the line as both receptors are effectively intact. We then temporally “knocked down” the receptors with extrinsic lentiviral Cre.

      Importantly we validated our cell line findings both back in the cells (with Western blotting) and in our transgenic receptor knockdown mice and found evidence of spliceosomal dysregulation (Figure 3E and 3F). Also as discussed above the spliceosome has been identified in other models in the insulin/IGF pathway.

      Point 3. We don’t think the experiment of knocking down the receptors and then reconstituting them would prove this hypothesis. This is because if splicing abnormality was due to generalised cell dysfunction (which we do not think is the case in this situation) then putting the receptors back may simply restore cell health and the spliceosomal function (e.g. it does not prove it is via the receptors). Secondly, the process of transduction with multiple lentiviruses may be inherently stressful to the cell and there may be a high level of extrinsic receptor inserted which may also be confounding/detrimental. Finally, as discussed there are now several lines of evidence describing insulin / IGF signalling to spliceosomal proteins which we consider important (discussed in the paper in detail).

      Point 4. We think modulating the receptors using the Cre-lox approach is the cleanest approach (with fewer off-target effects) to interrogate the insulin / IGF axis. It allows us to differentiate the cells by thermo-switching (which is crucial for this terminally differentiated cell) and then robustly knocking down both receptors simultaneously to investigate mechanism. We agree these supplementary approaches may give some extra information if their limitations (eg off target effects of inhibitors) are also taken into consideration.

      Point 5. They do not. Please see response to point 1 above regarding GSK3, Doxorubicin, LPS and bradykinin challenge.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Johnston and Smith used linear electrode arrays to record from small populations of neurons in the superior colliculus (SC) of monkeys performing a memory-guided saccade (MGS) task. Dimensionality reduction (PCA) was used to reveal low-dimensional subspaces of population activity reflecting the slow drift of neuronal signals during the delay period across a recording session (similar to what they reported for parts of cortex: Cowley et al., 2020). This SC drift was correlated with a similar slow-drift subspace recorded from the prefrontal cortex, and both slow-drift subspaces tended to be associated with changes in arousal (pupil size). These relationships were driven primarily by neurons in superficial layers of the SC, where saccade sensitivity/selectivity is typically reduced. Accordingly, delay-period modulations of both spiking activity and pupil size were independent of saccade-related activity, which was most prevalent in deeper layers of the SC. The authors suggest that these findings provide evidence of a separation of arousal- and motor-related signals. The analysis techniques expand upon the group's previous work and provides useful insight into the power of large-scale neural recordings paired with dimensionality reduction. This is particularly important with the advent of recording technologies which allow for the measurement of spiking activity across hundreds of neurons simultaneously. Together, these results provide a useful framework for comparing how different populations encode signals related to cognition, arousal, and motor output in potentially different subspaces.

      Comments on revised manuscript:

      The authors have done a very good job of responding to all of the reviewers' concerns.

      No weaknesses to address.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themself introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time on explaining the importance of arousal and how it could interfere with oculomotor behavior.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?

      Comments on revised manuscript:

      I remain somewhat concerned that the authors jump immediately into an analysis of the 'arousal-related' effects on SC activity. Before that, I would like to see a more detailed discussion justifying the use pupil size alone (i.e., w/o other indicators such as RT) as indicative of fluctuations in general arousal that are causal to concomitant changes in SC activity. Instead, in its current form, the authors find changes in SC activity and describe them immediately as 'arousal-related'.

      Other than this conceptual issue, I do not have major problems with the analysis per se.

      We agree with the reviewer that we may have advanced into discussing arousal-related effects in the previous version of the manuscript without providing a thorough explanation for why we think the slow drift axis is associated with changes in the monkey’s arousal levels. Arousal has been linked to the size of the pupil as well as movements of the eyes in numerous previous studies. We have made the following changes in the revised manuscript to address the reviewer’s concern:

      (1) When first describing how the spiking responses of SC neurons fluctuate over the course of a recording session (Lines 130-132), we have used the phrase "slow fluctuations in the spiking responses" rather than "arousal-related fluctuations in the spiking responses". Then, when describing these effects in more detail (Lines 136-147), we have explained why we think these fluctuations may be related to arousal. The following text has been added in the revised manuscript for clarification:

      “We found that this low-dimensional pattern of activity in the SC was also correlated with pupil size in the present study and with simultaneously recorded data in the prefrontal cortex (PFC), pointing to a link between this brain-wide fluctuation and changes in the monkeys’ arousal levels while performing the task.” (Lines 136-147)

      (2) We have changed the subheading in Line 183 of the revised manuscript from "Arousal-related fluctuations are present in the SC and correlated with pupil size and fluctuations in PFC activity" to "Slow fluctuations in SC spiking activity are correlated with pupil size and PFC activity". Given that we have not yet explained the results linking these fluctuations to arousal at this stage of the manuscript, we believe that this revised title is more accurate and avoids jumping too quickly to arousal-related fluctuations without first explaining the link between SC slow drift, pupil size and PFC activity.

      (3) We have provided additional justification for using pupil size and PFC activity to assess whether SC slow drift is associated with changes in the monkeys’ arousal levels. In a previous study, we computed an identical slow drift axis for spiking responses in visual cortex (V4) and PFC, and investigated how these low-dimensional neural activity patterns, which were themselves strongly correlated, were associated with various eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). Results showed that pupil size was the strongest predictor of slow drift in V4 and PFC. Given that the eye metrics were also strongly correlated with each other, we believe that the observed relationship between SC slow drift, pupil size and PFC activity provides sufficient evidence to suggest that the fluctuations observed in the SC are arousal-related. The following text has been added to the Results section of the revised manuscript:

      “Moreover, previous work in our laboratory computed a similar slow-drift axis using spiking activity in visual cortex (V4) and PFC, and investigated the relationship between these low-dimensional neural activity patterns and different eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). In addition to observing a strong correlation between V4 and PFC slow drift, we found that, relative to the other eye-related metrics, pupil size was the strongest predictor of these fluctuations (Johnston et al., 2022a). Thus, to further confirm the link between the SC slow drift axis and changes in the monkeys’ arousal levels while they performed the MGS task, we next sought to explore if projections onto the SC slow drift axis were associated with pupil size.” (Lines 236-344)

      Reviewer #3 (Public review):

      Summary:

      This study looked at slow changes in neuronal activity (on the order of minutes to hours) in the superior colliculus (SC) and prefrontal cortex (PFC) of two monkeys. They found that SC activity shows slow drift in neuronal activity like in the cortex. They then computed a motor index in SC neurons. By definition, this index is low if the neuron has stronger visual responses than motor response, and it is low if the neuron has weaker visual responses and stronger motor responses. The authors found that the slow drift in neuronal activity was more prevalent in the low motor index SC neurons and less prevalent in the high motor index neurons. In addition, the authors measured pupil diameter and found it to correlate with slow drifts in neuronal activity, but only in the neurons with lower motor index of the SC. They concluded that arousal signals affecting slow drifts in neuronal modulations are brain-wide. They also concluded that these signals are not present in the deepest SC layers, and they interpreted this to mean that this minimizes the impact of arousal on unwanted eye movements.

      Strengths:

      The paper is clear and well-written.

      Showing slow drifts in the SC activity is important to demonstrate that cortical slow drifts could be brain-wide.

      Weaknesses:

      The authors find that the SC cells with the low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual sensitivity. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in the most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC.

      Of course, the general conclusion is that the motor neurons will not have the arousal signal. It's just the interpretation that is different in the sense that the lack of the arousal signal is due to a lack of visual sensitivity in the motor neurons.

      I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. Please also note that I do not mean the luminance transient associated with the target onset. I mean the luminance of the gray display. it is a source of light. if the pupil diameter changes, then the amount of light entering to the visually sensitive neurons also changes.

      Comments on revised manuscript:

      The authors have addressed my first primary comment. For the light comment, I'm still not sure they addressed it. At the very least, they should explicitly state the possibility that the amount of light entering from the gray background can matter greatly, and it is not resolved by simply changing the analysis interval to the baseline pre-stimulus epoch. I provide more clear details below:

      In line 194 of the redlined version of the article (in the Introduction), the citation to Baumann et al., PNAS, 2023 is missing near the citation of Jagadisan and Gandhi, 2022. Besides replicating Jagadisan and Gandhi, 2022, this other study actually showed that the subspaces for the visual and motor epochs are orthogonal to each other

      We thank the reviewer for this comment and apologize that the citation to Baumann et al., PNAS, 2023 was missing in the previous version of the manuscript. In addition to including this citation in the revised version, we have provided a much more comprehensive description of all three cited studies and clarified that, in addition to replicating the results of Jagadisan and Gandhi, Baumann et al., PNAS, 2023 showed that the subspaces for the visual and motor epochs are orthogonal to each other. The following lines have been added to the Introduction of the revised manuscript:

      “A similar separation has been observed for visual and motor responses in the SC (Jagadisan and Gandhi, 2022; Ayar et al., 2023; Baumann et al., 2023). For example, Jagadisan and Gandhi (2022) used linear microelectrode arrays to investigate why early eye movements are not triggered when neuronal responses to a visual target, presented before a delayed saccade to that target, cross a threshold. They found that population activity in the SC was less stable during the visual epoch of a delayed saccade task, relative to the saccade epoch. Moreover, saccades could be evoked more easily by patterned microstimulation when the temporal structure of the microstimulation was stable across electrodes, providing a potential explanation for how downstream regions differentiate between visual and motor responses. Similar results were reported by Baumann et al. (2023) who found that the strength of SC motor responses during a saccade to a visual image depends on the features of that image (e.g., contrast, orientation). When dimensionality reduction was applied to the spiking responses of neuronal populations in the SC, the population trajectory during the initial visual response to the image was orthogonal to that during the motor response. These findings replicate the separation in temporal population structure reported by Jagadisan and Gandhi (2022) and support the results of Ayar et al. (2023). They found that, although not completely orthogonal, population activity in the SC is distinct for visual and motor responses during the same oculomotor task and across different tasks, which could further facilitate the decoding of signals related to sensation, action and context by downstream regions.” (Lines 110-127)

      Line 683 (and around) of the redlined version of the article (in the Results): I'm very confused here. When I mentioned visual modulation by changed pupil diameter, I did not mean the transient changes associated with the brief onset of the cue in the memory-guided saccade task. I meant the gray background of the display itself. This is a strong source of light. If the pupil diameter changes across trials, then the amount of light entering the eye also changes from the gray background. Thus, visually-responsive neurons will have different amount of light driving them. This will also happen in the baseline interval containing only a fixation spot. The arguments made by the authors here do not address this point at all. So, please modify the text to explicitly state the possibility that the global luminance of the display (as filtered by the pupil diameter) alters the amount of light driving the visually-responsive neurons and could contribute to the higher effects seen in the more visual neurons.

      We apologize that our analysis did not fully address the reviewer’s concern that the presence of fluctuations in visual neurons and their absence in motor neurons may have arisen indirectly due to changes in the amount of light entering the eye caused by changes in pupil size. As per the reviewer’s suggestion, we have now raised the possibility that visual neurons in the SC may have firing rates that are monotonically related to slow trends in overall luminance induced by pupil size changes, whereas motor neurons do not. Although we believe this to be an unlikely explanation, the paragraph from lines 374-398 has been modified to better describe this possibility, including the following text:

      “Given that slow drift is found in traditionally defined visual areas (e.g., area V4) and in regions that show mixed selectivity for multiple task variables (e.g., PFC) (Cowley et al., 2020), it seems unlikely that slow drift is caused by luminance fluctuations alone and more likely that it reflects global changes in arousal. At the same time, these arousal-related fluctuations covary with changes in pupil size (Johnston et al., 2022a), which could modulate the amount of light entering the eye from the display. This might affect visual neurons but not motor neurons due to their lack of visual sensitivity. Because SC neurons exist on a continuum, with visual responses decreasing and motor responses increasing from the intermediate to deep layers (Massot et al., 2019; Heusser et al., 2022) and no clear categorical boundary for motor-only neurons, any readout strategy would still need to avoid corruption of the motor output by slow drift, even if it were caused by changes in the amount of light entering the eye.” (Lines 387-398)

      The figures (everywhere, including the responses to reviewers) are very low resolution and all equations in methods are missing.

      We thank the reviewer for bringing this to our attention. We believe this issue may have arisen during conversion of the manuscript file for review, as the figures were of sufficient quality and the equations visible in the version that appeared online (https://doi.org/10.7554/eLife.99278.2). In any case, we will ensure that high-resolution figures are submitted with the revised manuscript and apologize that they were low resolution in the previous version.

      I'm very confused by Fig. 2 - supplement 2. Panel B shows a firing rate burst aligned to *microsaccade* onset. Does that mean you were in the foveal SC? i.e. how can neurons have a motor burst to the target of the memory-guided saccade and also for microsaccades? And which microsaccade directions caused such a burst? And what does it mean to compute the motor index and spike count for microsaccades in panel C? if you were in the proper SC location for the saccade target, then shouldn't you *not* get any microsaccade-related burst at all? This is very confusing to me and needs to be clarified

      We agree that clarification is needed here and thank the reviewer for their comment. The eccentricity of the targets was set to match the endpoints of the evoked saccades, which for some sessions were relatively close to the fovea. The mean eccentricity of the targets across sessions was 4.52° (SD = 2.89°). These values are now reported in the Methods section of the revised manuscript (Line 637). For the neuron shown in Figure 2–figure supplement 2, the eccentricity of the targets was 3°. Previous research has shown that some SC neurons respond during microsaccades as well as slightly larger saccades (see Hafed & Krauzlis, 2012, J. Neurophysiol., Fig. 4B). This likely explains why the neuron shown in Figure 2–figure supplement 2, which had a receptive field at ~3° based on saccades evoked by microstimulation, also responded during microsaccades. We apologize that this was not explained in the previous version and agree that it could have been confusing for the reader. To address this, the legend for this supplementary figure has been edited in the revised version and now reads:

      “(B) PSTH for an SC neuron that responded around the time of a microsaccade. Firing rates were computed in 1ms bins, averaged across trials and smoothed using a Gaussian function (σ = 5ms). Note that the targets were set to 3º in this session based on saccades evoked by microstimulation (see Methods). Previous research has shown that some SC neurons respond during microsaccades as well as to slightly larger saccades (Hafed and Krauzlis, 2012). This likely explains why this SC neuron, which had a RF at ~3º based on saccades evoked by microstimulation, also responded around the time of a microsaccade.” (Lines 1026-1031)

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study explores how exogenous attention operates at the finest spatial scale of vision, within the foveola - a topic that has not been previously explored. The question is important for understanding how attention shapes perception, and how it differs between the periphery and the central regions of highest visual acuity. The evidence is compelling, as shown by carefully designed experiments with state-of-the-art eye tracking to monitor attended locations just a few tens of minutes of arc away from the fixation target, but additional clarification regarding analyses and implications for vision and oculomotor control would broaden the impact of the study.

      We thank the editors and reviewers for their thorough evaluation of our work. We have carefully revised the manuscript and substantially reworked the Discussion to address all of the points raised, eliminate redundancies, streamline the text, and clarify the implications of our findings for vision and oculomotor control. We have also expanded the documentation of our power analyses and conducted the additional analyses requested by the reviewers. Our point-by-point responses are provided.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates how exogenous attention modulates spatial frequency sensitivity within the foveola. Using high-precision eye-tracking and gaze-contingent stimulus control, the authors show that exogenous attention selectively improves contrast sensitivity for low- to midrange spatial frequencies (4-8 cycles/degree), but not for higher frequencies (12-20 CPD). In contrast, improvements in asymptotic performance at the highest contrast levels occur across all spatial frequencies. These results suggest that, even within the foveola, exogenous attention operates through a mechanism similar to that observed in peripheral vision, preferentially enhancing lower spatial frequencies.

      Strengths:

      The study shows strong methodological rigor. Eye position was carefully controlled, and the stimulus generation and calibration were highly precise. The authors also situate their work well within the existing literature, providing a clear rationale for examining the fine-grained effects of exogenous attention within the foveola. The combination of high spatial precision, gazecontingent presentation, and detailed modeling makes this a valuable technical contribution.

      Weaknesses:

      The manipulation of attention raises some interpretive concerns. Clarifying this issue, together with additional detail about statistics, participant profiles, other methodological elements, and further discussion in relation to oculomotor control in general, could broaden the impact of the findings.

      We thank the reviewer for the helpful comments. In the Discussion, we have now considered additional factors that could have contributed to the observed attentional effects. First, the exogenous cue might have functioned as a temporal warning signal. However, the interval between cue and stimulus onset was fixed across trials, meaning that the cue did not provide temporal information beyond what participants could already anticipate. Furthermore, participants completed a large number of trials (≥ 4000), making it highly likely that the temporal relationship between trial onset and target onset was overlearned. These considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions.

      Another possibility is that the 100% validity of the exogenous cue could potentially have promoted endogenous attentional engagement. Yet, several characteristics of our task strongly limited the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to the observed attentional benefits in our task.

      Regarding the points on statistical reporting and participant details, we followed the reviewer’s suggestions by adding post hoc power analyses and providing more comprehensive reporting of the linear model outputs (see Appendices 1 and 2). We also expanded the description of the training procedures conducted with participants prior to formal data collection in the Methods section.

      We appreciate the reviewer for raising the important question of how our findings may relate to oculomotor control. To address this, we analyzed trials excluded from the manuscript due to saccades. This analysis revealed that saccade latencies were shorter in the valid condition than in the neutral condition (see Figure 2 — Supplementary Figure 2). This earlier saccade onset may reflect exogenously triggered preparatory activity in the oculomotor system in response to the salient cue. Future studies are needed to examine whether this preparatory mechanism serves to efficiently guide microsaccades or saccades toward behaviorally relevant stimuli in everyday vision. We have incorporated this point into the Discussion, highlighting a potential mechanistic link between exogenous attention and oculomotor behavior.

      Reviewer #2 (Public review):

      Summary:

      This study aims to test whether foveal and non-foveal vision share the same mechanisms for endogenous attention. Specifically, they aim to test whether they can replicate at the foveola previous results regarding the effects of exogenous attention for different spatial frequencies.

      Strengths:

      Monitoring the exact place where the gaze is located at this scale requires very precise eyetracking methods and accurate and stable calibration. This study uses state-of-the-art methods to achieve this goal. The study builds on many other studies that show similarities between foveal vision and non-foveal vision, adding more data supporting this parallel.

      Weaknesses:

      The study lacks a discussion of the strength of the effect and how it relates to previous studies done away from the fovea. It would be valuable to know if not just the range of frequencies, but the size of the effect is also comparable.

      We thank the reviewer for raising these important issues. In response, we have expanded the Discussion to link our findings to prior work. First, we included a direct comparison of our effect sizes with those reported in previous studies. This analysis revealed that our effect sizes are highly comparable to those earlier studies (see Figure 3 — Supplementary Figure 4). Second, we contextualized our findings within the popular framework of normalization model of attention in the Discussion. We detected a mixture of contrast and response gain effects, consistent with predictions from the normalization framework given our experimental design. Finally, we extended the Discussion to consider potential underlying neural mechanisms. Specifically, we suggested that differences in attentional modulation, particularly the manifestation in response gain vs. contrast gain between the fovea and extrafovea, may reflect distinct characteristics of foveal neurons relative to those in extrafoveal regions.

      Reviewer #3 (Public review):

      Summary:

      This paper explores how spatial attention affects foveal information processing across different spatial frequencies. The results indicate that exogenously directed attention enhances contrast sensitivity for low- to mid-range spatial frequencies (4-8 CPD), with no significant benefits for higher spatial frequencies (12-20 CPD). However, asymptotic performance increased as a result of spatial attention independently of spatial frequency.

      Strengths:

      The strengths of this article lie in its methodological approach, which combines a psychophysical experiment with precise control over the information presented in the foveola.

      Weaknesses:

      The authors acknowledge that they used the standard approach of analyzing observeraveraged data, but recognize that this method has limitations: it ignores the uncertainty associated with parameter estimates and the relationships between different parameters of the psychometric model. This may affect the interpretation of attentional effects. In the future, mixed-effects models at the trial level could overcome these limitations.

      We thank the reviewer for this comment. Our Methods section continues to transparently discuss these limitations, as well as the fact that these limitations are shared with most published studies in psychophysics. Additionally, we now include measures of uncertainty for all key effects (see Appendices 1 and 2), and we have reported effect sizes throughout the Results section. Finally, we have added post hoc power analyses to the Methods. Following previous approaches to power calculation for related experiments, we found that our study was sufficiently powered to detect the main effect of attention and had moderate power to detect the interaction between attention and spatial frequency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manipulation of attention raises some interpretive concerns. Since only valid and neutral cue conditions were included, the results might reflect differences in temporal predictability rather than true spatial reorienting of attention. In other words, the valid cue could act mainly as a temporal warning signal that reduces uncertainty about stimulus onset. Without invalid trials or a non-predictive control cue, it remains difficult to separate spatial and temporal contributions to exogenous attention.

      We thank the reviewer for raising this point. In this regard, we would like to clarify that there was no temporal uncertainty in stimulus onset: across all conditions and trial types, the stimulus was presented at the same time relative to the start of the trial, i.e., 600 ms after the start. Yet, we acknowledge that the shorter temporal proximity between the cue and stimulus in valid trials could serve as an additional temporal warning signal, potentially conferring an advantage relative to the neutral condition. While we cannot completely rule out a contribution of such temporal cueing within the constraints of the current experimental design, we believe its impact was limited. Specifically, the fixed cue-stimulus interval reduced the cue’s ability to convey additional temporal information. Furthermore, observers completed a large number of trials (≥4000), and the temporal contingency between trial onset and target onset was likely overlearned. Taken together, these considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions. We now mention this in the revised Discussion (lines 309-318).

      We recognized that the original Figure 2 illustrating the experimental paradigm may have caused confusion regarding the timing structure of the task. We have therefore updated the figure to more explicitly illustrate the trial timeline in both conditions.

      (2) The reported effects seem small, and no power analysis is provided. With only seven participants, the study may not have enough statistical power to confirm that the observed differences are reliable or generalizable. Although the technical precision in gaze and stimulus control is impressive, it cannot offset the limitations of a small sample. The authors should include effect size estimates, confidence intervals, and ideally a post-hoc power analysis.

      The statistical results are reported only as χ² values from model comparisons, which do not show the direction or size of the effects. For clarity and transparency, these tests should be accompanied by fixed-effect estimates with their standard errors and confidence intervals, so readers can better assess both the reliability and perceptual relevance of the findings.

      The reviewer raised several important points regarding the study's statistical rigor.

      In the revised manuscript, we now report effect size estimates (Cohen’s d) in the Results section and Appendices. Effect sizes were in the medium-to-large range, including the effect of attention on contrast sensitivity at 4 and 8 CPD, and the difference in attentional benefit on contrast sensitivity between 4 and 12 CPD and between 8 and 12 CPD. We have also included the full model outputs, including standard errors and confidence intervals, in the Appendices.

      The sample size for the current study was determined based on the magnitude of the attentional effects observed in our previous work (Guzhang et al., 2021). The experimental design and dependent measures were highly similar across the two studies, and the prior study revealed a robust effect, which accounted for a substantial proportion of within-observer variance in a tightly controlled repeated-measures design.

      We have revised the manuscript, adding bootstrap-based power estimates, following the procedure described by Jigo and Carrasco (2020), using data from Guzhang et al. (2021). Assuming the effect size in our current study would be comparable to the prior one, 2 to 12 observers were randomly sampled with replacement, and a one-way repeated-measures ANOVA with attention as the main factor was used. This procedure was repeated 10,000 times, and power was estimated as the proportion of iterations yielding a significant main effect for each sample size. The results of this analysis indicate that a sample size of five observers would have been sufficient to achieve approximately 80% power to detect the main effect of attention in the prior study. Based on these estimates, the sample size used in the current study (seven observers) is adequately powered.

      We also conducted a post hoc power analysis to evaluate the power of our design to detect the main effects and their interaction. It was performed using the R package simr, which estimates statistical power for mixed-effects models through model-based simulation. Specifically, simr generated datasets based on the fixed- and random-effect structure of the fitted model, preserving the observed effect sizes and variance components. For each simulated dataset, the model was refit, and the effect of interest was tested. By repeating this procedure 501 times across different sample sizes, power was estimated as the proportion of simulations in which the effect was statistically significant. Based on these post hoc simulations, we estimated that our study had high power (>95%) to detect the main effects and moderate power (>65%) to detect the interaction. Although the estimated power for the interaction was lower than for the main effects, the observed effect size was substantial (as indexed by Cohen’s d), indicating that the interaction was not trivially small.

      We now describe these analyses in lines 501-532 in the Methods section.

      (3) The task seems quite demanding, requiring fine spatial discrimination, very small stimuli, and head stabilization with a bite bar. It is not clear whether participants were naïve or experienced observers. If they had prior psychophysical training, practice effects could have influenced the results, particularly given the lack of invalid trials. The manuscript would benefit from clarifying participants' experience level and describing any training or familiarization procedures.

      We appreciate the reviewer’s concern regarding potential training effects. All observers had prior experience with similar tasks, but were naïve to the scope of this study. Each participant underwent an initial familiarization phase of approximately 50 trials with the experimental setup of this study. They then completed an additional ~50 trials to estimate their individual contrast thresholds per spatial frequency level before we proceeded with data collection at the five predefined contrast levels.

      Based on our experience, we have found that, for experiments similar to the one described here, observers quickly adapt to the setup and are generally able to maintain reliable fixation and stable performance, even during the initial training phase. In addition, each participant completed approximately 400 trials before the data collection started. Even observers who began the session with no prior experience would have become practiced with the setup by the time the actual data-collection phase started, during which ~4000 trials were collected per observer. Therefore, whether an observer participated in previous experiments is unlikely to meaningfully affect the results, as the large number of trials ensures comparable levels of task familiarity across individuals.

      Crucially, valid and neutral trials were interleaved throughout the session. Any general learning or practice would therefore influence both conditions equally. Despite this, we still observed clear performance improvements in the valid condition relative to the neutral condition, indicating that the observed benefits cannot be attributed solely to practice and reflect an attentional enhancement. We have added elaboration on the training procedures in Methods (lines 411-429).

      Finally, we recognize that the lack of invalid trials may raise concerns given our 100% spatially predictive cue, as noted in Reviewer 3’s first comment. We refer the reader to our response to that point for a more detailed discussion of cue validity and the distinction between exogenous and endogenous influences in our paradigm.

      (4) The study would benefit from a clearer connection between the behavioral results and possible underlying neural mechanisms. How might the observed changes in contrast sensitivity relate to known physiological processes at the retinal, thalamic, or cortical level? The discussion could be strengthened by framing the findings within established models of attentional modulation or by referring to known effects of attention in the early visual cortex.

      This is an important point, and we agree that framing the findings within established models of attentional modulation can strengthen the discussion. We believe that the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) offers a useful framework for interpreting our behavioral findings, especially the attention-related changes in contrast sensitivity and asymptotic performance observed at the foveal scale. We have now added a more detailed discussion linking our results to this model and considering, explicitly as speculation, how known physiological processes at different stages may contribute to the observed effects in Discussion (lines 264-307).

      (5) The ecological relevance of the results is not fully developed. The authors propose that the observed effects may resemble natural attentional shifts triggered by salient events, yet the brief, highly localized flashes used here are somewhat artificial. A more likely interpretation is that these mechanisms relate to oculomotor control within the fovea, perhaps reflecting preparatory activity for microsaccades or fine fixation adjustments. Considering this view could broaden the impact of the findings and link them to current discussions on the relationship between attention and oculomotor control.

      We thank the reviewer for raising this important point regarding the ecological relevance of our findings, which we did not sufficiently address in the original manuscript. Although we briefly motivated scenarios that engage exogenous attention at high spatial resolution, such as detecting road signs or traffic lights at a distance while driving, we did not fully elaborate on how such attentional processes may link to downstream visual and oculomotor functions.

      In our experiment, observers maintained fixation and avoided saccades throughout the trial. Nevertheless, in a subset of trials (on average 17% ± 3%), observers made saccades after stimuli disappeared and prior to providing a response. Typically, these movements were microsaccades with amplitudes smaller than 0.5°, directed toward the target location, in both valid and neutral trials. These saccades were discarded prior to the analyses performed in the manuscript. Inspired by the reviewer’s feedback, we decided to examine the saccade latency in these trials relative to the onset of the response cue to assess whether exogenous cueing influenced oculomotor timing. Notably, we observed an earlier onset of microsaccades in valid compared to neutral trials (71 ms ± 50 ms faster, P < 0.01). We have now added this observation as Figure 2 — Supplementary Figure 2 in the manuscript. Because the presence of an exogenous pre-cue was the only difference between the two trial types, the earlier microsaccade onset likely reflects exogenously triggered preparatory activity in the oculomotor system in response to the salient pre-cue. Such fine-grained attention may prime potential eye movements toward behaviorally relevant stimuli for further examination. This interpretation is consistent with the reviewer’s suggestion and supports a mechanistic link between exogenous attention and oculomotor behavior, extending the ecological relevance of our findings. This point has been added to the Discussion on lines 329 to 340.

      We also conducted analysis to examine ocular drift behavior following the response cue. Although trials included in the manuscript analyses were constrained such that fixation during target presentation remained within a small window (10’ radius) around the fixation marker, we did not assess whether gaze subsequently drifted closer to the target location after the response cue. One possibility is that exogenous attention might bias ocular drift, shifting the preferred locus of fixation closer to the target. To address this, we computed the average Euclidean distance between gaze position and the target location following response cue onset for valid and neutral trials. However, we found no significant difference in gaze-target distance between valid and neutral trials (p = 0.57).

      Although the spatial cueing approach has long been used to probe exogenous attention in a controlled manner in psychophysical experiments, we fully recognize the importance of understanding attention under more naturalistic viewing conditions that allow observers to freely move their eyes. Developing paradigms that incorporate more naturalistic, salient stimuli would be an important direction for future work, enabling investigation of exogenous attention in ecologically valid settings and its influence on sequential actions and processes, including oculomotor behavior.

      (6) There is no statement about the availability of the data and code used for the experiment.

      We have now added the data and code for the analysis pipeline to the Open Science Framework (OSF).

      Reviewer #2 (Recommendations for the authors):

      (1) The study could discuss the strength of the effect and how it relates to previous studies.

      We thank the reviewer for raising this point. To facilitate direct comparison with the study by Jigo and Carrasco (2020), we computed attentional benefit as the ratio of contrast sensitivity between the valid and neutral conditions (now shown in Figure 3 — Supplementary Figure 4). In their data, the attentional benefit at 0° eccentricity peaked just below 4 CPD, with a ratio of approximately 1.2, corresponding to a ~20% increase in contrast sensitivity. This magnitude closely matches the benefit we observed for fine-grained attentional shifts within the foveola at spatial frequencies between 4 and 8 CPD (17% ± 12% and 16% ± 14% for 4 and 8 CPD, respectively). We have added this comparison to the Discussion (lines 246-262).

      In addition, we acknowledge that prior studies have reported heterogeneous attentional effects, including pure contrast gain, pure response gain, or a mixture of the two. We now explicitly reference these findings in the Discussion and use the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) to account for how differences in stimulus configuration, attention field size, and eccentricity may account for discrepancies between our findings and prior studies examining attention in the extrafovea or when broadly distributed across the fovea (lines 264-307).

      (2) Minor details:

      (a) The abstract mentions gaze-contingent-display, but if I understand correctly, the stimulus was not presented in a gaze-contingent manner.

      That’s correct. Although stimuli were not presented gaze-contingently, we used a gaze-contingent calibration procedure (see Methods, lines 386-389) to achieve higher precision in localizing the line of sight. This increased accuracy was essential for selecting trials in which stimuli remained at the intended eccentricity relative to the preferred locus of fixation. To avoid potential confusion, however, we have removed this detail from the abstract.

      (b) Line 361: What is the manual calibration the authors are referring to? It does not appear to be described.

      The text has been updated to explain more explicitly what auto and manual calibrations are.

      (c) Line 402: There may be a typo towards the end of the line "t0" should be "to"?

      Text has been updated. Thank you.

      (d) Line 405. What are the units of 30?

      It’s in arcminutes. Text has been updated.

      Reviewer #3 (Recommendations for the authors):

      I found this paper very interesting, with a solid methodological approach and excellent data analyses. The authors present a well-designed psychophysical study that contributes valuable insights into the mechanisms of attention in the foveola. The methodology is rigorous, and the analyses are thoughtfully conducted and clearly presented.

      That said, I would like to offer a few comments and suggestions for clarification and further consideration:

      (1) Exogenous attention:

      If a 100% spatially predictive cue is compared to a neutral cue, the observed attentional effect should not be described as (purely) exogenous, since the cue fully predicts where the post-cue will request a response. This situation represents a case in which attention is exogenously driven but endogenously maintained (see e.g., Chica et al., 2013, Behavioural Brain Research). I recommend clarifying this distinction in the manuscript (and title) to avoid conceptual ambiguity.

      We thank the reviewer for raising this important conceptual point. We agree that because the pre-cue was 100% spatially predictive, the resulting attentional allocation cannot be considered purely exogenous. Although the abrupt, salient onset of the cue obligatorily triggers an exogenous shift of attention, its validity could also promote endogenous maintenance of attention at the cued location. Yet, several characteristics of our task strongly limit the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to perceptual encoding in our task.

      We also considered the possibility that our response cue (a retro-cue indicating the target location) might recruit endogenous attention to the internal perceptual representation. Importantly, however, this retro-cue was equally informative in valid and neutral conditions. Any enhancement driven by the retro-cue should therefore benefit both trial types to the same extent. The fact that we still observe a robust advantage in valid trials supports the conclusion that the performance improvements predominantly reflect fast, spatially specific exogenous facilitation rather than slower endogenous processes.

      We have revised the manuscript to clarify that although the cue obligatorily triggers an exogenous attentional shift, its 100% validity could allow for endogenous attention maintenance as shown by Chica et al. (2013). We also added an explanation detailing why such endogenous contributions are unlikely to drive our main results, given the rapid cue-target timing in our task in Discussion (lines 319-327). Finally, to further prevent ambiguity, we updated the manuscript title to refer to “exogenously triggered attention,” rather than simply “exogenous attention.”

      (2) Interpretation of statistical effects:

      The statement "Therefore, asymptotic performance showed only independent, additive effects of frequency and attention, without a systematic influence of spatial frequency on the attentional benefit" seems not to be supported by the data, as the main effect of frequency was not significant.

      We thank the reviewer for this helpful observation. We agree that the original phrasing did not accurately reflect the results, as the main effect of spatial frequency was not significant (p = .0545). We have revised the sentence to “Therefore, asymptotic performance reflected an effect of attention alone, with no detectable contribution of spatial frequency or of the interaction between spatial frequency and attention” to avoid implying such an effect (lines 210-211).

      If data from two participants were missing in one condition, the authors should consider replacing this data with new participants.

      We agree with the reviewer that having two observers with missing data in one condition is not ideal. However, the 20 cpd condition was deliberately positioned near the resolution limit at the tested eccentricity and was therefore extremely demanding. Observers also had to monitor two stimulus locations simultaneously, further increasing task difficulty. This condition was challenging for all observers and, despite testing up to the highest contrast, two of seven observers were unable to perform above chance, indicating that for a non-trivial fraction of observers, this condition was effectively unmeasurable with our paradigm. As noted in the manuscript, the 20 cpd condition also has a statistical limitation: thresholds clustered near the upper bound (approaching 100% contrast), compressing the dynamic range and markedly reducing variance relative to lower spatial frequencies, which violates the homoscedasticity assumption of linear models. For these reasons, we did not pursue additional data collection in this condition. Nevertheless, we report the data that were successfully obtained, as they remain informative about performance near the resolution limit.

      We finally note that even when setting aside the 20 CPD condition, our data support this conclusion: comparisons between 4 and 12 CPD, as well as between 8 and 12 CPD, revealed large differences in the magnitude of the attentional benefit (d = 0.65, 95% CI [0.11, 1.18] and d = 0.62, 95% CI [0.08, 1.14], respectively). To further quantify these effects, we have added Cohen’s d to report the effect sizes for these spatial-frequency comparisons across texts in Results as well as in tables in Appendices.

      (3) Sample size:

      As this is a psychophysical experiment with many trials and few participants, I am curious about how the authors determined the appropriate sample size and the number of trials required to detect the expected effects. Given that many effects were found to be significant, it seems that statistical power was adequate; however, it would be helpful if the authors could explain how this issue was addressed a priori during experimental planning.

      We appreciate that the reviewer raised this point. Please see the reply to the second point from Reviewer 1, who raised a related question about statistical power.

      (4) Figure 2 clarification:

      In Figure 2B, I do not fully understand the "Valid" and "Neutral" representation. Both conditions include a post-cue indicating the right position; however, in the neutral condition, there is a central fixation square, whereas in the valid condition, there is not. Please clarify this aspect of the figure. I think I understood the paradigm, but this part of the figure is misleading.

      Precue only exists in valid condition. But there is a mistake where fixation marker is missing in valid condition in panel B.

      We thank the reviewer for pointing this out. We have updated Figure 2 to explicitly show the sequence of valid vs. neutral trials. The fixation mark remained on the screen throughout the trial in both the valid and neutral conditions. After a 500 ms fixation period, an exogenous cue was presented for 30 ms in valid trials, followed by a 70 ms interval before stimulus onset. In neutral trials, no cue was presented, and the screen remained blank for 100 ms before the stimuli appeared. In conditions, a response cue would appear 50 ms after stimulus offset.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful consideration of our work and constructive comments. We are glad that reviewers appreciated the rigor and value of our work. In response to the reviewer comments we have made the following changes:

      (1) Addition of new experiments on EndoA localization at the Drosophila NMJ (Fig. 2).

      (2) Addition of new experiments on Dap160 localization at the Drosophila NMJ (Fig. 2).

      (3) Addition of new experiments to validate Dynamin, Dap160 and EndoA antibodies (Fig. 2 – figure supplement 1).

      (4) Assessment of the activity-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 3).

      (5) Assessment of the liprin-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 8).

      (6) Addition of a limitations section to the discussion to directly address that spontaneous release was not fully ablated in our studies and might contribute to recruitment.

      (7) Addition of an outlook to the same section on what experimental avenues could address the limitations in the future.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Emperador-Melero et al. seek to determine whether recruitment of endocytic machinery to the periactive zone is activity-dependent or tethered to delivery of active zone machinery. They use genetic knockouts and pharmacological block in two model synapses - cultured mouse hippocampal neurons and Drosophila neuromuscular junctions - to determine how well endocytic machinery localizes after chronic inhibition or acute depolarization by super-resolution imaging. They find that acute depolarization in both models has minimal to no effect on the localization of endocytic machinery at the periactive zone, suggesting that these proteins are constitutively maintained rather than upregulated in response to transient activity. Interestingly, chronic inhibition slightly increases endocytic machinery levels, implying a potential homeostatic upregulation in preparation for rebound depolarization. Using genetic knockouts, the authors show that localization of endocytic machinery to periactive zones occurs independently of proper active zone assembly, even in the absence of upstream organizers like Liprin-α. Overall, they propose that the constitutive deployment of endocytic machinery reflects its critical role in facilitating rapid and reliable membrane internalization during synaptic functions beyond classical endocytosis, such as regulation of the exocytic fusion pore and dense-core vesicle fusion. Although many experiments reveal limited changes in the localization or abundance of endocytic machinery, the findings are thorough, and data substantially support a model in which endocytic components are organized through a pathway distinct from that of the active zone. This work advances our understanding of synaptic dynamics by supporting a model in which endocytic machinery is constitutively recruited and regulated by distinct upstream organizers compared to active zone proteins. It also highlights the utility of super-resolution imaging across diverse synapse types to uncover functionally conserved elements of synaptic biology.

      We thank the reviewer for the positive assessment of our study.

      Strengths:

      The study's technical strengths, particularly the use of super-resolution microscopy and rigorous image analyses developed by the group, bolster their findings.

      We thank the reviewer for highlighting the technical strength of our work.

      Weaknesses:

      One notable limitation, however, is the absence of interrogation of endocytic proteins previously suggested to be recruited in an activity-dependent manner, in particular, endophilin.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drospophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin, which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al., 2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord versus Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together with our work, we conclude that these data suggest that Endophilin constitutively, but not completely, localizes to the periactive zone.

      Reviewer #2 (Public review):

      Summary:

      This study examines whether the localization of endocytic proteins to presynaptic periactive zones depends on synaptic activity or active zone scaffolds. Using a combination of genetic and pharmacological perturbations in Drosophila and mouse neurons, the authors show that proteins such as Dynamin, Amphiphysin, AP-180, and others are still recruited to periactive zones even when evoked release or active zone architecture is disrupted. While the results are mostly negative, the study is methodologically solid and contributes to a more nuanced understanding of synaptic vesicle recycling machinery.

      We thank the reviewer for deeming our work solid and for highlighting its importance for the field.

      Strengths:

      (1) The experimental design is careful and systematic, covering both fly and mammalian systems.

      (2) The use of advanced genetic models (e.g., Liprin-α quadruple knockout mice) is a notable strength.

      (3) High-resolution imaging (STED, Airyscan) is well used to assess spatial localization.

      (4) The findings clarify that certain core assumptions - such as strict activity dependence of endocytic recruitment - may not hold universally.

      We thank the reviewer for pointing out these strengths.

      Weaknesses:

      (1) The study would benefit from a clearer positive control to demonstrate activity-dependent recruitment (e.g., Endophilin).

      We have added experiments to measure the localization of Endophilin, a protein previously reported to localize to the synaptic vesicle cloud [1], in Drosophila NMJs (Figs. 2 and 3). We observed that EndoA localized both to the synaptic vesicle cloud and to the periactive zone area. While stimulation did not enhance levels in either compartment, this outcome is not inconsistent with shuttling of protein between compartments during activity. Nevertheless, our data support a model in which EndoA, like the other tested endocytic proteins, is present at the periactive zone at rest.

      (2) The reliance on Tetanus toxin in the Drosophila NMJ experiments in my eyes is a limitation, as it does not block all presynaptic fusion events; this should be discussed more directly.

      We agree with the point of the reviewer. To more directly discuss it, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” (519-523).

      (3) The potential role of Dynamin in organizing other periactive zone proteins is not addressed and could be an important next step.

      We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”

      (4) Some small changes in protein levels upon silencing are reported; their biological meaning (e.g., compensation vs. variability) is not fully clarified.

      These changes might include homeostatic adaptations. In the revised version of the manuscript, this is addressed on lines 135-137 and 405-407. We think it is overall difficult to assign biological meaning to small-magnitude changes, and chose to highlight the main point that there are no large-magnitude changes.

      (5) While alternative organizing mechanisms (actin, lipids, adhesion molecules) are mentioned, a more forward-looking discussion of how to test these models would be helpful.

      Following the reviewer’s suggestion, we have added an outlook section to the discussion where we provide suggestions for future studies (lines 510-543).

      (6) The authors should consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.

      We have included new experiments on EndoA at the fly neuromuscular junction (Fig. 2, Fig. 3, Fig. 8, Fig. 3 – figure supplement 1) and have added appropriate discussion of these findings as outlined above.

      Reviewer #3 (Public review):

      Summary:

      This study examines how synaptic endocytic zones are positioned using a combination of cultured neurons and the Drosophila neuromuscular junction. The authors test whether neuronal activity, active zone assembly, or liprin-α function is required to localize endocytic zone markers, including Dynamin, Amphiphysin, Nervous Wreck, PIPK1γ, and AP-180. None of the manipulations tested caused a coordinated disruption in the localization or abundance of these markers, leading to the conclusion that endocytic zones form independently of synaptic activity and active zone scaffolds.

      We thank the reviewer for reviewing our work.

      Strengths:

      The work is systematic and carefully executed, using multiple manipulations and two complementary model systems. The authors consistently examine multiple molecular markers, strengthening the interpretation that endocytic zone positioning is robust to changes in activity and structural assembly.

      We thank the reviewer for pointing out these strengths.

      Weaknesses:

      The main limitation is that the study does not test whether the methods used are sensitive enough to detect subtle functional disruption, and no condition tested produces clear disorganization of the endocytic zone. As a result, the conclusion that these zones assemble independently is supported by negative data, without a strong positive control for disassembly or mislocalization.

      We are confident that our methods are sensitive enough to detect changes within synaptic compartments. First, for mouse neurons assessed with STED microscopy, we have demonstrated that we can distinguish between the N- and the C-termini of the presynaptic protein Bassoon, which are positioned only a few tens of nanometers apart [4]. We have subsequently been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart and have established that genetic manipulations of active zone proteins induce detectable disruptions as assessed by STED microscopy [4-12]. Given that the periactive zone is larger than the distances that we can resolve, we are confident that we can detect changes in this area with enough sensitivity. Second, for Drosophila NMJs, we use a carefully validated workflow that allows assessing the distribution of periactive zone proteins and can detect subtle changes [13]. Unfortunately, there are no known manipulations that lead to periactive zone disassembly that could serve as a positive control, which reflects the little knowledge available in this field. We acknowledge that there may be subtle changes in protein localization that escape the resolution of our microscopy methods or experimental design, but this would not undermine the conclusion that the periactive zone remains assembled across the manipulations that we have tested. Overall, none of the manipulations we test induces a detectable disruption of the periactive zone. Naturally, we cannot exclude milder effects and have added a limitations section to discuss this possibility and some of the subtle changes we observe.

      This paper addresses a longstanding question in synaptic biology and provides a well-supported boundary on the types of mechanisms that are likely to govern endocytic zone localization. The conclusions are well justified by the data, though additional evidence would be needed to define the assembly mechanism itself.

      We thank the reviewer for the support of the conclusion of our study.

      Recommendations for the authors:

      Reviewing Editor Comments:

      This is a rigorous study that, while presenting largely negative data, delimitates the processes that control peri-active zone organization. In addition to the interpretive and technical comments below, we encourage the authors to consider extending this study in two areas. First, examining the activity-dependence of Endophilin, and perhaps other factors, being recruited to the PAZ, where previous research has indicated a positive role for activity. Second, further characterization of the role of miniature release events in potentially contributing to PAZ organization. Overall, this was a rigorous and well-executed study.

      We thank the reviewing editor for this positive assessment of our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The rationale for comparing chronic inhibition to acute depolarization could be more clearly articulated. While this approach may be grounded in prior studies, the physiological consequences of chronic silencing differ markedly from those of transient activity, and these distinctions should be more explicitly addressed in the interpretation of results. For example, might lower intensity, chronic stimulation be a better comparison? Since fixation takes place immediately after stimulation, the time window to capture changes in protein recruitment may be curtailed.

      We thank the reviewer for this comment. The introduction of the manuscript now includes a rationale on lines 110-112. By inhibiting evoked synaptic vesicle fusion throughout the lifespan of neurons, we assessed whether this process is necessary for periactive zone assembly and concluded that it is not a requirement. By acutely depolarizing neurons with 50 mM KCl or with a 40 Hz train of action potentials, we were able to test whether synaptic vesicle fusion triggers the rapid recruitment of endocytic proteins to the periactive zone and concluded that this is not the case for most of the endocytic proteins that we studied. While these results indicate that a constitutive pathway must exist to assemble the periactive zone, we remain agnostic as to whether stimulation paradigms not tested in our study can enhance the deployment of endocytic proteins, especially over long periods of time. This may be the case for low, chronic stimulation, as suggested by the reviewer. We clarify these limitations on a “limitations and outlook” section of the discussion (lines 510-543).

      (2) Amphiphysin stood out as the only protein showing a notable change in opposite directions under either active zone protein knockout/blockers and Liprin-α knockout. Given the predominance of negative results, it would be valuable to devote more discussion to why Amphiphysin behaves differently. What functional role might it play in this context that sets it apart from other endocytic components?

      As suggested by the reviewer, we have extended the discussion on Amphiphysin. One possibility why Amphiphysin may respond differently to different genetic manipulations or changes in stimulation is that different endocytic proteins might belong to different endocytic submachineries. This is addressed on lines 421-424. On lines 444-449, we further discuss the subtle decrease in the levels of Amphiphysin and AP-180 in Liprin-α mutants. We suggest that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus, and that this link may be partially disrupted in Liprin-α mutants. Overall, we note that Amphiphysin is still localized to the periactive zone at rest, and hence that it fits with the overall model of constitutive deployment that we propose.

      (3) The claim of activity-independence may need to be nuanced. Although the data suggest no recruitment in response to acute stimulation, the subtle changes following chronic inhibition complicate this interpretation, especially when considering redundancy. If activity-dependence is considered bidirectional, these findings might reflect a more complex regulatory mechanism. The interpretation in lines 188-190 more accurately captures this complexity than earlier generalizations.

      We agree with the reviewer that the dependence on activity should be discussed in a nuanced fashion. We have scrutinized the manuscript on this point and state throughout that recruitment is independent of evoked activity and not necessarily of any kind of activity. We believe that this interpretation is accurate because evoked release of neurotransmitter was ablated by the pharmacological and genetic manipulations that we used. Furthermore, we have included a “Limitations of the study” section in the discussion where we openly address that spontaneous fusion of synaptic vesicles cannot be ruled out as a potential mechanism to sustain periactive zone assembly (lines 514-523). Finally, we have expanded on the complexity of periactive zone assembly relative to activity. In particular, homeostasis may contribute to increased levels of endocytic proteins upon chronic blockade of evoked transmission (lines 404-406).

      (4) Given published work on endophilin's role in activity-dependent endocytic recruitment, adding endophilin (at least in the Drosophila NMJ experiments) would be highly informative.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for these findings compared to previous work on Endophilin [3], which we discuss on lines 407-410:

      “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are compatible with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.

      (5) Line 57 might have a typo in the citation.

      We thank the reviewer for pointing this out. The citations now include: Bai et al., 2010; Jiang et al., 2024; Koh et al., 2007; Winther et al., 2013 and Winther et al. 2015. Please note that these two last citations are grouped as Winther et al. 2013, 2015 following our formatting style.

      (6) Line 208 might be missing a citation that justifies parameters.

      In the revision, this information is discussed on lines 222-224, where we cite our prior work describing these data: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023)”.

      Reviewer #2 (Recommendations for the authors):

      (1) Please consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin [3], which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are consistent with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.

      (2) Expand the discussion of TeNT's limitations-specifically that it does not block spontaneous fusion or alternative fusion pathways-and consider referencing more stringent tools (e.g., Botulinum toxins or SNARE mutants), even if they weren't used here.

      Following the reviewer’s suggestion, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017)” (520-523).

      (3) We encourage the authors to briefly discuss whether Dynamin might contribute to periactive zone structure beyond its role in membrane fission. Loss-of-function data could be particularly informative in future work.

      We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”

      (4) Clarify the interpretation of increased endocytic protein levels upon chronic silencing - are these interpreted as homeostatic responses or experimental variability?

      We suggest that these changes might include homeostatic adaptations. We note that this increase is of the same magnitude as the increase in active zone proteins following a similar pharmacological manipulation on lines 405-406, where we state that “a mechanism for this effect might be a homeostatic response (Wen and Turrigiano, 2024) similar in magnitude to the increase in active zone protein levels following activity blockade (Held et al., 2020).”

      (5) The Discussion could be strengthened by sketching out more concrete experimental approaches to test candidate mechanisms (e.g., roles for actin, lipids, adhesion molecules) in organizing periactive zones.

      The potential roles of the cell adhesion molecules (lines 430-440), cytoskeleton and lipids (442-452) are addressed in the discussion. Furthermore, following the reviewer’s suggestion, we have added the following statement (lines 541-543): “This work builds a foundation to assess alternative mechanisms and models of periactive zone assembly, including roles of the cytoskeleton, lipids, adhesion molecules, and intrinsic endocytic protein interactions”. We hope that the reviewer agrees that the discussion of our paper is not the right format to provide a concrete experimental plan for future work. In our view, the discussion should put the findings of our experiments in the context of the field.

      Reviewer #3 (Recommendations for the authors):

      (1) At a spine synapse, the endocytic zone is estimated to be between 100-200nm from the active zone. The focus of the author's analysis is largely outside of this region (0-150nm), raising the question of whether the area studied may be outside of the area affected by the manipulations made. While STED systems claim ~80 nm resolution, this is rarely achieved in practice, and the authors do not report the effective resolution of their system. Reporting the resolution achieved would address this issue. In addition, super-resolution imaging does not appear to have been used at the Drosophila NMJ. The authors should clarify whether resolution limitations influenced the choice of analysis region and whether their imaging approach is sufficient to detect changes in the endocytic zone.

      We believe that it is unlikely that the relevant signals were missed. First, in mouse synapses, most signal corresponding to endocytic proteins was detected inside the selected region of interest. Our rationale to select the area was based on the fact that expanding the region analyzed would have reduced the sensitivity of our approach, as averaging over a larger area would dilute the signal. The resolution of our microscopy should not be a limitation either. In our previous work, we demonstrated that STED microscopy allows discriminating between the N- and the C-terminal termini of the presynaptic scaffold Bassoon, which are positioned only a few tens of nanometers apart [4]. This establishes that we can resolve differences at tens of nanometers in biological context, which is more relevant than the resolution measured with fluorescent beads (which we have repeatedly assessed to be ~80 nm laterally). Subsequently, we have also been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart [4-12]. Given that the periactive zone spans over a larger area than the distances that we can resolve experimentally in the examples above, we are confident that our measurements are sensitive enough to detect changes in this area.

      Second, for Drosophila NMJs, the choice for the region of interest and the overall analysis was done following a workflow validated in our previous work [13]. This method analyzes both immediately adjacent and more distant regions from the active zone, and does not exclude any region based on distance from the active zone as described on lines 222-224: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023).” In our previous study, we analyzed the distribution of periactive zone proteins at rest with STED microscopy and with Airyscan confocal microscopy. The resolution provided by Airyscan is reported to be ~175 nm in XY and ~400 nm in Z, which is sufficient to assess localization to the periactive zone compartment imaging methods and is not inferior to imaging methods previously used to report changes in the distribution of endocytic proteins; for examples, see [1,2]. In the revised manuscript, we have added new data measuring the levels and distribution of EndoA and Dap160 using STED microscopy (Figure 3 – figure supplement 1). The results acquired with STED microscopy and with Airyscan confocal microscopy are consistent with one another.

      Overall, the accuracy of the imaging methods and analyses used in this study are sufficient to assess periactive zone structure given its size and organization.

      (2) Interestingly, in a number of cases, the authors observe significant differences in endocytic markers (Figure 1q, 4k, 6k, 6r). However, little is made of these differences. The authors should provide more discussion of these changes and how they make sense of them alongside their claims of a lack of effect from their manipulations.

      The reviewer raises a good point. We interpret these changes in two different ways. First, we suggest that changes observed in response to block of action potentials or disassembly of the active zone might be homeostatic. This is addressed on lines 135-137. Second, we discuss that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus. Several active zone proteins interact with the actin cytoskeleton. One of them is Liprin-α. This interaction may explain the decrease in the level of Amphiphysin and AP-180 at the periactive zone in Liprin-α null neurons. This is addressed on lines 444-449. We hope that the reviewer agrees that overall, we should focus on the main conclusion that deployment of endocytic proteins persists over a number of manipulations and synapse types.

      (3) The graphs in Figure 1c and 1g, 3g, 4c, 4e, 6c, and 6g do not appear to be identical. If the solid line represents the mean and the lighter color represents the distribution of these data, these data appear to be different from one another. It is surprising that these differences are not significant. What statistical tests were used to determine whether the differences in these graphs are not significant? Is the issue that a relatively now number of synapses were examined (30-60)? Did the authors conduct a power analysis?

      We apologize if the display of our data and analyses was not clear. We do not perform statistical analyses on the line profiles. Instead, we perform it on two values that are extracted from line profiles. These values are (1) the distance between the peak intensity values of the protein of interest and the marker and (2) the peak intensity values. For example, in Figure 1, distances are quantified and statistically analyzed in panel j, and the peak levels are quantified and statistically analyzed in panel k. We have clarified this in the legend of current Figures 1, 4, 5, and 7.

      (4) The authors clearly state that their experiments address the role of evoked activity in endocytic zone positioning, but they do not examine whether spontaneous vesicle fusion might play a role. Given the availability of Drosophila mutants that decrease (Doc2, Dunc-13) or increase (syt1) spontaneous release, this is a notable omission. Ideally, these mutants should be examined. And at a minimum, the authors should discuss whether spontaneous release could contribute to endocytic zone organization.

      We agree with the reviewer that spontaneous fusion of synaptic vesicles may contribute to periactive zone organization. Many of the genetic manipulations that we used in mouse neurons result in a significant decrease in spontaneous release. This includes Ca<sub>V</sub>2 triple knockouts with a ~60% decrease in spontaneous fusion [10], RIM+ELKS quadruple knockouts with a ~70% decrease in spontaneous fusion [9] and Liprin-α quadruple knockouts with a ~50% decrease in spontaneous fusion [7]. We cannot rule out that the spontaneous release that is left is sufficient to mediate assembly functions. The conclusive way to address this possibility is using a manipulation that ablates spontaneous release without altering other pathways. However, to our knowledge, this is not available. The manipulations suggested by the reviewer might suffer from similar limitations, as they would change the frequency of spontaneous release without fully ablating it, and they would also affect evoked release. We have included a limitations section in the discussion where we address this (lines 514-523), specifically stating “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited. While many of the manipulations used here, including Ca<sub>V</sub>2 knockout (Held et al., 2020), RIM+ELKS knockout (Tan et al., 2022; Wang et al., 2016) and Liprin-α knockout (Emperador-Melero et al., 2024) in hippocampal neurons, and TeNT expression in fly NMJs (Sweeney et al.,1995) , result in 50% to 70% decreased spontaneous release rates, it is possible that the remaining spontaneous release supports periactive zone assembly. Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” We hope that the reviewer agrees that assessing these mutants should be a topic of future studies, given that we already test many mutants in the paper.

      (5) In Figures 1 and 6, the authors assess presynaptic protein localization in cultured neurons, but it is unclear whether these are synaptic sites. Many presynaptic proteins traffic together and can accumulate at sites lacking postsynaptic specializations. The authors should validate that the observed spatial organization occurs at bona fide synapses, ideally by co-labeling with postsynaptic markers as done in Figure 4. If methods like these were used, providing more details on how synapses were identified and selected would be useful to the reader.

      While we understand the reviewer’s point, we are confident that the structures analyzed are bona fide synapses for three reasons, as we have established before across many papers [4-8,10-12,17].

      The diameter of the structures detected using the synaptic vesicle marker Synaptophysin aligns much more closely with the size of the large vesicle clusters found at presynaptic terminals than with that of a few transport vesicles.

      In side-view synapses, the bar-like distribution of the active zone marker (Bassoon or Munc13-1) at one edge of the vesicle cloud indicates that active zone proteins are organized at one edge of the vesicle cluster—consistent with the architecture of synapses.

      Synaptophysin is one of our key markers for detecting synapses. In our cultures, most of the Synaptophysin signal colocalizes with postsynaptic markers (either PSD-95 or Gephyrin), as we have established across many studies [4,7-12]. This indicates that the markers used here are sufficient to select synapses. Furthermore, the frequency at which synapses were identified using an active zone marker as the second marker was similar to that observed when using a postsynaptic marker, suggesting that we were not randomly including unrelated structures.

      (6) Many of the images, particularly of the Drosophila NMJ, are of low quality and are shown in very small images. In addition, the quality of the images throughout the paper makes it difficult to assess the author's analysis and results. The authors should provide larger, higher-quality images that show examples of the means for each of the examples shown. This is an issue for most of the figures, but is particularly prominent in the dNMJ. A minor additional point is that the authors should be clear whether the dNMJ images are collected at super-resolution or using a conventional microscope.

      We believe that the quality of our images is sufficient for the assessments made for the following reasons:

      These images were acquired with enough spatial resolution to assess levels at the PAZ as discussed in response to this reviewer’s first comment. In our previous work, we used images acquired at the same resolution and presented in the same manner for both mouse hippocampal synapses [6,7] and Drosophila NMJs [13,18]. In those previous studies, we drew conclusions at a similar level of detail as in the current study.

      In our view, our representative images are not inferior in quality to other papers in the field addressing similar questions [1,2,19,20].

      We have selected sample images based on the quantified mean values per condition. Hence, we strived to select panels that are objectively representative regarding the quantified parameters.

      We have specified microscopy methods in the figure legends. Specifically, for Drosophila NMJs, we used Airyscan confocal microscopy and STED microscopy. For each experiment, it is now stated which microscopy method was used in the corresponding legend.

      References:

      (1) Winther, Å. M. E. et al. An Endocytic Scaffolding Protein together with Synapsin Regulates Synaptic Vesicle Clustering in the Drosophila Neuromuscular Junction. J Neurosci 35, 14756–14770 (2015).

      (2) Winther, Å. M. E. et al. The dynamin-binding domains of Dap160/intersectin affect bulk membrane retrieval in synapses. J Cell Sci 126, 1021–1031 (2013).

      (3) Bai, J., Hu, Z., Dittman, J. S., Pym, E. C. G. & Kaplan, J. M. Endophilin functions as a membrane-bending molecule and is delivered to endocytic zones by exocytosis. Cell 143, 430–441 (2010).

      (4) Wong, M. Y. et al. Liprin-alpha3 controls vesicle docking and exocytosis at the active zone of hippocampal synapses. Proc Natl Acad Sci U S A 115, 2234–2239 (2018).

      (5) Emperador-Melero, J., de Nola, G. & Kaeser, P. S. Intact synapse structure and function after combined knockout of PTPδ, PTPσ, and LAR. Elife 10, (2021).

      (6) Emperador-Melero, J. et al. PKC-phosphorylation of Liprin-α3 triggers phase separation and controls presynaptic active zone structure. Nat Commun 12, 3057 (2021).

      (7) Emperador-Melero, J. et al. Distinct active zone protein machineries mediate Ca2+ channel clustering and vesicle priming at hippocampal synapses. Nature Neuroscience 2024 1–15 (2024) doi:10.1038/s41593-024-01720-5.

      (8) Tan, C., Wang, S. S. H., de Nola, G. & Kaeser, P. S. Rebuilding essential active zone functions within a synapse. Neuron 110, 1498-1515.e8 (2022).

      (9) Wang, S. S. H. et al. Fusion Competent Synaptic Vesicles Persist upon Active Zone Disruption and Loss of Vesicle Docking. Neuron 91, 777–791 (2016).

      (10) Held, R. G. et al. Synapse and Active Zone Assembly in the Absence of Presynaptic Ca(2+) Channels and Ca(2+) Entry. Neuron 107, 667-683.e9 (2020).

      (11) Chin, M. & Kaeser, P. S. The intracellular C-terminus confers compartment-specific targeting of voltage-gated calcium channels. Cell Rep 43, 114428 (2024).

      (12) Nyitrai, H., Wang, S. S. H. & Kaeser, P. S. ELKS1 Captures Rab6-Marked Vesicular Cargo in Presynaptic Nerve Terminals. Cell Rep 31, 107712 (2020).

      (13) Del Signore, S. J., Mitzner, M. G., Silveira, A. M., Fai, T. G. & Rodal, A. A. An approach for quantitative mapping of synaptic periactive zone architecture and organization. Mol Biol Cell 34, (2023).

      (14) Sweeney, S. T., Broadie, K., Keane, J., Niemann, H. & O’Kane, C. J. Targeted expression of tetanus toxin light chain in Drosophila specifically eliminates synaptic transmission and causes behavioral defects. Neuron 14, 341–351 (1995).

      (15) Kaeser, P. S. & Regehr, W. G. Molecular mechanisms for synchronous, asynchronous, and spontaneous neurotransmitter release. Annu Rev Physiol 76, 333–363 (2014).

      (16) Santos, T. C., Wierda, K., Broeke, J. H., Toonen, R. F. & Verhage, M. Early Golgi Abnormalities and Neurodegeneration upon Loss of Presynaptic Proteins Munc18-1, Syntaxin-1, or SNAP-25. Journal of Neuroscience 37, 4525–4539 (2017).

      (17) de Jong, A. P. H. et al. RIM C2B Domains Target Presynaptic Active Zone Functions to PIP2-Containing Membranes. Neuron 98, 335-349.e7 (2018).

      (18) Del Signore, S. J. et al. An autoinhibitory clamp of actin assembly constrains and directs synaptic endocytosis. Elife 10, (2021).

      (19) Imoto, Y. et al. Dynamin 1xA interacts with Endophilin A1 via its spliced long C-terminus for ultrafast endocytosis. EMBO Journal https://doi.org/10.1038/S44318-024-00145-X

      (20) Imoto, Y. et al. Dynamin is primed at endocytic sites for ultrafast endocytosis. Neuron 110, 2815-2835.e13 (2022).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their manuscript entitled "Terminal tracheal cells of Drosophila are immune privileged to maintain their Foxo-dependent structural plasticity", Bossen and colleagues determine that the terminal cells of the tracheal system differ from other larval tracheal cells in that they do not typically show an Imd-dependent immune response to fungal and viral infections. The authors reach this conclusion based on the expression of a reporter line, Drs-GFP. The authors speculate that this difference may reflect differential expression of an immune pathway component, as tracheal terminal cells (TTCs) do not respond to forced expression of PRGP-LS. The authors then go on to show that, unlike the other cells of the tracheal system, terminal cells do not express PGRP-LC as reported by a GAL4 enhancer trap. Forced expression of PGRP-LC in terminal cells resulted in reduced branching, cell damage, and features of the cell death program. These effects could be suppressed by the depletion of AP-1 or Foxo transcription factors. The authors show that Foxo plays a negative role in the branching of TTCs, with ectopic branching occurring upon RNAi (or under hypoxic conditions). The authors speculate that the immune privilege of the TTCs may have evolved to permit Foxo regulation of TTC branching.

      Strengths:

      The authors provide compelling genetic data.

      Weaknesses:

      (1) The authors state that after infection 34% of larvae were not GFP+ as defined by the detection of Drs-GFP in dorsal branches. The authors should clarify if these larvae are completely without response to infection, with no Drs-GFP in dorsal trunks and or other tracheal branches. If these larvae are entirely unresponsive, could authors indicate why this might be? Also, at this point in the manuscript, the authors are somewhat misleading regarding TTC expression of Drs-GFP - they should state at this point that there are some TTCs that do express Drs-GFP, and also should address their prior study of Drs-GFP induction which does not claim exclusion of TTC Drs-GFP expression.

      GFP– indicates the absence of detectable fluorescence in regions proximal to the TTCs (dorsal branch and fusion cells). Our analysis specifically focused on these regions and did not assess fluorescence in other parts of the tracheal system. Therefore, the reported 34% of larvae classified as GFP– does not imply a complete absence of response in these animals; rather, no fluorescence was detected within our defined region of interest. To clarify how fluorescence in TTCs was quantified, we have added a schematic (new Fig. 1F). In addition, new Fig. S1 illustrates that AMP reporter activation frequently occurs in other tissues.

      Our observations are consistent with earlier reports. In the original description of the AMP reporter lines, Tzou et al. (2000; https://doi.org/10.1016/S1074-7613(00)00072-8) reported that “only a fraction of the flies or larvae exhibited fluorescence in surface epithelia, and the proportion of GFP-expressing animals was variable from one culture vial to the next. In addition, fluorescence was rarely distributed throughout the whole tissue and was limited to restricted areas of the epithelium,” suggesting that AMP reporter activation can occur locally rather than uniformly across tissues.

      In a previous study (https://doi.org/10.1186/1471-2164-9-446), we reported that airway epithelial cells, including the finest tracheal endings on target organs, can activate drosomycin transcription following infection. However, that study focused specifically on infected larvae. Importantly, it did not quantify the frequency of reporter activation or analyze TTC-specific phenotypes. As such, those statements should not be interpreted as implying uniform or ubiquitous reporter activation across all tracheal cells.

      (2) The authors describe the terminal cell phenotype as "shrunken" but this implies loss of size or pruning, however, it is not clear whether the defects could equally be due to lack of growth or slower growth.

      We omitted the term “shrunken” in the present manuscript to avoid potential misinterpretation.

      (3) Figure 1 suggests that GFP+ dorsal branches are not uniform in their expression of Drs-GFP, it seems more patchy. The authors should define the fraction of dorsal branch cells that are Drs-GFP positive. Also, are fusion cells Drs-GFP positive?

      We included a schematic illustrating our quantification approach (new Fig. 1F). We also revised the wording to clarify that GFP<sup>+</sup> animals include fluorescence not only in the dorsal branch (DB) but also in fusion cells (FCs), i.e., structures located between the dorsal trunks and the terminal tracheal cells (TTCs). Any structure in proximity to the TTCs that shows GFP expression was scored as GFP<sup>+</sup>. In most cases, GFP expression was observed in the dorsal fusion cells.

      (4) Drs-GFP expression is largely absent from terminal cells; however, a still significant # of terminal cells show expression (8%). Authors argue that PRGP-LC expression is absent based on a GAL4 transgenic line. If this line reflects endogenous PRGP-LC expression, should there not be 8% positive TTCs? Or is the 8% Drs-GFP expression independent of the IMD receptor?

      We detected PGRP-LE expression in approximately 3% of epithelial tracheal cells that expressed Drs after infection (Fig. 3F,G). This observation suggests that Drs activation can occur through a mechanism independent of PGRP-LCx. We have incorporated this finding into both the Results and Discussion sections.

      (5) Figure 2: the authors state that TTCs are negative even with induced PRGP-LE expression - should there not be at least 8% that are positive?

      We included infection of the PGRP-LE overexpression and could see Drs-GFP expression in 3 % of the cases, which we did not see without infection.

      (6) The authors compare PRGP-LC expression to induction of cell death by expression of reaper and hid. Reaper and Hid had stronger effects and eliminated TTCs. See cleavage of caspase Dpc-1 in PRGP-LC expressing cells. Is caspase cleavage always diagnostic of apoptosis or could the weaker than rpr/hid phenotype imply a different function?

      We have included the potential non-apoptotic functions of Dcp-1 in the Discussion. The weaker phenotype observed could therefore be explained by a non-apoptotic role of Dcp-1.

      (7) Drs-GFP expression is said to be "completely" absent from tracheal terminal cells when the entire tracheal system is expressing PGRP-LE.

      We have revised the wording accordingly.

      (8) Figure 5, TRE_RFP expression, is not convincing that it is higher or in terminal cells. https://doi.org/10.7554/eLife.102369.1.sa2

      We have revised the wording in line 230.

      Reviewer #2 (Public review):

      Summary:

      In this study, Bossen et al. looked at the immune status of the tracheal terminal cells (TTCs) in Drosophila larvae. The authors propose that these cells do show PGFP-LCx expression and, hence, lack immune function. Artificial overexpression of the PGRP-LCx in the TTCs causes these cells to undergo apoptosis.

      Strengths:

      Only a few groups have tried to look at the immune status of the trachea, though we know that AMPs are expressed there after infection. This exciting study attempts to understand the differences in the tracheal cells that do not produce AMPs upon infection.

      Weaknesses:

      The reason why the TTCs have some immune privilege still needs to be completely clear. Whether the phenotype is cell autonomous or contributes to the cellular immune system is not evaluated. As we know, crystal cells also maintain oxygen levels in larvae; whether in the absence of terminal trachea, the crystal cells have any role is not explored. https://doi.org/10.7554/eLife.102369.1.sa1

      In addition to the Drs-GFP reporter line, we performed new infection experiments using additional antimicrobial peptide reporters to further support our observations. While these experiments confirm the humoral immune response, they do not address the mechanisms underlying the apparent immune privilege. Our analysis therefore focuses specifically on the humoral immune response and does not allow conclusions regarding potential contributions of the cellular immune system, including crystal cells, to maintaining oxygen levels in animals with impaired TTCs. Notably, complete loss of TTCs is lethal, as demonstrated by TTC ablation using hid;rpr expression (Fig. 4F).

      Reviewer #3 (Public review):

      Summary:

      The authors report that tracheal terminal cells (TTCs) in Drosophila do not activate innate immunity following bacterial infection. They attribute this to the lack of expression of PGRP-LCx in these cells. Forced activation of the Imd pathway in TTCs leads to cell death and a reduction in tracheal branching. The authors propose a mechanism for cell death induction via pathways involving JNK, AP-1, and foxo. They suggest that the suppression of innate immunity in TTCs may serve to maintain their plasticity, preparing them for responses to hypoxic conditions.

      Strengths:

      (1) The study addresses the understudied area of immune privilege in innate immunity, providing a potentially important example in Drosophila TTCs.

      (2) The molecular characterization of the cell death pathway induced by forced Imd activation is well-executed and provides solid mechanistic insights.

      (3) The authors draw interesting parallels between Drosophila TTCs and mammalian endothelial cells, suggesting broader implications for their findings.

      Weaknesses:

      (1) The core premise of the study - that TTCs do not activate innate immunity following bacterial infection - relies heavily on a single readout (Drs reporter). Additional markers of immune activation would strengthen this crucial claim.

      We included new experiments using additional antimicrobial peptide reporter genes that show results similar to those obtained with the Drs-GFP reporter (new Fig. 1).

      (2) The evidence for the lack of PGRP-LCx expression in TTCs is based on a single GAL4 reporter line. Given the importance of this observation to the authors' model, validation using alternative methods would be beneficial.

      Although we were not able to include alternative methods to further confirm our hypothesis, we performed additional infection experiments. Upon bacterial infection, we observed a strong increase in GFP fluorescence throughout the animal and in many other tissues, while still detecting no response in the TTCs. These results further support our hypothesis.

      (3) The phenotypes observed upon forced activation of the Imd pathway in TTCs, while intriguing, may be influenced by non-physiological levels of pathway activation. The authors should address this potential caveat and consider examining the effects of more moderate pathway activation. https://doi.org/10.7554/eLife.102369.1.sa0

      We used two independent UAS-PGRP-LCx lines located on different chromosomes. One line (III) produced a stronger phenotype than the other (II). We clarified this point in the Results section (Fig. 4C,D) and added supplementary data (new Fig. S2) showing that both lines produce comparable phenotypes when expressed using an alternative tracheal driver. The epithelial thickening observed follows the same pattern as the phenotype detected in TTCs, indicating that even moderate pathway activation leads to similar effects. However, we acknowledge that this represents ectopic pathway activation and therefore likely reflects a non-physiological level of signaling.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      My particular comments on the figures are as follows:

      (1) In Figure 2, the PGRP-LCx signal should be quantified as done for Drosomycin GFP, as shown in Figure 1.

      We agree and have added a quantification.

      (2) In Figure 2F and G are the larvae infected? If not, what happens to PGRP-LCx expression post Ecc15 infection?

      We also included infected larvae to test whether infection induces GFP expression in TTCs. However, GFP expression was never observed in TTCs, although overall fluorescence increased in other tissues.

      (3) Is the effect of overexpression of LCx exaggerated post-infection? In particular when it comes to the escape phenotype.

      We induced mild Imd pathway activation by expressing PGRP-LE using a tracheal driver active in all tracheal cells, including TTCs, for 24 hours. In addition, these larvae were infected and their sensitivity to hypoxia was assessed. Animals expressing PGRP-LE in the trachea showed increased sensitivity to hypoxia, which was further enhanced following infection.

      (4) Does overexpression of anti-apoptotic genes in TTC and PGRP-LCx rescue the TTC branching?

      This point was not addressed.

      (5) Have the authors tried to rescue the larvae with shallow food?

      This point was not addressed.

      (6) Is there any effect on the circulating hemocytes or lymph glands in the PGFRP-LCx overexpressing animals?

      This point was not addressed.

      Reviewer #3 (Recommendations for the authors):

      The authors present an intriguing model of immune privilege in Drosophila tracheal terminal cells (TTCs). This model is built upon three key pillars: (1) the absence of innate immune activation in TTCs, (2) the lack of PGRP-LCx expression in TTCs, and (3) the induction of cell death when innate immunity is activated in TTCs. However, the experimental evidence supporting each of these critical points requires substantial strengthening. The reviewer recommends the following improvements and additional experiments to address these core issues:

      (1) Innate immune activation in TTCs:

      Evaluate the expression of additional antimicrobial peptide reporters to provide a more comprehensive assessment of innate immune activation in TTCs.

      In addition to the Drs-GFP reporter line, we performed new infection experiments using other antimicrobial peptide reporters to confirm our results.

      (2) PGRP-LCx expression in TTCs:

      Validate the PGRP-LCx-GAL4 line used in the study to ensure it accurately reflects endogenous PGRP-LCx expression.

      Employ complementary techniques such as in situ hybridization and antibody staining to corroborate the absence of PGRP-LCx in TTCs.

      We also included infection experiments using PGRP-LCx-Gal4 larvae. Infection did not trigger GFP expression in TTCs. However, the overall PGRP-LCx expression pattern observed in other larval tissues supports that the results reflect endogenous PGRP-LCx expression.

      (3) Cell death induction upon immune activation in TTCs:

      Address the possibility that the observed cell death is an artifact of strong, forced Imd pathway activation. To do that,

      perform control experiments activating the Imd pathway in non-TTC tracheal cells to determine if cell death is specific to TTCs.

      Use broader tracheal drivers (e.g., ppk4-GAL4 or btl-GAL4) to activate the Imd pathway and verify if cell death is indeed restricted to TTCs.

      We included results from PGRP-LCx overexpression using the tracheal driver ppk4-Gal4 and stained for the apoptosis marker Dcp-1 (new Fig. S3). We observed increased Dcp-1 signal in dorsal trunk cells, indicating that PGRP-LCx-mediated Dcp-1 cleavage is not restricted to TTCs.

      Ideally, generate a transgenic line expressing physiological levels of PGRP-LCx in TTCs and demonstrate that bacterial infection induces cell death specifically in TTCs through the proposed pathway. The reviewer acknowledges the complexity of this experiment but believe it would significantly strengthen the authors' conclusions.

      We did not generate a new transgenic line but instead used an alternative UAS-PGRP-LCx line (II), which exhibits a milder phenotype. This has now been clarified more prominently in the Results section (Fig. 4C,D). Additionally, we performed further experiments showing an epithelial thickening phenotype whose severity depends on the UAS-PGRP-LCx line used (new Fig. S2).

      In addition to the above major points

      (4) Quantitative data presentation:

      Provide quantitative analyses for the results presented in Figures 2 and 3J-K to allow for a more rigorous evaluation of the data.

      We included a quantitative analysis of the results shown in Fig. 2 (now presented in new Fig. 3). In addition, we added quantification of fluorescence in the TTCs of infected larvae.

      (5) Alternative hypothesis:

      Consider and address an alternative explanation for the lack of innate immune activation in TTCs: the potential gradient of bacterial ligands from proximal trachea to distal TTCs. If this hypothesis is correct, one might expect to see a gradient of Drs expression correlating with the distance from the proximal trachea. Addressing this possibility would strengthen the authors' proposed model.

      We now included the following paragraph as part of the discussion section.

      “An alternative explanation for the observed lack of an immune response in TTCs could be their maximal distance from the spiracles. In this scenario, a gradient of bacterial inducers along the tracheal system might be expected, resulting in a gradual decrease in immune activation from the spiracles toward the TTCs. However, this is not what we observed. In tracheae that displayed an immune response, the response was largely homogeneous along the entire length of the tracheal system, from the spiracles to the TTCs. Only at the transition to the TTCs did the immune response drop abruptly. This observation argues against the gradient hypothesis and suggests that TTCs are specifically excluded from the immune response.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary:

      In this manuscript, the authors used a leucine/pantothenate auxotrophic strain of Mtb to screen a library of FDA-approved compounds for their antimycobacterial activity and found significant antibacterial activity of the inhibitor semapimod. In addition to alterations in pathways, including amino acid and lipid metabolism and transcriptional machinery, the authors demonstrate that semapimod treatment targets leucine uptake in Mtb. The work presents an interesting connection between nutrient uptake and cell wall composition in mycobacteria.

      Strengths:

      (1a) The link between the leucine uptake pathway and PDIM is interesting but has not been characterized mechanistically. The authors discuss that PDIM presents a barrier to the uptake of nutrients and shows binding of the drug with PpsB. However it is unclear why only the leucine uptake pathway was affected.

      We observe interference of L-leucine, but not of pantothenate, uptake in mc2 6206 strain upon semapimod treatment. At present, we do not have any clue whether PDIM presents a barrier exclusively to the uptake of L-leucine. Further studies may shed a light on underlying mechanism(s) by which L-leucine uptake is modulated by this small molecule.

      (1b) We still do not know what PpsB actually does for amino acid uptake - is it a transporter?

      By BLI-Octet we do not find any interaction between L-leucine and PpsB. Therefore, we doubt that PpsB is a transporter of L-leucine.

      (1c) Does semapimod binding affect its activity?

      Our study suggests that semapimod treatment alters PDIM architecture which becomes restrictive to L-leucine. However, at present the exact mechanism is not clear. Further studies are required to thoroughly examine the effect of semapimod on Mtb PpsB activity and alterations in PDIM by mass spectrometry.

      (1d) Does the auxotrophic Mtb have lower PDIM levels compared to wild-type Mtb?

      As per the published report by Mulholland et al, and by vancomycin susceptibility phenotype in our study, both the strains appear to have comparable PDIM levels.

      (2) The authors show an interesting result where they observed antibacterial activity of semapimod against H37Rv only in vivo and not in vitro. Why do the authors think this is the basis of this observation? It is possible semapimod has an immunomodulatory effect on the host since leucine is an essential amino acid in mice. The authors could check pro-inflammatory cytokine levels in infected mouse lungs with and without drug treatment.

      Semapimod inhibits production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6, which would indeed help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth.

      (3) The authors show that the semapimod-resistant auxotroph lacks PDIM. The conclusions would be further strengthened by including validations using PDIM mutants, including del-ppsB Mtb and other genes of the PDIM locus, whether in vivo this mutant would be more susceptible (or resistant) to semapimod treatment.

      PDIM is a virulence factor, and plays an important role in the intracellular survival of the TB pathogen. Mtb strains lacking PDIM are expected to show attenuated growth during infection, even without semapimod treatment. In such a case, it might be difficult to draw any conclusions about the effect of semapimod against PDIM(-) strains in vivo.

      (4) Prolonged subculturing can introduce mutations in PDIM, which can be overcome by supplementing with propionate (Mullholland et al, Nat Microbiol, 2024). Did the authors also supplement their cultures with propionate? It would be interesting to see what mutations would result in Semr strains with propionate supplementation along with prolonged semapimod treatment. 

      Considering the fact that extensive subculturing may result in loss of PDIM, we avoided prolonged subculturing of bacteria. As presented in Fig. 6b, the WT bacteria retain PDIM. While performing the initial screening of drugs, we did not anticipate such phenotype, and hence bacteria were cultured in regular 7H9-OADS medium without propionate supplementation.

      A comprehensive future study would help examining the effect of propionate on generation of semapimod resistant mutants in Mtb mc2 6206.

      Weaknesses:

      I have summarized the limitations above in my comments. Overall, it would be helpful to provide more mechanistic details to study the connection between leucine uptake and PDIM.

      Reviewer #2 (Public review): 

      Summary

      This important study uncovers a novel mechanism for L-leucine uptake by M. tuberculosis and shows that targeting this pathway with 'Semapimod' interferes with bacterial metabolism and virulence. These results identify the leucine uptake pathway as a potential target to design new anti-tubercular therapy. 

      Strengths

      The authors took numerous approaches to prove that L-leucine uptake of M. tuberculosis is an important physiological phenomenon and may be effectively targeted by 'Semapimod'. This study utilizes a series of experiments using a broad set of tools to justify how the leucine uptake pathway of M. tuberculosis may be targeted to design new anti-tubercular therapy.

      Weaknesses

      (1) The study does not explain how L-leucine is taken up by M. tuberculosis, leaving the mechanism unclear. Even though 'Semapimod' binds to the PpsB protein, the relevant connection between changes in PDIM and amino acid transport remains incomplete.

      While Leucine uptake involves specific transporters in other bacteria, such transport system is not known in Mtb. By screening small molecule inhibitors, we came across a molecule, semapimod, which selectively kills the leucine auxotroph (mc2 6206), but not the WT Mtb. To understand the underlying mechanism of differential susceptibility of the WT and auxotrophic strains to this molecule, we evaluated the effect of restoration of leuCD and panCD expression on susceptibility of the auxotrophic strain to semapimod. Interestingly, our results demonstrated that upon endogenous expression of leuCD genes, mc2 6206 strain becomes resistant to killing by semapimod. In contrast, no effect of panCD expression was observed on semapimod susceptibility of mc2 6206. These findings were further substantiated by gene expression analysis of semapimod treated mc2 6206, which exhibits differential regulation of a set of genes that are altered upon leucine depletion in Mtb as well as in other bacteria. Overall results thus provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph.

      To further gain mechanistic insights into the effect of semapimod on leucine uptake in Mtb, we generated the semapimod resistant strain which exhibits point mutation in 4 genes including ppsB. Interestingly, overexpression of wild-type ppsB, but not of other genes, restored susceptibility of the resistant bacteria to semapimod. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As mentioned above, we anticipate that semapimod treatment brings about certain modifications in PDIM which becomes more restrictive to L-leucine. A comprehensive future study will be helpful to examine the effect of semapimod on Mtb physiology.

      (2) Also, the fact that the drug does not function on WT bacteria makes it a weak candidate to consider its usefulness for a therapeutic option.

      We agree that semapimod is not an appropriate drug candidate against TB owing to its inhibitory effect on production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6 that help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth. Therefore targeting L-leucine uptake can be a novel therapeutic strategy against TB.

      Reviewer #3 (Public review): 

      (1) Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly, however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival, and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium depleted of leucine. 

      The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.

      As mentioned above, overall results from this study provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As hitherto mentioned, it appears that semapimod treatment brings about certain modifications in PDIM which becomes restrictive to L-leucine. Future studies are required to gain detailed mechanistic insights into the effect of semapimod on Mtb physiology.

      (2) Since leucine uptake and PDIM synthesis are important concepts of the manuscript, experiments would benefit from exploring other BCAAs to know if the phenotypes observed are specific to leucine, and adding additional strains to the 2D TLC experiments to provide confidence in the absence of the PDIM band.

      We thank the peer reviewer for this suggestion. We would be happy to analyse the effect of semapimod on the level of other amino acids including BCAA by mass spectrometry.

      (3) The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or disrupted cell wall (PDIM synthesis), testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. H37Rv is still able to synthesize endogenous leucine and is able to circumvent the effect of semapimod.

      We thank the peer reviewer for this suggestion. We would explore the possibility of analysing the effect of increasing concentrations of BCAAs on mc2 6206 susceptibility to semapimod.

      Recommendations for the authors:

      (1A) Intracellular leucine can decrease from:

      inhibition of transport/uptake via semapimod as the authors claim or

      decreased uptake/requirement of many metabolites due to cells entering static growth arrest from challenge by semapimod

      To rule out the growth-inhibitory effect of semapimod on L-leucine uptake, we estimated intracellular L-leucine in Mtb after brief exposure of 24 hours to 50ng/ml semapimod (kindly refer Materials and Methods). We confirmed that 24 hours of treatment with 50ng/ml semapimod does not cause cells entering static growth arrest.

      (1B) increased consumption/utilization of leucine for some programmed response to semapimod challenge

      Our results show reduced expression of genes involved in leucine catabolism such as accD1, bkdA and bkdB in semapimod-treated cells, and thus the above hypothesis seems unlikely.

      (1C) Additional metabolites should be measured to determine the specificity of the semapimod challenge.

      As mentioned below, we measured intracellular valine in the semapimod-treated Mtb 6206 by LC-MS/MS, which shows no change in its level. These observations thus corroborate a specific effect of semapimod on L-leucine level in the cell.

      (2) The effect of Semapimod on L-leucine uptake is largely based on indirect evidence, without showing reduced transport of the amino acid. Gene expression data is not enough to prove that the amino acid transport is blocked. More compelling evidence is required to confirm this mechanism.

      The authors could perform leucine uptake assays to directly confirm the functioning of Semapimod, inhibiting L-leucine transport. Another possibility would be to try out measuring intra-bacterial leucine levels for drug-treated versus untreated M. tuberculosis strains.

      Data presented in the Fig. 3b shows lesser intracellular L-leucine upon semapimod treatment; in contrast, Sem<sup>R</sup> strain exhibits ~3-fold more intracellular L-leucine, as estimated by mass spectrometry (kindly refer our response to comment #6 below). Together, these observations indicate an inhibitory effect of semapimod on L-leucine uptake by the auxotroph.

      (3) The authors show that the overexpression of leuC-leuD restores Semapimod resistance in the auxotroph (Figs. 3C-3E). Is it possible to examine Semapimod resistance of WT-H37Rv or the complemented mutant grown in leucine-limiting conditions? This sort of evidence will be more direct on the specific drug-target beyond the auxotroph (mc<sup>2</sup> 6206).

      Because endogenous L-leucine synthesis pathway is functional in WT-H37Rv, as well as complemented auxotrophic strain, leucine-limiting conditions are unexpected to yield any effect on susceptibility to semapimod.

      Author response image 1.

      (4) Biolayer Interferometry (BLI) shows Semapimod binds to PpsB (Fig. 6); however, there is no clear evidence that it disrupts PDIM synthesis. More direct evidence would be to study the effect of Semapimod on a ppsB mutant (may be a knock-down). This would prove the specificity of Semapimod for PpsB. Likewise, it would be worth looking into the effect of Semapimod using mutant M. tuberculosis defective for PDIM synthesis.

      As recommended by the peer reviewer, we created the ppsB knockdown strain in the Mtb mc2 6206 by CRISPRi and examined its vulnerability to semapimod treatment. As can be seen in the Author response image 1, ppsB KD strain shows lesser susceptibility to semapimod when compared with the pDcas9-control strain which exhibits significant growth inhibition on the 7H11-OADS-PL agar plate containing 200nM semapimod.

      (5) Metabolomics experiments would benefit from including other control BCAAs like isoleucine and valine to determine if decreased intracellular levels of leucine are specific to semapimod or a general consequence of growth arrest from an antimicrobial agent.

      As suggested by the reviewer, we measured intracellular valine as well as proline levels in the semapimod-treated Mtb 6206 by LC-MS/MS; data presented in the supplimentry figure 5 clearly show no change in their levels upon semapimod treatment.

      (5) Figure 3c, pyrazinamide susceptibility assay could be included on the panCD strain to ensure complementation leads to functional panCD. Parent strain would be resistant to PZA, complement strain would be susceptible. (doi: 10.1038/s41467-019-14238-3).

      The wild-type Mtb 6206 is unable to grow in the absence of pantothenate. We verified resumption of growth of Mtb 6206 in 7H9-OADS-L-leucine medium lacking pantothenate upon PanCD overexpression, which provides more direct evidence of the expression of functional copies of panCD genes.

      (6) does the Sem-R mutant have increased levels of leucine?

      As can be seen in the supplimentry figure 7, Sem<sup>R</sup> strain shows ~3.0 fold increase in the intracellular L-leucine level when compared with the WT strain. In contrast, a comparable level of another BCAA– valine, is observed in both the strains

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to elucidate the role of RNA as a context-dependent modulator of liquid-liquid phase separation (LLPS), aggregation, and bioactivity of the amyloidogenic peptides PSMα3 and LL-37, motivated by their structural and functional similarities.

      Strengths:

      The authors combine extensive biophysical characterization with cell-based assays to investigate how RNA differentially regulates peptide aggregation states and associated cytotoxic and antimicrobial functions.

      Weaknesses:

      While the study addresses an interesting and timely question with potentially broad implications for host-pathogen interactions and amyloid biology, several aspects of the experimental design and data analysis require further clarification and strengthening.

      Major Comments:

      (1) In Figure 1A, the author showed "stronger binding affinity" based on shifts at lower peptide concentrations, but no quantitative binding parameters (e.g., apparent Kd, fraction bound, or densitometric analysis) are presented. This claim would be better supported by including: (i) A binding curve with quantification of free vs bound RNA band intensities (ii) Replicates and error estimates (mean {plus minus} SD).

      We thank the reviewer for this suggestion. To quantitatively support the binding differences observed in Figure 1A, we have now performed densitometric analysis of the EMSA data and included the results in Figure S1. The analysis showed that the Kd for PSMα3 binding to polyAU and polyA RNA is in the same order of magnitude but lower for the polyAU, indicating a stronger binding. A description was added to the results in lines 137-145 of the revised version.

      (2) The authors report droplet formation at low RNA (50 ng/µL) but protein aggregation at high RNA (400 ng/µL) through fluorescence microscopy. However, no intermediate RNA concentrations (e.g., 100-300 ng/µL) are tested or discussed, leaving a critical gap in understanding the full phase diagram and transition mechanisms.

      Our initial choice of 50 ng/µL (low RNA) and 400 ng/µL (high RNA) was guided by a broader RNA titration performed by turbidity measurements across 0, 10, 20, 50, 100, 200, and 400 ng/µL (Figure S2 in the revised version). In this screen, turbidity increased up to 50 ng/µL and then decreased dose-dependently from 100–400 ng/µL. We interpret this non-monotonic behavior as consistent with a transition from a droplet rich regime (maximal light scattering at intermediate dense-phase volume) toward conditions where assemblies become larger and/or more compact and sediment out of the optical path. This is described in lines 158-161 of the revised version.

      Of note, additional intermediate RNA conditions (100 and 200 ng/µL) are included in Figure S14 (of the revised version). While these experiments were performed under the heat-shock perturbation, they nevertheless support the central point that RNA tunes assembly state across intermediate concentrations rather than producing a binary low/high outcome.

      Importantly, we agree with the reviewer that a full phase diagram would be the most rigorous way to define the transition mechanism. However, establishing csat and constructing a complete phase diagram would require systematic measurements of dilute-phase concentrations (e.g., centrifugation/quantification or fluorescence calibration), controlled ionic strength titrations, and time-resolved mapping, which is beyond the scope of the present study. We have therefore revised the text to avoid implying that we provide a complete phase diagram. Instead, we frame our results as a qualitative with multi-assay characterization showing that RNA concentration drives a shift from liquid-like condensates (at low RNA) toward solid-like assemblies (at high RNA), with an intermediate regime suggested by the turbidity transition and supported by additional imaging under stress. Finally, to address the “critical gap” concern directly, we add a sentence (lines 239-241) stating that: “Future work will be required to quantitatively define the phase boundaries and delineate the dominant mechanisms, such as sedimentation, dissolution, or coarsening/aging, across intermediate RNA concentrations”.

      (3) Additionally, the behaviour of PSMα3 in the absence of RNA under LLPS conditions is not shown. Without protein-only data, it is difficult to assess if droplets are RNA-induced or if protein has a weak baseline LLPS that RNA tunes. The saturation concentration (csat) for PSMα3 phase separation, either in the absence or presence of RNA, should be reported.

      In response to the reviewer’s request, we have added Figure 2F, which shows PSMα3 alone in the absence of RNA under the same conditions. PSMα3 does not form droplets in this condition, indicating that condensate formation is RNA-dependent in the tested conditions. This is referred to in the text in lines 190-193 of the revised version. Please see our response about determining the csat in the response to the previous comment.

      (4) For a convincing LLPS claim, it is important to show: Quantitative FRAP curves (mobile fraction and half-time of recovery) rather than only microscopy images and qualitative statements.

      We have included quantitative FRAP analysis in Figure S4 of the revised version, showing normalized recovery curves along with extracted mobile fractions and half-times of recovery (t₁/₂). These quantitative measurements support the dynamic nature of the PSMα3–RNA. This is referred to in the text in lines 179-184 of the revised version.

      (5) The manuscript highly relies on fluorescence microscopy to show colocalization. However, the colocalization is presented in a qualitative manner only. The manuscript would benefit from the inclusion of quantitative metrics (e.g., Pearson's correlation coefficient, Manders' overlap coefficients, or intensity correlation analysis).

      In response, we have added quantitative colocalization analysis to the revised manuscript. Specifically, we now report Pearson’s correlation coefficients and Manders’ overlap coefficients for the dual-channel fluorescence microscopy datasets in Figure S5 of the revised version. These metrics provide an objective measure of co-distribution and complement the qualitative imaging.

      The analysis supports that at low RNA concentrations (droplet/condensate conditions), PSMα3 and RNA show strong colocalization, consistent with RNA being incorporated within, or closely associated with, the peptide-rich phase. In contrast, at high RNA concentrations, where the assemblies are more solid-like/amyloid-positive, the quantitative coefficients decrease, consistent with reduced overlap and an apparent spatial demixing in which RNA becomes partially excluded from the peptide-rich structures. This is referred to in the text in lines 194-203 of the revised version.

      (6) In Figures 3 B and 3C, the contrast between "no AT630 at 30 min, strong at 2 h" (50 ng/μL) and "strong at 30 min" (400 ng/μL) is compelling, but a simple quantification (e.g., mean fluorescence intensity per area) would greatly increase rigor.

      We have included quantitative analysis of AmyTracker630 fluorescence intensity in Figure S6 of the revised version, reporting the mean fluorescence intensity per area for the indicated conditions and time points. This quantification supports the qualitative differences observed in Figures 3B and 3C. This is now referred to in the text in lines 233-236 of the revised version.

      (7) In Figure S3 ssCD data, if possible, indicate whether the α-helical signal increases with RNA concentration or shows a non-linear dependence, which might link to the LLPS vs solid aggregate regimes.

      The ssCD spectra displayed in Figure S7 in the revised version (corresponding to Figure S3 in the original submission) show that the α-helical signature of PSMα3 is markedly enhanced in the presence of RNA compared to peptide alone, as evidenced by increased signal intensity, deeper minima, and more pronounced spectral features characteristic of α-helical structure. Importantly, this enhancement is more pronounced at 400 ng/µL Poly(AU) RNA than at 50 ng/µL, particularly after 2 hours of coincubation, indicating that RNA concentration influences the stabilization of α-helical assemblies. This is now more specifically detailed in the text in lines 258-263 of the revised version.

      We note that solid-state CD does not allow direct quantitative deconvolution of secondary structure content (e.g., % helix) in the same manner as solution CD, due to sample anisotropy, scattering, and orientation effects inherent to dried or aggregated films. Consequently, our interpretation is qualitative rather than strictly quantitative. The ssCD data therefore suggest a non-linear dependence on RNA concentration, rather than a simple linear dose–response. This is also expected considering that phase transition, suggested by the other findings, is intrinsically non-linear.

      (8) In Figure 5B, FRAP recovery in dying cells may reflect artifactual mobility rather than biological relevance. Additionally, the absence of quantification data limits interpretation; providing recovery curves would clarify relevance.”

      We added quantitative FRAP analysis of the effect on PSMα3 within HeLa cells, shown in Figure S8 of the revised version. Compared to PSMα3 assemblies in vitro, nucleolar PSMα3 exhibits slower fluorescence recovery and a reduced mobile fraction. The nucleolus represents a highly crowded, RNA-rich cellular environment, which is expected to impose additional constraints on molecular mobility and likely contributes to the slower recovery kinetics observed in cells. This is now more specifically detailed in the text in lines 324-333 and discussed in lines 597-607 of the revised version.

      (9) The narrative conflates cytotoxicity endpoints (membrane damage, PI staining, aggregates) with localization data (nucleolar foci), creating ambiguity about whether nucleolar targeting drives toxicity or is a consequence of cell death. Separating toxicity assessment from localization analysis, or clearly demonstrating that nucleolar accumulation precedes cytotoxicity, would resolve this ambiguity.

      We thank the reviewer for raising this important point. We agree that, in the current dataset, cytotoxicity readouts (membrane damage, PI staining, aggregate formation) and subcellular localization (nucleolar accumulation) are observed in close temporal proximity, which limits our ability to unambiguously assign causality. In the experiments presented here, PSMα3 was applied at concentrations known to induce rapid membrane disruption and cytotoxicity in HeLa cells. Under these conditions, PSMα3 accumulates on cellular membranes and penetrates into the cell and nucleus on very short timescales (seconds to minutes), likely preceding the temporal resolution accessible by standard live-cell fluorescence microscopy. As a result, nucleolar accumulation and cytotoxic endpoints are detected essentially concurrently, precluding a definitive determination of whether nucleolar association actively drives toxicity or occurs as a downstream consequence of membrane permeabilization and cell damage.

      We therefore emphasize that, in this study, nucleolar localization is presented as a phenomenological observation consistent with RNA-rich compartment association, rather than as a demonstrated causal mechanism of cytotoxicity. We have revised the Discussion (lines 597-607) to clarify this distinction and to avoid implying that nucleolar targeting is the primary driver of cell death.

      We agree that resolving this ambiguity would require systematic time-resolved and concentration-dependent experiments, including analysis at sub-toxic PSMα3 concentrations below the membrane-disruptive threshold, combined with orthogonal imaging approaches. Such experiments are planned for future work but are beyond the scope of the present study.

      (10) In Figure 8, to strengthen the LLPS assignment for LL-37, additional evidence, such as FRAP analysis or observation of droplet fusion events, would be valuable. This is particularly relevant given that the heat shock conditions (65 °C for 15 minutes) could potentially induce partial denaturation or nonspecific coacervation.

      In response to this comment, we have added FRAP analysis of LL-37 assemblies in the revised manuscript (Figure S12), including representative images and corresponding fluorescence recovery curves. The FRAP measurements show minimal fluorescence recovery over the acquisition window, indicating that the LL-37–RNA assemblies formed under these conditions are largely immobile and solid-like, rather than liquid-like droplets. This is now referred to in the text in lines 458-462 of the revised version.

      Reviewer #2 (Public review):

      In this paper, Rayan et al. report that RNA influences cytotoxic activity of the staphylococcal secreted peptide cytolysin PSMalpha3 versus human cells and E. coli by impacting its aggregation. The authors used sophisticated methods of structural analysis and described the associated liquid-liquid phase separation. They also compare the influence of RNA on the aggregation and activity of LL-37, which shows differences from that on PSMalpha3. 

      Strengths:

      That RNA impacts PSM cytotoxicity when co-incubated in vitro becomes clear. 

      Weaknesses:

      I have two major and fundamental problems with this study:

      (1) The premise, as stated in the introduction and elsewhere, that PSMalpha3 amyloids are biologically functional, is highly debatable and has never been conclusively substantiated. The property that matters most for the present study, cytotoxicity, is generally attributed to PSM monomers, not amyloids. The likely erroneous notion that PSM amyloids are the predominant cytotoxic form is derived from an earlier study by the authors that has described a specific amyloid structure of aggregated PSMalpha3. Other authors have later produced evidence that, quite unsurprisingly, indicated that aggregation into amyloids decreases, rather than increases, PSM cytotoxicity. Unfortunately, yet other groups have, in the meantime, published in-vitro studies on "functional amyloids" by PSMs without critically challenging the concept of PSM amyloid "functionality". Of note, the authors' own data in the present study, which show strongly decreased cytotoxicity of PSMalpha3 after prolonged incubation, are in agreement with monomer-associated cytotoxicity as they can be easily explained by the removal of biologically active monomers from the solution.

      We thank the reviewer for this important critique and agree that direct cytotoxicity is most plausibly mediated by soluble PSM species, while extensive fibrillation generally reduces toxicity by depleting these forms, a conclusion supported by our data and by other studies (e.g., Zheng et al 2018 and Yao et al 2019). We do not propose mature amyloid fibrils as the primary toxic entities. Rather, we use the term functional amyloid in a regulatory sense, consistent with other biological amyloids whose fibrillar states modulate activity (e.g., hormone storage amyloids or RNA-binding proteins).

      In line with emerging findings, we interpret PSMα3 toxicity as arising from a dynamic assembly process rather than from a single static molecular species. We previously showed that PSMα3 forms cross-α fibrils that are thermodynamically and mechanically less stable than cross-β amyloids and readily disassemble upon heat stress, fully restoring cytotoxic activity (Rayan et al., 2023). This behavior contrasts with PSMα1, which forms highly stable cross-β fibrils that do not recover activity after heat shock, suggesting that the limited thermostability of PSMα3 is an evolved feature enabling reversible switching between inactive (stored) and active states.

      Consistent with this view, both PSMα1 and PSMα3 are cytotoxic in their soluble states, yet mutants unable to fibrillate lose activity, indicating that fibrillation is required but not itself the toxic end state (Tayeb-Fligelman et al., 2017, 2020; Malishev et al., 2018). Our other studies further show that cytotoxicity toward human cells correlates with inherent or lipid-induced α-helical assemblies, rather than with inert β-sheet amyloids (RagonisBachar et al., 2022, 2026; Salinas 2020, Bücker 2022). Together, these findings support a model in which membrane-associated, dynamic α-helical assembly, which requires continuous exchange between soluble species and growing fibrils, drives membrane disruption, potentially through lipid recruitment or extraction, analogous to mechanisms proposed for human amyloids such as islet amyloid polypeptide (Sparr et al., 2004).

      In the present study, we further show that RNA reshapes this dynamic landscape: while PSMα3 alone progressively loses activity upon incubation, co-incubation with RNA preserves cytotoxicity by stabilizing bioactive polymorphs and condensate-like states, whereas high RNA concentrations promote solid aggregation but nevertheless preserve activity. Thus, aggregation is neither inherently functional nor toxic, but context dependent and environmentally regulated. Taken together, our data support a model in which PSMα3 amyloids act as a dynamic reservoir, enabling S. aureus to tune virulence by reversibly shifting between dormant and active states in response to environmental cues such as heat or RNA.

      This is now discussed in lines 56-76 and 523-553 of the revised version.

      (2) That RNA may interfere with PSM aggregation and influence activity is not very surprising, given that PSM attachment to nucleic acids - while not studied in as much detail as here - has been described. Importantly, it does not become clear whether this effect has biologically significant consequences beyond influencing, again not surprisingly, cytotoxicity in vitro. The authors do show in nice microscopic analyses that labeled PSMalpha3 attaches to nuclei when incubated with HeLa cells. However, given that the cells are killed rapidly by membrane perturbation by the applied PSM concentrations, it remains unclear and untested whether the attachment to nucleic acids in dying cells makes any contribution to PSM-induced cell death or has any other biological significance.

      We thank the reviewer for this important point and agree that PSM–nucleic acid interactions are not unexpected and that our data do not support a direct intracellular role for RNA binding in mediating cytotoxicity. Accordingly, we do not propose nucleolar or nuclear association of PSMα3 as a causal mechanism of cell death. At the concentrations used, PSMα3 induces rapid membrane disruption, and nucleic acid association is observed along with membrane attachment, precluding conclusions about intracellular function. This limitation is now explicitly clarified in the revised manuscript. The biological significance of our findings lies instead in extracellular and environmental contexts, where PSMα3 encounters abundant nucleic acids, such as RNA or DNA released from damaged host cells or present in biofilms as now addressed in lines 622631. Our data show that RNA modulates PSMα3 aggregation trajectories, shifting the balance between liquid-like condensates and solid aggregates, and thereby regulates the persistence and timing of cytotoxic activity. In this framework, RNA acts as a context dependent regulator of virulence, rather than as an intracellular cytotoxic cofactor, an aspect which would be studied in depth in future work. This is now addressed in the text in lines 597-607 of the revised version.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to investigate the role of RNA in modulating both virulent amyloid and host-defense peptides, with the objective of understanding their self-assembly mechanisms, morphological features, and aggregation pathways. 

      Strengths:

      The overall content is well-structured with a logical flow of ideas that effectively conveys the research objectives.

      Weaknesses:

      (1) Figure 2 displays representative FRAP images demonstrating fluorescence recovery within seconds. To gain a more comprehensive understanding of how recovery after photobleaching varies under different conditions, it is recommended to supplement these images with corresponding quantitative fluorescence recovery curves for analysis.

      In response to this comment, we have supplemented the representative FRAP images with quantitative fluorescence recovery curves, reporting normalized recovery kinetics for the indicated conditions. These data are now provided in Figure S4 of the revised manuscript, allowing direct comparison of recovery behavior across conditions (shown by microscopy in Figure 2). In addition, we have included quantitative FRAP analyses for the cellular imaging shown in Figure 5 (presented in Figure S8) and for LL-37 assemblies formed under heat-shock conditions (Figure S12). Together, these additions provide a quantitative framework for interpreting the FRAP results and strengthen the distinction between liquid-like and solid-like assembly states.

      (2) Ostwald ripening typically leads to the shrinkage or even disappearance of smaller droplets, accompanied by the further growth of large droplets. However, the droplet size in Figure 2D decreases significantly after 2 h of incubation. This observation prompts the question, what is the driving force underlying RNA-regulated phase separation and phase transition?”

      We thank the reviewer for this observation. Across multiple samples, we consistently observe a coexistence of small droplets and larger aggregates, rather than systematic growth of larger droplets at the expense of smaller ones or a uniform decrease in droplet size. In addition, the timescales examined do not allow us to reliably assess whether diffusion-driven droplet coalescence is fast enough to draw firm conclusions about droplet size evolution. This is now addressed in the text in lines 181-184 of the revised version.

      A decrease in droplet size over time is nevertheless observed in some instances and is more consistent with a time-dependent conversion of initially liquid-like condensates into more solid-like assemblies, which would reduce molecular mobility and suppress droplet coalescence. In parallel, progressive fibril formation may act as a sink for soluble peptide, leading to partial dissolution or shrinkage of less mature condensates. Together, these observations are consistent with a non-equilibrium aging process, in which RNAregulated assemblies evolve from dynamic condensates toward more solid structures rather than following equilibrium Ostwald ripening.

      (3) The manuscript aims to study the role of RNA in modulating PSMα3 aggregation by using solution-state NMR to obtain residue-specific structural information. The current NMR data, as described in the method and figure captions, were recorded in the absence of RNA. Whether RNA binding induces conformational changes of PSMα3, and how these changes alter the NMR spectra? Also, the sequential NOE walk between neighboring residues can be annotated on the spectrum for clarity.

      The solution-state NMR experiments were performed specifically to characterize the potential binding of EGCG to PSMα3. Due to the strong tendency of PSMα3 to undergo rapid aggregation and line broadening upon RNA addition, solution state NMR spectra in the presence of RNA could not be obtained at sufficient quality for residue-specific analysis. As suggested, we have updated and annotated the sequential NOE walk between neighboring residues on the relevant NOESY spectra to improve clarity.

      (4) The authors claim that LL-37 shares functional, sequence, and structural similarities with PSMα3. However, no droplet formation was observed of LL-37 in the presence of RNA only. The authors then applied thermal stress to induce phase separation of LL-37. What are the main factors contributing to the different phase behaviors exhibited by LL37 and PSMα3? What are the differences in the conformation of amyloid aggregates and the kinetics of aggregation between the condensation-induced aggregation in the presence of RNA and the conventional nucleation-elongation process in the absence of RNA for these two proteins?

      We appreciate this important question and have clarified both the basis of the comparison and the origin of the divergent phase behaviors of LL-37 and PSMα3. While PSMα3 and LL-37 share key properties as short, cationic, amphipathic α-helical peptides that self-assemble and interact with nucleic acids, they differ fundamentally in their assembly architectures. PSMα3 is an amyloidogenic peptide that forms cross-α amyloid fibrils, in which α-helices stack perpendicular to the fibril axis. In contrast, LL-37 can form fibrillar or sheet-like assemblies (observed in cryo grids), but these lack canonical amyloid features without clear cross-α or cross-β amyloid order, as so far observed by crystal structures. This is now clarified in different parts of the text of the revised version. Thus, the comparison between the two peptides is functional and physicochemical rather than implying identical amyloid mechanisms. These structural differences likely underlie their distinct phase behaviors.

      Because LL-37 does not follow a classical amyloid nucleation–elongation pathway, and high-resolution structural information (e.g., cryo-EM) is currently lacking, partly due to its sheet-like, non-twisted morphology (unpublished results), it is not possible to directly compare aggregation kinetics or nucleation mechanisms between LL-37 and PSMα3. It is possible that amyloidogenic systems such as PSMα3 exhibit greater flexibility in prefibrillar and fibrillar polymorphism, enabling RNA-regulated phase behavior, whereas non amyloid assemblies such as LL-37 are more prone to stress-induced solid aggregation. We note that this interpretation is necessarily tentative and does not imply a general rule, but rather reflects differences evident in the present system. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor Comments:

      (1) In the abstract, replacing the word "overriding" with "counteracting" may provide a scientifically neutral tone.

      In the course of revision, the abstract was substantially rewritten to more precisely convey the mechanistic framework and key conclusions of the study. As part of this rewrite, the term "overriding" was removed and the language throughout was revised to adopt a more scientifically neutral tone, consistent with the reviewer's suggestion.

      (2) In abstract, the final sentence is ambitious but heavy. It may benefit from being split into two shorter sentences, for example:

      "These findings establish RNA as a potent, context-dependent modulator of both virulent amyloids and host-defense peptides. They further reveal phase transitions as tunable regulators of peptide activity and potential therapeutic targets across infectious and neurodegenerative diseases."

      As part of the broader abstract revision, the final sentence was restructured and the abstract as a whole was rewritten to improve clarity and readability, in the spirit of the reviewer's recommendation.

      (3) In the Introduction section,

      The phenol-soluble modulins (PSMs) produced by Staphylococci contain amyloid-forming short peptides which play multiple functional roles...", consider "Staphylococcal phenolsoluble modulins (PSMs) are short, amyloidogenic peptides that perform multiple roles central to pathogenesis....

      In accordance with the suggestion, the sentence has been revised.

      (4) To improve narrative flow in the final paragraph of the Introduction, a short bridging sentence could be added, such as:

      "Given these nucleic acid interactions, we next examined whether RNA can drive phase separation or structural reorganization of these amyloidogenic peptides."

      We thank the reviewer for this helpful suggestion. It provided an opportunity to clarify an important distinction between the two peptides studied. While LL-37 can self-assemble into higher-order α-helical structures, it is not amyloidogenic, in contrast to PSMα3. We therefore revised the bridging sentence in the final paragraph of the Introduction to read: “Given their shared cationic, amphipathic α-helical character, but distinct amyloidogenic properties, we sought to examine whether RNA differentially influences the assembly landscapes and bioactivity of PSMα3 and LL-37. “

      (5) The rationale for selecting Poly(A) and Poly(AU) would benefit from further clarification. It would be helpful to specify whether these RNAs are intended to model particular host or bacterial RNA species, such as AU-rich elements, rRNA-like sequences, or mRNA-like contexts.

      Poly(A) and Poly(AU) RNAs were selected as simplified, well-defined model RNAs to probe general peptide–RNA interactions in an unbiased manner, as no prior information was available regarding whether such interactions occur or which specific RNA species might be involved. This rationale is now clarified in the revised text (lines 128–131).

      These RNAs are not intended to represent a single biological transcript, but rather generic RNA features relevant to both host and bacterial contexts, including single-stranded homopolymeric regions and AU-rich elements commonly found in mRNAs and stress srelated RNAs. The use of such reductionist RNA models to study RNA–protein interactions, phase behavior, and RNA-modulated aggregation is well established. We nevertheless agree that RNA sequence and structure may influence peptide assembly and activity, and future studies will address sequence-specific and biologically derived RNAs.

      (6) In Figure 1A, essential EMSA controls- RNA alone, peptide alone, and a nonspecific peptide or PSMα3 should be included to distinguish specific complexes from artifacts, even if presented in the supplementary information. In addition, a competition assay using unlabeled RNA would help confirm binding specificity and rule out predominantly nonspecific electrostatic interactions; these data could also be reported in the supplementary figures.

      An RNA-alone control is already included in Figure 1A of the revised version. The first lane (“0 µM”) shows free Poly(A) or Poly(AU) RNA in the absence of peptide and serves as the negative control against which PSMα3-induced mobility shifts are evaluated. A peptide-alone EMSA cannot be performed, as PSMα3 is highly cationic and does not migrate into the gel in the absence of RNA; moreover, EMSA in this format reports on RNA mobility rather than peptide migration.

      With respect to binding specificity, we compared Poly(A) and Poly(AU) RNAs and observed distinct binding behaviors, which would not be expected for purely nonspecific electrostatic interactions. In addition, the extracted Hill coefficients (>1) are consistent with cooperative binding, further arguing against simple charge-driven association. Finally, the RNA-dependent association of PSMα3 is independently supported by fluorescence microscopy and quantitative colocalization analyses, which corroborate the EMSA results. Together, these orthogonal approaches support the relevance of the observed peptide–RNA interactions.

      (7) In Figure 1B, there is a time mismatch between EMSA (30 minutes) and TEM (2 hours). If aggregation progresses over time, the EMSA pattern at 2 hours may differ. This point could be acknowledged or experimentally addressed, as RNA-peptide assemblies may evolve from liquid-like condensates to more solid aggregates.

      The EMSA and TEM experiments were intentionally performed at different time points to capture distinct stages of the PSMα3–RNA assembly process. The EMSA assay (30 minutes) was designed to probe early RNA–peptide complex formation and binding interactions, before extensive higher-order aggregation occurs. At this stage, we aim to detect mobility shifts reflecting complex formation rather than mature assemblies. In contrast, TEM was performed after 2 hours to visualize later-stage structural outcomes, including fibrillation and morphological reorganization. As aggregation progresses over time, the assemblies evolve from early RNA–peptide complexes into more ordered fibrillar structures, which are best assessed by electron microscopy at later time points. To improve clarity and avoid potential confusion, we have streamlined Figure 1 to focus on the EMSA data, which specifically addresses early binding events. The TEM data were removed from Figure 1 and are now presented in Figure 3, where later-stage structural transitions and fibrillation are shown more comprehensively and in the appropriate mechanistic context.

      (8) In Figure 1B, if feasible, complementing TEM with a confirmatory fibril assay (e.g., ThT kinetics) under the same conditions would strengthen the conclusion that the morphology difference is robust, but it is not mandatory.

      We attempted to perform ThT fibrillation kinetics under the same RNA containing conditions; however, these assays were not informative for this system. PSMα3 aggregates extremely rapidly, producing an immediate and steep increase in ThT fluorescence (Fig. S9 in the revised version), which prevents reliable resolution of RNA dependent differences in aggregation kinetics or lag phases. In addition, Poly(AU) RNA interferes with ThT readout through electrostatic interactions between the negatively charged RNA and the cationic dye, as well as through RNA-induced changes in fibril morphology, both of which complicate quantitative interpretation of fluorescence kinetics. Based on these technical constraints and prior experience with RNA–amyloid systems, ThT kinetics under identical RNA conditions would not provide a robust or interpretable confirmation of the morphological differences observed by TEM.

      (9) In Figure 1B, PSMα3 alone control is missing in TEM images.

      A TEM image of PSMα3 alone is included in Figure 3, where we systematically present fibrillation outcomes across different RNA concentrations alongside the peptide-only control. Figure 1 was streamlined to focus on early RNA– peptide interactions assessed by EMSA, whereas Figure 3 provides a comprehensive TEM analysis of later-stage structural outcomes. This organization was chosen to clearly separate early binding events from subsequent assembly transitions and to avoid redundant presentation of TEM images under similar conditions.

      (10) Although it is experimentally practical to focus on Poly(AU), the justification is very one-sided. The Poly(A) condition, which yields amorphous aggregates, may be equally informative for understanding toxicity, LLPS, or nonfibrillar states and could be discussed more explicitly.

      We agree that Poly(A)-induced amorphous aggregation is informative for understanding non fibrillar assembly states. However, the primary aim of this study was to dissect RNA-dependent regulation of fibrillar assembly and phase behavior, which is most clearly captured using Poly(AU). Poly(A) was therefore included as a comparative condition rather than as a focus for detailed mechanistic analysis. A more systematic comparison of different RNA classes and their effects on non fibrillar states and toxicity is an important direction for future work but is beyond the scope of the present study.

      (11) To improve readability of the manuscript, the main text should follow the order of the figure panels (e.g., A, B, C, D, and E) and numbers (Figure 1, 2...) sequentially, so that readers can easily align with the corresponding images.

      We have revised the manuscript to improve alignment between the main text and the figures, adjusting panel ordering and numbering where appropriate so that the text now follows the figure panels and figure numbers more sequentially. These changes were made to enhance readability while maintaining a logical visual flow within the figures.

      (12) In the result section of Figure 2, the analogy to Ddx4-like systems is a helpful concept, but should be clearly framed as an analogy, not evidence. It would be more accurate to say that the behavior is "conceptually similar to" those systems, while noting that the molecular context is significantly different.

      We have revised the text to explicitly frame the comparison to Ddx4-like systems as a conceptual analogy rather than evidence: lines 158-161 in the revised version.

      (13) In Figure 4, inclusion of positive and negative controls to validate assay performance (e.g., untreated bacteria or HeLa cells, lysis buffer, media alone) would strengthen confidence in the bioactivity measurements.

      We wish to clarify that appropriate positive and negative controls were included in all bioactivity assays and were used to normalize the data presented in Figure 4. For the HeLa cytotoxicity assay (LDH), untreated cells were used to determine spontaneous LDH release (negative control), and cells treated with the manufacturer supplied lysis buffer were used to determine maximum LDH release (positive control). The percent cytotoxicity shown in Figure 4B was calculated relative to these internal controls, as described in the Methods. For the antibacterial assay (PrestoBlue), wells containing E. coli without peptide served as the positive control for 100% viability, while wells containing sterile LB medium alone were used as blanks. Viability values in Figure 4A were normalized to these controls. We have ensured that the Methods section explicitly describes these controls to reinforce confidence in the bioactivity measurements.

      (14) To enhance clarity, consider presenting the RNA concentration and time-dependent effects on PSMα3 bioactivity in a comparison table within the main text or as a supplementary figure.

      We appreciate this suggestion and carefully considered presenting the data in tabular form. However, we found that graphical representation more effectively conveys the trends, transitions, and comparative patterns between conditions. A table would not adequately capture these relationships.

      Reviewer #2 (Recommendations for the authors):

      Further remarks:

      (1) Circumstantial evidence based on the "amyloid inhibitor", EGCG: The results with EGCG, which has been shown to have a moderate amyloid-reducing effect on PSMalpha 1 and PSMalpha4, should not be taken as evidence for amyloid-based cytotoxicity. While increased concentrations of EGCG reduced the cytotoxic effect of PSMalpha3, it is not convincingly shown that this is due to a lower concentration of amyloid vs. monomeric PSM.

      We agree that the effects of EGCG should not be interpreted as evidence for amyloid fibrils being the cytotoxic species. Our data instead support a mechanism in which EGCG primarily targets soluble PSMα3, thereby redirecting its assembly pathway and depleting bioactive species. Specifically, solution-state NMR (Fig. 7) shows that EGCG binds defined residues of monomeric PSMα3, consistent with sequestration of soluble peptide rather than selective inhibition of fibrils. Complementary light and electron microscopy, together with kinetic measurements, indicate that EGCG does not simply stabilize monomers but instead diverts PSMα3 into amorphous, non-functional aggregates, as visualized by TEM (Fig. 6B) and reflected in altered ThT responses (Fig. S9). Importantly, these EGCG-induced aggregates are non-cytotoxic (Fig. 6A/C) and fail to associate with membranes or cells, in contrast to untreated PSMα3, which forms membrane-associated assemblies and induces disruption (newly added Movies S1-S2). Thus, EGCG potentially reduces cytotoxicity by remodeling the aggregation landscape and depleting active soluble species, rather than by selectively inhibiting specific fibril formation. This clarification is now added to the Discussion in lines 554-564 of the revised version.

      (2) It is appreciated that the authors refrain from presenting the unsubstantiated concept of "functional" PSM amyloids in the discussion. However, wording in that direction must also be removed from other parts of the manuscript (e.g. "bioactive fibrillar polymorphs". "The formation of cross-alpha amyloids has been correlated with toxic activity", etc.), generally refraining from uncritically implying that amyloid formation underlies PSM biological activity, and rather discussing that the much more likely explanation of the findings is a lowering of cytolytically active, monomeric PSM concentration.

      As detailed in our response to Major Comment #1, we agree that uncritical language implying that amyloid fibrils themselves are the cytotoxic species should be avoided. Accordingly, we have revised the manuscript to consistently frame amyloid formation in regulatory terms. Aggregation, depending on context, modulates activity by altering the availability, persistence, and assembly pathways of these species. Distinct aggregation states are therefore presented as correlated with, but not equivalent to, cytotoxic activity, and as components of a dynamic assembly landscape rather than as direct toxic entities.

      (3) Discussion: "PSM alpha3 interaction with nucleic acids within human cells ...supports a comparable mechanism...". Please delete this as it is unsubstantiated.

      We agree that the original phrasing overstated the evidence. The sentence was removed and the Discussion was revised to clearly frame nucleolar accumulation as a phenomenological observation reflecting PSMα3's intrinsic nucleic acid–binding capacity, rather than as evidence for a comparable intracellular mechanism. Specifically, the revised Discussion (lines 597–607) states that nucleolar localization is "unlikely to represent a distinct intracellular toxic mechanism" and instead "reflects binding competence within RNA-rich compartments following cellular entry." The biological relevance of this interaction, particularly at sub-cytotoxic concentrations, is noted as an open question requiring further investigation.

      (4) The authors should also cite papers that have argued against their central hypothesis of "functional" PSM amyloids.

      We thank the reviewer for this suggestion. Accordingly, we have revised the manuscript to explicitly cite and discuss studies that argue against amyloid fibrils as the primary cytotoxic species, and that instead attribute PSM cytotoxicity to soluble or membrane-associated forms. These perspectives are now incorporated in the Discussion to provide a balanced view of the field and to clarify how our findings align with, and differ from, existing models of PSM activity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a comprehensive single-cell atlas of mouse anterior segment development, focusing on the trabecular meshwork and Schlemm's canal. The authors profiled ~130,000 cells across seven postnatal stages, providing detailed and solid characterization of cell types, developmental trajectories, and molecular programs.

      Strengths:

      The manuscript is well-written, with a clear structure and thorough introduction of previous literature, providing a strong context for the study. The characterization of cell types is detailed and robust, supported by both established and novel marker genes as well as experimental validation. The developmental model proposed is intriguing and well supported by the evidence. The study will serve as a valuable reference for researchers investigating anterior segment developmental mechanisms. Additionally, the discussion effectively situates the findings within the broader field, emphasizing their significance and potential impact for developmental biologists studying the visual system.

      Weaknesses:

      The weaknesses of the study are minor and addressable. As the study focuses on the mouse anterior segment, a brief discussion of potential human relevance would strengthen the work by relating the findings to human anterior segment cell types, developmental mechanisms, and possible implications for human eye disease. Data availability is currently limited, which restricts immediate use by the community. Similarly, the analysis code is not yet accessible, limiting the ability to reproduce and validate the computational analyses presented in the study.

      In the revised version we have added an additional paragraph to the discussion section highlighting the human relevance of our work. Additionally, data is public on single cell portal and GEO, accession numbers have been updated. Codes are available on Github (https://github.com/revathi-balasubramanian/Anterior-segment-development-single-cell-data-analysis).

      Reviewer #2 (Public review):

      Summary:

      This study presents a detailed single-cell transcriptomic analysis of the postnatal development of mouse anterior chamber tissues. Analysis focused on the development of cells that comprise Schlemm's Canal (SC) and trabecular meshwork (TM).

      Strengths:

      This developmental atlas represents a valuable resource for the research community. The dataset is robust, consisting of ~130,000 cells collected across seven time points from early post-natal development to adulthood. Analyses reveal developmental dynamics of SC and TM populations and describe the developmental expression patterns of genes associated with glaucoma.

      Weaknesses:

      (1) Throughout the paper, the authors place significant weight on the spatial relationships of UMAP clusters, which can be misleading (See Chari and Patcher, Plos Comb Bio 2023). This is perhaps most evident in the assessment of vascular progenitors (VP) into BEC and SEC types (Figures 4 and 5). In the text, VPs are described as a common progenitor for these types, however, the trajectory analysis in Figure 5 denotes a path of PEC -> BEC -> VP -> SEC. These two findings are incongruous and should be reconciled. The limitations of inferring relationships based on UMAP spatial positions should be noted.

      (2) Figure 2d does not include P60. It is also noted that technical variation resulted in fewer TM3 cells at P21; was this due to challenges in isolation? What is the expected proportion of TM3 cells at this stage?

      (3) In Figures 3a and b it is difficult to discern the morphological changes described in the text. Could features of the image be quantified or annotated to highlight morphological features?

      (4) Given the limited number of markers available to identify SC and TM populations during development, it would be useful to provide a table describing potential new markers identified in this study.

      (5) The paper introduces developmental glaucoma (DG), namely Axenfeld-Rieger syndrome and Peters Anomaly, but the expression analysis (Figure S20) does not annotate which genes are associated with DG.

      (1) We agree that inferring biological relationships from the spatial arrangement of UMAP clusters has limitations and we have qualified our interpretation accordingly in the text. We have also added clarifying language to the trajectory analysis in Figure 5. The intended developmental trajectory is PEC → VP → BEC and SEC; however, the cluster labels in Figure 5 were applied incorrectly. Specifically, VP, BECs cluster was mislabeled as BECs, which led to the confusion. This cluster contains VPs that transition into BECs as well as VPs that are precursors to SECs.

      (2) We recently published the P60 dataset separately (Tolman, Li, Balasubramanian et al., eLife 2025); these data consist of integrated single-nucleus multiome profiles that were subjected to in-depth analysis. Additionally, we found that integrating the P60 dataset with the developmental datasets obscured sub-clustering of mature cell types. In future manuscripts, we will pursue a more detailed analysis of TM development and perform time point–specific clustering, similar to the approach we used for endothelial cells (Figure 4e).

      Comparing proportions of cells at different ages and as the eyes grows needs to be done cautiously. Notwithstanding the limitations, the proportions of TM1, TM2, and TM3 clusters are expected to be similar between P14 and P21 as the proportions at P14 and P60 are similar when comparing to the separately analyzed P60 data. Importantly, our dissection strategy changed with age: from P2 to P14, we removed approximately one-third of the cornea, whereas at P21 and P60 we removed most of the cornea to help maximize representation of limbal cells as the eyes grew. This change in dissection likely contributed to the reduced number of TM3 cells observed at P21. TM3 cells are enriched anteriorly (at-least in adult) and so are located closer to the corneal cut during dissection of the P21 eyes (which despite being larger than younger ages are still small and more delicate to accurately dissect than at P60) and are therefore more likely to be lost. Additional details are provided in the Methods section and the caveats surrounding our dissection method have now been included.

      (3) For Figure 3a and b, we have now pseudo-colored the spaces and provided a quantification of how both TM volume and intratrabecular spaces change with developing age (Figure 3c).

      (4) We have now included a supplemental table of markers for developing and mature TM and SC cell types (Table S3).

      (5) We have highlighted DG genes in rectangular boxes in Figure S20.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Sun et al. generated germline-specific cKO mice for the Znhit1 gene and examined its effect on male meiosis. The authors found that the loss of Znhit1 affects the transcriptional activation of pachytene. Znhit1 is a subunit of the SRCAP chromatin remodeling complex and a depositor of H2AZ, and in cKO spermatocytes, H2AZ is not deposited into the gene region. The authors claim that this is why the PGA was not activated. These findings provide important insights into the mechanisms of transcriptional regulation during the meiotic prophase.

      Strengths:

      The authors used samples from their original mouse model, analyzing both the epigenome and the transcriptome in detail using diverse NGS analyses to gain new insights into PGA. The quality of the results appeared excellent.

      Weaknesses:

      Overall, the data is inconsistent with the authors' claims and does not support their final conclusions. In addition, the sample used may not be the most suitable for the analysis, but a more suitable sample would dramatically improve the overall quality of the paper.

      Thank you for your comprehensive summary of our study and your thoughtful insights into its strengths and weaknesses. We greatly appreciate this valuable feedback, which helps us further improve our work. Below, we provide a detailed response addressing each of the points you raised.

      Reviewer #1 (Recommendations For The Authors):

      Major revisions:

      Surprisingly, many genes were upregulated in the scRNA-seq results. How many XY genes are included? Discuss why many genes are up-regulated in Fig. 5E whereas bulk RNA-seq showed only 70 genes were down-regulated. Since apoptosis-related factors are up-regulated in Fig5E, could these up-regulated genes be due to the high content of the transcriptome of dead cells? As you know, cell death starts, but randomly and violently disrupts the transcriptome, so we think it is not desirable to analyze the transcriptome with dead cells in the mix. Describe this point appropriately in the text or generate new data without dead cells.

      We sincerely appreciate the reviewer’s critical points. Below, we address each point sequentially:

      (1) To address the question about XY-linked genes, we utilized scRNA-seq data to identify differentially expressed sex chromosome genes in spermatocytes at different stages. Our analysis revealed an aberrant activation of XY-linked genes relative to controls. Specifically, 120 XY-linked genes were aberrantly activated in zygotenestage spermatocytes, and 119 XY-linked genes showed aberrant activation in pachytene-stage spermatocytes (revised Fig. 4F). This observation directly indicates that Znhit1 knockout impairs Meiotic Sex Chromosome Inactivation (MSCI), a finding that aligns with our prior characterization of XY chromosome synapsis defects in Znhit1-deficient spermatocytes.

      (2) Two key reasons explain the discrepancy between scRNA-seq and bulk RNA-seq results:

      First, scRNA-seq employs a more permissive threshold for identifying DEGs (log2 fold change [log2FC] = 0.25), thereby enhancing sensitivity to subtle expression changes and enabling the detection of more upregulated genes. In contrast, bulk RNAseq uses a stricter threshold (log2FC = 1), which filters out these subtly upregulated transcripts, resulting in fewer DEGs overall.

      Second, scRNA-seq can capture cell subset-specific differential expression. In contrast, bulk RNA-seq averages signals across mixed cells, masking such subsetspecific expression changes.

      These clarifications have been included in the Data Analysis section of the revised manuscript.

      (3) We fully agree with the reviewer’s concern that dead cells could confound transcriptomic analyses. Before downstream analysis, we excluded non-viable cells via stringent QC: cells with mitochondrial RNA (mtRNA) content exceeding 15% were removed, as high mtRNA content is a well-established marker of cell death or compromised viability. To further validate that upregulated genes were not driven by dead cell contamination, we analyzed the correlation between the expression of apoptosis-related genes and mtRNA fractions in our data. This analysis revealed no significant correlation (Pearson correlation coefficient, r = -0.02; please see Author response image 1). These results collectively rule out dead cell transcriptome contamination as the primary cause of the observed gene upregulation.

      Author response image 1.

      Scatter Chart showing the Pearson correlation between apoptosisrelated genes and mitochondrial RNA fractions in scRNA-seq data.

      Line 280-286: The data in Figures 7I and J are confusing: as shown by KAS-seq, it is natural that ssDNA is not formed in the promoter region in Znhit1-cKO sample because transcription does not proceed, but why is ssDNA formed in the enhancer region in the first place in control and then lost in Znhit1-cKO sample? Generally, it is said that in the enhancer region, including the super-enhancer region, doublestranded DNA is not dissociated, thus not forming ssDNA. Discuss why the loss of ssDNA in the enhancer region affects transcription with appropriate citations. Also, show whether genes downstream of the missing ssDNA in the promoter region have abnormal transcriptional activity, along with the RNA-seq data. Furthermore, in the region shown in Figure 7I, why the chromatin is even more open, as shown by ATACseq in Znhit1-cKO. Discuss whether this is related to transcriptional progression or aberrant substitution with H2A. If the function of ZNHIT1 is to replace H2A with H2AZ for PGA, it is not necessary to show the H2A level in Znhit1-cKO.

      We appreciate the reviewer’s constructive comments.

      (1) ssDNA dynamics in enhancer regions: Emerging evidence demonstrates that active enhancers undergo transient DNA unwinding to form ssDNA, a process critical for transcriptional regulation by transcribing enhancer RNAs (eRNA). KAS‑seq is sufficiently sensitive to detect ssDNA in enhancer regions (Kim et al., 2010; Wu et al., 2020). It has been shown that H2A.Z (deposited by the ZNHIT1-SRCAP complex) is required for maintaining enhancer accessibility and dynamic unwinding (Sporrij et al., 2023). In this study, we found that Znhit1 deletion and defective H2A.Z incorporation impaired enhancer ssDNA formation, indicating that ZNHIT-H2A.Z plays an important role in the activity of both promoter and enhancer.

      (2) Impact of ssDNA loss on transcription: To address how missing ssDNA affects transcriptional activity, we further analyzed changes in KAS‑seq signals following Znhit1 knockout. Overall, KAS‑seq signals were significantly reduced upon Znhit1 depletion, confirming that Znhit1 is essential for ssDNA formation. Further examination of KAS‑seq signals at promoters of downregulated genes also revealed reduced signals (revised manuscript, Fig. S8). In contrast, KAS-seq signals of upregulated genes remained relatively low and showed no changes in both the control and knockout groups, and their upregulation probably results from indirect regulation. These results underscore the importance of ZNHIT1-mediated chromatin states in regulating ssDNA formation and gene expression.

      (3) Aberrant chromatin openness in Znhit1-cKO (ATAC-seq): The increased chromatin accessibility detected by ATAC-seq likely represents a disorganized, nonfunctional state rather than productive transcriptional openness. H2A.Z normally constrains chromatin dynamics to facilitate ordered transcriptional regulation (Cole et al., 2021); its absence in Znhit1-cKO leads to higher ATAC-seq signals, suggesting that this aberrant openness fails to support proper assembly of the transcriptional machinery.

      Minor revisions:

      Line 106. The text says that they looked for chromatin factors, but the legend says that they looked for epigenetic factors. The text must be consistent.

      We have corrected it in the revised manuscript (line 801).

      Line 107. Although it is stated that the transcriptional data published here were used, it appears from the cited references that they are scRNA-seq data. A clear explanation is required in the text or legend.

      We have revised this data as scRNA-seq data (line 107).

      Line 141-143: Using TUNEL analysis in Figure 4F, the authors show that Znhit1cKO testis cells contain many dead cells. Describe the type or stage of the apoptotic cells.

      We appreciate the reviewer’s suggestion. Specifically, we performed TUNEL staining on testes isolated from P14 mice, a critical time point for pachytene development (revised Fig. 2D). We tested this by showing that apoptosis-related genes were significantly upregulated in pachytene-stage spermatocytes in scRNA-seq data (revised Fig. 4D). To further validate this observation, we performed scRNA-seq from P35 testis samples. The results revealed a significant reduction in late pachytene-stage spermatocytes in Znhit1-cKO samples (revised Fig. 2F), consistent with apoptotic loss of pachytene cells. Collectively, these data confirm that Znhit1 knockout impairs pachytene-stage spermatocyte development.

      The authors claimed that the loss of Znhit1 lowers the transcription of a group of genes involved in homologous recombination, including Rnf212, causing a delay in homologous recombination; however, if the process of homologous recombination is delayed, homologous chromosome pairing and synapsis are affected unless DSB repair is completed. Provide a satisfactory explanation for the fact that DNA damage remains on autosomes despite complete synapsis, as shown in Figure 3C, which is likely not solely due to delayed homologous recombination.

      Thank you for this insightful comment. We fully agree that persistent autosomal DNA damage cannot be explained solely by delayed homologous recombination. To resolve this question, we further analyzed autosomal synapsis through SYCP1 and SYCP3 staining. While autosomal synapsis appeared morphologically complete, we identified subtle but significant synapsis defects in autosomal terminal regions (revised Fig. 3A). This suggests that Znhit1 knockout also results in autosomal synapsis defects. We speculate that these synapsis defects are associated with the unresolved autosomal DNA damage we observed.

      Lines 150-163. With regard to XY unpairing in Znhit1-cKO pachytene spermatocytes, there is insufficient discussion as to whether this is due to transcriptional aberrations.

      Thank you for highlighting the need to link transcriptional aberrations to XY unpairing in Znhit1-cKO pachytene spermatocytes. To address this, we analyzed sex chromosome transcription using scRNA-seq data. Relative to controls, 120 XYlinked genes were aberrantly activated at zygotene, and 119 were upregulated at pachytene in Znhit1-cKO spermatocytes (revised Fig. 4F), directly demonstrating Znhit1 knockout disrupts Meiotic Sex Chromosome Inactivation (MSCI). Given that intact MSCI is required to stabilize XY synapsis in pachytene spermatocytes, we conclude that the observed XY unpairing is likely a direct consequence of these sex chromosome transcriptional abnormalities. We add this information to the revised manuscript (lines 221-226).

      Line 187-194. Analysis of the scRNA-seq data is shown in Figure 4, but it lists several genes as stage-specific markers, some of which do not have well-understood meiotic functions. Please cite a reference paper that provides sufficient evidence to qualify this stage.

      In response to this comment, we have refined the presentation of marker genes used for cell annotation (revised Fig. S4B). We have incorporated relevant references supporting their utility as stage-specific markers for the meiotic stages (line 187).

      Line 225-233: If Znhit1 is important for H2AZ deposition and regulates PGA through it, how does it regulate HR-related genes that are expressed earlier through H2AZ deposition during the pachytene stage? For example, Rnf212 is not specifically expressed during the pachytene stage but is one of the targets of MEIOSIN, so it is expressed at an earlier stage.

      Thank you for this insightful comment. We fully acknowledge the reviewer’s key observation that HR-related genes such as Rnf212 are MEIOSIN targets that initiate transcription at earlier meiotic stages, before the pachytene stage. Our stage-resolved scRNA-seq data further showed that the expression of Ccnb1ip1 and Rnf212 was significantly upregulated from zygotene to pachytene, following their initial transcriptional onset. We next showed that the loss of H2A.Z deposition induced by Znhit1 deletion specifically impaired this pachytene-specific secondary transcriptional activation, rather than the early MEIOSIN-driven expression onset (please see Author response image 2).

      Author response image 2.

      Plots showing the expression level of indicated genes in scRNAseq data.

      Line 245-251: As shown in Figure 6E, more than 14,000 genes have H2AZ peaks. In contrast, only approximately 60% of the genes downregulated by Znhit1-cKO appeared to be directly affected by H2AZ. Are the remaining 40% of genes regulated in a different way that is not mediated by H2AZ? Also, only a few percent of the genes with H2AZ peaks are affected, but why are only genes with A-MYB involvement affected, as shown in Figure 7?

      Thank you for these insightful and constructive comments. For the ~40% of downregulated genes not directly linked to H2A.Z, they were likely regulated through indirect mechanisms. H2A.Z deposition mediated by ZNHIT1 may influence upstream transcriptional regulators (e.g., transcription factors or coactivators), whose dysregulation in turn affects these genes.

      The selective effect of H2A.Z loss on A-MYB target genes is explained by the strict context-dependent function of H2A.Z, which requires stage-specific partner transcription factors to exert its regulatory activity. During the zygotene-to-pachytene transition, A-MYB acts as the master regulator of pachytene gene activation and forms a functional collaborative complex with H2A.Z to drive target gene transcription. Disrupted H2A.Z deposition upon Znhit1 deletion specifically impairs the activity of this A-MYB-H2A.Z complex, leading to selective downregulation of A-MYB targets. Other H2A.Z peak-associated genes may rely on alternative cofactors and compensatory mechanisms.

      Line 245-256: Figures 6 and F show that the localization of H2AZ is reduced in Znhit1-cKO mice, which means that no substitution with H2A occurs. If so, show it in the data because the localization of H2A should be increased compared to that in the control.

      To clarify the status of H2A, we have now detected immunofluorescent staining against H2A. While H2A.Z deposition was clearly impaired following Znhit1 deletion, the global level of H2A did not change significantly (Author response image 3). We speculate that this observed absence of a compensatory increase in H2A is likely due to the intrinsically low abundance of the histone variant H2A.Z relative to canonical histone H2A under physiological conditions.

      Author response image 3.

      Immunostaining of SYCP3 and H2A in spermatocyte testis sections of control and Znhit1-sKO mice, Scale bar, 40 μm.

      Reviewer #2 (Public Review):

      Summary:

      The study demonstrates that Znhit1 regulates male meiosis, with deletion causing pachytene failure associated with defective expression of pachytene genes and subtle effects on X-Y pairing and DSB repair. The authors attribute this phenotype to the defective incorporation of the Znhit1 target H2A.Z into chromatin.

      Strengths:

      The paper and the figures are well presented and the narrative is clear. Evidence that the conditional deletion strategy removes Znhit1 is strong, with multiple orthogonal approaches used. Most of the meiotic phenotyping is well performed, and the omics analysis clearly identifies a dramatic effect on the meiotic gene expression program. The link to H2A.Z and A-MYB adds a mechanistic angle to the study.

      Weaknesses:

      (1) Current literature demonstrates that meiotic mutants arrest at one of two stages: midpachytene (stage IV of the seminiferous cycle) or metaphase I (stage XII of the seminiferous cycle). This study documents that in the Znhit1 KO the midpachytene marker H1t appears normally, but that cells arrest before diplotene. If this is true, then arrest must occur during late pachytene, which based on my knowledge has never been documented for a meiotic KO. To resolve this, the authors should present stronger histological substaging evidence to support their claim.

      Thank you for this insightful and constructive comment. To achieve highresolution tracking of cell lineage progression, we performed scRNA-seq analysis using P35 testes in this revised manuscript. scRNA-seq data showed that germ cells normally progressed through all meiotic stages and successfully gave rise to spermatids in control groups. By contrast, in the Znhit1 knockout group, late pachytene spermatocytes decreased significantly, and only very few subsequent germ cell types were observable (revised Fig. 2F, G). In scRNA-seq data, although very few diplotene spermatocytes and meiotic metaphase I cells were detectable, these cells still appeared abnormal, as evidenced by their extremely low Pou5f2 expression. We have revised our description of the meiotic arrest stage in the manuscript.

      (2) The authors overlooked the possible effects of Znhit1 deletion on MSCI. Defective MSCI is a well-established cause of pachytene arrest. Actually, the fact that they see X-Y pairing failure should alert them even more strongly to this possibility because MSCI failure is often associated with defective X-Y pairing. This could be easily addressed by examination of their RNAseq data.

      To address the concern that Znhit1 deletion may impact Meiotic Sex Chromosome Inactivation (MSCI), we analyzed XY-linked gene expression using scRNA-seq data from spermatocytes at distinct stages. Our analysis revealed aberrant activation of XY-linked genes in Znhit1-CKO spermatocytes relative to controls. Specifically, 120 XY-linked genes were activated at zygotene, and 119 XY-linked genes were upregulated at pachytene (revised Fig. 4F). This observation directly demonstrates that Znhit1-CKO impairs MSCI, which aligns with our prior characterization of defective X-Y chromosome synapsis in Znhit1-deficient spermatocytes. To explicitly resolve this concern, we have integrated these MSCIfocused RNA-seq analyses into the revised Results section (lines 221-226).

      (3) The recombination assays need attention.

      In the text the authors state that they studied RPA2 and DMC1, but the figures show RPA2 and RAD51.

      The RPA counts are not quantitated.

      The conclusion that crossover formation fails (based on MLH1 staining) is not justified. This marker does not appear in wt males until late pachytene, so if cells in this mutant are dying before that stage, MLH1 cannot be assessed.

      The authors state that gH2AZ persists in the KO, but I'm not convinced that they are comparing equivalent stages in the wt and KO. In Figure 3C, the pachytene cell is late, whereas in the mutant the pachytene cell is early or mid (when residual gH2AX is expected, even in wt males).

      Previous work (PMID: 23824539) has shown that antibodies reportedly detecting pATM in the sex body are non-specific. I therefore advise caution with the data shown in Figure 3D.

      We appreciate the reviewer’s detailed feedback on our recombination assays and have addressed each concern as follows:

      (1) Discrepancy between text and figures (RPA2/DMC1 vs. RPA2/RAD51): We have corrected this in the revised manuscript.

      (2) Quantitation of RPA2 foci: We have supplemented quantitative analysis of RPA2 foci (revised Fig. S3).

      (3) Conclusion on crossover failure: Single-cell RNA sequencing data from P35 testes definitively confirmed that Znhit1 knockout spermatocytes successfully progressed to the late pachytene stage, ruling out the possibility that our MLH1 staining results are confounded by cell death or arrest before this critical stage. In addition, analysis of transcriptome datasets revealed significant downregulation of important genes required for homologous recombination and crossover formation, including Ccnb1ip1 and Rnf212. Reduced expression of these essential factors may impair the assembly of MLH1 crossover foci. These data demonstrate that ZNHIT1 is essential for proper homologous recombination and crossover formation during male meiosis. We have revised the text to emphasize this context.

      (4) γH2AX persistence and stage matching: We have replaced the images with more representative, stage‑matched pachytene spermatocytes from wild‑type and Znhit1‑KO mice (revised Fig. 2C). Furthermore, prompted by the insightful comment from Reviewer 1, we carefully re‑examined autosomal synapsis and identified abnormal synapsis specifically at the terminal regions of autosomes in Znhit1‑deficient spermatocytes (revised Fig. 3A). These data together confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.

      (5) pATM staining issue: Following the reviewer’s advice, we carefully reviewed the relevant literature (PMID: 23824539) and confirmed that the anti‑pATM antibody may exhibit non‑specific staining on the XY chromosomes. Accordingly, we have removed the pATM staining data presented in Figure 3D from the revised manuscript to ensure the accuracy and rigor of our results.

      (4) RNAseq data. The authors show convincingly that Znhit1 activates genes that are normally upregulated at the zyg-pachytene transition. They should repeat the analysis for genes normally upregulated at the prelep- lep and lep-zyg transition to show that this effect is really pachytene-gene specific.

      We appreciate this suggestion. To clarify the stage specificity of ZNHIT1’s regulatory role, we analyzed genes upregulated at the prelep-lep and lepzyg transitions. Our results showed that Znhit1 knockout had little impact on the overall expression levels of these genes (as shown in revised Fig. 4B). In contrast, as we previously reported, genes upregulated at the zygotene-pachytene transition were remarkably downregulated in Znhit1-cKO. These findings further confirm the specificity of ZNHIT1 in regulating pachytene gene expression.

      (5) I am puzzled that the title and overall gist of the study focuses on H2A.Z, when it is Znhit1 that has been deleted.

      We appreciate the reviewer’s observation and have revised the study title as suggested. Specifically, the title is now updated to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis.”

      Reviewer #3 (Public Review):

      Summary:

      Sun et al. present a manuscript detailing the phenotypic characterization of loss of Znhit1 in male germ cells. Znhit1 is a subunit of the chromatin regulating complex SRCAP that functions to deposit the histone variant H2A.Z. Given that meiosis, and specifically meiotic recombination, occurs in the context of the dynamic condensing of chromosomes, the role of chromatin regulators in general, and histone variants specifically, in mammalian meiosis is an active area of research. Previous work has shown that H2A.Z is found at the locations of recombination in plants, although H2A.Z was previously not found at recombination sites in mammalian meiosis. Here the authors use a conditional approach to ablate Znhit1 in spermatocytes and characterize a block in meiosis in prophase I in the transition from pachytene to diplotene stage.

      Strengths:

      The authors combine current methods in immunohistochemistry and functional genomics to provide strong evidence of meiotic block upon the loss of Znhit1. They find that loss of Znhit1 leads to reduced incorporation of the histone variant H2A.Z, specifically at promoters and enhancers. Further, RNA sequencing found more genes are down-regulated upon loss of Znhit1 compared to upregulated, suggesting that incorporation of H2A.Z is critical for the expression of genes necessary for successful meiotic progression.

      A strength of the manuscript is tying the locations of changes in H2A.Z deposition with binding of the transcription factor A-MYB, providing a mechanism that can potentially combine the changes in chromatin regulation with variable binding of a transcription factor in gene expression in pachytene stage spermatocytes.

      Weaknesses:

      A weakness in the single-cell RNA experiment using cells from 16-day-old male mice. The authors suggest that the rationale for the experiment was to determine where the Znhit1-sKO mutant showed an arrest in meiosis, and claim that this is the pachytene stage. However, in the 'first wave' of meiosis 16-day-old mice are just beginning to enter pachytene, so cells from later meiotic stages will be largely absent in these tubules. This is clear from the UMAP showing a similar pattern of cell distributions between wild-type and mutant mice. Using older mice would have better demonstrated where the mutant and wild-type mice differ in cell-type composition.

      We appreciate the reviewer’s constructive comment. To resolve this issue, we have added new scRNA‑seq data from testes of P35 mice, which harbor a full spectrum of meiotic stages, including late pachytene, diplotene, metaphase I spermatocytes, and post-meiotic spermatids. Compared with wild-type controls, Znhit1-sKO testes exhibited a marked reduction in late pachytene spermatocytes and a near-complete loss of post-pachytene cell types, directly validating the pachytenestage meiotic arrest (revised Fig. 2F, G). All updated analyses have been integrated into the manuscript to strengthen our conclusions.

      The authors use the term pachytene genome activation (PGS) in the manuscript to suggest a novel process by which genes are specifically increased in expression in the pachytene stage of meiotic prophase I, without reference to literature that establishes the term. If the authors are putting forward a new concept defined by this term, it would strengthen the manuscript to describe it further and delineate what the genes are that are activated and discuss potential mechanisms.

      We appreciate the reviewer’s valuable feedback on our use of the term "pachytene genome activation (PGA)".

      To address this, we have revised the text to explicitly frame PGA as a stage-specific transcriptional program observed in our data, defined by the coordinated upregulation of a distinct set of genes during the pachytene stage of meiotic prophase I.

      (1) Definition and Gene Set: Using the scRNA-seq dataset, we formally defined PGA as the transcriptional wave characterized by genes with increased expression in pachytene vs. zygotene spermatocytes (n = 1,560 genes). Functional enrichment analysis shows these genes are primarily involved in DNA repair, cilium organization, and spermatid development (Table S3), consistent with the biological process of germ cell development.

      (2) Relationship to existing literature: While PGA as a term is not widely established, our data align with prior observations of pachytene-specific transcriptional upregulation (Alexander et al., 2023; Ernst et al., 2019; Turner, 2015). Importantly, Alexander et al reveals that in late meiotic stages, starting from pachynema, chromatin has a ~3-fold increase in transcription. We have added these citations to clearly illustrate the relevant advances in the field (lines 68-71).

      (3) Regulation of pachytene-stage gene expression: We further delineate that PGA is regulated by ZNHIT1-dependent H2A.Z deposition. Znhit1 deletion resulted in significant downregulation of 70.1% (1,094 out of 1,560) of these genes. This links PGA to chromatin-based regulation, where ZNHIT1-dependent H2A.Z deposition enables pachytene-specific transcription.

      Generally speaking, the authors present solid evidence for a pachytene block in male germ cell development in mice lacking Znhit1 in spermatocytes. The evidence supporting a change in gene expression during pachytene, that more genes are downregulated in the mutant compared to increased expression, and changes in histone modification dynamics and placement of H2A.Z all support a role in alterations in meiotic gene regulation. However, the support that changes in H2A.Z impacting meiotic recombination (as suggested in the manuscript title) is less supported, rather than a general cell arrest in the pachytene stage leading to cell death. The conclusions around the role of Znhit1 influencing meiotic recombination directly could use further justification or mechanistic hypothesis.

      We acknowledge the reviewer’s comments. Indeed, existing data support the presence of a pachytene block in spermatocytes of Znhit1-deficient mice, along with aberrant pachytene gene expression and impaired H2A.Z deposition.

      In response, we made the following revisions: (1) we adjusted the manuscript title and conclusion to reduce emphasis on a direct H2A.Z-recombination link, and focus instead on ZNHIT1/H2A.Z in pachytene gene regulation and meiotic progression; (2) recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery (lines 314-319).

      Reviewer #3 (Recommendations For The Authors):

      Quality of the images for meiotic spreads - images have low contrast and are tiny. It is difficult to see the SYCP3 results even when the images are magnified on the computer screen.

      We have provided new images with high resolution to ensure a clear visualization of SYCP3 signals.

      Line 165 - indicates the results for DMC1, although the figure suggests the results are for RAD51 foci.

      We have corrected this mistake.

      Line 306 - this manuscript 'confirms' that H2AZ is not found at mammalian recombination sites, a result already in the literature.

      We have corrected this mistake (lines 309-312).

      Reviewing Editor Comments:

      Major points and revisions highlighted by the reviewers:

      (1) Meiotic prophase in Znhit1KO: The main questions to clarify are the stage and status of progression, the analysis of apoptosis, and the consequences of gene expression on the X and Y. Additional analysis for DSB repair foci, gH2AX is also required. Those analysis are needed to answer to reviewer 2. Even if H2AZ was not detected at recombination hotspots, it may be possible that it plays a role in DSB repair but the level is too low for detection. This should be discussed as H2AZ was shown to be involved in DNA repair.

      We sincerely appreciate the reviewing editor’s constructive comments.

      (1) Stage and progression of meiotic prophase: We supplement P35 testes for scRNAseq. Results confirmed Znhit1-KO spermatocytes arrest at late pachytene, and postpachytene stages (diplotene, metaphase I) were nearly absent (revised Fig. 2F, G).

      (2) Apoptosis analysis: We studied this by demonstrating that apoptosis-related genes were upregulated in pachytene spermatocytes at the single-cell level (revised Fig. 4D). To further validate this finding, we performed scRNA-seq analysis on P35 testis samples. Our results revealed a marked reduction in late pachytene spermatocytes in Znhit1-cKO testes (revised Fig. 2F, G), consistent with apoptotic depletion of pachytene-stage cells. Together, these data confirm that Znhit1 ablation impairs pachytene-stage spermatocyte development.

      (3) X/Y gene expression consequences: To address this key point, we performed stage-resolved analysis of XY-linked gene expression using scRNA-seq data from different-stage spermatocytes. Compared with controls, we detected aberrant ectopic activation of XY-linked genes in Znhit1-KO spermatocytes: 120 XY-linked genes were inappropriately activated at zygotene, and 119 remained abnormally upregulated at pachytene (revised Fig. 4F). These results provide direct evidence that Znhit1 deletion impairs Meiotic Sex Chromosome Inactivation (MSCI).

      (4) DSB repair issue: We have replaced the images with more representative, stage‑matched pachytene spermatocytes (revised Fig. 3C). The revised images show consistently increased γH2AX signals in Znhit1-KO spermatocytes. Prompted by Reviewer 1’s comment, we identified abnormal synapsis at autosomal terminal regions in mutant cells. Together, these results confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.

      (5) Potential role of H2A.Z in DSB repair: Though H2A.Z was nearly undetectable at recombination hotspots, we discuss two possibilities: (1) ZNHIT1-H2A.Z depletion dysregulated DSB repair-related genes; (2) Current ChIP-seq sensitivity may miss low-abundance H2A.Z at hotspots, which could support repair via chromatin remodeling. Future high-resolution assays (super-resolution imaging, DSB-targeted ChIP-seq) are proposed to validate this. We agree that recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery.

      (2) Gene expression analysis. The first consequence of H2AZ depletion is gene expression downregulation. However, it may be not surprising that some genes are down and others upregulated. There are likely secondary and indirect effects including the upregulation of some genes. The authors should explain and discuss this point such as to answer to questions raised by reviewer 1 and 2.

      The primary consequence of H2A.Z depletion in pachytene spermatocytes is indeed widespread downregulation of genes. For the coexistence of upregulated genes, we explain this via three key points.

      (1) Technical differences between scRNA-seq and bulk RNA-seq (addressing Reviewer 1): scRNA-seq captures cell-type-specific differentially expressed genes that bulk RNA-seq masks (bulk averages signals across mixed cells, hiding changes in rare subsets). Additionally, scRNA-seq uses a lower log2(fold change) threshold (0.25 vs. 1 in bulk RNA-seq), detecting subtle upregulations missed by bulk analysis.

      (2) No dead cell contamination (addressing Reviewer 1): Stringent quality control excluded cells with >15% mitochondrial RNA. Apoptosis-related genes showed no significant correlation with mitochondrial RNA fractions (Pearson correlation coefficient, r = -0.02; please see Author response image 1), ruling out dead cell transcriptome interference.

      (3) Secondary/indirect effects (addressing Reviewers 1 & 2): Upregulated genes likely result from indirect regulatory cascades. H2AZ depletion may disrupt upstream transcription factors, leading to compensatory upregulation of their downstream genes or cell stress responses to meiotic arrest. Notably, Znhit1 knockout specifically impacts genes upregulated at the zygotene-pachytene transition, while genes upregulated at preleptotene-leptotene or leptotene-zygotene transitions remain largely unaffected (revised Fig. 4B), confirming the specificity of H2A.Z’s direct regulatory role and framing upregulation as non-targeted indirect effects.

      (3) The authors should also test the effect of Znhit1KO on the 1196 genes (up PreL/L) and 1325 (up L/Z) as shown in Figure 5D for the PGA. Also in Figure 5B, there is no evaluation of the statistical significance of the variation, this should be revised. X and Y genes should be analysed. KAS-Seq should be correlated with gene expression analysis, and several points as mentioned in the reviews below should be better explained and discussed.

      (1) Effect of Znhit1-KO on PreL/L- and L/Z-upregulated genes: we analyzed the 1196 genes upregulated at the PreL/L transition and 1325 genes upregulated at the L/Z transition. Znhit1 knockout had minimal effect on the expression of these early meiotic gene sets (revised Fig. 4B), whereas genes activated at the zygotene‑pachytene transition were strongly downregulated in Znhit1-KO spermatocytes. These results confirm the specific role of ZNHIT1 in regulating pachytene‑stage gene expression. We have also added a statistical evaluation for the variation shown in Fig. 4B.

      (2) X/Y-linked gene analysis: Analysis of stage‑resolved scRNA‑seq revealed aberrant ectopic activation of 120 XY‑linked genes at zygotene and 119 at pachytene in Znhit1-KO spermatocytes (revised Fig. 4F), demonstrating impaired Meiotic Sex Chromosome Inactivation (MSCI).

      (3) KAS-seq correlation with gene expression: We analyzed the link between KAS‑seq signals and gene expression, and we found that Znhit1 depletion caused a global reduction in KAS‑seq signals, especially at promoters of downregulated genes (revised Fig. S8). Genes with increased expression showed low KAS‑seq signals in both control and mutant groups, likely reflecting indirect regulation. These results highlight the essential role of ZNHIT1 in transcriptional regulation.

      (4) The title should refer to Znhit1, and the effect on meiotic recombination activities may be an indirect consequence of prophase progression arrest, even if some recombination genes are downregulated. This point is important as noted by reviewer 3.

      We fully acknowledge Reviewer 3’s key point and have revised the manuscript title to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis” to reduce emphasis on a direct H2A.Z-recombination link.

      Regarding meiotic recombination activities: The downregulation of recombinationrelated genes (e.g., Ccnb1ip1, Rnf212) stems from impaired pachytene-stage transcriptional programs caused by ZNHIT1-dependent H2A.Z deposition defects, which in turn leads to prophase progression arrest. Thus, the observed recombination abnormalities may be a secondary consequence of the meiotic prophase arrest, rather than a direct regulatory effect of ZNHIT1 on recombination machinery. This clarification has been integrated into the Discussion section (lines 314-318).

      (5) The recent structural analysis of SRCAP should be cited: Yu et al. Cell Discovery (2024) 10:15 https://doi.org/10.1038/s41421-023-00640-1.

      We have cited this reference in this revised manuscript (lines 234-236).

      (6) The authors should read and answer the specific revisions asked for by the reviewers.

      We have thoroughly read and systematically addressed all specific revisions requested by Reviewers 1, 2, and 3, as detailed in the revised manuscript and supplementary data.

      References

      Alexander, A.K., Rice, E.J., Lujic, J., Simon, L.E., Tanis, S., Barshad, G., Zhu, L., Lama, J., Cohen, P.E., and Danko, C.G. (2023). A-MYB and BRDT-dependent RNA Polymerase II pause release orchestrates transcriptional regulation in mammalian meiosis. Nature communications 14.

      Cole, L., Kurscheid, S., Nekrasov, M., Domaschenz, R., Vera, D.L., Dennis, J.H., and Tremethick, D.J. (2021). Multiple roles of H2A.Z in regulating promoter chromatin architecture in human cells. Nature communications 12, 2524.

      Ernst, C., Eling, N., Martinez-Jimenez, C.P., Marioni, J.C., and Odom, D.T. (2019). Staged developmental mapping and X chromosome transcriptional dynamics during mouse spermatogenesis. Nature communications 10, 1251.

      Kim, T.K., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, M., Barbara-Haley, K., Kuersten, S., et al. (2010). Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182-187.

      Sporrij, A., Choudhuri, A., Prasad, M., Muhire, B., Fast, E.M., Manning, M.E., Weiss, J.D., Koh, M., Yang, S., Kingston, R.E., et al. (2023). PGE(2) alters chromatin through H2A.Z-variant enhancer nucleosome modification to promote hematopoietic stem cell fate. Proceedings of the National Academy of Sciences of the United States of America 120, e2220613120.

      Turner, J.M. (2015). Meiotic Silencing in Mammals. Annu Rev Genet 49, 395-412. Wu, T., Lyu, R., You, Q., and He, C. (2020). Kethoxal-assisted single-stranded DNA sequencing captures global transcription dynamics and enhancer activity in situ.

      Nature methods 17, 515-523.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the relationship between 3D chromatin architecture and innate immune gene regulation in monocytes from patients with alcohol-associated hepatitis (AH). Using Hi-C technology, they attempt to identify structural changes in the genome that correlate with altered gene expression. Their central claim is that genome restructuring contributes to the hyper-inflammatory phenotype associated with AH.

      Strengths:

      (1) The manuscript employs Hi-C technology, which, in principle, is a powerful approach for studying genome organization.

      (2) The focus on disease-relevant genes, particularly innate immune loci, provides a contextually important angle for understanding AH.

      Weaknesses:

      (1) Sample Size: The study relies on an exceptionally small cohort (4 AH patients and 4 healthy controls), rendering the results statistically underpowered and highly susceptible to variability.

      (2) Hi-C Resolution unpaired to RNA seq: The data are presented at a resolution of 100kb, which is insufficient to uncover meaningful chromatin interactions at the level of individual genes. This data is unpaired.

      (3) Functional Validation: The manuscript lacks experiments to directly link changes in chromatin architecture with gene expression or monocyte function, leaving the claims speculative.

      (4) Data Integration: The lack of Hi-C with ATAC and RNA-seq data handicaps the analysis and really makes it superficial. In short, it does not convincingly demonstrate a functional relationship.

      (5) Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.

      Appraisal of the Aims and Results:

      The manuscript sets out to establish a connection between chromatin architecture and AH pathology. However, the study fails to achieve its stated aims due to inadequate methods and insufficient data. The conclusions drawn from the Hi-C analyses alone are poorly supported, and the lack of functional validation undermines the credibility of the proposed mechanisms. Overall, the results do not provide compelling evidence to substantiate the authors' claims.

      Impact on the Field and Utility to the Community:

      The work, in its current form, is unlikely to have a meaningful impact on the field. The limited scope, methodological shortcomings, and lack of robust data significantly diminish its potential utility. Without addressing these critical gaps, the study does not offer new insights into the role of genome architecture in AH or provide useful methodologies or datasets for the community.

      Additional Context:

      The manuscript would benefit from a more comprehensive analysis of potential mechanisms underlying the observed changes, including the interplay between chromatin architecture and epigenetic modifications. Furthermore, longitudinal studies or therapeutic interventions could provide insights into the dynamic aspects of genome restructuring in AH. These considerations are entirely absent from the current study.

      Conclusion:

      The manuscript does not achieve its stated goals and does not present sufficient evidence to support its conclusions. The limitations in sample size, resolution, and experimental rigor severely hinder its contribution to the field. Addressing these fundamental flaws will be essential for the work to be considered a meaningful addition to the literature.

      Reviewer #2 (Public review):

      Summary:

      Dr. Adam Kim and collaborators study the changes in chromatin structure in monocytes obtained from alcohol-associated hepatitis (AH) when compared to healthy controls (HC). Through the usage of high throughput chromatin conformation capture technology (Hi-C), they collected data on contact frequencies between both contiguous and distal DNA windows (100 kB each); mainly within the same chromosome. From the analyses of those data in the two cohorts under analysis, authors describe frequent pairs of regions subject to significant changes in contact frequency across cohorts. Their accumulation onto specific regions of the genome -referred to as hotspots- motivated authors to narrow down their analyses to these disease-associated regions, in many of which, authors claim, a number of key innate immune genes can be found. Ultimately, the authors try to draw a link between the changes observed in chromatin architecture in some of these hotspots and the differential co-expression of the genes lying within those regions, as ascertained in previous single-cell transcriptomic analyses.

      Strengths:

      The main strength of this paper lies in the generation of Hi-C data from patients, a valuable asset that, as the authors emphasize, offers critical insights into the role of chromatin architecture dysregulation in the pathogenesis of alcohol-associated hepatitis (AH). If confirmed, the reported findings have the potential to highlight an important, yet overlooked, aspect of cellular dysregulation-chromatin conformation changes - not only in AH but potentially in other immune-related conditions with a component of pathological inflammation.

      Weaknesses:

      In what I regard as the two most important weaknesses of the work, I feel that they are more methodological than conceptual. The first of these issues concerns the perhaps insufficient level of description provided on the definition of some key types of genomic regions, such as topologically associated domains, DNA hotspots, or even DNA loci showing significant changes in contact frequency between AH and HC. In spite of the importance of these concepts in the paper, no operational, explicit description of how are they defined, from a statistical point of view, is provided in the current version of the manuscript.

      Without these definitions, some of the claims that authors make in their work become hard to sustain. Some examples are the claim that randomizing samples does not lead to significant differences between cohorts; the claim that most of the changes in contact frequency happen locally; or the claim that most changes do not alter the structure of TADs, but appear either within, or between TADs. In my viewpoint, specific descriptions and implementation of proper tests to check these hypotheses and back up the mentioned specific claims, along with the inclusion of explicit results on these matters, would contribute very significantly to strengthening the overall message of the paper.

      The second notable weakness of the study pertains to the characterization of the changes observed around immune genes in relation to genome-wide expectations. Although the authors suggest that certain hotspots contain a high number of immune-related genes, no enrichment analysis is provided to verify whether these regions indeed harbor a higher concentration of such genes compared to other genomic areas. It would be important for readers to be promptly informed if no such enrichment is observed, for in that case, the presence of some immune genes within these hotspots would carry more limited implications.

      Additionally, the criteria used to define a hotspot are not clearly outlined, making it difficult to assess whether the changes in contact frequencies around the immune genes highlighted in figures 5-8 are truly more pronounced than what would be expected genome-wide.

      Reviewer #3 (Public review):

      In this manuscript, the authors use HiC to study the 3D genome of CD14+ CD16+ monocytes from the blood of healthy and those from patients with Alcohol-associated Hepatitis.

      Overall, the authors perform a cursory analysis of the HiC data and conclude that there are a large number of changes in 3D genome architecture between healthy and AH patient monocytes. They highlight some specific examples that are linked to changes in gene expression. The analysis is of such a preliminary nature that I would usually expect to see the data from all figures in just one or two figures.

      In addition, I have a number of concerns regarding the experimental design and the depth of the analyses performed that I think must be addressed.

      (1) There is a myriad of literature that describes the existence of cell type-specific 3D genome architecture. In this manuscript, there is an assumption by the authors that the CD14+ CD16+ monocytes represent the same population from both healthy and diseased patients. Therefore, the authors conclude that the differences they see in the HiC data are due to disease-related changes in the equivalent cell types. However, I am concerned that the AH patient monocytes may have differentiated due to their environment so that they are in fact akin to a different cell type and the 3D genome changes they describe reflect this. This is supported by published articles for example: Dhanda et al., Intermediate Monocytes in Acute Alcoholic Hepatitis Are Functionally Activated and Induce IL-17 Expression in CD4+ T Cells. J Immunol (2019) 203 (12): 3190-3198, in which they show an increased frequency of CD14+ CD16+ intermediate monocytes in AH patients that are functionally distinct.

      I suggest that if the authors would like to study the specific effects of AH on 3D genome architecture then they should carefully FACsort the equivalent monocyte populations from the healthy and AH patients.

      (2) The analysis of the HiC data is quite preliminary. In the 3D genome field, it is usual to report the different scales of genome architecture, for example, compartments, topologically associated domains (TADs), and loops. I think that reporting this information and how it changes in AH patients in the appropriate cell types would be of great interest to the field.

      We thank the reviewers for their careful and thorough examination of our manuscript. We agree with all of their comments regarding the limitations of the study. Many of the criticisms focus on the small sample size of our study (n=4 for healthy controls and disease patients) in both Hi-C and single-cell RNA-seq experiments, and that these experiments are unpaired, or in other words, PBMCs came from different patients for each experiment.

      Unfortunately, these experiments are fairly complicated to perform, requiring patient cells and very expensive deep sequencing. We are not currently in a position to be able to easily or cost effectively increase sample size. In the case of Hi-C, we still believe our study to be of value as Hi-C is not a commonly used technique to study disease effects on chromatin, and very few studies have employed a large enough sample size to perform statistical comparisons. Additionally, to analyze the data at a higher resolution would require deeper sequencing, and unfortunately we do not have the resources to sequence these libraries deeper. Regarding the single-cell RNA-seq data, this dataset was generated for an earlier study [1] focusing on gene expression responses to LPS, and we were unable to get PBMCs from exactly the same patients to perform the Hi-C study.

      We disagree that our study has limited scientific value. Our study is the first to use Hi-C to show that the 3D genome architecture of primary monocytes is changed in a disease context. The only other study to follow a similar approach performed Hi-C in monocytes from 2 healthy and 2 Systemic lupus erythematosus (SLE) patients, and in their study the data from both patients were combined prior to comparison. No statistics were performed and their conclusion was no differences in genome architecture due to disease. They did find differences between primary monocytes and the THP1 monocytic cell line, but this lacked statistical analysis. Their conclusion was that inflammatory disease may not lead to genome wide changes in architecture. Our study, though a very different disease than SLE, shows statistically significant differences between AH and healthy controls. We believe our study lays the groundwork for how Hi-C can be used to study genome architecture in human disease, and the possible downstream effects.

      Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.

      This is an interesting suggestion. This dataset only contains 4 AH patients, which we have included basic clinical data in Supplemental Table 1, including Age, HCA1c, Bilirubin, AST, ALT, Creatinine, Albumin, and MELD score. 3/4 of these patients are severe AH while 1 is moderate (AH2). Despite one patient being moderate, all four AH patients had similar correlations with each other, suggesting these disease specific differences we observed are not indicative of severity. More patient samples are needed to determine if genome architecture changes throughout disease progression. We have added this important discussion to the manuscript (page 12, lines 5-14).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The criteria used to determine which pairs of regions exhibit significant differences in contact frequency between alcohol-associated hepatitis (AH) and healthy controls (HC) are not disclosed. It would be beneficial for the authors to provide this information, including details such as the number of pairs tested, the nature of the statistical tests conducted, the method of multiple testing correction applied, as well as the significance thresholds used, and the number of loci-pairs below these thresholds for each chromosome. This information would greatly enhance the reader's understanding of the relevance of the reported findings.

      Thank you for this comment, though we are not sure we totally understand. All of our statistics were performed using multiHiCcompare [2], where we input all 8 datasets (.hic files from Juicer), then measured statistical differences between defined groups (HC vs AH). For our randomization studies, we randomized the group comparisons, so each group contained a mix of HC and AH.

      Second, a formal statistical definition of what constitutes a hotspot would be valuable for clarity.

      Thank you for this suggestion. Initially, hotspots were defined as just regions of the genome with a high frequency of very significant differential contacts. We have defined a more formal definition of “hotspot” based on similar criteria. A hotspot is defined by both adjusted p value and frequency of locations. First, we filtered all pair-wise chromosomal interactions by a very, very stringent padj < 0.0000001 to focus on only the most changed coordinates (Supplemental Table 4). Then we looked for regions of the genome with a high frequency of these differential locations. Borders for each hotspot were determined more liberally by looking at the full list of differential spots (padj < 0.05). Then we used code to list genes within each interacting region. We have added these important details to the Methods (page 14, lines 11-14).

      Third, a clear definition of the criteria used to identify different topologically associated domains (if these were indeed defined in the data and/or utilized in the analyses) would also be a helpful addition.

      Thank you for this suggestion, we did not identify TADs or really utilize TADs in any of these analyses.

      Likewise, several statements throughout the paper lack support from specific analyses, although it should be feasible to implement such analyses (or at least present them if they have already been conducted) to substantiate these claims:

      If randomizing samples does not result in significant differences between (randomized) cohorts, it would be beneficial to provide insights into the number of loci pairs that exhibit differences in frequency when using both the actual and randomized cohorts.

      Thank you for asking this question, as this is an important point. Using multiHiCcompare, if we compare WT (n=4) to AH (n=4), we get the results in the figures and supplementary data but if we randomize Group 1 (WT, WT, AH, AH) vs Group 2 (WT, WT, AH, AH), we get almost 0 significant changes in contact frequency. To show this more robustly, we performed 5 randomized comparisons and found far fewer changes in contact frequency between groups. This shows that these changes in contact frequency caused by disease are not random, but rather due to our real difference in AH. This point has been added to the Results (page 6, lines 15-17), and Methods (page 14, lines 16-21)

      If most changes in contact frequency occur locally, it would be useful to visualize the relationship between effect sizes and/or significance levels for the observed differences in frequency in relation to the distance between the involved loci. Additionally, comparing these results to the average baseline contact intensities as a function of distance would be informative. This comparison could help determine whether the distance decay in effect size/significance for the differences between AH and HC is faster or slower than the decay rates for baseline contact frequencies.

      This is a good suggestion. In our initial analysis, we made a number of figures relating chromosome positions, distance between loci, and statistics regarding the differential contact frequency. In the initial submission, we only showed Figure 3, which shows the logFC (log fold change) for the differential contact frequency by chromosomal position on both sides. To address this question, we have added a supplemental figure showing logFC as a function of the distance between two loci (new Supplemental Figure 3)

      Similarly, the assertion that most changes do not affect the structure of topologically associated domains (TADs) but occur either within or between TADs should be supported by specific testing; otherwise, or else, removed.

      Thank you, yes we have adjusted the language in the Discussion

      Furthermore, the authors should clarify whether differences in chromatin conformation are more pronounced around immune genes compared to genome-wide expectations. If this is not the case, it would be helpful to quantify the intensity of these differences around the highlighted genes in relation to the rest of the genome. To achieve this, I would suggest the following:

      Conduct enrichment analyses on the genes located within the most prominent hotspots to determine whether they are significantly enriched in immune genes (and, or, alternatively, in any other functional category).

      Estimate the average absolute fold change in contact frequency within all topologically associated domains (TADs) identified in the study. This would allow for the identification of immune gene-containing TADs highlighted in Figures 5-8, providing readers with a quantitative understanding of how anomalously different these genomic regions are with regards to the magnitude of its alterations in AH, compared to the rest of the genome.

      While some of the selected gene clusters appear to co-localize well with topologically associated domains (e.g., Figures 5A, 8A), others seemingly encompass either multiple TADs (Figure 6) or only portions of them (Figure 7). This should be clarified.

      Thank you, this is a great suggestion. In order to be as unbiased as possible, we took all genes present in the regions with the highest significant changes in genome (Supplemental Table 4) that we used to identify the hotspots. And you are correct, we do in fact see enrichment of genes involved in innate immune signaling. This has been added to Results (page 7, lines 19-25) and Figure 4.

      Finally, there are several minor issues concerning the figures that could be easily addressed to substantially enhance their readability:

      Font sizes in most figures should be increased, particularly for some axis labels and tick marks. This issue affects most figures; for instance, in Figure 4, it hinders the reader's ability to interpret the ranges of the data presented.

      Thank you, the figures have been adjusted

      Figures 5 to 8 (panels A and B) would benefit significantly from a more consistent format. Specifically, the gene cluster boxes should also be included in the right panels, and the gene locations should be displayed on the left in a uniform format across all figures (e.g., formatting Figures 7 and 8 to match the style of Figures 5 and 6).

      Figures 5 and 6 have a similar structure to each other because we were focusing on all of the genes in that chromosomal region. Figures 7 and 8 are different because we are focusing on how the region around a certain hotspot of interest changes.

      It is also important to note that the genes plotted in Figures 8C and 8D are not the same. Concerning these two panels, it would be valuable to clarify whether the data presented pertains exclusively to monocytes. If so, information regarding the number of cells analyzed and the number of donors from which they were drawn would also be beneficial.

      These figures are generated using scRNA-seq data. They represent all of the genes expressed in that region of the genome, in their chromosomal position. If a gene is not expressed in the scRNA-seq data, then it is not shown. I have debated with myself a lot on how to show gene expression in a region of the genome, but I think this is the clearest way to show this; including the genes that have no expression would make it more confusing. But yes, if you compare HC and AH, you see some differences in the list of genes. We have added more clarity to the figure legend for this figure.

      References

      (1) Kim, A., Bellar, A., McMullen, M. R., Li, X. & Nagy, L. E. Functionally Diverse Inflammatory Responses in Peripheral and Liver Monocytes in Alcohol-Associated Hepatitis. Hepatol Commun 4, 1459-1476 (2020). https://doi.org:10.1002/hep4.1563

      (2) Stansfield, J. C., Cresswell, K. G. & Dozmorov, M. G. multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments. Bioinformatics 35, 2916-2923 (2019). https://doi.org:10.1093/bioinformatics/btz048

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Al Asafen and colleagues apply a set of scanning fluorescence correlation spectroscopic approaches (Raster Image Correlation Spectroscopy (RICS), cross-correlation RICS, and pair-correlation function spectroscopy) to address the nuclear-cytoplasmic kinetics of the Dorsal (Dl) transcription factor in early Drosophila embryos. The Toll/Dl system has long been appreciated to establish dorsal-ventral polarity of the embryo through Tolldependent control of Dl nuclear localization, and provides an example of a morphogen gradient produced with high enough precision to yield robust biophysical measurements of general transcription factor activity and function. By measuring GFP-tagged Dl protein, either in wild-type embryos or in mutant embryos with low/medium/high levels of Toll signaling, the authors report diffusivity of Dl in nuclear and cytoplasmic compartments of the embryo, as well as the fraction of mobile and immobile Dl, which can be correlated with DNA binding through cross-correlation RICS. A model is presented where Cactus/IkB is implicated in preventing Dl from binding to DNA.

      Strengths:

      The experiments on wild-type GFP-tagged Dorsal are performed well, are mostly reported well, and are interpreted fairly.

      Weaknesses:

      The discrepancy between experiment and theory as pertains to Michaelis-Menten kinetics is not fully motivated in the text, and could benefit from a more clear presentation. The experiments performed to distinguish between the contribution of Toll-dependent phosphorylation and Cactus interaction models for limiting Dorsal DNA binding are possibly confounded by the presence of wild-type, GFP-tagged Dorsal protein.

      Thank you for your thoughtful feedback. Regarding the discrepancy between experiment and theory in relation to Michaelis-Menten kinetics, we recognize that our initial explanation may not have been explicit enough. Our intent was to illustrate that if DNA binding is a saturable process, then while the absolute concentration of Dl bound to DNA will increase with total Dl levels, the fraction of Dl bound to DNA will decrease. We used Michaelis-Menten kinetics only as a familiar example to convey this concept but did not intend to suggest that the system strictly follows Michaelis-Menten behavior. To clarify this point, we removed mention of Michaelis-Menten as an illustrative analogy and stuck specifically with discussing the system as “saturating.” This primarily affected text in the paragraph starting on Line 204, but also Lines 323-325.

      Regarding the concern about potential confounding effects due to the presence of wildtype GFP-tagged Dorsal (Dl[wt]-GFP): we understand the importance of addressing this point more directly. Therefore, we have imaged the Dorsal-GFP gradient in embryos expressing the UAS-dl[S280P]-GFP or the UAS-dl[S317A]-GFP constructs in the absence of the BAC-recombineered Dl-GFP construct. In both cases, the dl mutants by themselves were not able to recapitulate enough of the Dl gradient to test our hypotheses. We have added this analysis to Supplemental Figure 4 and mentioned this figure on Lines 333-336 and 354-358. Furthermore, we explicitly mention that it is possible the reason why we failed to reject the null hypothesis in the Toll phosphorylation mutant case may be due to the additional copy of Dl[wt]-GFP (the BAC recombineered construct), with text added to Lines 343-345, 365-369 (Results) and 408-418 (Discussion).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Al Asafen, Clark et al., use fluorescence correlation spectroscopy (FCS) to quantitatively analyze the mobility of Dl along the DV axis of the early Drosophila embryo. Dl is essential for dorsal-ventral (DV) patterning and its gradient initiates the activation of several genes and thereby orchestrates the formation of the Drosophila body plan. While the mechanisms underlying the formation of the Dl gradient have been extensively studied by this group and others, there are some observations for which there is not yet a mechanistic explanation. For example, the peak of the Dl gradient grows continuously during nuclear cycles 10-14. This is likely due to Cact-dependent Dl diffusion and Dl binding to DNA. However, the biophysical parameters governing Dl nuclear dynamics that would support these claims have not been previously measured. In this work, the authors provide evidence that GFP-tagged Dl may be separated into a mobile pool and an immobile pool. Interestingly, the fraction of immobile Dl is position-dependent along the DV axis, revealing more binding to DNA in the ventral than in the dorsal nuclei. This is either due to higher binding affinity in ventral locations (due to Toll-dependent Dl phosphorylation) or to higher Dl-Cact binding in dorsal nuclei that would prevent Dl from binding to DNA. Using dl-mutant alleles, the authors support the latter hypothesis.

      Strengths:

      The manuscript is well written and their conclusions are convincingly supported by their methodology and analysis. As a quantitative study, the biophysical analysis seems rigorous, in general.

      Although this is not the first study that employs FSC to investigate the dynamics of a morphogen, it further exemplifies how these quantitative tools can be used to uncover mechanistic aspects of morphogen dynamics during development. In particular, the manuscript reports novel biophysical parameters of Dl dynamics that will be helpful in future hypotheses-driven modeling studies.

      Weaknesses:

      In my opinion, the main weakness of the manuscript is that the main biological implication of the study, namely that the asymmetry in the fraction of immobile Dl is a result of nuclear Dl-Cact binding which prevents Dl from binding DNA (Figure 5), occurs in a region of the embryo where there is very little Dl anyways (Figure 1A, 5A). While it is interesting that the fraction of immobile Dl increases (just a little, but significantly) in dorsal nuclei in mutants expressing a form of Dl with reduced Cact binding it is unclear what is the biological impact of this effect in a location where Dl is nearly absent. As can be seen in Figure 3F, the fraction of immobile is unaffected in Dl-mutant forms with reduced DNA binding, because it is already very low. It is unlikely that Dl binding to Cact in dorsal nuclei would affect shuttling as well since the fraction is very low anyway.

      We thank the reviewer for pointing out the places where we could strengthen our explanations. Here we first address the criticism, also raised by the other reviewer, that the fraction of immobile Dl increases only a small amount (Fig. 5A). [In our reply to the next comment, we address the question of biological implications.] We attempted to explain this small effect size in the manuscript; however, we understand that we could clarify further and, given the fact that eLife has no restraints on space, we added more explanation in the main text.

      In essence, even though the effect was statistically significant, the effect size was small because the mutation was “diluted” by the presence of a wildtype Dl protein tagged with GFP. We were willing to deal with this dilution because the alternative was that, according to previous literature, without any wildtype Dl, no Dl gradient would be present in the reduced Toll phosphorylation mutants, and only a very weak Dl gradient (weakened on both ends) would be present in mutants that reduced Cact binding. We were confident that, with our quantitative approaches, we would be able to detect the diluted effect.

      However, because both reviewers have criticized this diluted effect, in this resubmission, we have included analysis of GFP-tagged mutants without the presence of wildtype Dl protein. Unfortunately, these embryos lack a discernible Dl gradient and cannot be analyzed in such a way as to test the hypotheses that the mutants were generated for.

      Even so, the effect of the Cact-binding mutant was strong enough that we were able to statistically distinguish it from embryos expressing only wildtype Dl-GFP, even with the dilution effect. On the other hand we have also included a caveat that our failure to statistically distinguish Toll phosphorylation mutants from wildtype may be due to the dilution effect. We now also explicitly state the concerns about a lack of a discernible Dl gradient and have included figures of full mutants in the supplement. See also our discussion of Reviewer 1’s similar comment.

      While the authors have a very clear understanding of the biology of the Dl gradient, I feel that the manuscript is more written as a 'tools' paper (i.e., to exemplify how FSC methods and analysis can be used for biological discovery). This is ok, but I think that the authors should discuss further what are the biological implications of these findings other than the contribution to uncovering the biophysical parameters.

      Here we underscore the biological implications of our discovery that Cact is present in the nucleus on the dorsal side. The reviewer mentioned that Cact in the nucleus on the dorsal side appears to have little overall effect, because this is the location of the embryo where there is very little Dl in the first place, which raises the question of whether this discovery is impactful.

      While we previously used the final paragraph of the discussion to touch on the implications of this discovery, we acknowledge that we could have spent more time on the explanation. As such, we have expanded this final paragraph into two paragraphs. In the first of the two, we discuss in more detail the implications specifically of the Dl/Cact interactions in the dorsal-most nuclei, as understood by the results of this paper. In brief, knowing that Dl in the dorsal-most nuclei is bound by Cact results in an updated understanding of the Dl gradient, with increased dynamic range, robustness, and precision (but unknown shape).

      In the second of the two paragraphs, we discuss this result in light of our recent work on imaging Cact in live embryos, in which we have shown that Cact is present in all nuclei at roughly uniform levels. Taken together, we suggest that it is possible that Cact is bound to Dl in all nuclei (not just the dorsal-most), which would allow us to estimate the shape of the overall Dl gradient by subtracting off the fluorescence that stems from Dl/Cact complex.

      For example, I think that the implications of the rejected hypothesis (i.e., that Tolldependent Dl phosphorylation does not seem to have an impact on Dl binding affinities to DNA) are important and should be further discussed (even if no additional experiments are performed). What is then the role of Dl phosphorylation? Perhaps it could have an impact on patterning robustness in lateral regions. The authors should report in Figure 5 also what happens to the fraction of Dl bound to DNA in lateral regions in the reduced Cact binding and reduced Toll phosphorylation mutants.

      We appreciate the reviewer’s suggestion that the rejection of the hypothesis that phosphorylation of Dl by Toll impacts Dl/DNA binding could be expanded upon further. For the role of Dl phosphorylation by Toll: we previously mentioned that this phosphorylation is known to enhance the nuclear import or retention of Dl, and that mutation of serine 317 to an alanine abolishes Toll-mediated phosphorylation of Dl, which results in embryos with no Dl gradient. We had also mentioned that phosphorylation of Dl is not known to affect its DNA binding, which is the hypothesis we sought to test by creating the dl[S317A]-GFP mutants. We did not image any mutants, or the UAS-dl[wt]-GFP control, in the lateral regions, for two reasons. First, this region is easily the smallest of the three regions, in terms of the percentage of the DV axis (see Fig. 1A). Second, because of the dilution effect, we knew the effect size would be small, and as such, we imaged only on the extreme ends of the gradient so that the most clear conclusion could be drawn about the effect that Toll phosphorylation might have on DNA binding of Dl.

      The way that position along the DV axis is reported using the nuclear-cytoplasmic-ratio (NCR) in Figures 1-3 is not incorrect, but I wonder if it is the best way of doing it. The reason is that it spreads out a relatively small region of the embryo (the ventral-most locations) and shrinks a relatively large region of the embryo (lateral and dorsal regions), see Figure 1A. Perhaps reporting the NCR in log_2 units would be more appropriate.

      We agree that there is some distortion of the relative spatial extents of the Dorsal gradient when NCR is used as an independent variable on a plot. However, we prefer the NCR on the horizontal axis because it is closer the functional variable (Dl concentration, rather than spatial location) for the properties we studied.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I really enjoyed the first part of this paper and have only minor suggestions for improvement of the presentation. I am confused about the experimental approach for the final figure, distinguishing phosphorylation and cactus-dependent effects. I'll divide my comments between "First Part/General Suggestions", "Last Part", and finish with some minor typo observations.

      The gist of the issues with the last part of the paper could boil down to insufficient detail/explanation of the section. The discrepancy with expectation with Michaelis-Menten kinetics is presented in a total of three sentences and is not necessarily obvious to the general readership of eLife. The mutants chosen to distinguish the phosphorylation and cactus mechanisms could be described more (why these? aren't other residues phosphorylated?) and possibly why also having wild-type GFP-Dl in the measurements isn't confounding. Since there is unlimited space in this journal, it may be advisable to use this space to fill out these rationales and ideas.

      First part/General Suggestions:

      (1) For the RICS data, (Figures 1 and 2) there is a nice correlation between WT NC ratio and the selected low/med/hi Dl activity mutants. More-or-less the median values in, say, Figure 1E-G are reflected in Figure 1H. However, with the ccRICS data (Figure 3), it looks like there is less correspondence between the range of fraction bound estimates in, for instance, "ventral" in Figure 3D and '10b' in Figure 3E. Can the authors comment on this? Should the reader be able to make this kind of comparison, or does something about data collection for the wt/NCR measurements preclude direct comparison of magnitudes with the panel of mutants? (imaging setup, laser power, etc)?

      The reviewer is correct that there seems to be a discrepancy in the values of ψ between the wt embryos (ventral side) and the Toll10B embryos. It should be noted that the Toll10B embryos are not “ventral-like” in every way, in part because they have unknown activated Toll levels that might be above or below what is seen at the ventral midline in wildtype embryos, and in part because there is no DV gradient, and thus no shuttling in these embryos that would accumulate total Dorsal on the ventral midline. As such, comparisons between Toll10B embryos and the ventral side of wildtype embryos are not exactly one-toone, and we are more confident in comparing among the mutants in an allelic series. To address this question, we have added a sentence to the end of the second paragraph of the “Dorsal/DNA binding exhibits a spatial gradient” subsection of the Results (Lines 233235).

      (2) Materials and methods: Mounting and imaging of Drosophila embryos: the authors cite the "488 nm laser intensity ranged from 0.5% to 3.0%..." The values presented here are not useful for the general reader or an individual looking to replicate these conditions, as emission power produced from such values will vary from instrument to instrument. It is standard in these cases to report an estimated laser power (measured in watts) for each laser line, and a clear description of how such measurements were made (stationary beam, under scanning conditions, with what detector, etc). These measurements are valuable and the authors are strongly encouraged to report such measurements for their setup.

      We appreciate the reviewer’s suggestion and understand the importance of providing absolute laser power values for reproducibility. We have now included the laser power (in watts) for the laser lines on both microscopes used in this study. The revised text can be found in the Materials and Methods section, in the Lines 535-536 and 540.

      (3) The presentation of the data in Figure 4 is difficult to understand. Are the kymographs (A lower) representing the entire length of the big white arrow in A upper? Or do the dashed lines indicate the x-axis limits of the kymograph? It is difficult to tell from the figure legend, where the dashed lines are described as "areas where Dl-GFP movement is measured out of the nucleus." I believe that the authors can make these measurements and that Figure 4B reflects properties of "movement" of Dl out of the nucleus, but how they get there from these data is not clear to this reader. Perhaps a cartoon explaining the green lines and the orange lines in the kymograph or tightening the legend would help.

      We thank the reviewer for their feedback and understand the need for greater clarity in the text of the pCF section and in Figure 4. The widths of the kymographs in the lower panels correspond to the full widths of the images in the upper panels. The pCF measurements were taken at the y-coordinates at the level of the white arrows. The dashed vertical lines connecting the upper and lower panels illustrate two cases of locations along the x-axis of the image where Dl is crossing from inside a nucleus to outside. In the two illustrated cases, these crossings are accompanied by either zero Dl molecules being observed to cross the nuclear barrier (ventral image/kymograph on left) or delayed crossing of Dl molecules (dorsal image/kymograph on right). To address this concern, we have added more detail to the Fig. 4 legend and greatly expanded on a discussion of what pCF does in the text (the second and third paragraph of the section). We have also updated Fig. 4 to align with new explanations from the text: namely, describing the y-axis of the kymographs as Δt (instead of log(time)) and explicitly showing that the pair correlation is for pairs of pixels that are Δx = 6 pixels apart. Further details were also added to the relevant Methods section.

      (4) DV position in the wild-type imaging experiments is operationally determined through measurement of the Dorsal NC ratio. This makes sense, but the strategy is buried in the first paragraph of the results, and not discussed in the M & M. For readers unfamiliar with imaging the fly embryo or the nuances of the Dl gradient, perhaps a sentence or two explaining that embryos were oriented randomly along the DV axis, and DV positions of the imaging region were estimated by measuring the Dl NC ratio.

      We thank the reviewer for this helpful suggestion. To improve clarity, we have added a description of how DV position was determined to the Materials & Methods section (paragraph starting on Line 520). Specifically, we now state that embryos were randomly oriented along the DV axis and that we used the Dorsal NC ratio of intensity as a proxy for measuring the DV position in imaging experiments. Additionally, we have added a statement to the Results section to ensure that this strategy is more clearly introduced (Lines 143-144). We appreciate this recommendation, as it will help readers unfamiliar with fly embryo imaging better understand our approach.

      (5) It would be nice to report the corresponding NC-ratio values for Dl in each of the mutant conditions, perhaps as a supplement to Figure 1. Currently, Figure 1H relies on the (admittedly well-established) properties of the three mutants, but it feels that an additional nice quantitative link in the data can be drawn out here. Do the authors see the strict correlation between the wt and mutant diffusivity measurements at specific NC-ratios?

      We are hesitant to try to draw direct comparisons between the mutants and the behavior of the wildtype embryo at the corresponding NCR. This is because, in the context of these uniform mutants, the NCR is determined by a combination of at least three factors that we cannot measure or control for: the unknown strength of Toll signaling, the unknown capacity of Toll signaling (ie, the potential saturation of the cytoplasmic enzymes controlled by Toll signaling), and, most importantly, the lack of a shuttling mechanism that concentrates Dl on the ventral side of the embryo. As such, the NCR does not represent a continuous variable that transforms the behavior of one mutant into another (or from mutants into wt DV coordinates), as it does along the DV axis in wildtype embryo. This is why the mutant studies are presented as boxplots. At best, we were comfortable only in using the uniform mutants as an allelic series to produce gross trends. We have added a brief statement describing the shuttling caveat to the Results section (Lines 173-177).

      (6) In the section related to Dl nuclear export, the language used to describe Dl kinetics is ambiguous. The term "movement" is used seemingly as a catch-all for nuclear-importexport as distinguished from diffusion. However, diffusion is also a form of movement. Could this section be reworked to explicitly distinguish nuclear import-export and diffusive movements?

      We appreciate the reviewer’s suggestion and agree that the language used to describe Dl kinetics could be more precise. By way of explanation, the pCF analysis calculates the time scale on which Dl can exit the nucleus. pCF only gives a signal if it sees the same Dl molecule twice, at two different locations after some Δt amount of time has passed. Because of this, if a given Dl molecule in a ventral nucleus is being tracked, then that molecule has some probability that it is bound to DNA initially, which means it will take, on average, longer to exit the nucleus than a Dl molecule not initially bound to DNA. Therefore, on the ventral side, the time scale on which Dl exits the nucleus is longer than on the dorsal side (where DNA binding is not happening). This can be true even if the nuclear export rate constants are the same on the ventral side vs the dorsal side. As such, we were careful to choose language that did not imply that we were talking about a nuclear export rate constant. We have added this discussion to the end of the relevant Results section (Lines 308-315).

      We have also revised this section to explicitly distinguish between the mobility associated with exiting the nucleus and diffusive movement, while still trying to distinguish between the time scale of exiting the nucleus vs the nuclear export rate. Specifically, we now refer to ‘time scale of nuclear export’ when discussing transport across the nuclear envelope and reserve the term ‘diffusion’ for passive intracellular movement. Furthermore, we have edited a sentence in this section (Lines 291-293) to describe the distinction we are making between the time scale measured by pCF and the time scale commonly associated with nuclear export (that is, the reciprocal of the rate constant). We hope this clarification improves readability and conceptual clarity.

      Last Part:

      (1) There is an undersold argument centered on Michaelis-Menten kinetics that needs to be explicitly presented, especially since it motivates the final experiments of the paper, which are challenging. In the two sections describing how the data do not adhere to expectations based on Michaelis-Menten Kinetics, the assertion that "the fraction of immoble Dl is expected to decrease with increasing nuclear total Dl concentration" is only intuitively true if the system is saturated. Is the system demonstrably saturated? Another interpretation of this would be that these results demonstrate that the system is likely not saturated. In any case, the authors need to devote some space in the introduction and/or results and/or discussion to fully motivate this point.

      We agree that the reviewer has raised an important point: if the system is very far from saturation, then the fraction of immobile Dl is not expected to decrease with increasing nuclear total Dl concentration. But neither would it increase; it would instead stay flat. To correct this mistake, we have edited the sentences in question to acknowledge the farfrom-saturation scenario, saying “at best, [the fraction bound] remain[s] constant” (Line 209). As such, our original point, which is that in no case would the fraction immobile increase [unless something else is going on besides affinity-based binding to DNA], it still valid.

      (2) Wouldn't any argument on the basis of Michaelis-Menten need to rely on the assumption that the system is at steady-state? Reeves 2012 concludes that during the times measured here, Dl does not reach a steady state. It would be good, in the context of the point above, for the authors to clarify how this impacts the expectations of saturation and the application of M/M kinetics.

      We thank the reviewer for raising this important point. We apologize for not being clear on our points about M/M kinetics and would like to stress again that we are not claiming the system is has M/M kinetics. We appealed to M/M kinetics only as a simple, intuitive example of a saturating system to point out the difference between bound concentration vs bound fraction as functions of total concentration. We did this because previous feedback on our manuscript suggested that the difference between these two variables needed to be made clearer. Because this point seemed controversial with both reviewers, we removed all mention of M/M kinetics and simply refer to the system as “saturating.” For further explanation, see the first paragraph of our response to Reviewer 1’s “weaknesses” in the public review.

      (3) It is not clear to me how the inclusion of wild-type, GFP-tagged dorsal in the experimental setup for Figure 5 is not confounding. For the S317 (phospho-) mutant, GFPtagged alleles of both phospho- and wild-type Dl are expressed. The reasoning is that not enough phospho-mutant Dl gets into the nucleus, and this makes it difficult to distinguish the dorsal from the ventral side of the embryo, so in a dl mutant background, there is expression of wt GFP-dl from a BAC, and nos>Gal4 driven expression of a GFP-tagged S317A mutant dl. The measurements show that on the ventral side of the embryo, there is no difference in the fraction of bound Dl. Couldn't this be predominantly binding of wildtype GFP-Dl? How is this interpretable? Wouldn't it be easier to perform these measurements in a Tl 10b background (or to cross in UAS>Tl[10b]) and for the only GFPtagged dl to be S317A? The same goes for the S234 mutant (could be done in the pelle mutant background).

      We thank the reviewer for raising the point that the confounding effect of wildtype Dl makes it difficult to interpret the results from the 317A mutant. Under the circumstances of the experimental design, we can best conclude that, if the null hypothesis is incorrect, the effect size was too small to detect with our sample size. As such, we have modified our discussion of the results of this experiment to carefully explain this caveat (rather than confidently saying that Toll phosphorylation has no effect). For further explanation, see the second paragraph of our response to Reviewer 1’s “weaknesses” in the public review, as well as our response to the related question raised by Reviewer 2 in the public review.

      Minor issues/typo stuff:

      (1) This reviewer notes that the submitted materials contain neither line numbers nor page numbers.

      We appreciate the reviewer’s feedback. We have now included line numbers and page numbers in the revised manuscript for easier reference.

      (2) First paragraph of results: "We imaged small regions of the embryo..." The parenthetical statement only cites pixel size and directs the reader to the methods. Without the total number of pixels, the pixel size value does not clarify how "small" the imaged region is. Consider including the xy area, pixel dimensions, and pixel size here to assert the smallness of the imaged area.

      We have added the requested information.

      (3) Second paragraph, Introduction: "Dorsal, one of three (Drosophila) homologs to mammalian NF-kB" (Add Drosophila). Also, aren't these orthologs?

      We have made these changes.

      (4) Last sentence of last paragraph in the introduction: Kind of a throw-away sentence. Consider revising.

      We thank the reviewer for making this point; the sentence was originally constructed to state that our quantitative measurements resulted in a biologically significant discovery. However, because Reviewer 2 also mentioned the question of biological significance, we have changed this final sentence to explicitly mention of what the biological significance is: namely, an understanding of the Dl gradient that has superior dynamic range, spatial range, robustness, and precision.

      (5) Where is the median line in the S317A boxplot in Fig 5C?

      The median line is at ψ = 0. We have added an explanation of this to the Figure legend.

      (6) Materials & Methods: Fly transformation, typo: Drosophila embryos were injected with 0.5 µl of each pUAST construct..." The volume of an entire Drosophila embryo is less than 0.5 µl, please revise the units to reflect the value injected. Most likely an absolute volume unit was stated when rather a concentration of an injection solution, delivered at significantly smaller volumes was intended.

      We thank the reviewer for catching this typo. It was intended to indicate a concentration of 0.5 ng/μL, and we have made the appropriate changes.

      Reviewer #2 (Recommendations for the authors):

      (1) Perhaps this has been described in a prior publication (if this is the case, please simply state this somewhere in the Methods section where Dl-GFP embryos are described), but since Dl-GFP embryos have one copy of endogenous dl and one copy of Dl-GFP, how do potential differences in tagged vs. non-tagged Dl interactions with DNA or Cact affect their findings?

      The reviewer brings up a good point, and we acknowledge that any time a protein is tagged with GFP, the behavior of the protein may be affected. We have now explicitly added this caveat to our discussion in a new paragraph on Lines 420-429.

      (2) In the Discussion section, the authors argue that a major implication of their findings is the possibility that Cact binds Dl in the nuclei would imply that the true (active) Dl gradient may be unknown unless the unbounded Dl is separated from the Dl/Cact (inactive form). While this is an interesting point, this idea is not supported by the findings of Figure 5B where there is no effect in the fraction of Dl bound to DNA in the reduced Cactus binding mutants. The authors should report what happens in lateral regions in Figure 5 because perhaps there is an effect there (see comment on this in the Public Review).

      We thank the reviewer for the insight, as we did not directly discuss the implications of the middle column of Fig. 5B on our hypothesis. Indeed, our hypothesis is not supported by Fig. 5B; it is instead inconclusive (failure to reject H0). This is why we designed the second experiment (Fig. 5C) to test the Cactus hypothesis, because the effect size would be greater on the dorsal side.

      Furthermore, as pointed out by both reviewers, the presence of wildtype Dl-GFP in these experiments is confounding. We have discussed this elsewhere in our rebuttal, but briefly, this problem resulted in needing larger effect sizes to detect a statistically significant difference between wt and the mutant populations. This was a necessary evil that we were willing to deal with in order to ensure the Dl gradient could be established so that the dorsal vs ventral sides would be distinguishable. We have added a fuller discussion of these issues to the relevant Results section (Lines 333-336, 343-345, 354-359, 365-369) and also the Discussion section (Lines 412-418), including underscoring the fact that, from a falsification standpoint, the results in Fig. 5B do not allow us to reject either null hypothesis, possibly due to the confounding effect of wildtype Dl. We appreciate the reviewer’s point about this, and believe the changes suggested by the reviewer have improved the manuscript.

      On the other hand, we respectfully disagree with the reviewer that investigating either mutant in the lateral regions of the embryo would bear fruit. To the first approximation, it would be the average between the behaviors on the ventral vs. dorsal sides. For the S317A mutant, neither the ventral nor the dorsal side was conclusive in regards to our hypotheses. (Although we admit here that further investigation into why the S317A column in Fig. 5C was statistically different from wildtype, in the opposite direction from the S234P mutant, may be interesting in future work.) For the S234P mutant, the data were more conclusive on the side of the embryo where the effect size was expected to be large enough to detect a difference. In the lateral regions, the expectation would be that the effect size would be intermediate, which would make the interpretation of the results more difficult (i.e., more likely to be inconclusive). In contrast, as Fig. 5C is already conclusive, we are not confident there would be more information gained by imaging the lateral regions.

      (3) Is Figure 5A a wild-type embryo? If so, I think that the labels are misleading or unclear. Also, is it the same image as in Figure 1A? If so, I suggest replacing this with a schematic since it does not add any new data.

      We have eliminated the labels for the mutants and have added the following comment to the figure 5 legend “Same embryo as in Fig. 1A”.

      (4) Also in Figure 5, I suggest using labels to indicate the schematics instead of simply using their location. You could use 5A', 5A' and 5A', for example.

      We have made the suggested changes.

      (5) The use of some technical labels makes some figures difficult to read. I suggest using more simple labels for mutants in Figure 3F (replace R063C) or Figure 5B, C (replace S234P and S317A).

      We have made changes to Fig. 3F, Fig. 5B,C, and the corresponding places in the figure legends. We have labeled R063C as ↓DNA, S317A as ↓Toll, and S234P as ↓Cact.

      (6) I suggest reporting p-values consistently. For example, in Figure 4B, they use one or two asterisks to denote p-values less than 0.07 and 0.05, respectively, which is somehow arbitrary and unconventional. Why not report the actual values as in Figure 5C, for example? (By the way, I would report in Figure 5B the actual p-values as well, since a nonsignificant value is also reported in Figure 5C. Also in Figure 5C, report values in the same notation (decimal or scientific), i.e., either put 0.005 as 5x10^-3 or 10^-3 as 0.001).

      We have made the suggested changes.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chen, Tu, and Lu focused on how brain-wide dopamine release dynamically changes during sleep/wake state transitions. Using multi-site fiber photometry to monitor DA release, alongside simultaneous EEG and EMG recordings, the authors show distinct DA dynamics during transitions from NREM to WAKE, REM to WAKE, WAKE to NREM, and NREM to REM. Next, they analyze temporal coordination between regions using cross-correlation analysis. Finally, chemogenetic activation of VTA or DRN but not SNc dopamine neurons is shown to promote wakefulness.

      Strengths:

      The manuscript addresses an interesting question: how brainwide dopamine activity evolves across sleep/wake transitions. The combination of multi-site DA recordings with simultaneous EEG/EMG monitoring is technically sophisticated. The experimental logic is generally clear, and the dataset is rich. The result has several interesting observations.

      Weaknesses:

      The authors used the GRAB-DA2m sensor to monitor dopamine release. Although DA2m exhibits higher affinity for dopamine compared to NE (around 15-fold difference in EC50 in HEK cell assays), it is still possible that NE contributes to the recorded signals, particularly during sleep/wake transitions when locus coeruleus activity is strongly modulated. Given the widespread and state-dependent dynamics of NE, this potentially needs to be addressed.

      We thank the reviewer for raising this important methodological consideration. While we acknowledge that a minor contribution from norepinephrine (NE) to the DA2m signal cannot be categorically excluded, several convergent lines of evidence give us confidence that the signals we recorded primarily reflect dopamine release.

      First, DA2m has substantially lower affinity for NE compared to dopamine. The reported EC<sub>50</sub> for NE is ~1200 nM [1], which is ~15-fold higher than for dopamine. In contrast, extracellular NE levels in the prefrontal cortex are typically in the low nanomolar range (generally <5 nM under basal conditions) [2,3]. Because physiological NE concentrations are orders of magnitude below the sensor’s EC<sub>50</sub> threshold, NE is highly unlikely to drive significant DA2m activation in vivo.

      Second, our optogenetic experiments provide direct functional validation. The targeted stimulation of midbrain dopaminergic neurons elicited robust DA2m signal responses across both cortical and subcortical brain areas. This confirms that the sensor reliably captures evoked dopamine release within our specific experimental paradigm.

      Finally, the spontaneous DA2m signal dynamics we observed across sleep-wake states functionally diverge from previously reported patterns of cortical NE release [4]. For example, in Figure 1C, our DA2m recordings in the mPFC revealed high activity during wakefulness, alongside pronounced, sharp changes during NREM-to-WAKE transitions. In contrast, prior study [4] show that NE exhibits comparatively mild fluctuations during wakefulness and transitions between NREM. This temporal and kinetic divergence further supports that our recorded signals isolate region-specific dopaminergic dynamics rather than generalized NE arousal activity.

      Taken together, these physiological, functional, and kinetic distinctions indicate that while a negligible contribution from NE cannot be entirely ruled out, it is highly unlikely to account for a substantial portion of the DA2m signals observed during sleep-wake transitions in our study.

      Similarly, the chemogenetic experiments rely on CNO to activate hM3Dq-expressing dopamine neurons. However, it is well established that CNO can be converted to clozapine in rodents, and clozapine itself is known to influence sleep/wake. Although the authors included non-hM3Dq-expressing mice as controls, the potential confounding effects of clozapine on sleep regulation remain a concern.

      We appreciate the reviewer raising this important point regarding the metabolism of CNO. We are aware of the evidence suggesting that CNO can undergo back-metabolism to clozapine in rodents, which could potentially exert independent effects on sleep-wake architecture. To mitigate this concern, we strictly employed several experimental safeguards:

      (A) Non-hM3Dq Control Group: As noted by the reviewer, we included a cohort of mice that did not express the hM3Dq receptor but received the same dosage of CNO (1 mg/kg). In these animals, we observed no significant alterations in sleep-wake states compared to saline baseline (Figure S3), suggesting that at this dosage, any clozapine produced was below the threshold for behavioral modulation of sleep.

      (B) Dosage Selection: We utilized a relatively low dose of CNO (1 mg/kg), which is widely reported in the literature to minimize the accumulation of clozapine to levels that would interfere with EEG-defined sleep states in rodents [5]. Furthermore, studies have demonstrated that while higher doses of CNO (e.g., 5–10 mg/kg) can produce clozapinelike effects on sleep architecture, lower doses around 1 mg/kg do not yield significant alterations in cortical EEG power distribution or sleep-wake amounts in control animals [6,7].

      Midbrain dopamine neurons exhibit both tonic and phasic firing patterns. In Figure 1, most reported dopamine transitions appear relatively slow. However, some faster, phasic-like components are observable. For example, in NAc-L during REM-to-WAKE transitions, there are 2 phasic-like decreases between −20 and 0 s. The authors used laser-evoked stimulation experiments in the VTA and DRN and showed that 2 s versus 10 s stimulation produces distinct dopamine kinetics, suggesting that different firing patterns generate distinct DA dynamics. Moreover, the temporal profiles vary not only across regions but also across transitions within the same region. For example, in CeA, the NREM-to-WAKE transition shows a relatively rapid decrease, whereas REM-to-WAKE displays a much slower decline. Similarly, some regions (e.g., NAc-L NREM-to-WAKE, DRN REM-toWAKE) show faster changes, while others (e.g., mPFC WAKE-to-NREM, VTA NREM-toWAKE) show slower kinetics. These observations argue against a simple region-specific explanation and instead suggest that distinct firing modes may differentially contribute depending on transition type.

      We thank the reviewer for this insightful comment. We agree that midbrain dopamine neurons exhibit both tonic and phasic action-potential firing patterns. As summarized by Grace et al., dopamine neurons recorded using in vivo electrophysiology can display a slow, irregular, single-spike “tonic” firing pattern, typically around 2–10 Hz, as well as burst-like “phasic” firing patterns [8].

      However, our recordings were performed using GRAB-DA2m fiber photometry. Therefore, our measurements reflect extracellular dopamine dynamics in the recorded target regions rather than the action-potential firing patterns of midbrain dopamine neurons. GRABDA2m has subsecond sensor kinetics and is suitable for detecting extracellular dopamine transients occurring over hundreds of milliseconds to seconds, as well as slower dynamics occurring over seconds to tens of seconds [1], which matches the timescale of the sleep–wake transition-related dynamics observed in previous studies [9,10]. Nevertheless, GRAB-DA2m fiber photometry in our study does not directly resolve dopamine neuron spike timing or distinguish tonic from phasic firing modes. Accordingly, we interpret our signals as extracellular dopamine concentration dynamics rather than as direct measurements of tonic or phasic neuronal firing.

      Therefore, the transition-aligned dopamine signals shown in Figure 1 should be interpreted as dopamine dynamics occurring over seconds-to-tens-of-seconds around sleep–wake transitions, rather than as dopamine neuron firing patterns. In addition, these traces represent GRAB-DA2m signals averaged across sessions and mice within a ±30 s window centered on each sleep/wake transition. Thus, they do not necessarily represent individual dopamine transient patterns on single transitions. We also acknowledge the reviewer’s observation that faster phasic-like components are visible in some traces, including the decreases in the NAc-L preceding REM-to-WAKE transitions. Direct electrophysiological recordings of dopamine neuron firing during sleep–wake transitions would be useful in future studies to determine how tonic and phasic firing modes contribute to the observed dopamine dynamics.

      In the laser-evoked stimulation experiments shown in Figure 3, we thank the reviewer for the thoughtful interpretation. The results indicate that different stimulation durations can produce distinct dopamine release dynamics in downstream projection regions. Moreover, prolonged optogenetic stimulation was associated with more sustained dopamine responses, suggesting that the temporal profile of extracellular dopamine dynamics depends, at least in part, on the duration and region of dopaminergic input [1]. We also agree with the reviewer that the temporal profiles of the GRAB-DA2m signals vary not only across regions, but also across sleep/wake transitions within the same region. For example, in CeA, the NREM-to-WAKE transition shows a relatively rapid dopamine decrease, whereas the REM-to-WAKE transition displays a slower decline.

      Similarly, faster dopamine changes are observed in some region/transition combinations, such as NAc-L during NREM-to-WAKE and DRN during REM-to-WAKE, whereas slower kinetics are observed in others, such as mPFC during WAKE-to-NREM and VTA during NREM-to-WAKE. Together, these effects reflect both region-specific mechanisms and transition-dependent differences in dopaminergic activity.

      While cross-correlation analysis provides insight into the temporal coordination of DA signals across regions, several limitations should be considered. Sleep/wake transitions are inherently non-stationary events, whereas cross-correlation assumes relatively stable signal properties within the analysis window. This mismatch may bias lag estimates and obscure transient lead-lag relationships. Moreover, the temporal resolution of fiber photometry and the kinetics of genetically encoded DA sensors limit the precision with which timing relationships can be interpreted, particularly for sub-second lags.

      We thank the reviewer for raising these important considerations. The temporal relationships between regional dopamine signals were assessed using cross-covariance analysis. We agree that cross-covariance analysis has limitations when applied to sleep/wake transitions, because these transitions are inherently non-stationary events. Although cross-covariance centers the signals by subtracting their means and is therefore less sensitive to baseline offsets than raw cross-correlation, it still summarizes the lagdependent covariance between two signals over the selected analysis window. Therefore, the inferred lag should be interpreted as a transition-level measure of temporal coordination rather than a precise estimate of instantaneous lead–lag timing.

      To minimize the influence of brief or unstable state fluctuations, we only included transitions in which both the preceding and following sleep/wake epochs lasted at least 30 s, and excluded epochs shorter than 30 s [4]. This criterion helped ensure that the analyzed events represented well-defined transitions between sustained behavioral states rather than transient or fragmented episodes. Although dopamine signals may still change dynamically within the transition window, and the temporal resolution of fiber photometry and the kinetics of genetically encoded GRAB-DA2m sensors limit the precision with which fine-scale timing relationships can be interpreted, dopamine signals were relatively stable within each behavioral state, as shown in Fig. 1B and reported previously [1,9,10] Thus, we believe that cross-covariance analysis provides useful information about the temporal coordination of dopamine dynamics across regions.

      In the Introduction, the authors state that they aim to address 'which dopaminergic populations causally drive these patterns.' However, the chemogenetic approach used operates on a relatively slow timescale: CNO-induced activation takes 15-30 minutes to produce effects, and the induced changes are long-lasting. In contrast, the dopamine transitions described in Figure 1 occur on a much faster timescale compared to CNO manipulation. Thus, while chemogenetic activation demonstrates that stimulating VTA or DRN dopamine neurons promotes wakefulness, it does not directly establish that these populations causally drive the rapid transition-related DA dynamics observed in the photometry recordings.

      We thank the reviewer for this thoughtful comment. We agree that chemogenetic manipulation operates on a much slower timescale than the rapid dopamine transients observed during sleep–wake transitions, and therefore does not directly recapitulate these fast dynamics. In particular, CNO-induced activation unfolds over minutes and produces sustained changes in neuronal activity, whereas the DA signals we report fluctuate on a sub-second to second timescale. Our intention with the chemogenetic experiments was not to mimic the precise temporal profile of endogenous DA signals, but rather to test whether increasing the activity of specific dopaminergic populations is sufficient to influence behavioral state.

      In this context, our results show that activation of VTA or DRN dopaminergic neurons robustly promotes wakefulness, supporting a causal role for these populations in sleep– wake regulation at the circuit level. However, we agree that these data do not by themselves establish that these neurons directly generate the rapid transition-related DA dynamics observed in the photometry recordings.

      Reviewer #2 (Public review):

      In "Brainwide dopamine dynamics across sleep-wake transitions", Chen et al. provide a thorough description of how dopamine dynamics fluctuate across sleep-wake transitions and in transitions between sleep states. To achieve this, the authors used multi-channel fiber photometry and a genetically encoded fluorescent dopamine reporter to simultaneously measure dopamine dynamics in 8 brain regions. They also used EEG measurements to precisely quantify and time transitions between sleep states and wakefulness. Finally, the authors used channelrhodopsin to examine dopamine dynamics following subregion stimulation and chemogenetics to test the causal relationship between activation of distinct dopamine neuron populations and their effects on sleep state.

      The conclusions made by the authors in this study are modest and appropriate given the largely observational nature of the principal findings. The use of optogenetics to probe regional dopamine signaling following activation of distinct nuclei is interesting, but not entirely novel and constrained in interpretability. Similarly, the chemogenetics experiment largely confirms previous studies, which the authors correctly cited in the text.

      The principal findings of this study are based on strong methodological and analytical methods. Implanting 8 optical fibers in a single mouse, along with EEG/EMG electrodes, is technically challenging, providing valuable, simultaneous measurements of dopamine fluctuations across the brain. This enables the strong correlational and time-locked analyses performed by the authors in Figure 2. What's more, the use of EEG/EMG electrodes provides time-locked descriptions of sleep states, enabling precise comparisons between the dopamine signal and sleep state transitions.

      The paper has some weaknesses that the authors could address. The analyses in Figure 1 could be strengthened to show how dopamine changes during transitions between specific sleep states. The injection sites for channelrhodopsin and chemogenetic viruses could be validated to strengthen the interpretation of those results. Also, a stronger justification for the experiments conducted in Figure 3 could be provided, as they seem unrelated to the present study.

      Overall, this study has strong descriptive power, convincingly showing how dopamine fluctuates across sleep states. Some of the other aspects of the paper, however, are somewhat limited in novelty and interpretation.

      The analyses in Figure 1 could be strengthened to show how dopamine changes during transitions between specific sleep states.

      We appreciate the reviewer’s thoughtful suggestion. We agree that the directionality and kinetics of dopamine changes during sleep/wake transitions may provide important information beyond state-level dopamine quantification.

      In this study, mice were recorded for 4–5 h during each sleep session. Across the recording period, mice frequently transitioned from NREM to WAKE, WAKE to NREM, NREM to REM, and REM to WAKE. Transitions from WAKE to REM were rarely observed and therefore were not included in the transition analysis. Accordingly, we focused our analysis on the four major transition types: NREM-to-WAKE, WAKE-to-NREM, NREM-toREM, and REM-to-WAKE [4,9,11].

      For each transition type, dopamine dynamics were analyzed separately by aligning the zscored GRAB-DA2m signal to the transition onset and averaging across all epochs of the same transition type. To minimize the influence of brief or unstable state fluctuations, we excluded transitions in which either the preceding or following sleep/wake epoch lasted less than 30 s. The resulting transition-triggered dopamine traces were then averaged across sessions and mice for each transition type independently.

      Thus, the transition analysis preserves the directionality of state changes rather than pooling all sleep/wake transitions together. Because dopamine signals differ across behavioral states, transitions between neighboring states produce distinct temporal profiles when aligned to the transition point [4,9-11]. For example, REM-to-WAKE transitions may show a rapid increase in dopamine in the mPFC, whereas WAKE-to-NREM or NREM-to-REM transitions may show slower and more modest decreases. These transition - specific kinetics may reflect distinct underlying mechanisms, including changes in dopamine neuron firing or local terminal modulation.

      The injection sites for channelrhodopsin and chemogenetic viruses could be validated to strengthen the interpretation of those results.

      We agree with the reviewer that precise histological validation is essential for the correct interpretation of our optogenetic and chemogenetic findings.

      Regarding the chemogenetic experiments, as noted, we provide examples of virus expression in the VTA, DRN, and SNc in Figure 4. By demonstrating the consistency and restriction of our targeting across the entire cohort (VTA, SNc, and DRN), we confirmed that our observed sleep effects were regionally specific. Our data only included mice with accurate targeting and no substantial virus "leakage" into adjacent nuclei.

      We thank the reviewer for this insightful observation regarding the regional dopamine (DA) responses following SNc stimulation. While the SNc is traditionally associated with the dorsal striatum (DLS), several studies have demonstrated that SNc dopaminergic neurons also project to the nucleus accumbens, particularly the lateral shell [12,13]. Furthermore, recent work characterizing the functional heterogeneity of midbrain DA neurons suggests that SNc subpopulations can drive significant DA release in ventral striatal subregions [14]. We appreciate the reviewer’s caution regarding potential off-target effects. While our histological criteria for validation post recordings were stringent, we acknowledge that in any midbrain manipulation, the close anatomical proximity of the VTA and SNc makes it technically challenging to guarantee zero involvement of neighboring VTA neurons. However, by using mice with the most restricted virus expression and fibers targeting, we have minimized this potential confound as much as is technically feasible with current viral and optogenetic methods.

      Also, a stronger justification for the experiments conducted in Figure 3 could be provided, as they seem unrelated to the present study.

      We thank the reviewer for this comment. The experiments in Figure 3 were designed to systematically map the sources of dopaminergic inputs to key brain regions examined in this study [15], including the mPFC, DLS, NAc, and CeA. Establishing these input–output relationships is important for interpreting the photometry signals observed during sleep– wake transitions.

      Specifically, we found that optogenetic activation of VTA dopaminergic neurons elicits DA responses in all four regions, whereas activation of DRN dopaminergic neurons induces responses in the mPFC, DLS, and CeA, and activation of SNc dopaminergic neurons induces responses in the mPFC, NAc, and DLS. These results reveal partially overlapping but distinct projection patterns across dopaminergic populations.

      Taken together, these data provide a circuit-level framework suggesting that VTA, SNc, and DRN dopaminergic neurons may contribute differentially and with distinct weights to the DA signals observed in these regions during sleep wake transitions.

      Overall, this study has strong descriptive power, convincingly showing how dopamine fluctuates across sleep states. Some of the other aspects of the paper, however, are somewhat limited in novelty and interpretation.

      We appreciate the reviewer’s assessment that our study convincingly demonstrates how dopamine fluctuates across sleep states. We agree that the primary contribution of this work is descriptive and foundational. At the same time, we respectfully emphasize that rigorous, comprehensive descriptive studies are essential, particularly when addressing phenomena that have not been systematically characterized. Prior to this work, dopamine dynamics during natural sleep–wake transitions had not been measured simultaneously across multiple brain regions.

      Our multi-site photometry approach advances the field in several important ways. Technically, the combination of simultaneous eight-region fiber photometry with EEG/EMG recordings represents a substantial methodological advance, enabling brainwide, network-level analysis of dopamine dynamics during natural state transitions. This approach reveals emergent features—such as temporal coordination and inter-regional lead–lag relationships—that cannot be captured using single-site recordings. Moreover, integrating brain-wide measurements with region-specific manipulations allows circuitlevel insights that would not be accessible from either approach alone.

      Conceptually, our findings revealed the region, sleep/wake transition type -specific and bidirectional dopamine dynamics, instead of the prevailing view of dopamine as a uniform arousal signal: dopamine decreases in certain limbic regions, such as the central amygdala and nucleus accumbens lateral shell, during arousal transitions, while increasing in cortical and other striatal regions. These results refine simplified models of dopaminergic regulation of arousal. In addition, our data reveal differential circuit contributions, with the VTA and DRN—but not the SNc—promoting wakefulness, highlighting functional specialization within the dopamine system.

      We acknowledge that some aspects of our study, including the optogenetic mapping and chemogenetic experiments, build on established methodologies and in part confirm prior findings. However, these experiments also provide several new insights. First, whereas individual dopamine sources have often been studied in isolation, our systematic comparison across VTA, SNc, and DRN using consistent methods reveals distinct brainwide functional contributions that were not previously established. Second, our optogenetic mapping does not simply recapitulate known projection patterns, but instead uncovers quantitative differences in dopamine release kinetics and magnitude across source–target pairs, which inform the heterogeneity of the transition dynamics. Finally, our findings provide a crucial anatomical and temporal framework for future research on the specific mechanisms driving these dynamics and their precise functional consequences.

      References:

      (1) Sun, F. et al. Next-generation GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods 17, 1156-1166, doi:10.1038/s41592-020-00981-9 (2020).

      (2) Ihalainen, J. A., Riekkinen, P., Jr. & Feenstra, M. G. Comparison of dopamine and noradrenaline release in mouse prefrontal cortex, striatum and hippocampus using microdialysis. Neurosci Lett 277, 71-74, doi:10.1016/s0304-3940(99)00840-x (1999).

      (3) Berridge, C. W. & Abercrombie, E. D. Relationship between locus coeruleus discharge rates and rates of norepinephrine release within neocortex as assessed by in vivo microdialysis. Neuroscience 93, 1263-1270, doi:10.1016/s0306-4522(99)00276-6 (1999).

      (4) Silverman, D. et al. Activation of locus coeruleus noradrenergic neurons rapidly drives homeostatic sleep pressure. Sci Adv 11, eadq0651, doi:10.1126/sciadv.adq0651 (2025).

      (5) Anaclet, C. et al. The GABAergic parafacial zone is a medullary slow wave sleeppromoting center (vol 17, pg 1217, 2014). Nat Neurosci 17, 1841-1841, doi:DOI 10.1038/nn1214-1841d (2014).

      (6) Ma, C. Y. et al. Microglia regulate sleep through calcium-dependent modulation of norepinephrine transmission. Nat Neurosci 27, 249-258, doi:10.1038/s41593-02301548-5 (2024).

      (7) Traut, J. et al. Effects of clozapine-N-oxide and compound 21 on sleep in laboratory mice. Elife 12, doi:10.7554/eLife.84740 (2023).

      (8) Grace, A. A., Floresco, S. B., Goto, Y. & Lodge, D. J. Regulation of firing of dopaminergic neurons and control of goal-directed behaviors. Trends Neurosci 30, 220-227, doi:10.1016/j.tins.2007.03.003 (2007).

      (9) Darmohray, D. et al. Brainstem circuit for sickness-induced sleep. Sci Adv 11, doi:ARTN eady024510.1126/sciadv.ady0245 (2025).

      (10) Hasegawa, E. et al. Rapid eye movement sleep is initiated by basolateral amygdala dopamine signaling in mice. Science 375, 994-+, doi:10.1126/science.abl6618 (2022).

      (11) Ding, X. et al. Neuroendocrine circuit for sleep-dependent growth hormone release. Cell 188, 4968-4979 e4912, doi:10.1016/j.cell.2025.05.039 (2025).

      (12) Poulin, J. F. et al. Mapping projections of molecularly defined dopamine neuron subtypes using intersectional genetic approaches. Nat Neurosci 21, 1260-1271, doi:10.1038/s41593-018-0203-4 (2018).

      (13) Lerner, T. N. et al. Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits. Cell 162, 635-647, doi:10.1016/j.cell.2015.07.014 (2015).

      (14) Azcorra, M. et al. Unique functional responses differentially map onto genetic subtypes of dopamine neurons. Nat Neurosci 26, 1762-1774, doi:10.1038/s41593023-01401-9 (2023).

      (15) Eban-Rothschild, A., Rothschild, G., Giardino, W. J., Jones, J. R. & de Lecea, L. VTA dopaminergic neurons regulate ethologically relevant sleep-wake behaviors. Nat Neurosci 19, 1356-1366, doi:10.1038/nn.4377 (2016).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Pecak et al have deciphered the conformational dynamics of a heterodimeric model ABC transporter, TmrAB, a functional homolog of the human antigen transporter TAP, using single-molecule Forster resonance energy and fluorophores attached to residues at either nucleotide binding domains or periplasmic gate. The analysis not only differentiated ATP-free and bound states but also enabled the real-time monitoring of protein conformational changes, precisely dissecting transport cycles and resolving transient intermediates. This study is absolutely significant in providing and establishing a general pipeline delineating the conformational dynamics in heterodimeric ABC transporters.

      We thank the reviewer for this accurate and thoughtful summary of our work and its broader significance. We agree that the combination of single-molecule FRET with orthogonal validation approaches enables mechanistic resolution of conformational states and transitions that are not accessible by ensemble measurements. In particular, this framework allows direct discrimination of ATP-free and ATP-bound conformations, real-time tracking of transport cycle progression, and identification of transient intermediates in the heterodimeric ABC transporter TmrAB. We further agree that these capabilities support a generalizable strategy for dissecting conformation dynamics in related ABC transporters.

      Strengths:

      The scientific study is very well documented for experimental design, results, and conclusions supported by the experimental data. The authors have determined the conformational dynamics of TmrAB across different ATP concentrations, including physiological ones, and resolved an outward open state and other conformational states consistent with previous cryoEM and DEER studies.

      Weaknesses:

      The scientific study needs a bit of in-depth analysis with respect to consistency in K<sub>d</sub> and its implications on the mechanism.

      The apparent K<sub>d,ATP</sub> values were determined using two complementary approaches that report on different aspects of the system. Ensemble FRET measurements yielded values of 51° ± 38° µM (TmrAB<sup>NBD</sup>), 68°  ± 25° µM (TmrAB<sup>PG</sup>), and 95° ± 26° µM (TmrAB<sup>PG_EQ</sup>), which are in good agreement with previously reported biochemical estimates (~100° µM for TmrAB<sup>EQ</sup>) (Stefan et al, 2020). The slightly elevated value observed for the E→Q variant may reflect modest perturbation of nucleotide handling in this slow-turnover background. Notably, the close agreement between labeled and unlabeled variants indicates that fluorophore attachment does not measurably affect ATP binding.

      In contrast, smFRET-derived K<sub>d,ATP</sub> values (13° ± 1° µM for TmrAB<sup>NBD</sup> and 2° ± 1° µM for TmrAB<sup>PG</sup>) are systematically lower. This difference likely arises from the difficulty of deconvoluting overlapping FRET populations at sub-K<sub>d,ATP</sub> concentrations, particularly for TmrAB<sup>PG</sup>, where state assignment is less well separated. Despite this quantitative offset, both approaches consistently indicate ATP saturation well below physiological concentrations and therefore support the same mechanistic conclusion that ATP binding drives conformational switching in TmrAB.

      Reviewer #2 (Public review):

      In their manuscript entitled 'ATP-driven conformational dynamics reveal hidden intermediates in a heterodimeric ABC transporter', Pečak et al. use elegant single-molecule FRET experiments in detergent to investigate the heterodimeric ABC transporter TmrAB. By combining simulations of the transporter's accessible volume with elegant trapping strategies, the authors identify an unresolved outward-facing open state and conclude that it is usually obscured by a rapidly interconverting ATP-bound ensemble. Overall, the study demonstrates that smFRET can resolve the short-lived intermediate states of TmrAB and potentially other ABC transporters that are obscured in ensemble measurements.

      It is a very interesting study that highlights the power of combining high-resolution structural information with spectroscopic approaches. I have three major points and a few minor criticisms.

      We thank the reviewer for the thoughtful and constructive evaluation of our manuscript and for highlighting the strength of combining structural and single-molecule approaches. We have addressed all major and minor points in detail below and revised the manuscript where appropriate to clarify limitations, justify analysis choices, and improve transparency.

      Major points:

      (1) The main weakness is that the authors base their conclusions on a very limited set of FRET pairs. While TmrAB has been extensively studied in terms of its structure, the authors should at least acknowledge this limitation more clearly.

      We agree that our conclusions are based on a limited number of FRET reporter pairs, and we now explicitly state this limitation in the revised manuscript. The chosen labeling positions were selected to probe two functionally critical regions—the nucleotide-binding domains and the periplasmic gate—based on prior structural and spectroscopic evidence. While this represents sparse sampling of the full conformational space, it is consistent with typical smFRET studies of membrane transporters, where experimental constraints generally limit the number of simultaneously accessible labeling positions (Asher et al, 2021; Asher et al, 2022; Levring et al, 2023; Wang et al, 2020).

      Importantly, both independent reporter variants yield consistent ATP-dependent population shifts, supporting the robustness of the observed trends. We further clarify that additional labeling sites could, in principle, resolve finer structural sub-states; however, given the already limited population separation in the current variants, such extensions would likely provide diminishing returns in state resolvability under the present experimental conditions. This trade-off is now explicitly discussed.

      (2) Most smFRET distributions were fitted with one, two, or three Gaussians. However, in several cases, additional populations with noticeable amplitudes appear to be present (e.g., Figure 3c at 0.1 mM and 3 mM ATP; Figure 4a, apo; Figure 4c, 0.3 mM R9L). Could the authors clarify why these populations were not included in the analysis?

      We thank the reviewer for this careful observation. Low-amplitude subpopulations are occasionally detected in individual histograms; however, they were not included in the quantitative model because they do not meet criteria for reproducibility, amplitude robustness, or structural assignability. Specifically, these features vary between replicates, contribute minimally to total population, and cannot be mapped to structurally or biochemically defined states based on available cryo-EM (Hofmann et al, 2019), DEER/PELDOR (Barth et al, 2018; Barth et al, 2020), or accessible-volume simulations.

      Similar minor subpopulations have been reported in smFRET studies and often attributed to photophysical or labeling heterogeneity effects (Asher et al, 2022; Husada et al, 2018). To avoid over-parameterization, we therefore restricted analysis to reproducible, structurally supported states. This rationale is now clarified in the revised manuscript.

      (3) Figure 3c (3 mM ATP): Is it truly possible to distinguish the two states in this distribution?

      We agree that state separation in the TmrAB<sup>PG</sup> variant is limited (ΔE° = °0.11), and we now explicitly acknowledge this constraint in the manuscript. To improve robustness under these conditions, we used a constrained fitting strategy in which the apo-state distribution was fixed from nucleotide-free measurement, reducing parameter degeneracy during fitting of ATP-bound datasets.

      While single-molecule trajectory-based approaches such as Hidden Markov Modeling would be ideal for resolving dynamic interconversion, this was not feasible due to the low fraction of dynamic traces at the available temporal resolution. We therefore rely on population-level analysis, which remains consistent across replicates and reporter variants.

      Notably, independent measurements from two reporter positions (TmrAB<sup>NBD</sup> and TmrAB<sup>PG</sup>) yield similar ATP-bound population fractions at saturating ATP concentrations (~77% vs. ~80%), supporting the robustness of the inferred state distribution despite partial overlap.

      References

      Asher WB, Geggier P, Holsey MD, Gilmore GT, Pati AK, Meszaros J, Terry DS, Mathiasen S, Kaliszewski MJ, McCauley MD, Govindaraju A, Zhou Z, Harikumar KG, Jaqaman K, Miller LJ, Smith AW, Blanchard SC, Javitch JA (2021) Single-molecule FRET imaging of GPCR dimers in living cells. Nat Methods 18: 397–405. doi:10.1038/s41592-021-01081-y

      Asher WB, Terry DS, Gregorio GGA, Kahsai AW, Borgia A, Xie B, Modak A, Zhu Y, Jang W, Govindaraju A, Huang LY, Inoue A, Lambert NA, Gurevich VV, Shi L, Lefkowitz RJ, Blanchard SC, Javitch JA (2022) GPCR-mediated beta-arrestin activation deconvoluted with single-molecule precision. Cell 185: 1661–1675 e1616. doi:10.1016/j.cell.2022.03.042

      Barth K, Hank S, Spindler PE, Prisner TF, Tampé R, Joseph B (2018) Conformational coupling and trans-inhibition in the human antigen transporter ortholog TmrAB resolved with dipolar EPR spectroscopy. J Am Chem Soc 140: 4527–4533. doi:10.1021/jacs.7b12409

      Barth K, Rudolph M, Diederichs T, Prisner TF, Tampé R, Joseph B (2020) Thermodynamic basis for conformational coupling in an ATP-binding cassette exporter. J Phys Chem Lett 11: 7946–7953. doi:10.1021/acs.jpclett.0c01876

      Hofmann S, Januliene D, Mehdipour AR, Thomas C, Stefan E, Brüchert S, Kuhn BT, Geertsma ER, Hummer G, Tampé R, Moeller A (2019) Conformation space of a heterodimeric ABC exporter under turnover conditions. Nature 571: 580–583. doi:10.1038/s41586-019-1391-0

      Husada F, Bountra K, Tassis K, de Boer M, Romano M, Rebuffat S, Beis K, Cordes T (2018) Conformational dynamics of the ABC transporter McjD seen by single-molecule FRET. EMBO J 37: e100056. doi:10.15252/embj.2018100056

      Levring J, Terry DS, Kilic Z, Fitzgerald G, Blanchard SC, Chen J (2023) CFTR function, pathology and pharmacology at single-molecule resolution. Nature 616: 606–614. doi:10.1038/s41586-023-05854-7

      Nocker C, Pečak M, Nocker T, Fahim A, Sušac L, Tampé R (2026) Single-molecule dynamics reveal ATP binding alone powers substrate translocation by an ABC transporter. Nat Commun 17 doi:10.1038/s41467-026-70021-1

      Nöll A, Thomas C, Herbring V, Zollmann T, Barth K, Mehdipour AR, Tomasiak TM, Bruchert S, Joseph B, Abele R, Olieric V, Wang M, Diederichs K, Hummer G, Stroud RM, Pos KM, Tampé R (2017) Crystal structure and mechanistic basis of a functional homolog of the antigen transporter TAP. Proc Natl Acad Sci U S A 114: E438–E447. doi:10.1073/pnas.1620009114

      Stefan E, Hofmann S, Tampé R (2020) A single power stroke by ATP binding drives substrate translocation in a heterodimeric ABC transporter. eLife 9: e55943. doi:10.7554/eLife.55943

      Wang L, Johnson ZL, Wasserman MR, Levring J, Chen J, Liu S (2020) Characterization of the kinetic cycle of an ABC transporter by single-molecule and cryo-EM analyses. eLife 9: e56451. doi:10.7554/eLife.56451

    1. Author response:

      We appreciate the extremely helpful feedback from the reviewers and editors for our manuscript. We are happy that the reviewers have appreciated what we are doing here, performing the initial work that should set the stage with Drosophila larva as a model for hyperactive stimulant response. Every comment is certainly addressable within a reasonably short time period and we look forward to improving our paper in an upcoming revision.

      We have some confusion about the “fundamental issue” of using nicotine, as we see the excitation as the fundamental effect we are studying, but we can continue to discuss and clarify this.

      We plan to make significant edits to our introduction and background sections to better frame the goals of the work, and will clarify and expand on our methods, and more carefully make any claims about neural mechanisms.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      We thank the reviewer for their constructive and insightful comments and agree with the importance of the points raised. We recognize that aspects of our original presentation may have been unclear or overly strong in their interpretation. We have therefore revised the manuscript to clarify our intended scope, moderate our claims, and strengthen the analysis. In the second paragraph of the Discussion, we have explicitly acknowledged the concerns raised by the reviewer and outlined how they have been addressed in the revised manuscript. Our detailed responses are provided below.

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit

      We agree with the reviewer’s comment. The expressions noted by the reviewer (e.g., closely mimicked, nearly identical, recapitulate) will be replaced with alternative wording that conveys a more moderate meaning (Line 16-17, 65-66, 83, 96, 120, 212).

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      As the reviewer pointed out, behaviorally trained RNNs can admit multiple internal solutions that produce the same behavioral output, and we acknowledge the non-uniqueness of such internal solutions. However, we do not interpret the fact that only a subset of trained RNNs exhibit dynamics similar to those observed in the claustrum as evidence that this solution is fragile. Notably, the claustrum-like dynamics emerged spontaneously during training and were not explicitly enforced. Furthermore, our finding suggests that the emergence of this particular dynamical regime depends on relatively specific structural constraints.

      Our criterion for selecting RNNs that could inform the computational principles of the claustrum was their ability to reproduce the behavioral and physiological observations obtained in the delayed escape experiments. RNNs that were excluded may reflect information-processing strategies used by other brain regions or may rely on artificial logical structures. The computational demand of the task, which integrates temporally separated signals, naturally drives convergence toward networks with recurrent excitatory connectivity capable of maintaining persistent activity. Indeed, all networks that exhibited a claustrum-like cluster shared a common structural feature: strong recurrent excitatory connectivity within Cluster 1. This property is consistent with biological characteristics observed in the slice experiments shown in Fig 2.

      Importantly, the computational principles derived from this RNN were found to be quantitatively consistent with in vivo single-neuron activity patterns. Specifically, analysis using an eigenvalue-based metric (λ<sub>3</sub>/Σλ) revealed the same directional effect in both the RNN and the claustrum neuron data. In addition, a leave-one-neuron-out analysis showed that this pattern was broadly distributed across in vivo claustral neurons rather than being driven by a small subset (see Fig. 4).

      Taken together, these convergent lines of evidence suggest that the computational model is not simply one arbitrary solution among many possible alternatives, but rather implements a computational principle that may underlie claustral functions.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified, given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      As the reviewer pointed out, the GPFA trajectory comparison presented in the original manuscript remained largely qualitative, and we agree that this alone was insufficient to establish robustness or provide convincing evidence for population-level structure. In the revised manuscript, we have therefore added the requested quantitative analysis (see Fig. 4).

      Before describing the analysis, we would like to clarify several methodological limitations associated with pseudopopulation and single-trial data. GPFA estimates latent trajectories based on assumptions about covariance structure among neurons and temporal smoothness. In pseudopopulation datasets, the true simultaneously recorded covariance structure cannot be fully reconstructed, which is an inherent limitation. Because our dataset is based on single trials, the analysis does not directly exploit trial-to-trial variability. Nevertheless, the estimation of the latent space still depends on the covariance structure among real claustral neurons, suggesting that the inferred trajectories remain tied to biologically meaningful population dynamics.

      Accordingly, the quantitative metric we introduce is not entirely independent of the GPFA estimation step. Rather, it is intended to evaluate the geometric structure of the single-trial latent trajectories estimated by GPFA. We acknowledged this limitation in the revised manuscript.

      Specifically, for the biological data, we reanalyzed the GPFA-derived latent trajectories in PCA space and computed an eigenvalue-based metric (λ<sub>3</sub>/Σλ). For each of the 20 time bins, we applied a sliding window of 10 bins and calculated the covariance matrix within that window. The eigenvalues of PC1, PC2, and PC3 were then obtained, and the third eigenvalue (λ<sub>3</sub>) was normalized by the total variance (Σλ = λ<sub>1</sub> + λ<sub>2</sub> + λ<sub>3</sub>). This metric quantifies the degree to which the trajectory locally deviates from a planar structure that can be explained by two dominant axes. An increase in λ<sub>3</sub>/Σλ indicates that the population-state trajectory forms a higher-dimensional geometric structure beyond a simple two-dimensional combination.

      For the RNN data, in contrast, the activity of all units can be observed simultaneously and sufficient trial repetitions are available. Therefore, GPFA was not applied; instead, PCA was performed directly on the population activity for each trial. We then computed an average trajectory across trials and applied the same λ<sub>3</sub>/Σλ metric. Thus, although the initial dimensionality reduction steps differ between the two systems, the definition and calculation of the final quantitative metric are identical. The focus of the comparison is therefore not the dimensionality reduction technique itself, but the geometric dimensional structure of the population trajectories evolving over time.

      Importantly, within the biological dataset, the GPFA estimation procedure, preprocessing steps, pseudopopulation construction, subsampling strategy, temporal alignment criteria, and smoothing parameters were applied identically across conditions. Likewise, the same analysis pipeline was used for all conditions in the RNN. If structural biases had been introduced during covariance estimation or dimensionality reduction, they would be expected to affect all conditions within each system similarly. Nevertheless, the λ<sub>3</sub>/Σλ value was consistently and significantly higher in the CS condition than in the Neutral condition, and this directional pattern was observed in both the RNN and the claustral neuron data. This suggests that the effect reflects condition-specific differences in population dynamical structure rather than artifacts arising from a particular dimensionality reduction method.

      To further test whether the observed effect might be driven by a small subset of neurons or specific neuron combinations, we performed a leave-one-neuron-out analysis on the claustrum dataset. Recomputing λ<sub>3</sub>/Σλ while removing one neuron at a time showed that, in the CS group, most neurons contributed relatively evenly to this metric, whereas the Neutral group did not show such a distributed contribution pattern. This indicates that the observed three-dimensional structure is not driven by a few outlier neurons or incidental covariance patterns, but rather reflects an organized population-level phenomenon.

      If the result were primarily due to structural artifacts introduced by the pseudopopulation construction or dimensionality reduction procedures, it would be unlikely for consistent selective differences to repeatedly emerge between conditions under identical analysis pipelines. The consistently higher λ<sub>3</sub>/Σλ values observed in the CS condition therefore provide indirect support that this pattern reflects condition-specific population dynamics rather than estimation bias.

      Taken together, these results suggest that the observed three-dimensional structure reflects condition-specific population dynamics rather than analysis artifacts. The fact that the same quantitative metric yields consistent effects in both the RNN and claustral data further strengthens the correspondence between the two systems.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      We agree with the reviewer and stated that references to these theories are speculative, while substantially reducing both their emphasis and prominence in the manuscript (Line 444-446, 451).

      (5) Comment on Conceptual Interpretation of the Behavioral Paradigm:

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      First, it is not clear that this task is widely recognized in the literature as a canonical inference task, in the sense of, for example, sensory preconditioning, transitive inference, or model-based inference paradigms. Rather, the observed effect-that CS animals escape faster to a neutral compartment than neutral-CS controls-can be parsimoniously interpreted in terms of generalized threat value, heightened fear/anxiety, or a bias toward avoidance/escape under elevated threat, without requiring an explicit inferential step about the specific safety of the alternative compartment. The fact that no prior training is needed is compatible with flexible generalization, but does not by itself demonstrate inference in a more formal computational sense.

      Second, the inference claim becomes central to the manuscript's conceptual framing (e.g., the idea that rsCla supports "inference-based escape"), yet the behavioral analyses presented here and in the cited prior work do not clearly rule out simpler accounts. Clarifying this distinction would help avoid overstating both the inferential nature of the behavior and the specific role of rsCla and the RNN's "claustrum-like" cluster in supporting inference per se, as opposed to more general integration of threat-related signals with an opportunity for escape.

      We agree with the reviewer’s concern. First, we referred to the delayed escape behavioral task as “a behavioral paradigm that requires integration of temporally separated task-relevant signals.” (Line 7-8). We also removed references to the term inference throughout the manuscript (Line 46, 51, 67, 397).

      Reviewer #2 (Public review):

      We sincerely thank the reviewer for their constructive and insightful comments. Through the revision process, the manuscript has been substantially improved, with increased reproducibility, more appropriate acknowledgment of prior work, and a clearer and more logical presentation of the study.

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g., in Figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      We agree with the reviewer that this distinction should be made clearer. In the original manuscript, we indicated in the Figure 1 legend that panels A, D, E, F, and L (left) were reproduced from Han et al. (2024). To further clarify this point, we explicitly noted this distinction again in the main text (Line 74, 85). In addition, we described the behavioral experiments and in vivo electrophysiological recordings performed in Han et al. (2024) in the Methods section and include the appropriate citation (Line 463-530).

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      We agree with the reviewer’s comment and have revised the manuscript to provide a more detailed description of the model training procedure, weight initialization, and parameter selection.

      We expanded the explanation of the model training procedure and weight initialization. Specifically, the recurrent (W<sub>rec</sub>) and output (W<sub>out</sub>) weight matrices were initialized using a Glorot normal distribution with a standard deviation of to ensure stable signal propagation during early training. In addition, we now explicitly describe the training algorithm and optimization procedure. The network was trained using the Adam optimizer implemented in TensorFlow (v2.1.0) with a batch size of 256 for 1.2 million training iterations, minimizing the per-trial loss function defined in the manuscript. We also explicitly stated how Dale’s principle was maintained throughout training: rows in W_out corresponding to inhibitory units were zeroed out, and recurrent weights were continuously constrained so that excitatory and inhibitory neurons preserved their respective positive and negative synaptic projections. To illustrate how the weight structure evolved during training, we explicitly reference Figure 2A, which visualizes the final mean inter-cluster synaptic weights and highlights the strong recurrent connectivity that emerged within Cluster 1. Regarding Equations 2 and 3 and their constants, we clarified that the target escape times used to anchor the network were based on experimentally measured behavioral latencies (48.7 s for the CS-present condition and 111.3 s for the CS-absent condition). Furthermore, the regularization coefficients (λ = 0.01 and λ<sub>FR</sub> = 0.95) were selected through a grid search procedure to maintain biologically plausible firing rates while preventing overfitting.

      We detailed the surgical procedures that were previously omitted. This includes the specific anesthesia protocol (sodium pentobarbital, 50 mg/kg, i.p.), stereotaxic mounting, and the exact coordinates for the rsCla (AP +2.95, ML ±1.95, DV -3.85 mm). To define "sparse expression," we specified that the AAV was diluted 1:4 in sterile saline. Finally, we included the precise injection parameters: delivery at 20 nL/min via a pressure injection system, with the pipette left in place for 10 minutes post-infusion to ensure adequate diffusion. (Line 635, 636-639, 641-643). We have added these contents in the Methods section. 

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions, intended to "test" the model.

      We agree with the reviewer’s comment and have reorganized the figures to focus on the key results. Specifically, we separated the original figures so that they correspond to (1) Presentation of an RNN model consistent with the results of actual claustral recordings, (2) identification of dimensionality-reduced population activity patterns in the model, (3) comparison of these patterns with population activity patterns derived from recorded claustral neurons, (4) proposal of a nonlinear integration mechanism, and (5) the suggestion that such integration may be implemented through dynamic coding. Using this figure organization, we first identify RNN models trained on behavioral metrics whose dynamics are consistent with experimental claustral recordings. We then compare the dimensionality-reduced population activity patterns of these models with those derived from recorded claustral neurons to evaluate their biological plausibility. After selecting the models that satisfy this criterion, we perform further analyses that would be difficult to achieve using real neural recordings alone. These analyses ultimately allow us to propose dynamic coding exhibiting nonlinear integration as a plausible computational mechanism.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which, to this reviewer's knowledge, was the first to demonstrate such connectivity, including the long-duration events and impact of planes of section.

      We agree with the reviewer’s suggestion and will include a reference to Orman (2015). We have clarified that neuronal activity can persist for extended periods and that such persistent activity has been observed in claustral slices prepared at a specific slicing angle (Line 144).

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustrum. Additional papers by Mathur's group and Citri's group are ignored.

      We agree with the reviewer’s comment and have revised the relevant sentences in the Introduction section.  We also included and acknowledged the contributions of previous studies by the Mathur group and the Citri group by adding additional references to their works (Line 36, 429).

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new, and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how the claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      All relevant work must be appropriately cited throughout the manuscript.

      Regarding the E–E metric, we obtained the following result. When including recordings in which the whole-cell recording could not be completed, optogenetically evoked responses were observed in 38 out of 43 patched cells. This suggests that approximately 90% of the cells receive intra-claustral excitatory input. However, the current dataset does not allow us to quantify the connection probability or the strength of these connections.

      As the reviewer pointed out, the RNN developed in this study is specifically designed for the delayed escape task, and we do not intend to claim direct generalization to other proposed functions of the claustrum, such as attention, salience, or sleep. The goal of this study is to computationally characterize the temporal integration mechanism of the claustrum observed in this specific task. We have included this in the Discussion section. In the second paragraph of the Discussion, we have explicitly acknowledged the concerns raised by the reviewer and outlined how they have been addressed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a novel toolkit for visualizing and manipulating neurotransmitterspecific vesicles in C. elegans neurons, addressing the challenge of tracking neurotransmitter dynamics at the level of individual synapses. The authors engineered endogenously tagged vesicular transporters for glutamate, GABA, acetylcholine, and monoamines, enabling cell-specific labeling while maintaining physiological function. Additionally, they developed conditional knockout strains to disrupt neurotransmitter synthesis in single neurons. The study reveals that over 10% of neurons in C. elegans exhibit co-transmission, with a detailed case study on the ADF sensory neuron, where serotonin and acetylcholine are trafficked in distinct vesicle pools. The approach provides a powerful platform for studying neurotransmitter identity, synaptic architecture, and co-transmission.

      Strengths:

      (1) This toolkit offers a generalizable framework that can be applied to other model organisms, advancing the ability to investigate synaptic plasticity and neural circuit logic with molecular precision.

      (2) Through the use of this toolkit, the authors uncover molecular heterogeneity at individual synapses, revealing co-transmission in over 10% of neurons, and offer new insights into neurotransmitter trafficking and synaptic plasticity, advancing our understanding of synaptic organization.

      Weaknesses:

      (1) While the article introduces valuable tools for visualizing neurotransmitter vesicles in vivo, the core techniques are based on previously established methods. The study does not present significant technological breakthroughs, limiting the novelty of the methodological advancements.

      The reviewer is correct that this study does not introduce fundamentally new molecular or imaging techniques. Rather, the goal of this work is to establish a generalizable and experimentally validated framework for investigating neurotransmission in vivo at single-cell resolution. To achieve this, we deliberately integrate robust and well-established approaches, including CRISPR-based genome engineering, endogenous tagging, intersectional labeling strategies, and behavioral genetics, into a unified toolkit that enables questions that were previously difficult to address in intact animals.

      The novelty of the work therefore lies not in the invention of individual technologies, but in their systematic integration, functional validation, and deployment to reveal new biological insights, such as the prevalence and spatial organization of co-transmission in vivo.

      (2) The article does not fully explore the potential implications or the underlying mechanisms governing this process, while the discovery of co-transmission in over 10% of neurons is an intriguing finding. A deeper investigation into the functional uniqueness and interactions of neurotransmitters released from individual co-transmitting neurons - perhaps through case study examples - would strengthen the study's impact.

      We agree with the reviewer that this study does not exhaustively explore the functional implications or mechanisms of co-transmission. The primary goal of this work is to introduce and share a validated set of strains that enable monitoring and cell-specific disruption of the major neurotransmitter systems in C. elegans, using molecular components that are broadly conserved across species. By establishing this toolkit, we aim to enable the mechanistic, single-cell analyses of co-transmitting neurons that extend beyond the scope of the present study but represent important next steps for the field.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors developed fluorescent reporters to visualize the subcellular localization of vesicular transporters for glutamate, GABA, acetylcholine, and monoamines in vivo. They also developed cell-specific knockout methods for these vesicular transporters. To my knowledge, this is the first comprehensive toolkit to label and ablate vesicular transporters in C. elegans. They carefully and strategically designed the reporters and clearly explained the rationale behind their construct designs. Meanwhile, they used previously established functional assays to confirm that the reporters are functional. They also tested and confirmed the effect of cell-specific and pan-neuronal knockout of several of these transporters.

      Strengths:

      The tools developed are versatile: they generated both green and red fluorescent reporters for easy combination with other reporters; they established the method for cell-typespecific KO to analyze the function of the neurotransmitter in different cell types. The reagents allow visualization of specific synapses among other processes and cell bodies. In addition, they also developed a binary expression method to detect co-transmission "We reasoned that if two neurotransmitters were co-expressed in the same neuron, driving Flippase under the promoter of one transmitter would activate the conditional reporter - resulting in fluorescence - only in cells also expressing a second neurotransmitter identity". Overall, this is a versatile and valuable toolkit with well-designed and carefully validated reagents. This toolkit will likely be widely used by the C. elegans community.

      Weaknesses:

      The authors evaluated the positions of fluorescent puncta by visually comparing their positions with the positions of synapses indicated by EM reconstruction. It would provide stronger supportive evidence if the authors also examined co-localization of these reporters with well-established synaptic reporters previously published by their lab, such as reporters that label presynaptic sites of AIY interneurons.

      We have now included images of the synaptic vesicle marker RAB-3 in neurons like ASE (new Figure S2) and RIB (new Figure S4D). We mention in the text that the patterns observed with VGLUT/EAT-4 (in Figure 2E) and VGAT/UNC-47 (Figure 3D) are like those observed in the Rab3 images (Figure S2 and S4D, now discussed in lines 180-182 and line 244, respectively), supporting labeling of presynaptic vesicles.

      Additionally, we now show that in the ADF neuron, a mutant for the conserved presynaptic kinesin KIF1A, results in the accumulation of VACh/UNC-17 and VMAT/CAT-1 in the cell soma and the elimination of the signal from the ADF axon (new Figure 7D-D’). These results are also consistent with the idea that these labeled transporters localize to synaptic vesicles that fail to be transported into the axon in the absence of a functional KIF1A/UNC-104 protein (lines 408-411).

      This toolkit will likely be widely used by the C. elegans community. To facilitate the adoption of the approach and method by worm labs, the authors should include their plan for the dissemination of all of the reagents included in the kit, along with all of the associated information, including construct sequences and the protocols for their use.

      We thank the reviewer or this suggestion, and in response we now: (1) have deposited all strains that we developed in this study to the Caenorhabditis Genetics Center, (2) have created a public website with sequences and genotyping information for each allele developed (https://www.intralab.app/research-papers/cuentas-condori_etal-2026) and(3) have named the tool kit, SynaptoTagMe, and included the name in the title and in the text. We also added the information of the public website to the main text (lines 140-142) and methods section (lines 540-542).

      Reviewer #3 (Public review):

      Summary:

      Cuentas-Condori et al. generate cell-specific tools for visualizing the endogenous expression of, as well as knocking out, four different classes of neurotransmitter vesicular transporters (glutamatergic, cholinergic, GABAergic, and monoaminergic) in C. elegans. They then use these tools in an intersectional strategy to provide evidence for the coexpression of these transporters in individual neurons, suggesting co-transmission of the associated neurotransmitters.

      Strengths:

      A major strength of the work is the generation of several endogenous tools that will be of use to the community. Additionally, this adds to accumulating evidence of co-transmission of different classes of neurotransmitters in the nervous system.

      Weaknesses:

      A weakness of the study is a lack of comparison to previously published single-cell sequencing data. These tools are alternatively described in the manuscript as superior to the sequencing data and as validation of the sequencing data, but neither claim can be assessed without knowing how they compare and contrast to that data. It is thus not clear to what extent the conclusions of this paper are an advance over what could be determined from the sequencing data on its own. Finally, some technical considerations should be discussed as potential caveats to the robustness of their intersectional strategy for concluding that certain genes are indeed co-expressed. Overall, claims about cotransmission should be tempered by the caveats presented in the discussion, suggesting that co-expression of these transporters is not in and of itself sufficient for neurotransmitter release.

      To clarify, we do not claim that our tools are superior to single-cell sequencing data. Rather, we view the characterization of neurotransmitter identity as an iterative process of discovery and validation across complementary approaches. Moreover, while this study provides an additional lens through which to examine neurotransmitter identity, its primary advance is not in redefining transmitter identity per se, but in establishing a toolkit that enables direct, in vivo monitoring and manipulation of neurotransmitter use at single-cell resolution.

      We do agree on the importance of explicitly comparing our findings with prior studies. In the revised manuscript we have therefore strengthened this integration by:

      (1) Revising Figure S9 and its legend to indicate the source of information for each neuron;

      (2) Adding a new Table 3 summarizing neurons consistently reported to have co-transmission potential;

      (3) Adding a new Table 4 listing neurons previously suggested to be co-transmitter neurons but not consistently supported across datasets;

      (4) Revising the Results to clarify these comparisons (lines 372-374 and 381-383); and

      (5) Incorporating this discussion into the main text (lines 482–488).

      In the Discussion we also now acknowledge technical caveats of the intersectional strategy, emphasizing that co-expression of vesicular transporters indicates co-transmission potential but is not, on its own, sufficient evidence of functional co-release (lines 482–488).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The design of different recombination sites for the transporters is a key strength of this paper. While the authors have provided justification and validation for the chosen sites, it would be valuable to know whether alternative insertion sites were tested as controls. A comparative analysis of multiple sites would provide important insights, especially for the design of similar sites in other proteins or in mammalian systems.

      Our paper lists all the sites tested for labeling each synaptic vesicle transporter. To summarize this information, we have added Table 5 in the Methods section (line 591).

      (2) Given the endogenous nature of the transporter design, it would be interesting to know if the authors have observed dynamic vesicle trafficking to explain the partial overlap shown in Figure 7. A dynamic approach could better capture the potential synergism and heterogeneity of co-transmission. I recommend that the authors try time-lapse imaging to explore this dynamic process further.

      We agree that dynamic imaging approaches, including time-lapse analysis of vesicle trafficking, represent an exciting avenue to further investigate the spatial and temporal organization of co-transmission. Such experiments are part of ongoing work in our laboratory and will be the focus of future studies aimed at dissecting the dynamic regulation of transmitter-specific vesicle populations in vivo.

      (3) The paper identifies co-transmission across a significant proportion of neurons, but the functional implications and interactions of neurotransmitters released from individual cotransmitting neurons are not fully explored. A case study focusing on the uniqueness and interactions of neurotransmitter release in these neurons would provide further clarity on the biological relevance of co-transmission.

      We agree with the reviewer on the importance of dissecting the functional implications of co-transmission and understanding how different neurotransmitters interact within individual co-transmitting neurons in vivo. The primary goal of this study is to establish and share tools that enable such investigations, and we anticipate that future work, using these reagents, will examine the functional roles of co-transmission on a neuron-by-neuron basis in the future.

      (4) Minor Comments:

      (a) Figure S1D: The label "eat-4" in the eat-4::GFP image appears in italics.

      We have corrected this.

      (b) Figure 2C: The figure legend is missing the statistical significance notation (*** p).

      We have corrected this.

      (c) Figure 2D: The scale bar should be labeled as 10 μm.

      We have added the label.

      (d) Figure S4B: The image quality could be improved for better clarity.

      We have replaced the image.

      (e) Figure S8: The figure legend formatting needs attention, and the scale bar is missing in Figure S8C.

      We have added panel labels and the scale bar.

      Reviewer #3 (Recommendations for the authors):

      (1) A comparison of the results generated in this paper to the Cengen data (or other previously published data) would greatly strengthen the paper. Figure S7 seems to be a compilation of several different data sets, but this is very unclear if so, and there is no indication of which neurons are from which data, and whether there is any conflicting evidence (or what cutoffs were used to determine co-expression from Cengen). If there are indeed conflicting results, the ramifications should be discussed. Finally, given the caveat introduced in the discussion regarding the I2 neuron not expressing GABA synthesis or reuptake machinery, a more thorough analysis of which neurons identified here do or don't express other relevant genes may be warranted.

      In the revised version, we have added Tables 3 and 4 to explicitly compare our findings with CeNGEN and prior studies. Table 3 lists neurons consistently reported across independent datasets to have co-transmission potential, while Table 4 highlights neurons that have been suggested, but not consistently supported, across studies. We now also provide explicit references for each neuron in these tables and have clarified data sources and annotations in the legend to Figure S7 (now Figure S9). These additions are intended to make points of agreement and discrepancy across datasets transparent and to better contextualize our findings within existing resources.

      (2) The intersectional strategy used to identify co-expression of different transporters has some caveats that should be discussed. Specifically, removing the entire open reading frame of the eat-4 gene (as opposed to employing a T2A strategy) could potentially also remove some negative regulatory elements (for example, located within introns), leading to the inappropriate expression of the fluorescent reporter. This should at least be mentioned as a potential caveat.

      We have added this caveat into the discussion section (lines 511-513).

      (3) The colocalization experiments performed in Figure 7 seem to rely on the use of a transgenic allele (syb7882) that was not previously validated for functionality. This is only a problem because: a) another allele with a constitutive mRuby in the same position (ot907) did not seem to be fully functional in the thrashing assays (Figure S4F), and thus it is at least conceivable that the differences in localization are due to the non-functional transporters being relegated to compartments destined for degradation. Validating this strain (after panneuronal Flippase expression) in the thrashing assay would dispel this concern.

      We have performed thrashing assays with allele syb7882 (UNC-17::mRuby3 GLP-on) (new Figure S6), in which we find that labeling UNC-17 with C. elegans-optimized mRuby3 (driven by pan-cellular Flippase) results in animals whose thrashing behavior is indistinguishable from that of wild-type animals. This result is consistent with the idea that the distinct subsynaptic localizations observed between VMAT/CAT-1 and VAChT/UNC-17 in ADF neurons arise from endogenous cellular subsynaptic organization programs.

      We additionally note that allele ot907 labels UNC-17 with mKate2, not mRuby3, and that this allele is different from wild type animals in a thrashing assay (Figure S5F). The syb7882 allele that we generated labels UNC-17 with mRuby3 and is not different from wild type in a thrashing assay. We are unsure as to these distinct phenotypes between ot907 and syb7882, but note that in addition to the use of different fluorescent proteins, each allele also employs distinct linker sequences between UNC-17 and the fluorescent protein (new Figure S6). We now explain this difference in the figure legend of Figure S5 (lines 1184-1189).

      Minor comments:

      (1) Is there a difference between the strains imaged in Figures 3D and S3D? If so, this is not clear. If not, why are they shown twice, and why do they look so different from each other?

      We have replaced panel S3D with an endogenous RAB-3::mScarlet marker in RIB neurons to show that the localization of this synaptic vesicle marker parallels the punctated pattern of UNC-47::gfp11x3 reconstituted specifically in RIB neurons. See new panel S4D and line 244.

      But to explain, GFP1-10 is expressed with an extrachromosomal array, which drives variable expression of the array and can explain the difference.

      (2) Strains are alternatively denoted by their effect in the main figures, and by their allele names in the supplementary figures. This can be confusing when trying to compare data between the two figures (e.g., Figures 4C and S4F). Perhaps adding the allele names as parentheticals in the main figure might help.

      We have modified the paper to include the name of the alleles used in the panels of the main figures. Additionally, we now mention the specific alleles used for the functional assays in the figure legends.

      (3) To better understand the ramifications and efficiency of the cat-1 FLP-mediated removal (Figure 5E), it would be interesting to compare it directly to the ADF-specific removal of tph-1 referenced in the text.

      We agree that a direct comparison between the FLP-mediated removal of cat-1 and ADFspecific removal of tph-1 would be informative for assessing the efficiency and functional consequences of these manipulations. These experiments represent an interesting direction for future work, and we plan to pursue such comparisons in subsequent studies.

      (4) ADF seems to express very low levels of cho-1 (reuptake transporter), based on the images in Figure S8. Does it express higher levels of cha-1 (synthesis)?

      We have not directly compared the relative expression levels of cho-1 and cha-1 in ADF neurons in this study. Such quantitative comparisons of synthesis and reuptake machinery represent an interesting direction for future work but fall beyond the scope of the present manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma et al. show that melanoma cells induce an EMT-like state in nearby keratinocytes and that when this state is induced experimentally by Twist-overexpression the resulting alteration in keratinocytes is inhibitory for melanoma invasion. These conclusions are based on experiments in vivo with zebrafish and, in vitro, with human cells. The work is carefully done and provides new insights into the interactions between melanoma cells and their environment.

      We appreciate your support for our overall conclusions.

      Strengths:

      The use of both zebrafish and human cells adds confidence that findings are relevant to human melanomas while also further demonstrating the utility of the zebrafish system for discovering important new features of melanoma biology that could ultimately have clinical impacts. The work also combines a nice suite of approaches including different models for induced melanomagenesis in zebrafish, single-cell RNA-sequencing, and more. Some of the final observations are intriguing as well, especially the possibility of EMT-induced melanocyte-keratinocyte interactions via Jam3 expression; it will be interesting to see if this is indeed a mechanism for restraining melanoma invasion. The paper is clearly written and the inferences are appropriate for the results obtained. Overall the work makes a solid contribution to our understanding of important, but too often neglected, roles of the tumor microenvironment in promoting or inhibiting tumor progression and outcome.

      Weaknesses:

      No critical weaknesses were noted.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Ma et. al. utilizes a zebrafish melanoma model, single-cell RNA sequencing (scRNA-seq), a mammalian in vitro co-culture system, and quantitative PCR (Q-PCR) gene expression analysis to investigate the role keratinocytes might play within the melanoma microenvironment. Convincing evidence is presented from scRNA-seq analysis showing that a small cluster of melanoma-associated keratinocytes upregulates the master EMT regulator, transcription factor, Twist1a. To investigate how Twist-expressing keratinocytes might influence melanoma development, the authors use an in vivo zebrafish model to induce melanoma initiation while overexpressing Twist in keratinocytes through somatic transgene expression. This approach reveals that Twist overexpression in keratinocytes suppresses invasive melanoma growth. Using a complementary in vitro human cell line co-culture model, the authors demonstrate reduced migration of melanoma cells into the keratinocyte monolayer when keratinocytes overexpress Twist. Further scRNA-seq analysis of zebrafish melanoma tissues reveals that in the presence of Twist-expressing keratinocytes, subpopulations of melanoma cells show altered gene expression, with one unique melanoma cell cluster appearing more terminally differentiated. Finally, the authors use computational methods to predict putative receptor-ligand pairs that might mediate the interaction between Twist-expressing keratinocytes and melanoma cells.

      Strengths:

      The scRNA-seq approach reveals a small proportion of keratinocytes undergoing EMT within melanoma tissue. The use of a zebrafish somatic transgenic model to study melanoma initiation and progression provides an opportunity to manipulate host cells within the melanoma microenvironment and evaluate their impact on tumour progression. Solid data demonstrate that Twist-expressing keratinocytes can constrain melanoma invasive development in vivo and reduce melanoma cell migration in vitro, establishing that Twist-overexpressing keratinocytes can suppress at least one aspect of tumour progression.

      Weaknesses:

      While the scRNA-seq analysis of melanoma tissue and RT-PCR analysis of EMT gene expression in isolated keratinocytes provide evidence that a subpopulation of host keratinocytes upregulates Twist and other EMT marker genes and potentially undergoes EMT, the in vivo evidence for keratinocyte EMT within the melanoma microenvironment is based on cell morphology in a single image without detailed characterization and quantification. No EMT marker gene expression was examined in melanoma tissue sections to determine the proportion and localization of Twist+ve keratinocytes within the melanoma microenvironment.

      We agree this needed better support. To address this, we have collaborated with the Sorger lab who has performed Spatial Transcriptomics on early human melanoma samples (n=8 samples). The advantage of this method is that they can dissect microregions of interest (MRs) RNA-seq to discern keratinocytes vs. melanocytes. We queried regions that had higher or lower numbers of atypical melanocytes in these biopsies with our TAK or TWIST signature. While the normal sample had no enrichment, we found that a subset of the human samples had evidence of these signatures in the keratinocytes, particularly the ones which had a higher proportion of atypical melanocytes. These data support our model that early melanomas enact an EMT like program in a subset of nearby keratinocytes.

      The scRNA-seq UMAP suggests the proportion of EMT keratinocytes within the melanoma microenvironment is very small, raising questions about their precise location and significance within the tumour microenvironment. Although both in vivo and in vitro evidence demonstrates that Twist-expressing keratinocytes can suppress melanoma progression, the conditions modelled by the authors involve over-expression of Twist in all keratinocytes, which do not naturally occur within the melanoma microenvironment and, therefore, might not be relevant to naturally occurring melanoma progression. The author did not test whether blocking EMT through down-regulation of Twist in keratinocytes may influence melanoma development, which would establish the role of Twist expression keratinocytes in the melanoma microenvironment.

      We entirely agree, and ideally would do the exact experiment you suggested, which is to knockout TWIST in the keratinocytes using CRISPR and see how this affects the tumor phenotype. However, despite our best efforts, we do not yet have an efficient method for performing knockouts in the tumor microenvironment. If we used standard 1-cell embryo transgenic approaches with a krt4-Cas9, this would severely disrupt skin development in the whole animal, and would be viable. Theoretically, we could do this with TEAZ, but we have found that the expression of Cas9 in the microenvironment (i.e. under a krt4 promoter) is relatively inefficient. For example, we tried a krt4-Cas9 coupled with an sgRNA against GFP (as a test of the system) and this did not work well. Thus, a major goal for future studies is to develop a technology that would allow us to do this exact experiment. Finally, we do not have enough cells present in the sections to answer the question of whether the EMT keratinocytes are associated with certain melanoma cell states (i.e. proliferative, invasive), although we agree this would be an important question for future studies.

      To address the potential mechanism by which Twist-expressing keratinocytes suppress melanoma progression, a second scRNA-seq analysis was conducted. However, this analysis is not adequately presented to provide strong evidence for proposed mechanisms for how Twist-expressing keratinocytes suppress melanoma cell invasion. CellChat analysis was used to attempt to identify receptor-ligand pairs that might mediate keratinocyte-melanoma cell interaction, but the interactions between tumour-associated keratinocytes (TAK) and melanoma cells were not included in the analysis. Furthermore, although genetic reporters were used to label both keratinocytes and melanoma cells, no images showing the detailed distribution and positional information of these cells within melanoma tissue are presented in the report. None of the gene expression changes detected through Q-PCR or scRNA-seq were validated using immunostaining or in situ hybridization.

      As noted above, we have now added human biopsy samples from the Sorger lab to our analysis, showing that the TAK/TWIST keratinocytes occur directly adjacent to the atypical melanocytes in these samples. While these early melanomas are quite difficult to obtain (most samples are used for diagnostic purposes), this provides further support to our zebrafish models.

      Overall, the data presented in this report draw attention to a less-studied host cell type within the tumour microenvironment, the keratinocytes, which, similar to well-studied immune cells and fibroblasts, could play important roles in either promoting or constraining melanoma development.

      Counterintuitively, the authors show that Twist-expressing EMT keratinocytes can constrain melanoma progression. While the detailed mechanisms remain to be uncovered, this is an interesting observation.

      Reviewer #3 (Public review):

      Summary:

      In this study the authors use the zebrafish model and in vitro co-cultures with human cell lines, to study how keratinocytes modulate the early stages of melanoma development/migration. The authors demonstrate that keratinocytes undergo an EMT-like transformation in the presence of melanoma cells which leads to a reduction in melanoma cell migration. This EMT transformation occurs via Twist; and resulted in an improvement in OS in zebrafish melanoma models. Authors suggest that the limitation of melanoma cell migration by Twist-overexpressing keratinocytes was through altered cell-cell interactions (Jam3b) that caused a physical blockage of melanoma cell migration.

      Strengths:

      The authors describe a new cross-talk between melanoma and its major initial microenvironment: the keratinocytes and how instructed by melanoma cells keratinocytes undergo an EMT transformation, which then controls melanoma migration. Overall, the paper is very well written, and the results are clearly organized and presented.

      Weaknesses:

      (1) To really show their last point it would be important to CRISPR KO Jam3b in melanoma with twist OE keratinocytes, in vivo or in vitro.

      The CellChat data suggest that Jam3b is likely important in melanoma development, as it has been shown to be important in melanocyte development (Eom, Dev Biol 2021). Studying this specifically in melanoma progression is an area of ongoing study in our lab, and we have begun to generate the Jam3b knockouts as you suggested. Since this set of experiments is quite extensive, we feel this set of data deserves a separate manuscript, which we hope to complete in the near future.

      (2) The use of patient biopsies from early-stage melanomas vs healthy tissue to assess if there is a similar alteration of morphology of adjacent keratinocytes and an increase in vimentin in human samples would strengthen the author's findings.

      As noted above, we have now added human biopsy samples from the Sorger lab to our analysis, showing that the TAK/TWIST keratinocytes occur directly adjacent to the atypical melanocytes in these samples. While these early melanomas are quite difficult to obtain (most samples are used for diagnostic purposes), this provides further support to our zebrafish models.

      (3) The cell-cell junctions and borders between cells (melanoma/ keratinocytes) should be characterized better, with cellular and sub-cellular resolution. Since melanocytes can "touch" with their dendrites ~40 keratinocytes - can authors expand and explain better their model? Can this explain that in some images we cannot observe a direct interface between the cells?

      We have now added higher resolution images of these junctions. Our overall hypothesis, related to point (2) above, is that Jam3b mediates these junctions between melanoma cells and keratinocytes, which is why we are now pursuing this in a followup study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Please say a little more about any phenotypes that might have been evident inTwist-overexpression fish in the absence of melanomas, and clarify in the text that these were mosaic animals, as a first (incorrect) reading left the impression that stablelines had been made.

      In these experiments, we co-injected the melanoma plasmids along with the krt4-TWIST plasmids, creating mosaic animals. Because of this, we did not have a way of specifically looking at the effect of TWIST in the absence of melanoma. We agree this needs better clarification and have added this to the Results.

      (2) Violin plot colors in main and Supplementary Figures tend to obscure data points. Colors for keratinocyte clusters are not discernible in Figure 4C.

      We have remade the plots in a different color scheme to try and make these stand out more easily.

      (3) Clarify that N-cadherin = cdh2 in Figure 1

      We have fixed this in the legend for Figure 1.

      (4) Clarify the relationship between keratinocytes highlighted in Figure 2B and used for Hallmark expression in Figure 2B, and those analyzed for expression of candidate genes in Figure 2E. The last shows many NKC whereas whereas even the larger group circled in Figure 2B as keratinocytes seems to have far fewer cells, unless massively overplotted. Is the rest of that cluster in Fig. 2B keratinocytes as well?

      In the analysis in Figure 2E, we first calculated genes differentially expressed in the TAK vs. NKCs (found in Figure 2B). We used those genes as input into GSEA analysis, which showed enrichment for EMT programs specifically in the TAKs. We recognize that the number of TAKs is relatively small (compared to all of the other cells in the single-cell UMAP) but that is the most we were able to get from this particular scRNA run, because the melanoma cells naturally make up the vast majority of the cells in the 10X run. This is why we performed downstream mechanistic analysis (in the rest of the paper) to ensure this result was not an artifact of a small number of TAKs.

      (5) Define "NES" in the Figure 2 legend.

      NES indicates “Normalized Enrichment Score”, a standard output of GSEA. This has been added to the legend.

      (6) Indicate how many control vs. Twist+ fish were found to have invasive vs non-invasive tumors upon histological examination. Were tumors in the latter fish always contained within the epidermis proper, or did some extend deeper if given enough time?

      In the histology analysis, we used n=3 control fish and n=3 TWIST overexpressing fish. Main Figure 3 shows n=1 of these fish from each group, and the other n=2 from each is shown in Supplemental Figure 1. In this cohort (taken at 26 weeks), all of the TWIST tumors were contained within the epidermis, but we did not let them grow longer to see if (given enough time) they could have invaded below this. Around 26 weeks, the survival decreased so made this an unfeasible experiment at later time points. We have added a statement about this to the Results section.

      Reviewer #2 (Recommendations for the authors):

      Going through the data presented in the figures, here are my comments:

      (1) Figure 1: To strengthen the evidence that keratinocytes in the melanoma microenvironment undergo EMT, it would be beneficial to provide immunostaining or in situ data for EMT marker genes within melanoma tissue sections co-stained with a keratinocyte marker (such as an anti-GFP antibody).

      We agree this type of analysis is an important validation of our findings. Doing this in zebrafish tumors is difficult, as human/mouse antibodies for EMT marker genes typically do not work in fish. In addition, we felt that validating our results in human melanomas would make our findings more generalizable. Therefore, we established a collaboration with Peter Sorger’s lab, who have been performing high-resolution spatial transcriptomics on early melanoma samples from humans. While these are difficult to attain (since most early lesions are processed for clinical diagnosis) they have a collection of n=8 samples that they subjected to GeoMX spatial analysis. In this method, the samples are first stained with antibodies to definitively mark keratinocytes (PANCK) vs. melanoma cells (SOX10) and all samples are reviewed by expert pathologists. From this, microregions (MRs) of interest are selected to then undergo RNA-seq. After control analysis to ensure both keratinocytes and melanocytes were present in the samples, they then used our TAK or TWIST signatures as a query. Both signatures were enriched in the keratinocytes adjacent to early melanomas, but not in normal skin samples or in samples with few atypical melanocytes. This provides further evidence that the altered keratinocytes we see in our fish are present and enriched in human biopsy specimens.

      (2) Figure 2: In panel B, the UMAP shows the separation of single cells, and keratinocytes are circled. However, there are two clusters of keratinocytes, and the graph does not indicate which cluster represents tumour-associated keratinocytes (TAKs) versus normal keratinocytes (NKCs). The two clusters also appear to differ in abundance, so it would be helpful to report the proportion of keratinocytes that are TAKs undergoing EMT, according to the individual dots in Figure 2E. In Figure 2E,TAKs seem to have very few cells compared to the other clusters. Given the relatively small number of EMT-TAKs detected in the single-cell RNA-seq data, I wonder how much direct influence these cells could exert on the bulk of melanoma cells in vivo.The evidence would be strengthened if an IHC analysis could show the location of Twist-expressing keratinocytes within the melanoma microenvironment and whether they are associated with certain melanoma cell markers but not others (i.e., markers indicating different differentiation states of melanoma cells). To further support the role of Twist-expressing keratinocytes in the melanoma microenvironment, it would be beneficial to perform a knockout (KO) of Twist in keratinocytes within the melanoma microenvironment.

      In Figure 2B, we agree that the color scheme made it difficult to discern TAKs vs. NKCs.

      We have changed the color scheme to make this more clear.

      The number of TAKs undergoing EMT is relatively small, and this is why we performed the overexpression studies of TWIST in order to expand the field of keratinocytes undergoing EMT. To get at the question of whether these are really important in tumor initiation and progression, we ideally would do the exact experiment you suggested, which is to knockout TWIST in the keratinocytes using CRISPR and see how this affects the tumor phenotype. However, despite our best efforts, we do not yet have an efficient method for performing knockouts in the tumor microenvironment. If we used standard 1-cell embryo transgenic approaches with a krt4-Cas9, this would severely disrupt skin development in the whole animal, and would not be expected to be viable. Theoretically, we could do this with TEAZ, but we have found that the expression of Cas9 in the microenvironment (i.e. under a krt4 promoter) is relatively inefficient. For example, we tried a krt4-Cas9 coupled with an sgRNA against GFP (as a test of the system) and this did not work well. Thus, a major goal for future studies is to develop a technology that would allow us to do this exact experiment. Finally, we do not have enough cells present in the sections to answer the question of whether the EMT keratinocytes are associated with certain melanoma cell states (i.e. proliferative, invasive), although we agree this would be an important question for future studies.

      (3) Figure 4: Co-culture results show that melanoma cells migrate further on a control HaCaT cell monolayer compared to a TWIST-overexpressing HaCaT cell monolayer. While this phenotype might support the conclusion that TWIST-expressing keratinocytes reduce melanoma cell invasion, it should be interpreted with caution. The data can be interpreted as TWIST-HaCaT cells inhibiting melanoma cell migration; however, an alternative explanation cannot be ruled out. For example, wild-type HaCaT cells might provide a suitable substrate for melanoma cells to migrate, whereas TWIST-HaCaT cells lack this property. To address this, the baseline melanoma cell migration should be established in this assay by coating the plate with cells from the same melanoma cell line and allowing melanoma cells from the flipped cover slip to migrate out.

      We have performed the experiment you suggested using Hs.294T and SKMEL2 cells and provided this as a new Supplemental Figure 2. This demonstrated that the melanoma cells in this context could indeed migrate out of the coverslip at baseline. Thus, it is possible, as you indicated, that the phenotype we have observed might be due to something lacking in the TWIST keratinocytes that promotes migration. Since we cannot differentiate between these two possibilities (i.e. that TWIST KCs actively inhibit migration vs. lacking something that promotes migration), we have modified the text to indicate both of these possible mechanisms could be at play.

      (4) In the representative images shown in the figure, it appears that both HaCaT cells and melanoma cells in the upper and lower panels are at very different densities."Contact inhibition" and "cell sorting" are well-known phenomena in tissue-cultured cells, so when cells are seeded at different densities, their ability to move away from the initial location could vary. From the Materials and Methods section, it is unclear why cell densities are drastically different in the images presented. Images in the upper panel show both melanoma cells and keratinocytes at lower densities, and in the TWIST group, melanoma cells under the cover slip appear to aggregate into clusters with TWIST-expressing keratinocytes surrounding each aggregated cluster. This suggests that cell sorting might be occurring, potentially mediated by cadherins or Eph-ephrins.

      We recognized this discrepancy as well. In the setup of the experiment, we seeded the exact same number of cells for both the Hs.294T (Figure 4E) and SKMEL2 (Figure 4G) experiment. But when we took the images after 20 hours of co-culture, it was clear that the HaCat densities were different, as seen in the figures. We suspect this might be because these two melanoma cells may secrete different factors (i.e. growth factors) that impact upon HaCat proliferation, adhesion or cell sorting. Despite this, in terms of the ability of the melanoma cells to migrate into the HaCATs, we saw similar results across both experiments, suggesting that it is not HaCAT density alone that explains the results. But we agree we need to clarify this point about cell density more clearly in the manuscript, and we have amended the Discussion to indicate the above points.

      (5) Figure 5: Single-cell RNA-seq analysis comparing cells from control melanomas with cells from melanomas developed in a Twist-expressing keratinocyte background could provide valuable information on how melanoma cells alter their phenotype and how Twist-expressing keratinocytes respond to melanoma development. However, the information presented in the manuscript is not persuasive in this regard (appears to be minimal).

      (a) In Figure 5C, the differences between melanoma cells in a control background versus those in a Twist-expressing keratinocyte background include cells from more than one unique cluster, but most of the different clusters are not discussed, except for one prominent cluster indicated by an arrow.

      The reason we pointed out that one cluster is that it was the major thing that was different in the control melanomas vs. the TWIST melanomas. To better clarify this point, we have made a new Supplemental Figure 3 comparing the clusters in each situation: 7 in the control melanomas vs. 8 in the TWIST melanomas (Supp. Figure 3d). To then better understand the nature of the TWIST melanomas, we performed Gene Set Enrichment Analysis (GSEA) compared to the control melanomas. Interestingly, this revealed a striking enrichment for pathways related to oxidative phosphorylation using both GO and Hallmark terms. Because we had previously shown that melanoma cells with high ox-phos are typically in the more melanocytic and less invasive state (Lumaquin-Yin, Nature Communications 2023), we therefore analyzed our TWIST melanomas by comparing this unique cluster to the well-annotated melanoma cell state signatures from Tsoi et al (Cancer Cell, 2018). This showed that most of the TAKs and TWIST-KCs were in the melanocytic/transitory cluster, which are thought to be the least invasive of all the melanoma cell states. Thus, it seems likely that high levels of TWIST in the keratinocytes induces a low invasion state in the melanoma cells. We have added this data and interpretation to the Results and Discussion sections of the manuscript.

      (b) In Figure 5D, it is unclear whether TAKs include both wild-type keratinocytes and Twist-expressing keratinocytes. 

      We oversimplified this plot for the sake of visualization, but realize that in doing so we obscured some important details. In the plot, we separate normal keratinocytes (NKCs) vs. tumor associated keratinocytes (TAKs). TAKs are, by definition, TWIST<sup>hi</sup>/EMT<sup>hi</sup> and represent upregulation of endogenous TWIST. In contrast, when we force overexpression of TWIST in the keratinocytes, then we see an entirely new cluster appear, as expected. 

      (c) In Figure 5F, TAKs are interacting with melanoma cells so it is unclear why the CellChat analysis did not include TAKs. 

      This was an oversight on our part, and the Figure has now been corrected to include this. TAKs in both the control and TWIST melanomas have numerous interaction partners, whereas the TWIST-KCs have relatively fewer and more specific interactions.

      (d) Finally, Figure 5G needs clearer labelling,currently unclear which gene is expressed by the sender and which is by the receiver.

      This has been clarified in Figure 5F with specific indicators of “sender” vs. “receiver”.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1E - in this figure, it is possible to observe the altered morphology of keratinocytes but these cells are not in the vicinity of the melanoma cells - can authors please make a zoom-in in the region of the interface? And quantify the distance between cells - at least the image they show looks like the cells that are mostly de-formed are far away from the melanoma but perhaps was just this example....please clarify. Or there are patches of keratinocytes that go through EMT and others that maintain their epithelial structure?

      We have now added zoom-in images of the interface (Figure 1E). In nearly all sections examined, some keratinocytes maintain their hexagonal normal epithelial structure, but the majority of the cells appear altered. We have attempted to quantify this effect, along with the distance between cells with this EMT-like morphology, but have not found a reliable method given the heterogeneity across samples. That is why we instead chose to quantify the EMT-like keratinocytes (what we refer to as TAKs) using single-cell RNA seq, which showed that 32% of the population had the TAK signature, whereas 68% resembled normal keratinocytes. We feel this is more quantitative than imaging alone.

      This data has been added to the Results section.

      (2) Figure 3B - could not find the number of fish analyzed.

      This was an oversight on our part. We studied n=135 control melanomas vs. n=118

      TWIST melanomas. This data has now been added to Figure 3B.

      (3) Figure 3D - missing a graph with quantification and zoom images in the tail keratinocytes/ melanoma interface.

      In this particular cohort of animals, we unfortunately did not specifically track body vs. fin melanomas, so we are not able to quantify this.

      (4) Figure 4 - it would be nice again to have a zoom-in to observe the interface of cells- maybe use a phalloidin staining to visualize better how cells are touching each other.

      We have added a zoom in image of the interface to the image (Figure 4E). We have very much wanted to do immunohistochemistry (not just for phalloidin, but for other markers as well) on these coverslip co-cultures and have tried, but we have not been successful. This is likely because the assay requires plastic plates, which are incompatible with doing this, but agree that getting this to work would be an important area for future development.

      (5) I believe the paper deserves a last figure - with the model.

      We agree and this has now been added as Figure 7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This manuscript provides several important findings that advance our current knowledge about the function of the gustatory cortex (GC). The authors used high-density electrophysiology to record neural activity during a sucrose/NaCl mixture discrimination task. They observed population-based activity capable of representing different mixtures in a linear fashion during the initial stimulus sampling period, as well as representing the behavioral decision (i.e., lick left or right) at a later time point. Analyzing this data at the single neuron level, they observed functional subpopulations capable of encoding the specific mixture (e.g., 45/55), tastant (e.g., sucrose), and behavioral choice (e.g., lick left). To test the functional consequences of these subpopulations, they built a recurrent neural network model in order to "silence" specific functional subpopulations of GC neurons. The virtual ablation of these functional subpopulations altered virtual behavioral performance in a manner predicted by the subpopulation's presumed contribution.

      Strengths:

      Building a recurrent neural network model of the gustatory cortex allows the impact of the temporal sequence of functionally identifiable populations of neurons to be tested in a manner not otherwise possible. Specifically, the author's model links neural activity at the single neuron and population level with perceptual ability. The electrophysiology methods and analyses used to shape the network model are appropriate. Overall, the conclusions of the manuscript are well supported.

      Weaknesses:

      One potential concern is the apparent mismatch between the neural and behavioral data. Neural analyses indicate a clear separation of the activity associated with each mixture that is independent of the animal's ultimate choice. This would seemingly indicate that the animals are making errors despite correctly encoding the stimulus. Based solely on the neural data, one would expect the psychometric curve to be more "step-like" with a significantly steeper slope. One potential explanation for this observation is the concentration of the stimuli utilized in the mixture discrimination task. The authors utilize equivalent concentrations, rather than intensity-matched concentrations. In this case, a single stimulus can (theoretically) dominate the perception of a mixture, resulting in a biased behavioral response despite accurate concentration coding at the single neuron level. Given the difficulty of isointensity matching concentrations, this concern is not paramount. However, the apparent mismatch between the neural and behavioral data should be acknowledged/addressed in the text.

      We thank the Reviewer for the insightful comments and thoughtful suggestions. Our electrophysiological recordings show that GC dynamically encodes stimulus concentration of mixture elements, dominant perceptual quality, and decisions of directional lick. With regard to the encoding of mixtures, the clear separation of activity associated with each mixture (Figure 3) is present at a trial-averaged pseudo-population level, and average activities associated with more similar, intermediate mixtures are closer to each other in this space. At a single trial level activities evoked by similar, intermediate mixtures are much harder to separate. This increased similarity can lead to behavioral errors resulting from either incorrect encoding of the stimulus or from the inability to interpret the stimulus to guide the correct decision. The psychometric function, which shows that more distinct stimuli (100/0 vs 0/100) lead to fewer mistakes than more ambiguous, intermediate mixtures (55/45 vs 55/45), is consistent with the increased ambiguity of responses to intermediate mixtures.

      The Reviewer is correct that there could be a slight mismatch in the perceived intensity of the mixture components. This mismatch could be the reason for the slight asymmetry in our psychometric function (Figure 1B). However, it is not uncommon for mice in these 2AC tasks to also have a motor laterality bias in their responses that manifests itself for the more ambiguous stimuli. We chose not to model this bias given its subtlety and its unknown origin. Rather, we chose to model an ideal scenario in which stimuli have matched intensity and no motor bias exists. In the revised manuscript we discuss this issue.

      Reviewer #1 (Recommendations for the authors):

      (1) The apparent mismatch between neural and behavioral data. I am providing more details in this section to hopefully better illustrate my concern.

      (a) Based on the author's psychometric curve, sucrose appears to be a more salient signal causing the behavior to be shifted (e.g., a 50/50 mixture results in a >60% predicted behavioral performance). If both sucrose and salt were intensity-matched, a 50/50 mixture should result in a behavioral performance near 50%. The increased salience of sucrose could cause the animals to have lower overall performance despite accurate neural encoding. Alternatively, certain animals could display a strong side bias, skewing the data slightly. These issues have seemingly been fixed in the model data, which displays a more balanced psychometric curve. Accordingly, the model data seemingly displays a larger shift in error trials as compared to correct trials (Figure 6A).

      The reviewer is correct in observing that the average experimental psychometric curve in Figure 1B shows a slight shift in favor of the sucrose side with a 50/50 mixture. We fit psychometric curves to each session and the mean value of P(Sucrose choice | Stimulus = 50/50) across sessions was significantly different from 0.5 (one-sample t-test, p = 0.003), with 5 probabilities below 0.5 and 18 above it.

      This slight bias could be attributed to a slight mismatch in the perceived intensity of the mixture components and/or lateral motor biases. In any case, it is subtle and its origins were not a focus of this study.

      Models were not trained to match the animals’ psychometric curves, but rather to choose correctly in an ideal scenario where stimuli have matched intensities. This explains why the model simulations lack the bias observed in animal behavior data.

      We do not believe that there is a mismatch between the experimental behavioral and neural data, as trial-averaged pseudo-population trajectories are farther in neural space for more discriminable stimuli and closer in neural space for more similar stimuli, consistent with behavioral performance that is high for more discriminable stimuli and low for more similar stimuli. Moreover, as the model also shows, a clear separation of trial-averaged trajectories still results in a sigmoidal performance function for trial-to-trial behavior.

      Finally, subtle behavioral biases would not necessarily be expected to appear in our dPCA analyses since we used this technique to find a single axis that best separates all stimuli conditions regardless of choice when the pseudo-population data are projected upon it. Additional modes of activity that explain less overall variance might better reflect biases.

      (b) Although I am not an expert at these analyses, I wonder whether the elevated bump (i.e., >0) in Figure 3C of the 55/45 mixture that occurs early in the stimulus presentation further supports the hypothesis mentioned above and could indicate an early signal of salience/increased intensity?

      The reviewer is correct that the 55/45 trajectory features a brief positive wave right after stimulus delivery before going negative. While this may be related to stimuli not being explicitly balanced for intensity, it could also reflect a signal related to ambiguity or balanced mixtures. We are hesitant to interpret this positive deflection as conclusive evidence of a bias in neural activity, given its short duration and the natural variability of neural signals.

      (2) The increase in step-perception neurons after the decision period is confusing (Figure 4C). The text states (line 246) "the analysis reveals a small and time-invariant proportion of step-perception neurons". However, the proportion doubles after the decision-making process, which is seemingly a significant change. Why does this occur? This observation is noticeably missing from the network data. Could it be attributed to a mislabeling of "step-choice" neurons, given the correlation between the left/right decision and sweet/salty? Either way, it is very noticeable and should be addressed.

      We cannot be sure of the reason for the increase in step-perception neurons after decisions. One possibility is that they are acting as feedback for learning, encoding the percept to compare with choice and outcome to improve performance. The model, which presumably learns the task differently from the animals, does not seem to leverage this signal for its own learning. We have modified the text, now referring to a “small but consistently present proportion” of step-perception neurons, and included this proposed explanation in the Discussion.

      (3) Optional: I think the authors are missing an opportunity to analyze the temporal aspect of this multiplex code using their network-based modeling approach. A significant proportion of neurons fall into different categories (i.e., step-perception/linear, etc.) at different time points. However, the virtual ablation experiments remove any neuron that falls into one of these categories at any time. By limiting the cell-specific virtual ablation to specific time windows, you could (I think) provide stronger evidence for the temporal sequence of the encoding of these perceptual aspects.

      This was an excellent suggestion for an additional modeling experiment, so we performed it. A new supplemental figure (Figure S8) and additional text in the revised manuscript showcase the results. In summary:

      In terms of behavioral results, ablating the linear coding units in the beginning (that is, silencing all units that are labeled linear in any bin within the first 1.2 s after stimulus onset for the entirety of the 1.2 s) significantly reduces performance, as does ablating the step-perception or step-choice coding units at the end (1.2 s prior to choice). The remaining combinations of coding type and timing of the ablation do not affect performance.

      Regarding the dynamics of coding types (compare Figure 7A), stimulus coding activity was significantly blunted only by ablating the linear coding units in the beginning, whereas choice coding activity was diminished by ablating the choice coding units at the end or by ablating the linear coding units at either the beginning or the end.

      Reviewer #2 (Public review):

      Lang et al. investigate the contribution of individual neuronal encoding of specific task features to population dynamics and behavior. Using a taste-based decision-making behavioral task with electrophysiology from the mouse gustatory cortex and computational modeling, the authors reveal that neurons encoding sensory, perceptual, and decision-related information with linear and categorical patterns are essential for driving neural population dynamics and behavioral performance. Their findings suggest that individual linear and categorical coding units have a significant role in cortical dynamics and perceptual decision-making behavior.

      Overall, the experimental and analytical work is of very high quality, and the findings are of great interest to the taste coding field, as well as to the broader systems neuroscience field.

      I have a couple of suggestions to further enhance the authors' important conclusions:

      My main comment is the distinction between constrained and unconstrained units. The authors train a small percentage of units to match the real neural data (constrained units), and then find some unconstrained units that are similar to the real neural data and some that are not. As far as I could tell, the relative fraction of constrained and unconstrained units in the trained RNN is not reported; I assume the constrained ones are a much smaller population, but this is unclear. The selection of different groups of neurons for the RNN ablation experiments appears to be based on their response profiles only. Therefore, if I understood correctly, both constrained and unconstrained units are ablated together for a given response category (e.g., linear or step-perception). It would be useful, therefore, to separately compare the effects of constrained vs. unconstrained RNN units.

      We thank the Reviewer for the constructive feedback. The Reviewer is correct that ablations were carried out with respect to response categories only and included both constrained and unconstrained units.

      The ratio of total units to constrained units was fixed at 5.88, thus constrained units were ~17% of the network and unconstrained units were ~83%. This value is specified in the Methods (RNN: Components and dynamics), but we have reported it in the Results of the revised manuscript for clarity.

      We have also edited the Methods because they wrongly stated that the ratio of unconstrained (rather than total) units to constrained units was 5.88.

      Specifically:

      (1) For the analyses in the initial version of the manuscript, the authors should specify how many units in each ablation category are constrained and unconstrained.

      In the revised manuscript, we have specified the fractions of constrained and unconstrained units within each response category. For convenience, they are reported here: linear = 194 constrained and 691 unconstrained units; step-perception = 147 constrained and 840 unconstrained units; step-choice = 129 constrained and 814 unconstrained units; “other” = 353 constrained and 1739 unconstrained units.

      (2) The authors should repeat Figure 6, but only for unconstrained units to test how much of the effects in the initial version of Figure 6 are driven by constrained vs. unconstrained RNN units.

      In the revised version we have included two additional supplemental figures (Figures S5-6) where the analyses of Figure 6 are carried out separately for constrained and unconstrained units. In short, the results for the constrained units strongly resemble those for the experimental data, while the results for the unconstrained units strongly resemble those for all model units.

      (3) The authors should repeat Figure 7, but performing ablations separately on the constrained and unconstrained units to examine how the network behaves in each case and the resulting "behavioral" effect.

      The revised version includes a supplemental figure (Figure S7) with the results of these additional ablation simulations.

      In summary:

      In terms of behavioral performance, the prior results showing that ablating linear, step-perception, or step-choice units significantly impairs performance, while ablating “other” has no significant effect, hold even if ablation is restricted to only constrained or only unconstrained units. There is a significant main effect of constrained vs unconstrained; on average, ablating the unconstrained population impairs performance more, most likely due to their larger population size.

      In terms of dynamics, to impair stimulus coding by ablating step-choice units, you must ablate them all; to impair stimulus coding by ablating linear or step-perception units, however, ablating just the unconstrained ones suffices. As before, ablating linear, step-perception, or step-choice units significantly impairs choice coding activity, while ablating “other” units does not; these results hold even if ablation is restricted to only constrained or only unconstrained units. Finally, there is again a significant main effect of constrained vs unconstrained; on average, ablating the unconstrained population impairs dynamics more, most likely due to the larger population size.

      Reviewer #2 (Recommendations for the authors):

      (1) In addition to panel 5B, it would be informative to show data from individual mice and the corresponding RNNs trained on each mouse, to assess how closely they match. If available, including one representative example of a good match and one of a less accurate match would help the reader get a better sense of the data.

      Figure 5B shows the average behavioral performance of the model. Individual models were not trained directly on the psychometric curves of experimental sessions; they were trained to perform the task correctly. After successful training, model simulations were run with input noise to be able to produce a sigmoidal psychometric curve. However, although the input noise was tuned to capture the overall correct rate of the corresponding experimental session, we did not attempt to match the details of the psychometric curve. See also the next reply.

      (2) In addition to panel 5C, it would be useful to add examples of experimentally observed PSTHs and the corresponding activity trajectory for the units in the RNN trained to match them, for all the other coding patterns (step-perception and step-choice).

      We note that the PSTH in 5C is not an example of a linear coding unit as the Reviewer implies, but simply one with a good fit, and here the model’s output was produced in the absence of input noise. In order to classify step-perception and step-choice responses one needs error trials, but the model was trained without this input noise that induces errors (and produces a sigmoidal psychometric function) to match experimental PSTHs from correct trials only. Post-training simulations were then run with input noise to induce error trials, and model unit response profiles were classified based on this. However, there is no guarantee that error trials in the model match the error trials in the experiment; therefore, step-perception and step-choice units in the model may or may not be step-perception and step-choice units in the data. Despite this limitation, the revised manuscript includes additional examples, in Figure S2, of experimentally observed PSTHs and their corresponding model activity, to supplement Figure 5C and provide a better sense of the goodness-of-fit.

      (3) Electrophysiological data in Figure 2 - It would be helpful to provide statistics on how many neurons change their activity in each session.

      In the revised manuscript we have included across-session statistics for proportions of neurons that are taste-responsive and that show decision preparatory activity. We have also included tables (Tables S1 and S3) with the numbers of neurons that are taste-responsive and that show preparatory activity for each session in the experimental and model data.

      (4) Peak auROC selection - How was the peak auROC selected? Selecting only one bin for the peak could be potentially problematic and may result in the incorrect identification of an outlier that does not faithfully represent the neuron's overall activity. The peak selection could instead be based on several consecutive bins showing a consistent trend. If this approach was already implemented, the authors should explicitly describe it in the Methods section.

      Peak auROC was selected from a single bin (with average duration about 50ms). While it is true that this may result in outlier neurons that transiently prefer one stimulus strongly but more consistently prefer the other, we opted for a simple criterion to sort the neurons into two categories for visualization. Adopting more stringent criteria that consider multiple bins may result in neurons that cannot be placed in either category, and we wanted a way to examine the entire pseudo-population. Also, the entire auROC trace is visualized in the heatmap, so potential outliers are not hidden and can be assessed by eye.

      Reviewer #3 (Public review):

      Primary taste cortex neurons show a variety of dynamic response profiles during taste decision-making tasks, reflecting both sensory and decision variables. In the present study, Lang et al. set out to determine how neurons with distinct response profiles contribute to perceptual decisions about taste stimuli.

      The methods, with reference to the behavioral task and electrophysiological recordings/data analysis, are straightforward, solid, and appropriate. The computational model is presented in a clear and conceptually intuitive manner, although the details are outside of my area of expertise.

      The experimental design features a simple 2-alternative forced-choice design that yielded clear psychometric curves across a range of stimuli. In vivo recordings were performed using Neuropixels and yielded an appropriate sample of single neuron responses. The strength of the model lies in the fact that it consists of single neurons whose response profiles mimic those recorded in vivo, and allows neuron-selective manipulation.

      By virtually lesioning specific subsets of neurons in the network, the authors demonstrate that a relatively small population of neurons with specific tuning profiles was sufficient to produce the observed neural dynamics and behavioral responses. This effect was selective as lesioning other responsive neurons did not affect overall response dynamics or performance.

      These findings provide new insight into the relation between the response profiles of single neurons in sensory cortex, their population-level activity dynamics, and the perceptual decisions they inform.

      The approach is particularly innovative as it uses computational modeling to target functionally-defined "cell types", which cannot necessarily be targeted by more conventional genetic approaches.

      We thank the Reviewer for the positive assessment of our study.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: I'm missing a clearly stated specific hypothesis and what is predicted on the basis of that hypothesis. What is the alternative?

      The null hypothesis is that single neuron activity patterns, even when clearly structured, do not matter for population activity or behavior. Alternatively, they do matter for these phenomena, and our model supports the alternative hypothesis. We have made this hypothesis clearer in the Introduction.

      (2) Discussion: Much of the text is a recap of the Introduction and Results sections. Please elaborate on the specific insights gained from the findings. The idea that tuned neurons in the sensory cortex are the basis for perception and perceptual decisions concerning the features being represented by those neurons is generally accepted. What the present study adds to this insight could be described more explicitly. On the other hand, the idea that small populations of tuned neurons are responsible for perception of taste/perceptual decisions about taste appears in contrast with previous accounts where stimulus features/decisions are reflected in correlated changes in activity across distributed populations of taste cortical neurons, including ones that are not necessarily tuned or even overtly responsive. How do the present findings relate to this idea?

      This is a very good point about reconciling these findings with past ones that have focused on coordinated changes across ensembles of neurons, i.e., metastable dynamics of internal (hidden) states. There is a brief mention of metastability toward the end of the Discussion, but we agree it deserves elaboration.

      This work does emphasize single unit activity, but in the context of, and as relevant to, population activity. We believe that the findings and frameworks of previous studies and those presented here are compatible rather than mutually exclusive. There is no reason why neurons with the coding patterns we studied here cannot coordinate with others to participate in the formation of different metastable states. The question of which—neurons with specific response profiles, or ensemble activity patterns that may involve these neurons?—is necessary and sufficient for producing perception and behavior during the mixture-based decision-making task is interesting but rather difficult to answer because of the single units’ contribution to both alternatives. One would need to utilize a manipulation that disrupts ensemble coordination without disrupting single unit activity to differentiate between them. We have made these points clearer in the Discussion.

      (3) Results: RNNs were based on data from single sessions -- how many neurons of each tuning type were observed in each session? In particular, there were 23 sessions but only 25 neurons total tuned to choice, suggesting that modelled choice neurons were based on ~1 neuron.

      The revised manuscript includes the session-by-session breakdown of response types for both experiment and model in two supplementary tables (Tables S2 and S4). We note that there are 25 neurons tuned to choice during the last 500 ms of the trial prior to decision, but 114 out of 626 neurons in total are tuned to choice in some time bin in the experimental data.

      (4) Minor: Indicate the time windows used for analysis of stimulus sampling, delay, and choice on the figures.

      The revised manuscript now includes the illustration of sampling and delay windows in Figure 2C-D, since we averaged the values over these windows for use in a 2-way ANOVA. All other figures either are associated with bin-by-bin analyses and have the first central and lateral licks (T and D) indicated, or have the time windows specified (e.g., Figure 4B, which uses [T, T + 0.5 s] and [D - 0.5 s, D]).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper aims to characterise the physiological and computational underpinnings of the accumulation of intermittent glimpses of sensory evidence.

      Strengths:

      (1) Elegant combination of electroencephalography and computational modelling.

      (2) The authors describe results of two separate experiments, with very similar results, in effect providing an internal replication.

      (3) Innovative task design, including different gap durations.

      Weaknesses:

      (1) The authors introduce the CPP as tracking an intermediary (motor-independent) evidence integration process, and the MBL as motor preparation that maintains a sustained representation of the decision variable. It would help if the authors could more directly and quantitatively assess whether their current data are in line with this. That is, do these signals exhibit key features of evidence accumulation (slope proportional to evidence strength, terminating at a common amplitude that reflects the bound)? Additionally, plotting these signals report locked (to the button press) would help here. What do the results mean for the narrative of this paper?

      The reviewer is correct that properties such as temporal slope scaling with evidence strength and stereotyped threshold-like amplitude were key in establishing that the CPP reflects evidence accumulation in conventional continuous-stimulus tasks, and its motor independence was demonstrated in how it exhibited the same evidence-dependent dynamics in the absence of motor requirements (e.g. O'Connell et al 2012). We agree that it is of interest to check any such properties that can be feasibly tested in the current, distinct task context of intermittent evidence with delayed responses. Given the way in which participants performed our delayed-response task, sometimes terminating decisions early, it is in the CPP-P1 that conventional patterns of coherence-dependence in slope and amplitude would be expected. Indeed, we found that the CPP-P1 reached higher amplitudes (Fig. 3A, Author response image 1) and exhibited a steeper build up in high- compared to low-coherence trials (Author response image 1). The slope and amplitude profile of the CPP-P2 is complex due to the variability in baseline activity across our various delay conditions and the bounded process that participants engaged in, but it is still consistent with an accumulation process. Our simulations provide a full account of how an accumulating signal could produce the observed results.

      Author response image 1.

      Grand-averaged (± sem) CPP-P1 traces in both experiments (top). Bottom boxplot graphs indicate the average slope computed as the slope between 0.2 s post P1 onset (when CPP begins its buildup) and the time when peak amplitude was reached within the [0.4-0.6s] interval, computed for each subject individually. Red crosses indicate outliers, computed as values exceeding 1.5 times the interquartile range away from the bottom or top of the box. Grey lines indicate single subject estimates, and asterisks reflect the significance of paired ttests for the estimated slope and amplitude effects; **p<0.01, *p<0.05. H = high coherence, L = low coherence.

      Like in other delayed-response tasks (Twomey et al 2016; McCone et al 2026), we observe here that the CPP peaks and falls well before the response is cued or indeed executed (here, in fact peaking and falling for each individual pulse). Thus, its pre-response dynamics will not relate to stimulus-driven evidence accumulation in the way they do in immediate response contexts (e.g. O’Connell et al. 2012; Steinemann et al. 2018). We therefore do not analyse response-aligned CPPs in the experiment.

      As to the intermediary role we have interpreted for the CPP, in addition to the local pulse driven peak-and-fall dynamics compared to the sustained profiles of motor preparation signals, we can point to the obvious temporal delay between the signals, where evidence-dependent buildup in the CPP substantially precedes that of motor preparation, as observed in all previous studies comparing the two (e.g. Kelly & O'Connell 2013).

      (2) The novelty of this work lies partly in the aim to characterize how the CPP and MBL interact (page 5, line 3-5). However, this analysis seems to be missing. E.g., at the single-trial level, do relatively strong CPP pulses predict faster/larger MBL? The simulations in Figure 5 are interesting, but more could be done with the measured physiology.

      As exemplified in the extant EEG-decision literature, the low signal-to-noise ratio of EEG is such that attempts are seldom made to link two EEG signals on a single-trial basis, and studies instead favour testing single-trial relationships between each individual EEG signal and behaviour, or, most commonly, comparing patterns of variation in the EEG signals across experimental conditions (e.g. difficulty). Accordingly, here we show that trials with high coherence P1 evoked 1) higher CPP amplitudes (Fig. 3A,C), and 2) stronger MBL (Fig. S2 & S3). Further, we showed that particularly high CPP amplitudes following the first pulse led to stronger weights on choice for the first pulse (Fig. S11), which could only be mediated by the motor system.

      (3) The focus on CPP and MBL is hypothesis-driven but also narrow. Since we know only a little about the physiology during this "gaps" task, have the authors considered computing TFRs from different sensor groupings (perhaps in a supplementary figure?).

      While we agree that it might be interesting to explore frequency bands and sensors more broadly, we feel that such an exploration would detract from the hypothesis-driven focus on how prominent, well-characterised decision signals in the brain behave in a context where evidence is presented in an atypical, seldom-studied manner, namely in the form of temporally separate pulses. Our aim was not to explore whole-brain dynamics that might be engaged during the task, but rather to get a better understanding of the functional roles of the neural processes underlying the CPP and MBL during decision making. Providing a detailed description of whole-scalp responses is thus beyond the scope of this paper, but given that all data will be made publicly available this can be pursued in future work and by other researchers.

      (4) The idea of a potential bound crossing during P1 is elegant, albeit a little simplistic. I wonder if the authors could more directly show a physiological signature of this. For example, by focusing on the MBL or occipital alpha split by the LL, LH, HL and HH conditions, and showing this pulse- as well as report-locked. Related, a primacy effect can also be achieved by modelling (i) self-excitation of the current one-dimensional accumulator, or (ii) two competing accumulators that produce winner-take-all dynamics. Is it possible to distinguish between these models, either with formal model comparison or with diagnostic physiological signatures?

      In addition to the CPP amplitude effects we report in the main paper, the reviewer is correct that pulse-locked MBL can also provide a physiological signature of the greater number of pulse-1 bound crossings when that pulse is high-coherence. This is shown in Figure S3, where we see this coherence-dependent effect consistently across all gap durations and both experiments. Figure S2 also shows that the MBL step-change after P2 is greater in P1-low coherence trials in Experiment 1, as predicted by the bound-crossing account, and consistent with the CPP findings. We note that this effect appears absent in Experiment 2, but this is likely because the greater proportion of shorter gap durations (0, .12, .36s) mean that updates following P2 are likely to still capture P1-driven changes, due to signal-transmission delays. Please also note that Fig. S2 and S3 have been updated from the previous version, because while revising the paper we noticed a mistake whereby we were plotting alpha band power (813Hz) rather than the intended beta (13-30Hz). The results remain qualitatively unchanged. Although there isn’t sufficient single-trial signal-to-noise ratio to be able to categorise individual trials as having crossed a threshold or not, this is strong evidence in support of the coherence dependent amplitudes of the CPP and motor updates. Analyzing beta locked to the report would not be informative in this case because of the delayed reporting structure of the task and the threshold-crossing relationship beta exhibits with response execution (O’Connell et al. 2012). That is, beta will reach the same amplitude immediately prior to the response regardless of whether or not decisions were terminated during P1. Instead, we believe that the empirical CPP-P2 traces we show provide direct evidence that the second pulse was not fully integrated in all trials, and as our modelling confirms, this is consistent with bound crossings occurring sometimes before P2. First, the fact that CPP-P2 amplitudes were overall lower than CPP-P1 amplitudes mirrors the behavioural observation that the first pulse had a stronger weight on choice than the second one. Second, we show that trials where the CPP was particularly high after the first pulse were also trials where P1 also exerted a particularly strong influence on choice (see Fig. S11), further validating the idea that higher CPP amplitudes are directly related to behaviour.

      Regarding self-excitation (SE) and winner-take-all competition (WTAC), these could indeed contribute to the behavioural primacy effects, but they would not detract from our central finding that the CPP does not encode a sustained representation of a decision variable, but rather reflects two rounds of evidence accumulation feeding into a single decision process. Further, it is not immediately clear whether/how these alternative models might also account for the CPP-P1/CPP-P2 results as simply as our bounded model does. While it might be theoretically possible for SE/WTAC models to explain 1) why the CPP-P2 is generally lower than the CPP-P1 across conditions, and 2) why the maximum CPP-P2 amplitudes in P1-high trials are smaller than in P1-low trials, these patterns of results are not an immediate consequence of standard implementations. Further, while the question of whether the accumulation process is perfect integration or involves SE or WTAC is certainly of additional interest, given that this is a delayed response task and does not provide information on termination timing through RT distributions, arbitrating between these modes of integration would not be straightforward with the current data.

      (5) The way the authors specify the random effects of the structure of their mixed linear models should be specified in more detail. Now, they write: "Where possible, we included all main effects of interest as random effects to control for interindividual variability." This sounds as if they started with a model with a full random effect structure and dropped random components when the model would not converge. This might not be sufficiently principled, as random components could be dropped in many different orders and would affect the results. Do all main results hold when using classical random effects statistics on subject-wise regression coefficients?

      The equations in the paper include the full details of the random effects structure we used for each model. We note that only two of our four equations did not include a full random effect structure, indeed due to convergence issues. We have now fit these models with a maximal random effects structure (i.e. including all fixed effects as random effects as well) with the ‘bobyqa’ optimiser. This resulted in singular fits for both Eq. 2 (Exp. 1 and Exp. 2) and Eq. 3 (Exp. 2 only). Following previous suggestions, we used a weakly informative wishart prior (Chung et al. 2015) to regularise the random effects covariance matrix using the blme package (Chung et al. 2013), which resolved the singular fit problem. However, the model still produced convergence warnings in some models. To assess these models’ robustness, we compared the fixed effect parameter estimates across multiple optimisers, as suggested by the lme4 developers (see lm4 documentation). Parameter estimates across optimisers rarely deviated by more than one decimal point across 6 optimisers (see Bates et al. 2011), and we thus concluded the model estimates were robust and convergence warnings were a false positive, a known issue in lme4. For all models in the paper, we report the parameters estimated using the “bobyqa” optimiser. All main inferential results remain unchanged (except for one interaction that was not of interest in Exp. 1), and the estimated slopes and statistical results for all models have been updated in the manuscript. We also included all these details in the manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript examines decision-making in a context where the information for the decision is not continuous, but separated by a short temporal gap. The authors use a standard motion direction discrimination task over two discrete dot motion pulses (but unlike previous experiments, fill the gaps in evidence with 0-coherence random dot motion of differently coloured dots). Previous studies using this task (Kiani et al., 2013; Tohidi-Moghaddam et al., 2019; Azizi et al., 2021; 2023) or other discrete sample stimuli (Cheadle et al., 2014; Wyart et al., 2015; Golmohamadian et al., 2025) have shown decision-makers to integrate evidence from multiple samples (although with some flexible weighting on each sample). In this experiment, decision-makers tended not to use the second motion pulse for their decision. This allows the separation of neural signatures of momentary decision-evidence samples from the accumulated decision-evidence. In this context, classic electroencephalography signatures of accumulated decision-evidence (central-parietal positivity) are shown to reflect the momentary decision-evidence samples.

      Strengths:

      The authors present an excellent analysis of the data in support of their findings. In terms of proportion correct, participants show poorer performance than predicted if assuming both evidence samples were integrated perfectly. A regression analysis suggested a weaker weight on the second pulse, and in line with this, the authors show an effect of the order of pulse strength that is reversed compared to previous studies: A stronger second pulse resulted in worse performance than a stronger first pulse (this is in line with the visual condition reported in Golmohamadian et al., 2025). The authors also show smaller changes in electrophysiological signatures of decision-making (central parietal positivity and lateralised motor beta power) in response to the second pulse. The authors describe these findings with a computational model which allows for early decision-commitment, meaning the second pulse is ignored on the majority of trials. The model-predicted electrophysiological components describe the data well. In particular, this analysis of model-predicted electrophysiology is impressive in providing simple and clear predictions for understanding the data.

      Weaknesses:

      Some readers may be left questioning why behaviour in this experiment is so different from previous experiments, which use almost exactly the same design (Kiani et al., 2013; TohidiMoghaddam et al., 2019; Azizi et al., 2021; 2023). The authors suggest this may be due to the staircase procedure used to calibrate the coherence of (single-pulse) dot motion stimuli for individuals at the start of the experiment. But it remains unclear why overall performance in this experiment is so bad. Participants achieved ~85% correct following 400 ms of 33 - 45% coherent motion. In previous work, performance was ~90% correct following 240ms of 12.8% coherent motion. It seems odd that adding the 0% coherent motion in the temporal gaps would impair performance so greatly, given it was clearly colour-coded. There is a lack of detail about the stimulus presentation parameters to understand whether visual processing explains the declined performance, or if there is a more cognitive/motivational explanation.

      We thank the reviewer for highlighting this. We apologise for not providing full details about the visual display, which we have included now.

      The moving dots were presented centrally on the monitor, at a 5 degree aperture, and moving at a speed of 5 degrees/second. The monitor refresh rate was 60Hz for 19 participants and 85Hz for 3 participants in Experiment 1, while it was 85Hz for 19 participants and 60Hz for 2 participants in Experiment 2. Dot density in our task was similar to previous studies (16.7 dots/degree/s<sup>2</sup>, as in Kiani & Shadlen 2013; Tohidi-Moghaddam et al. 2019; Azizi et al. 2021, 2023). However, in contrast to previous studies, we did not include any feedback on a trial-bytrial basis, instead only providing feedback at the end of each block indicating the average accuracy. This would have made it harder for participants to continually assess how well they were performing and to adjust their strategies (e.g. increase their bound for better accuracy) accordingly. We agree that the inclusion of 0% coherence dots during the gap between pulses is unlikely to have caused the participants’ relatively low overall performance, especially since we did not find accuracy to be overall lower for longer 0%-coherence gaps.

      Further, as the reviewer notes, we used a staircasing procedure at the beginning of the experiment which used only single pulses of evidence. This may have encouraged participants to set a bound that can usually be reached by one pulse, and the resultant early terminations meant that they seldom used the full 400ms of evidence that were available to them. In fact, we would like to thank the reviewer for pointing out Golmohamadian et al., 2025, which used a similar variable delays task structure but with different visual stimuli. They, like us, trained on a single-pulse task version and omitted trial-by-trial feedback in the main task, and, also like us, reported a stronger choice reliance on pulse-1. This suggests that these two factors may suffice to induce a primacy rather than a recency effect.

      There are other reasons why performance may have been different in our task compared to previous studies. For example, our task included a lead-in period that was longer than in previous studies and contained 0%-coherence dots, in order to minimise interfering VEP components (the lead in period was between 700 to 1050ms in our study, compared to 200– 500 ms in Kiani & Shadlen 2013; Tohidi-Moghaddam et al. 2019 & Azizi et al. 2023, and 400 -1000 ms in Azizi & Ebrahimpour 2021). This longer and visually explicit preparation period may have acted as a warning cue, allowing participants to fully prepare before the first pulse, and again making it easier for them to hit a bound with only that information.

      We have added a more detailed discussion about how our stimuli and the task characteristics may have resulted in a substantially different performance in our task compared to previous studies in the discussion section.

      Recommendations for the authors:

      Reviewing Editor:

      Please consider the following reviewer suggestions for how to strengthen the evidence for your central claims, which could translate into an improved assessment of the "strength of evidence".

      Apart from these useful suggestions, I had some concerns about scholarship, because the list of studies currently cited in your introduction is exclusively from your group, while one of the phenomena of interest - motor beta power lateralization (MBL) in decision-making - has been widely studied by several groups, using also other techniques.

      I was wondering why you chose not to cite the ample MEG evidence for the role of MBL in decision-making. This has been shown both in classical random dot motion tasks (Donner et al, Curr Biol, 2009; de Lange et al, J Neurosci, 2013; Pape et al, Nat Commun, 2016; Urai et al, Nat Commun, 2022) as well as in tasks involving discrete evidence samples (Wilming et al, Nat Commun, 2020; Murphy et al, Nat Neurosci, 2021). Another relevant EEG study is by Ian Gould et al, J Neurosci, 2010. There is also quite a bit of monkey LFP work (mainly by Saskia Haegens) on choice-selective beta power in the motor system of the macaque, although the link to the lateralized beta power suppression in your work and the above human E/MEG studies remains a bit elusive. I feel it would be important to provide a more balanced reflection of the existing literature on this phenomenon.

      We thank the editor for this fair comment, and we apologise for having provided a too narrow, EEG-centric view of the literature, arising from our interest in the CPP component which hasn’t yet been characterised in MEG or LFPs. We have now substantially expanded the introduction to provide a more balanced and comprehensive overview of the literature.

      Reviewer #1 (Recommendations for the authors):

      (1) The diffusion model needs to be explained in more detail. For example, it should be explicitly stated that the model was fit to only choices, as most readers would expect reaction times. Further, it needs to be started if the model was fit separately for each subject or in one go to the group-level data. If the former, it is important to add error bars of the betweensubjects variability (in simulated and empirical data) to Figure 4A. If the latter, it would be important to determine uncertainty using bootstrapping.

      The original model was fit to grand-average data, as stated in the methods section. To assess between-subjects variability, we have re-fitted the model to each individual subject, for each experiment. The average of the individually-estimated model parameters closely recapitulated the values obtained from the fit to grand-averaged data (Fig. S12). We then simulated N = 10000 trials for each individual, and we report the grand-averaged results with error bars indicating the standard error of the mean as a supplementary figure (Fig. S13). The results replicate the ones reported in the main manuscript. We have also made it explicit that the models are fit to accuracy data but not RT.

      (2) The authors write numerous times that the MBL exhibits an "evidence-dependent" buildup. However, should this not be "choice-dependent"? In Figure 2A, one can clearly see that the sign of MBL follows choice and not objective evidence.

      We thank the reviewer for this comment. By evidence-dependent, we mean that lateralisation towards the correct response is strongest in high-coherence trials (see Fig. S2, S3). This is indeed because the sign of MBL is choice-dependent, and participants are less likely to make mistakes in high-coherence trials. We have added a clarification sentence in the text.

      (3) It would aid readability to add sub-conclusions at the end of each Results section.

      We have added clarifications where needed.

      (4) In Figure 1B, I cannot see a dashed line for the HL condition. I understand that it must lie under the LH condition, but it would be good to show it separately.

      We thank the reviewer for this comment. Since we cannot show both lines separately without additional panels, given the HL and LH lines perfectly overlap, we indicate at the end of the caption that this is the case as follows: “Note that a perfect accumulator predicts identical accuracies for the HL and LH conditions, and therefore the two lines overlap.”

      (5) In Figure 4B, is the horizontal dashed line important? It is confusing because the legend incorrectly states that this is "data".

      Thanks for this observation - it was only there to indicate a 50% as a benchmark to assess how frequent early terminations are, but we agree that it was unnecessary and potentially confusing, so we have removed it from the plot.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors should more directly address how behaviour in their task differs quite substantially from previous experiments with very similar designs (including why such high coherence levels are required, over a longer duration, to reach overall worse performance). Some readers may also be interested in a broader discussion of how decision-makers may use flexible weights when integrating evidence across samples over time. While the explanation of bounded accumulation is convincing in this context, Tsetsos et al., (2012) suggest recency effects (as in Cheadle et al., 2014; Wyart et al., 2015) cannot be explained by bounded accumulation, but rather integration leak. Other factors may include stimulus consistency (Glickman et al., 2022) or even choice consistency across decisions (Bronfman et all., 2015). Golmohamadian et al., 2025 demonstrated flexibility in decision strategies across sensory modalities.

      As we described above, we have added some more detailed explanation about why it might be the case that behaviour in our study differs from previous reports using similar tasks. We agree that the reversed pulse-reliance in our study compared to others presents an opportunity to discuss flexibility in decision strategy and so we have now added a broader discussion on different patterns of integration in various task contexts. We thank the reviewer for pointing out Golmohamadian et al., 2025, as they, like us, trained on a single-pulse task version and omitted trial-by-trial feedback in the main task, and, like us, reported a stronger choice reliance on pulse-1.

      (2) Another open question is how central parietal positivity reflects an accumulation signal in the case of continuous evidence, but reflects momentary evidence in the case of discrete evidence samples. If, in both cases, the parietal evidence is passed along to motor processes for bounded decision commitment, how do motor processes deal with the changes in what is represented? Can the relationship between MBL and CPP in the model-simulated data shed some light on this? Specifically, how is the 0-gap condition treated in this simulation (which shows only 1 CPP peak but with a longer time to decay) compared to non-zero gap conditions (which show 2 peaks)?

      This is a very interesting and important point, and we thank the reviewer for raising it. We believe that the CPP in our intermittent-dots task reflects dot-motion evidence integration in the same way as in conventional continuous evidence tasks, building at an evidence dependent rate (see Author response image 1), with the only difference being that integration processes can be turned “on” or “off” depending on whether evidence is present, and can thus be temporally split into multiple “rounds” of accumulation when there is a gap.

      Our model simulations assume that evidence integration is triggered by the dots turning yellow, indicating the presence of evidence, and feeds continuously to the motor system in these periods. However, it is switched off either when 1) a bound has been hit, or 2) the dots turn blue again, at which point the CPP falls (see various rates of signal decay in Fig. S7). The reason the CPP continues longer before it peaks and falls in the zero-gap condition, by this account, is because there is no dot-colour change at the end of pulse-1 to switch it off, and thus the accumulation process continues until either a bound is hit, or the yellow dots turn blue after pulse-2. When there is a non-zero gap, despite the CPP being switched off, the decision variable itself remains encoded at the motor level so that no information is lost. This requires that the same instruction that turns-off the CPP must also break or pause the flow from the CPP to the motor level and allow it to hold its current level until either a second pulse resumes a feed from a newly-triggered CPP, or response execution is cued. Thus, in our account, the accumulation process underlying the CPP in our intermittent-evidence task is identical to conventional continuous-evidence tasks, but since it can be turned “on” and “off” as a function of whether or not evidence is clearly present or absent, produces two “rounds” of integration in non-zero gap conditions. The motor process also receives a feed from the CPP as in conventional continuous-evidence tasks, but with this feed similarly gated by the presence of evidence.

      A slightly different and perhaps more challenging question (which the reviewer was perhaps alluding to) relates to tasks where evidence comes not in short noisy snippets, but rather as static tokens (e.g. Wyart et al. 2012, 2015; Murphy et al. 2021; Parés-Pujolràs et al. 2025). In these instances, the CPP exhibits transient evoked responses to each token, which scale with the belief updates resulting from it (Parés-Pujolràs et al. 2025). However, it remains unclear whether these transient potentials reflect a temporally-evolving integration process to compute the appropriate belief update afforded by that token in the context of a particular task, or rather reflect the output of such a process. The former account would be similar to our interpretation of the transient deflections observed in this gaps task, which we believe capture the same temporal integration processes as those commonly observed in conventional continuous noisy stimuli paradigms, only short-lived. The latter account would instead be specific to low-noise stimuli like tokens, where the computations required for belief updating may not require a temporally-extended integration process, but rely on different mechanisms to compute belief updates (e.g. prior-based modulations of sensory encoding, attention or neural gain). These questions remain open for future investigation.

      (3) From what I understand, the model suggests all-or-none integration of the second pulse: either the bound has not been reached and the pulse is perfectly integrated, or the bound has been reached and so the pulse is not integrated. The CPP amplitude at pulse 2 is therefore determined not only by the strength of the evidence at pulse 2 but also by the proportion of trials where the evidence is not ignored: CPP at pulse 2 is of lower amplitude because it is calculated as an average across trials where it is either similar to CPP at pulse 1 or otherwise completely absent. Another explanation for the lower average amplitude is that all trials have a smaller amplitude (somewhat different from the main conclusions of the paper). It would be nice to show the dichotomy predicted by the model in the empirical data. I'm thinking of something similar to this 'bifurcation' analysis from Sergent et al., 2021. Or more simply, estimates of CPP amplitude from single trials (perhaps an average over a short window around the peak) should be more variable at pulse 2, with some reaching similar amplitudes to pulse 1, and many close to baseline, whereas at pulse 1, there should be a more uniform cluster of amplitudes. If all CPP peak amplitudes were lower, would this motivate a model comparison where, for example, additional evidence from the second pulse was down-weighted according to certainty following the first pulse (leading to all trials down-weighting the second pulse)? This could link in nicely with some of the more nuanced analyses related to attention in the supplementary figures.

      We thank the reviewer for this insightful comment, which will help us clarify how our model works. The integration of the second pulse does not work in an all-or-none manner. In our model, the accumulation stops whenever a bound is reached at the downstream motor level. This can happen 1) at some point during the 1st pulse (no integration of pulse 2 at all), 2) during the 2nd pulse (partial integration of pulse 2, until the bound is hit), or 3) not crossed at all (full integration of pulse 2). Our model thus allows for partial integration of the second pulse rather than all-or-none. Author response image 2 shows 3 example trials that illustrate how the model works. The CPP amplitudes at pulse 2 are thus determined by two main factors: 1) whether or not accumulation of P2 is precluded by an earlier bound crossing in P1 (if it is, the CPP amplitude is assumed to equal 0), and 2) whether and when accumulation ended if it did take place. Our interpretation is that, given that trials where pulse 1 was low coherence were 1) less likely to terminate early (Fig. 4B) and 2) had achieved lower levels of accumulated evidence (Fig. 4C), the LL and LH conditions are linked to a higher proportion of trials where accumulation at pulse 2 does occur, and it lasts for a longer amount of time because the distance required to reach a bound is longer than in their pulse 1 high-coherence counterparts. We have clarified this point in the results section describing the model.

      The reviewer notes: “Another explanation for the lower average amplitude is that all trials have a smaller amplitude (somewhat different from the main conclusions of the paper)”. However, our interpretation in fact predicts that the vast majority of trials should indeed exhibit smaller amplitudes. That can again be explained by the three trial types mentioned above. Unlike in CPP-P1, there would be a majority of trials where integration does not occur at all. Only trials where evidence was at least partially integrated during P2 would be predicted to have CPPP2 amplitudes that are overall positive, and even in those instances, average amplitudes would be overall lower than CPP-P1 in trials that terminated early, because of the lower distance remaining to be covered before hitting a bound. Author response image 2 illustrates this point. Thus, the prediction regarding how CPP amplitude variance or distribution shape would compare between P1 and P2 is less straightforward than if it were all-or-none on P2, not to mention the fact that EEG noise would likely drown-out distributional features like this. We therefore focus on a comparison of the means, for which our model has the clear prediction that most trials should exhibit lower CPP-P2 amplitudes. To assess whether empirical observations meet this prediction, and following the reviewer’s suggestion, we extracted the mean amplitudes around 0.45-0.55s after P1 and P2, for each single trial. CPP-P2 data were baselined using the amplitude 100 ms before P2 onset, as in Fig. S5 - note that this is likely to introduce spurious drifts due to overlapping potentials from P1, but given that grand averaged traces still qualitatively captured the key effects we assume it is a valid approach. We then pooled CPP-P1 and CPP-P2 amplitudes across pulses, and z-scored them for each participant separately. In both experiments, in a majority of participants (Exp. 1: 16/22, Exp. 2: 17/21) the median z-CPP-P1 amplitude was higher than that of z-CPP-P2. Author response image 3 illustrates the pooled distributions.

      Author response image 2.

      Decision variable simulations illustrating sample single trials (top) and CPP traces averaging data across conditions and N = 1000 trials (bottom), using model fits from Exp 2, in the long gap condition. Overlaid text indicates the percentage of trials in each subset, for each condition. The horizontal line indicates the bound; shaded areas indicate pulse presentation times. A. The bound was hit during P1, and therefore no further accumulation occurred during P2. B. The bound was hit during P2, and therefore P2 was only partially accumulated, C. No bound was hit, and therefore all evidence from P2 was accumulated.

      Author response image 3.

      Pooled CPP–P1 and CPP-P2 amplitudes [450-550ms post-pulse] distributions, normalised within-participant, and baselined 100ms before pulse onset. In both experiments, CPP-P2 amplitudes had a lower median (vertical line) normalised amplitude than CPP-P1.

      (4) A minor note: Full details of stimulus presentation (size, number of dots, dot size, speed, lifetime) would be appreciated.

      Thank you - we have now provided these details in the methods section (see also reply to public reviews above).

      (5) Are the authors sure they want to use this 'Gaps task' name? It seems a bit strange to introduce this name in this context, where there isn't really a 'Gap' (random dot motion fills the gap). A reader could get the impression the name was given in the Kiani et al., 2013 study (page 3, paragraph 1: "This scenario has begun to be studied using an intermittent- evidence or "gaps" task (Kiani et al., 2013) ...") but this is not true, Kiani et al. never use the term "Gaps task", nor has any other study since (as far as I know).

      We thank the reviewer for noting this oversight on our part - we have now made it clear that “gaps task” is the way we refer to the task originally developed by Kiani et al. 2013 in the introduction. We have decided to still use this name because it is a convenient proxy, in the understanding that “gap” refers to a “gap” in coherent motion as in Kiani et al (2013), albeit not a proper blank as in the original implementation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides valuable insights with solid evidence into altered tactile perception in a mouse model of ASD (Fmr1 mice), paralleling sensory abnormalities in Fragile X and autism. Its main strength lies in the use of a novel tactile categorization task and the careful dissection of behavioral performance across training and difficulty levels, suggesting that deficits may stem from an interaction between sensory and cognitive processes. However, while the experiments are well executed, the reported effects are subtle and sometimes non-significant. The interpretation of results may be overextended given the nature of the data (solely behavioral), the reliance on repeated d′ measures may obfuscate some of the results without clearer psychometric or regressionbased analyses, and the absence of mechanistic, causal, or computational approaches limits the strength of the broader conclusions. The work will be relevant to those interested in autism, cognition, and/or sensory processing.

      We thank the editors for their positive assessment of the data quality and the novelty of our behavioral task, and for pointing out the limitations inherent in behavioral studies.

      We would like to clarify one important point regarding the use of d′ measures. While d′ was included to quantify sensitivity, our conclusions are not based solely on repeated d′ measures. In addition to d′, we analyzed raw behavioral data (correct and incorrect choice rates), and categorization performance was assessed using psychometric curves fitted with logistic regression models. These complementary analyses provide converging evidence and ensure that our interpretations are supported by multiple robust measures.

      In the revised manuscript, we have further strengthened the analyses by including additional regression-based assessments, reporting effect sizes for subtle effects, and refining the statistical methods for clarity and transparency.

      We fully acknowledge that this work is behavioral and does not directly reveal the underlying neural mechanisms. Nonetheless, the translational framework we have developed establishes a robust foundation for future studies. This platform can be directly applied in clinical research on autism and other neuropsychiatric conditions involving sensory-cognitive interactions, and provides a solid basis for subsequent mechanistic, causal, or computational investigations to uncover the neural circuits mediating these effects.

      We greatly appreciate the editors’ and reviewers’ guidance and believe the revisions have clarified and strengthened the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.

      We appreciate the reviewer’s statement highlighting the importance of our study.

      Strengths:

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism.

      We thank the reviewer for recognizing the quality of our experiments and the relevance of our findings for understanding tactile perception and cognition in autism.

      Weaknesses:

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure).

      We thank the reviewer for these constructive comments. We acknowledge that aspects of the analyses were previously difficult to follow, and we have reworked the Results section to improve clarity and transparency.

      We would like to emphasize that all d′ measures are complemented by analyses of raw response rates (correct and incorrect choices), ensuring that our interpretations are not solely dependent on this metric. In addition, we applied standard psychometric analyses wherever possible. For the training phase, only two stimulus amplitudes were presented, which precluded the construction of full psychometric curves; however, for the categorization phase, psychometric analyses were feasible and are reported in Figure 3. Specifically, psychometric functions were fitted to the data using logistic regression, allowing us to estimate both categorization bias (threshold) and precision (slope) across stimulus intensities. These analyses revealed no evidence of categorization bias or precision in Fmr1<sup>-/y</sup> mice across stimulus strengths.

      Following the reviewer’s suggestion, we have also added general linear model analyses that account for trial history, providing a complementary perspective on decision-making dynamics. Finally, while the calculation of d′ is detailed in the Methods, we have revised the Results to clearly explain its use and appropriateness in each relevant analysis.

      These revisions aim to provide a clearer, more comprehensive picture of the data while ensuring that all conclusions are supported by multiple complementary measures.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      We thank the reviewer for emphasizing the strengths of our task design and analysis approach, and we appreciate that the potential of this platform for future mechanistic investigations is recognized.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative.

      We thank the reviewer for the careful reading of our manuscript and for these constructive comments. We agree that our study is purely behavioral, and we appreciate the opportunity to clarify the scope and interpretation of our findings. The primary goal of this work was to characterize behavioral patterns during tactile discrimination and categorization in a translationally relevant mouse model of autism.

      Although we did not include direct neural recordings, causal manipulations, or computational modeling, our analyses combining choice behavior, sensitivity measures from signal detection theory, psychometric curves, and regression-based models of trial history provide a detailed and robust characterization of perceptual learning, stimulus discrimination, categorization, and the interplay of cognitive processes with tactile perception. The manuscript has been revised to explicitly state that our conclusions are behavioral, emphasizing that this work establishes a foundation for future studies aimed at elucidating the neural and circuit mechanisms underlying these sensory–cognitive interactions.

      Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered.

      Alternative explanations for our findings including differences in motivation, fatigue, satiety, stereotyped licking, or reward valuation were carefully considered. As described in the Methods, only testing sessions with >70% correct performance on the training stimuli (12 µm and 26 µm) were included, excluding sessions with reduced motivation, fatigue, satiety, or stereotyped licking that could confound performance on low- or high-salience stimuli.

      Although differences in reward valuation could affect learning speed, we observed no genotype differences in training duration (Fig. 1B-D, Fig. S1C-D). Sessions with disengagement were analyzed only during epochs of active task performance (information added to the revised Methods section, lines 619-620). Reward-driven choice biases were unlikely, as no genotype differences were observed in categorization bias (Fig. 3F) and GLM analyses confirmed that previous reward outcome did not affect current choices (Fig. 4D).

      Finally, altered reward valuation could increase miss rates. Elevated miss rates in Fmr1<sup>-/y</sup> mice were restricted to the lowest-intensity stimulus (12 µm) under high cognitive load, demonstrating a salience- and context-specific effect inconsistent with generalized motivational or reward deficits. The Discussion has been updated to clarify these points and delimit the scope of our interpretations (lines 483-499).

      Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      This was not done intentionally. References to Load Theory were meant to provide conceptual inspiration for assessing attention in high cognitive load conditions during categorization, rather than to indicate a formal test. Moreover, we do not claim to have tested the Weak Central Coherence theory, although our results suggest reduced facilitation of across- category discrimination. Finally, we agree that citing Adaptive Resonance Theory, which is grounded in artificial neural network models, could be misleading, and we have revised the text accordingly.

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations.

      We thank the reviewer for this comment and agree that our study is purely behavioral and does not provide direct mechanistic evidence for top-down pathway dysfunction. In the first version of the manuscript, the term “top-down” was used at the behavioral level, referring to the influence of higher-order cognitive processes (e.g., categorization, attention, sensory and choice history integration) on tactile perception, rather than to imply specific neural circuits.

      We acknowledge that identifying the neural pathways underlying these effects would require extensive mechanistic experiments, including identifying the specific top-down pathway that modulates the influence of categorization on discrimination without directly altering categorization itself and performing pathway-specific recordings and manipulations. Such work represents a substantial mechanistic research program beyond the scope of the present study.

      To clarify that our study does not provide insights into the neural underpinnings of the studied behavioral processes, we have revised the manuscript, removing the term “top-down” or replacing it with “higher-order processes” where appropriate. We also explicitly noted that future work using neural recordings or causal manipulations will be needed to uncover the neural underpinnings of these behavioral phenomena (lines 508-510).

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited.

      We recognize that terms such as “reduced top-down categorization influence” and “choice consistency bias” are derived from behavioral observations. However, we respectfully note that these behavioral inferences are widely used in clinical studies to characterize cognitive tendencies (Soulières et al., 2007; Feigin et al., 2021) and are not inherently speculative.

      The translational impact of our work lies in the development of a robust behavioral platform that allows precise dissection of tactile perception and cognitive influences in a manner directly comparable to clinical studies. While we agree that neural, circuit-level, or causal manipulations would provide valuable mechanistic insight, the current study establishes a foundational behavioral framework that can guide and inform future investigations into the underlying neurobiological substrates.

      To ensure clarity, we have revised the manuscript throughout to explicitly indicate that all conclusions are based on behavioral measures and do not imply mechanistic evidence.

      (3) Statistical analysis:

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on non-significant findings undermines confidence in the conclusions.

      We chose to present both statistically significant effects and trends to ensure transparency and to highlight that commonly used aggregate measures, such as d′, can sometimes obscure meaningful underlying patterns. In the text, p-values between 0.05 and 0.1 are described as trends without over-interpreting their significance. To further support interpretation, we have now computed effect sizes (Hedges’ g) for all subtle effects. In the revised manuscript, all interpretations of non-significant effects have been reworded to avoid overstatement.

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations.

      The number of mice used per genotype is consistent with standard practices in behavioral studies of sensory processing. To complement statistical analyses and account for small sample sizes, we have calculated effect sizes (Hedges’ g) for all subtle or trend-level effects (p ≈ 0.05–0.1), providing a measure of effect magnitude independent of sample size.

      As the reviewer correctly noted, no animals were excluded as outliers, since observed variability reflects true biological differences rather than experimental or technical errors. In the revised manuscript, we re-examined all datasets for potential outliers, and when identified, analyses were performed both with and without the data point. Any results sensitive to single animals are explicitly reported. This procedure is now detailed in the Methods section (lines 675-679).

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.

      We thank the reviewer for highlighting this important point. To control for false positives arising from multiple comparisons, we applied the Bonferroni correction. This information has been added to the Methods section (line 682) to ensure transparency and reproducibility of all statistical tests.

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as t-tests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test.

      We thank the reviewer for raising this point, as this was not done intentionally. In the revised manuscript, miss rates for high- and low-salience stimuli were reanalyzed using a mixedeffects linear model, which appropriately accounts for repeated measurements within sessions (Fig. 5; Results section: lines 320-340). This analysis confirmed that Fmr1<sup>-/y</sup> mice exhibit increased miss rates specifically at the 12 µm amplitude, with the effect disappearing at higher low-salience amplitudes (18 µm). Post-hoc comparisons with Bonferroni correction revealed a strong trend for increased misses at 12 µm (T-test: t = -2.8437, p = 0.058, Hedge’s g = 1.23), while no significant differences were found at other amplitudes. The Methods section has been updated to detail this statistical approach for analyzing miss rates (lines 686687).

      (4) Emphasis on theoretical models:

      The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed.

      As mentioned above, our goal was not to directly test theoretical frameworks such as Adaptive Resonance Theory, Load Theory of Attention, or Weak Central Coherence, but rather to provide a context for interpreting our behavioral findings. In the revised manuscript, we have removed references to the Load Theory from the Results section and reframed the Discussion to emphasize that our results are consistent with certain predictions from these cognitive theories, without implying that the experiments directly assessed them. This clarifies that the interpretations are based on observed behavioral patterns, while still acknowledging the potential relevance of these frameworks to better understand tactile perception and cognition in autism.

      Reviewer #3 (Public review):

      Summary:

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice.

      Strengths:

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD.

      We appreciate the reviewer’s positive assessment regarding our study’s translational value and the importance of our behavioral findings.

      Weaknesses:

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.

      We thank the reviewer for these helpful suggestions. We agree that visualizing behavioral patterns, such as raster and density plots of licks, as well as learning rate over time, provides additional insights into learning dynamics. In response, we have added these analyses to the revised manuscript (Fig. S1, Fig. S2), which illustrate both individual and group-level learning trajectories and trial-by-trial licking patterns.

      There was no assessment of reversal learning in Fmr1<sup>-/y</sup> mice in this study. While this is an interesting and important question, and is motivated by previous preclinical and clinical findings, it falls outside the scope of the current manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Main Comments

      (1) This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism vs. WT controls. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention. The experiments seem well performed, with interesting results. I found certain aspects of the analysis not clearly explained, which made it difficult at times to understand.

      Please see specific details in the comments below.

      (2) To measure sensitivity, the authors present many comparisons of d' - sometimes between pairs of stimuli (or sometimes even for a single stimulus level).

      (a) Firstly, the calculation of d' for a single stimulus value is unclear (because the same proportion of high/low choices for a given stimulus can result from shifts in bias/criterion).

      We agree with the reviewer that calculating d′ for a single stimulus conflates sensitivity with response bias/criterion differences. For this reason, the panels showing d′ for individual stimulus amplitudes during training (Fig. 1F and 1G in the original manuscript) have been removed from the manuscript.

      In addition, we revised our d’ (Fig. 1E) and criterion calculations (Fig. 2A), treating the high amplitude stimuli as “signal” and low amplitude stimuli as “noise”, based on the Signal Detection Theory. The formulas used in the revised manuscript take into account correct responses during high amplitude stimuli and wrong responses during low amplitude stimuli to calculate the sensitivity and bias of the mice during discrimination in the training period.

      Sensitivity (d′) is now computed as:

      d' = z(lick right|high amplitude stimulus) - z(lick right|low amplitude stimulus)

      and the criterion (c) as:

      c = −1/2 × [z(lick right / high amplitude) + z(lick right / low amplitude)]

      (b) Secondly, while calculating d' makes sense for comparing two stimulus levels (like in the training condition), in the test condition (with a spread of stimuli), this becomes a little tedious - at times difficult to follow and unclear.

      I would have thought that sensitivity (at least for overall performance) would be better compared using data from all the stimuli - e.g. either using:

      (i) the sigma of the psychometric curve (although the downside of that approach is that it ignores history effects), or

      (ii) a logistic regression for the choices, given the stimuli, where the weights assigned to the stimulus magnitude indicate sensitivity (the advantage of that approach is that history effects, like the previous trials/choices can be used as regressors in the model). Accordingly, it can simultaneously also quantify the history effects. This could even be expanded to a GLMM (mixed effects for different mice).

      We thank the reviewer for this very valuable feedback. Indeed, during the testing phase, we calculated sensitivity d’ to probe the overall categorization sensitivity (Fig. 3H).

      (i) This analysis was only complementary to the psychometric curves (fitted on the rightward lick rate for each stimulus amplitude using a general linear model – Fig. 3A). As the reviewer proposes, we had calculated the sigma of the psychometric curve (Fig. 3G, slope) to assess categorization precision. Sensitivity calculations have also now been revised using the aforementioned formula (d' = z(lick right|high amplitude stimulus) - z(lick right|low amplitude stimulus).

      (ii) To incorporate history effects, we implemented generalized linear models (GLMs) with a binomial link function to predict high-salience licks (right-lick choices) based on the current stimulus, trial history, genotype, and their interactions. A main-effects model included current stimulus, previous stimulus, previous outcome, previous choice, and genotype, followed by interaction terms to assess genotype-specific modulation of history effects. These analyses are now presented in the new Figure 6.

      The resulting coefficients are shown in Fig. 6A. As expected, decisions were primarily driven by current stimulus amplitude (Fig. 6A, B). Both genotypes displayed a tendency to repeat previous choices (Fig. 6A, C), while previous reward outcomes did not influence current choice (Fig. 6A, D). Notably, stimulus amplitude history showed genotype-specific effects: WT mice were negatively influenced by the previous stimulus, whereas Fmr1<sup>-/y</sup> mice remained unaffected (Fig. 6A, E).

      To clearly visualize these findings, we plotted psychometric curves and marginal effects accounting for current stimulus, previous choice, previous outcome, and previous stimulus (Fig. 6B-E). These analyses are now fully integrated into the Methods (lines 688-702), Results (Fig. 6, lines 341-369), and Discussion (lines 469-479) sections of the revised manuscript.

      (3) I find some of the terminology used confusing/misleading:

      (a)The term "Categorization thresholds" can be misleading - in psychometric curves, "thresholds" often refer to the sigma (SD) of the fitted curve used to measure sensitivity (inversely related). Here, I think that the meaning is in terms of the PSE/ criterion. Perhaps the terminology can be improved to prevent confusion on this matter. E.g., I think that here the authors mean a measure of bias/criterion/PSE or similar. Correct? Not really a perceptual "threshold".

      We thank the reviewer for pointing this out. In our analysis, the term “threshold” referred to the inflection point (i.e., the midpoint parameter μ) of the fitted logistic psychometric function used to categorize high- versus low-amplitude stimuli. We termed it “threshold” in the categorization of high and low amplitude stimuli. We agree with the reviewer that we could also use the term “Categorization bias”. We originally opted to avoid this term, not to confuse the readers when referring to the criterion (signal detection theory) as “response bias”. However, seeing as the term “threshold” may be confusing as well, we adopted the term “Categorization bias” in the updated version of the manuscript (lines 282, 284, 637-638, 785, Fig. 3F).

      (b) Similarly, I think that "Categorization accuracy" can be misleading when describing the slope of the psychometric curve. Performance could have a steep slope but still be quite inaccurate (e.g., if there is a big bias). Perhaps "precision" is a better description of the slope?

      We thank the reviewer for this suggestion. The slope of the psychometric curve is often referred to as “sensitivity” in the literature (Carandini and Churchland, 2014), but in our original manuscript we used the term “accuracy” to avoid confusion with the d′ measure from signal detection theory. We have revised the manuscript and Figures with the term “precision” as the reviewer suggested (lines 282, 284, 637-638, 786, Fig. 3G).

      Minor Comments

      (1) Abstract: "determines how autistic individuals engage" - there are other factors too. So, I think that "determines" is a little strong. Perhaps "influences" is more appropriate.

      We have incorporated the reviewer’s suggestion (line 7).

      (2) Figure 1 F, G. On the one hand, d' is defined as "sensitivity (d') in discriminating between high- and low-salience stimuli" - that seems to make sense. But then d' is also calculated and presented for each salience level on its own. How was this done? Namely, percent correct (or proportion of choices high/low salience) could be affected by criterion shifts as well as sensitivity. This makes calculating the d' for a single (low or high) salience stimulus ambiguous. So, how do these authors make this conclusion?

      We agree that calculating d′ for a single stimulus amplitude is ambiguous, because the resulting value conflates true stimulus sensitivity with shifts in response bias or criterion. Consequently, all analyses and figures reporting d′ for individual high- or low-salience stimuli (e.g., Figures 1F and 1G) have been removed from the revised manuscript.

      In the updated analyses, d′ is calculated only across high- versus low-salience stimuli, following standard Signal Detection Theory procedures, ensuring that it reflects true discriminability between the two categories (Methods, line 631; Figure 1E).

      (3) "Our results showed comparable correct choice rates in Fmr1-/y and WT mice (Fig. 1H), for both high- and low-salience stimuli (Fig. S1C-D). In contrast, Fmr1-/y mice presented a significantly higher rate of incorrect choices (Fig. 1I)." - aren't correct choices and incorrect choices complementary (i.e., 1-x) in a 2AFC? How is this possible?

      We thank the reviewer for pointing this out. Correct and incorrect choices are complementary at the single-trial level if miss trials are excluded. However, in our analyses, correct and incorrect choice rates were calculated by normalizing the number of correct or incorrect responses to the total number of trials (including misses), which breaks this complementarity and contributes to the differences observed in Fig. 1H–I. This was clarified in the Methods section (lines 616-617). Moreover, incorrect responses were less frequent than correct ones and are thought to reflect lapses, response bias, and impulsive responding rather than sensory performance, making them more sensitive to genotype-dependent differences in behavioral control. Based on this concept, we further examined whether incorrect choices were preferentially associated with specific stimulus amplitudes and assessed response bias and prior effects.

      (4) The conclusion that "they showed a strong trend toward reduced sensitivity for lowsalience stimuli (Fig. 1G)" has a confound - it could be that there was a criterion shift (rather than differences in sensitivity)?

      We agree with the reviewer that the previously reported trend in sensitivity for low-salience stimuli could reflect a criterion shift rather than true differences in sensory sensitivity. Because sensitivity estimates for individual stimulus amplitudes are not well-defined in a 2AFC framework, we have removed the sensitivity calculations for high- and low-salience stimuli considered independently. Instead, we now present salience-specific differences using correct and incorrect response rates for each stimulus amplitude, which more directly capture performance differences without assuming changes in sensory sensitivity (Fig. 1G-I, S1E-F).

      (5) Figure 3D, E - I stumbled over this in comparison to Figure 3B, C. That is because (a) In D and E, the authors compare right-lick responses (reporting high salience) to stimuli of 12 μm and 14 μm amplitude (Figure 3D) and low-salience lick rates for the same (Figure 3E). I would have thought that these approaches are simply complementary (1-x) - see related minor question above/below. So, what is the advantage of presenting them both?

      We presented both panels to clarify the source of the observed differences in performance. Specifically, showing right-lick responses (reporting high-salience choices) alongside low salience lick rates allows us to distinguish whether reduced high-salience reporting arises from an actual shift in choice (e.g., increased leftward licking) versus an increase in miss trials at the lowest amplitude (12 µm). By presenting both, we can demonstrate that the effect is primarily driven by an increase in leftward choices rather than by missed responses, providing a more precise interpretation of behavioral changes. The complementary analysis for leftward choices has now been moved to the supplemental material (Fig. S5A) and the reason for this analysis has been clarified in the Results (lines 275-276).

      (b) In B and C, the authors compare two differences in stimulus magnitude (2 and 4 μm), but in Figure 3D and E, only one difference (2 μm) from two perspectives. I was expecting a comparison with stimuli differing by 4 μm in amplitude (comparable to the high stimulus comparison of 26 μm vs. 22 μm stimuli).

      We have indeed analyzed the 12 μm versus 16 μm stimulus pair, which corresponds to a 4 μm difference and is reliably discriminated by both genotypes. In the original manuscript, we did not include this comparison because of the differences already seen at a 2 μm amplitude difference. Based on the reviewer’s suggestion, we have now included the 12 μm vs. 16 μm comparison in the revised manuscript (Results, lines 270-272; Fig. 3E) to provide a complementary perspective consistent with the high-salience comparisons (26 μm vs. 22 μm).

      (c) "Sensitivity d' for high- and low-salience stimuli was calculated based on the Correct and Incorrect choice rate for high- and low-salience stimuli respectively." How were trials for which the animal did not respond taken into account? Were these part of the denominator? Or were these excluded when calculating proportions? (related to the Q regarding Figure 3 D,E above).

      Indeed, the Miss trials were part of the denominator. This is now clarified in the Methods section (line 631).

      (d) "c = d'(high)- d'(low)." - I did not understand this fully. There were several high and several slow stimuli - so how were these calculated? Pooled for high and pooled for low? Per stimulus difference?

      This was indeed calculated for pooled high and low amplitudes during testing. In the revised manuscript, criterion c has been recalculated based on the average correct high rate (for stimuli of 20-26 µm amplitude) and average incorrect low rate (for stimuli of 12-18 µm amplitude), using the same formula as in the analysis of the training dataset:

      c = −1/2 × [z(lick right / high amplitude) + z(lick right / low amplitude)]

      Pooling across amplitudes allows us to obtain a single summary measure of response bias toward the right lickport, independent of stimulus discriminability. This approach is consistent with standard signal detection theory practices when multiple stimulus levels are present.

      If the inter-trial interval is 5-10s, how is a 5s timeout a punishment?

      The 5 s timeout serves as a punishment by temporarily delaying access to the next trial and potential reward, thereby reducing the overall reward rate. Even though the inter-trial interval (ITI) varies between 5 and 10 s, the timeout increases the effective delay before the next opportunity to earn a reward, discouraging incorrect responses. This is consistent with standard operant conditioning procedures, where brief timeouts act as negative consequences without being overly severe. Across most trials, the timeout effectively reduces expected reward rate, though its impact is minimal when the ITI is already long.

      Reviewer #2 (Recommendations for the authors):

      Task-related questions:

      (1) What evidence is there that the 40 Hz, 12 μm stimulus is "low salience: while the 40 Hz, 26 μm stimulus is "high salience"? This seems like an arbitrary distinction without showing sensitivity curves across a group of animals. Better definitions of the stimuli and the actual forces applied are necessary.

      We thank the reviewer for this comment. Based on our previous work (Semelidou et al., bioRxiv; Accepted in Advanced Science), both the 40 Hz, 12 µm and 40 Hz, 26 µm stimuli are clearly suprathreshold. In the present study, however, stimulus salience is defined in a relative and operational manner within this suprathreshold range.

      Specifically, analysis of miss trials (Fig. S3E) shows that the 40 Hz, 12 μm stimulus consistently elicited a higher proportion of missed responses compared to the 40 Hz, 26 μm stimulus across animals, indicating lower behavioral performance for the lower-amplitude stimulus. We therefore refer to the 12 μm stimulus as “low salience” and the 26 μm stimulus as “high salience” to denote relative differences in perceptual strength and attentional engagement within the suprathreshold range, rather than differences in detectability or absolute sensory sensitivity. This definition has been clarified in the Methods (lines 583-587) and Results sections (lines 115-119; lines 225-227).

      (2) Sensitivity curves/detection thresholds for each mouse should be included in the study.

      We thank the reviewer for this suggestion. Sensitivity curves and detection thresholds for low-amplitude and low-frequency vibrotactile forepaw stimulation have been systematically characterized in our previous study (Semelidou et al., bioRxiv, Accepted in Advanced Science). In that work, we demonstrated that stimuli with similar amplitudes and even lower frequency (10Hz) than those used in the present study are reliably detectable by mice, confirming that both the 40 Hz, 12 µm and 40 Hz, 26 µm stimuli fall within the suprathreshold range.

      Because the goal of the present study was not to determine absolute detection thresholds but rather to examine discrimination and categorization performance within a suprathreshold range, we did not re-establish full psychometric detection curves for each mouse.

      We have clarified this rationale in the revised manuscript (Results, lines 108-113; Methods, lines: 577-579).

      (3) What force is being applied during stimulus presentations? 12 or 26 μm does not provide enough information about the stimuli applied. What are the physical parameters of the indenter? What material, what tip size?

      Vibrotactile stimuli were delivered to the forepaw via a piezoelectric actuator. A 12.7 mm stainless steel post (ThorLabs) was mounted on the actuator vertically and a 0.6 mm stainless steel rod (ThorLabs) was clamped horizontally onto this post. The horizontal rod served as the contact bar on which the animal rested its right forepaw.

      Stimuli were sinusoidal vibrations at 40 Hz with peak-to-peak displacements of 12 μm (low salience) or 26 μm (high salience). The actuator displacement was calibrated prior to experiments to ensure accurate vibration amplitudes.

      Animals were positioned in the setup to ensure stable and consistent forepaw contact with the rod delivering the vibration. Pilot experiments with an extra sensor to monitor forepaw placement confirmed that the mice did not remove their forepaws from the bar before stimulus delivery. All this information is now added in the Methods section (lines 552-555, 580-582).

      (4) Only one vibration stimulus was used (40 Hz) - this preferentially activates specific subsets of low-threshold mechanoreceptors and not others. A range of vibrotactile stimuli (with varying frequencies) would be more useful. From this limited range of stimuli, it is difficult to assess whether the findings would extrapolate to other types of stimuli.

      We agree that using a single vibration frequency limits the generalization of our findings across the full range of mechanoreceptor subtypes and vibrotactile stimulus conditions. In the present study, we deliberately focused on amplitude discrimination within the flutter range (<50 Hz), as this frequency preferentially activates subsets of low-threshold mechanoreceptors relevant for flutter perception and is commonly used in clinical studies of tactile amplitude discrimination (Puts et al., 2014, 2017; Asaridou et al., 2022). By holding frequency constant and varying only amplitude, we were able to isolate amplitude-dependent perceptual and decision-making processes while minimizing frequency-dependent variability and to facilitate direct translational comparisons with human studies using similar flutter stimuli.

      We acknowledge, however, that extending the paradigm to additional, high frequencies would help determine whether the observed effects generalize across mechanoreceptor channels. We have now added this point as a future direction in the Discussion section (lines 510-514).

      (5) The methods indicate that during the implementation of the water-restriction protocol, mice had access to a solid water supplement in their home cage. How did they control for how much water supplement was consumed by each mouse before the testing sessions?

      We thank the reviewer for raising this point. The solid water supplement was divided into premeasured individual portions, and each mouse received its allotted amount only after the daily training/testing session. Daily body weight measurements were used to monitor hydration and ensure that all animals maintained stable body weight. If necessary, supplemental water was adjusted to maintain animals within the approved weight range. This procedure is now described in the Methods section (line 567-571).

      (6) A control version of the test, perhaps using a different sensory modality, would be useful for making conclusions.

      We agree that testing other sensory modalities would provide a useful control for assessing the generalizability of the observed effects. However, in the present study, we intentionally focused on the tactile modality, as touch has been shown to play a critical role in autism across sexes and predict other core behavioral symptoms. This makes touch particularly relevant for investigating translational mechanisms in this model.

      By specifically targeting tactile perception, we aimed to investigate the link between sensory discrimination, decision-making, and cognitive modulation within a modality that is strongly implicated in autism. Previous studies in autistic individuals have demonstrated similar interactions between cognitive processes and perceptual decision-making in the visual domain, suggesting that such effects may not be modality-specific. Nevertheless, extending this paradigm to additional sensory systems would be valuable to directly test whether comparable cognitive influences on perception generalize across modalities. We have now incorporated this perspective as a future direction in the Discussion section (lines 514-518).

      Reviewer #3 (Recommendations for the authors):

      There are several questions:

      (1) It is important to show stimulus intensity-response curves representing tactile responses for both WT and Fmr1-/y mice.

      We thank the reviewer for this important comment. Detection sensitivity curves for lowamplitude and low-frequency vibrotactile stimulation of the forepaw have been characterized in detail in our previous study (Semelidou et al., bioRxiv; now accepted in Advanced Science). In that work, we showed that stimuli at or above 8 µm amplitude and 10Hz frequency are reliably detected by both WT and Fmr1<sup>-/y</sup> mice.

      Based on these findings, the current study employed vibrotactile stimuli at a higher frequency (40 Hz) and amplitudes of 12 µm and above, ensuring that all stimuli were well within the suprathreshold range for both genotypes. This experimental choice was made to specifically probe discrimination, categorization, and decision-making processes, rather than basic sensory detection. As a result, the behavioral effects reported here cannot be attributed to differences in stimulus detectability.

      We have clarified this rationale in the revised manuscript to make explicit that the absence of full intensity-response curves in the current study reflects a deliberate focus on suprathreshold perceptual and cognitive processes rather than sensory threshold differences (Results, lines 108-113; Methods, lines: 577-579).

      (2) There is no difference in the time it takes to learn the task between WT and Fmr1-/y mice. But how does the learning rate curve look? Is there a difference in the slope between WT and Fmr1-/y early vs late into learning?

      We thank the reviewer for this suggestion. To directly address whether learning dynamics differed between genotypes, we analyzed learning curves across training.

      We first computed the correct choice rate per day for each animal (Fig. S2A) and fit a mixedeffects model including training day, genotype, and their interaction. This analysis revealed no genotype differences in baseline performance or learning rate with minimal Genotype × Day interaction (Fig. S2A-top, Fig. S2C).

      We additionally computed the slope of the learning curve for each individual, which also showed no difference across genotypes (Fig. S2B). In addition, within-animal day-to-day performance variability was also comparable across groups (Fig. S2A-bottom, S2D).

      These analyses indicate that WT and Fmr1<sup>-/y</sup> mice exhibit similar learning trajectories during training. The learning curves are now included in Figure S2, described in the Results (lines 140–151) and detailed in the Methods (lines 644-658).

      (3) It would be useful to see raster plots of licks for different trials and the corresponding lick density plots for early vs late trials.

      We thank the reviewer for this suggestion. To visualize trial-by-trial behavior, we included example lick traces from an early 100-trial session and a late 100-trial session, alongside the corresponding raster plots of licks (Fig. S1A–B).

      (4) Consistent with the first question, examples of intermediate learning stages would help gain more insight into how both WT and Fmr1-/y mice learn.

      In line with the reviewer’s suggestion, we examined whether WT and Fmr1<sup>-/y</sup> mice showed different performance during intermediate stages of learning. To this end, we defined the middle three days of the training period of each animal as the intermediate learning phase. We compared both the mean correct-choice rate and individual learning slopes across this interval. Statistical analyses revealed no significant genotype differences in either measure, indicating comparable performance and learning dynamics during the intermediate phase of training (lines 152-156).

      (5) How does the learning rate change with increased cognitive load for both WT and Fmr1-/y mice?

      We thank the reviewer for this question. While our experimental design did not include a manipulation of cognitive load during the learning phase itself, we assessed whether increased cognitive load affected performance by analyzing behavior on the first day of testing, when animals were required to categorize and discriminate among a larger set of stimuli compared to training.

      Using performance on the training stimuli during this first testing session as a proxy, we found no significant difference between WT and Fmr1<sup>-/y</sup> mice in correct choice rate (Author response image 1). This indicates that increased cognitive load did not differentially affect performance on familiar stimuli across genotypes at this stage.

      Because this analysis does not reflect learning rate per se, but rather performance under increased task demands after learning had already occurred, we did not incorporate it into the main Results section. Instead, it is presented here to directly address the reviewer’s question.

      Author response image 1.

      Correct choice rate for the 12 µm and 26 µm stimuli during the first day of testing when the cognitive load is high.

      (6) How does the learning rate change if the sensory stimuli are more challenging for both WT and Fmr1-/y to detect?

      We thank the reviewer for this question. In the present study, animals were deliberately trained using well-separated, suprathreshold low- and high-salience stimuli to ensure reliable stimulus detection and to avoid confounding learning rate with perceptual difficulty or discrimination limits.

      A recent study (Heimburg et al., 2025) has shown that learning is slower when the difference between the two training stimuli is reduced. Based on these results, we would expect that decreasing the separation between low- and high-salience stimuli would similarly increase training duration for both WT and Fmr1<sup>-/y</sup> mice, since our results do not indicate any discrimination or categorization deficits in the mouse model of autism. However, directly testing how stimulus difficulty modulates learning rate would require a dedicated manipulation of stimulus spacing during training and was beyond the scope of the current study.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals.

      These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We sincerely thank the editors and reviewers for their careful evaluation and constructive feedback, which has helped us substantially improve the clarity and rigor of the manuscript. In the revised version, we have clarified the interpretation of the electrophysiological experiments, corrected the labeling of recorded signals as light evoked EPSCs, and removed statements implying differences in absolute synaptic strength. To address concerns about the interpretation of Fig. 7, we have added quantitative analyses of EPSC kinetics and revised the text to focus on synaptic response dynamics rather than amplitude differences. We have also removed analyses that could cause confusion and expanded the Methods section to provide additional experimental details, including the optogenetic stimulation configuration in slice recordings. Together, these revisions strengthen the interpretation of the electrophysiological results and improve the overall clarity and transparency of the study.

      Public Reviews:

      Reviewer #1 (Public review):

      Weakness:

      The authors focused primarily on female mice limiting generalizability and leaving the readers with questions about the impact of sex differences on their results. The tube test is used as a manipulation of the "emotional state" in several of the experiments. While the authors show the changes to corticosterone levels as a consequence of win/loss in the tube test, stronger claims might be made with comparisons to other gold standard stressors such as forced social defeat or social isolation.

      We thank the reviewer for these thoughtful comments.

      First, we acknowledge that the present study was conducted primarily in female mice, which may limit the generalizability of the findings. Female mice were selected to reduce variability associated with male aggression and housing-related stress, which can complicate behavioral assays such as social interaction and dominance testing. While focusing on a single sex allowed us to maintain experimental consistency across multiple behavioral paradigms, we agree that sex differences could influence the neural circuits underlying emotional and social behaviors. We have now added a statement in the Discussion acknowledging this limitation and noting that future studies will be necessary to determine whether similar circuit mechanisms operate in male mice.

      Second, we appreciate the reviewer’s suggestion regarding the use of other stress paradigms. In this study, the tube test was used primarily to establish social dominance relationships between paired mice rather than as a classical stress-induction paradigm. Nevertheless, we observed measurable physiological changes associated with repeated win/loss outcomes, including alterations in corticosterone levels in brain lysates of loser mice after repeated tube-test competitions. Notably, repeated win/loss outcomes in the tube test were associated with significant increases in corticosterone levels in loser mice, indicating that the paradigm produced measurable physiological responses consistent with stress-related processes. These findings suggest that repeated social competition in this context can induce transient physiological and behavioral changes associated with social hierarchy. We agree that paradigms such as chronic social defeat stress or social isolation represent well-established models for inducing sustained stress responses. We have therefore revised the manuscript to clarify that the tube test in our study serves as a model of social competition and rank establishment rather than a canonical stress paradigm, and we highlight the comparison with other stress models as an important direction for future work.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      In relation to figure 7. Their response does not really clarify the issue:

      (a) They argue that they are not making claims about synapse strength. However they still state "In the mPFC→NAc pathway, blue light stimulation evoked larger excitatory postsynaptic currents (EPSCs) in winner mice compared to losers (Fig. 7E). This suggests stronger synaptic transmission in winners' mPFC→NAc circuits. " They don't show this, they just show that normalized to some arbitrary value the responses of the earlier durations is higher or lower, which is very hard to interpret.

      They argue in the rebuttal that the aim of this is to highlight response kinetics, but these are not quantified or discussed in any way.

      We thank the reviewer for this helpful comment. We agree that the normalized input output curves shown in the original submission did not allow conclusions about absolute synaptic strength, and we also acknowledge that response kinetics were not previously quantified despite being mentioned in the rebuttal.

      To address both concerns, we have revised Fig. 7 and added quantitative analyses of EPSC kinetics. Specifically, we measured the rise and decay slopes of light-evoked EPSCs recorded in postsynaptic neurons within the NAc and BLA of winner and loser mice. In the mPFC→BLA pathway, both the EPSC rise and decay slopes were significantly increased in loser mice compared with winners (rise slope: p = 0.0138; decay slope: p = 0.0392), suggesting enhanced synaptic responsiveness and faster charge transfer kinetics in BLA neurons of losers. In contrast, in the mPFC→NAc pathway, both mEPSC rise and decay slopes were not significantly different between groups. 

      These results provide a quantitative characterization of synaptic response dynamics and reveal pathway-specific differences in synaptic properties associated with social hierarchy. Importantly, this analysis does not rely on amplitude normalization and therefore allows a more interpretable comparison of synaptic response profiles between groups. We have updated Fig. 7 and the corresponding Results section to include these analyses. 

      (b) They still haven't labeled the responses correctly. The responses in figure 7 are not "voltage spikes" but light-evoked EPSCs.

      We apologize for the incorrect terminology. All instances of “voltage spikes” have been corrected to “light-evoked EPSCs” in the figure legends and text.

      (c) They argue that responses do not vary across experiments/slices because they use a constant viral injection volume targeted to the same co-ordinates and identical placement of the fiber and recording location. While I am sure they aim to do that, it is almost impossible to ensure that this was identical across experiments and that the degree of opsin labelling in their slices was the same (See for example Mao et al., 2011 PMID: 21982373 who pioneer the approach of using within slice comparisons to account for this). If I understand their explanation of their strategy correctly, the authors own rebuttal highlights this point, they seem to have needed to vary the LED duration by an order of magnitude (1-10ms) to ensure reliable responses across experiments, even for the same projection.

      We thank the reviewer for raising this important point. We agree that it is not possible to ensure identical opsin expression or light delivery across experiments. We have revised the manuscript to explicitly acknowledge this limitation and clarify that normalization was used to mitigate, but not eliminate, inter-slice variability. We now avoid any interpretation that relies on absolute response amplitude across animals.

      Regarding “LED duration variability (1-10 ms)”, we agree that the need to adjust stimulation duration reflects variability in effective opsin activation across slices. We now clarify this point in the Methods and Results and emphasize that stimulation parameters were optimized to reliably evoke responses rather than to equate absolute light input across experiments.

      Importantly, our main conclusions do not rely on absolute EPSC amplitude comparisons. Instead, they are supported by analyses that are less sensitive to variability in opsin expression or light delivery, including EPSC kinetics (rise and decay slopes), paired-pulse ratio measurements, and AMPA/NMDA ratios. These complementary measures provide a more robust characterization of synaptic properties across conditions.

      (d) Similarly in Fig S6 it is unclear what they are showing. The Y axis is still labeled in pA, yet they claim this is an action potential? Also this analysis is rather irrelevant to the data shown in figure 7 as the pathway between PFC and BLA/NAc is not preserved.

      We thank the reviewer for pointing out the lack of clarity in Fig. S6. We agree that it does not directly inform the interpretation of Fig. 7 and may cause confusion. To improve the clarity and focus of the manuscript, we have therefore removed Fig. S6 from the revised manuscript. The removal of this supplementary figure does not affect the main conclusions of the study.

      (e) It now also seems that these experiments were performed by placing a fiber optic into the slice to elicit responses. This should be detailed in the methods.

      We thank the reviewer for noting this omission. We have added a detailed description of fiber-optic placement within the slice for optogenetic stimulation to the Methods section. Specifically, we clarify that blue light was delivered through a fiber optic positioned above the recorded slice to activate ChR2-expressing mPFC axon terminals within the BLA or NAc. The placement of the fiber relative to the recorded neurons and the stimulation parameters are now explicitly described in the revised Methods section.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript examines the evolution of molluscan shells using single-cell analyses of the adult mantle of Crassostrea gigas and compares these data with previous datasets from embryonic and larval stages of this species and other spiralians. The authors provide support for a scenario in which secretory cells are broadly conserved across spiralians, and the incorporation of lineage-restricted genes contributes to the evolution of molluscan shells.

      Strengths:

      High-quality datasets for mantle tissue in Crassostrea gigas and thorough comparisons with existing datasets for this species and other spiralians. Balanced discussion.

      Weaknesses:

      No major weaknesses. The analyses follow fairly standard approaches in the field that have been previously applied and developed in similar systems.

      We thank the reviewer for the positive evaluation of our work. We are encouraged that the reviewer finds our conclusions balanced and the analyses appropriate. Although no major concerns were raised, we will incorporate clarifications and improvements prompted by the other reviewers to further strengthen the manuscript.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Validation of cell types

      Cell type identities are not convincingly validated. Although the authors cite previous studies (l. 92), the referenced marker genes are largely not used, and the cited works do not provide sufficient spatial validation. Without in situ data, the inferred locations of cell types (e.g. Figure 2A) are not supported. Spatial validation of marker genes (e.g. via HCR) is essential, particularly for a study addressing shell field evolution. In addition, the gastrula dataset is not meaningfully analyzed, and its inclusion remains unclear.

      We thank the reviewer for this important comment regarding cell type validation. In the previous version of the manuscript, we provided a detailed compilation of referenced marker genes from previous studies in Supplementary File 2. It is possible that, due to an incorrect or unclear reference in the main text, this information was not readily accessible. We will correct and clarify these citations in the revised manuscript to ensure that these resources are clearly presented.

      We agree that spatial validation would provide important support for cell type identities. In the revised version, we will strengthen this aspect by selecting more specific marker genes for each SEC cluster and performing fluorescence in situ hybridisation (FISH) to validate their spatial localization.

      Regarding the gastrula dataset, our original intention was to investigate the developmental transition of shell gland-related cell populations from gastrula to trochophore stages. However, following the reviewer’s suggestion and considering the limited interpretability of the gastrula dataset in its current form, we agree that its inclusion does not substantially strengthen the study. We therefore plan to remove the gastrula dataset from the revised manuscript, and instead focus on the trochophore stage as a representative developmental stage for larval shell formation, enabling a clearer comparison between larval and adult shell-forming cell populations. We note that this change does not affect the main conclusions of the study. In addition, we will curate a refined set of experimentally supported marker genes, and provide an updated supplementary table summarizing detailed information, including cell type annotations, literature sources, and experimental validation methods.

      (2) Robustness of cell type classification 

      Several proposed cell types may not represent distinct entities (not individuated) but rather reflect over-clustering. Marker genes are often not specific and are shared across clusters (e.g. Sec1/Sec2), making it difficult to distinguish cell types reliably.

      In the revised manuscript, we will refine marker gene selection by prioritizing genes with higher specificity and stronger discriminatory power to improve the robustness of cell type identification. To further support cell identity assignment, we will select representative marker genes for SEC clusters and perform FISH to validate their spatial localization. These revisions will lead to a more robust and conservative interpretation of cell populations.

      (3) Comparative analysis of secretory cells

      The comparative framework is not sufficiently supported. Secretory cells are highly diverse, and without proper validation, their comparison across taxa is not meaningful. The transcription factor analysis is limited, as only a few genes are shared and many are inconsistently expressed (Figure 3E). The conclusion of a conserved regulatory program across spiralians is therefore overstated.

      We agree that secretory cell types are highly diverse across spiralians and that cross-species comparisons require careful interpretation. In the revised manuscript, we will adopt a more cautious framework, highlight partial conservation of regulatory program alongside functional convergence in secretory processes. We also will strengthen the comparative framework by integrating functional annotations, which may provide complementary support beyond individual gene overlaps. Importantly, we will improve the reliability of oyster SEC annotations through FISH-based spatial validation, thereby increasing confidence in cross-species comparisons. These revisions will provide a more balanced and biologically grounded interpretation of secretory cell evolution across spiralians.

      (4) Clarity and interpretation of results

      Results are at times difficult to follow and remain superficial. Marker genes are insufficiently annotated (especially for Crassostrea), and comparisons across taxa lack functional interpretation. Unvalidated and heterogeneous cell types are grouped together, and transcriptional similarities are overinterpreted. Overall, key conclusions are not adequately supported by the presented data.

      In the revised manuscript, we will re-evaluate marker gene annotations to ensure support from existing experimental evidence. For SEC populations, we will validate representative markers using FISH. We will also expand the functional annotation of marker genes and strengthen cross-species comparisons. In addition, we will substantially revise the Results and Discussion sections to improve clarity and depth, reduce overinterpretation of transcriptional similarities, and ensure that all conclusions are more tightly aligned with the strength of the supporting evidence.

      Reviewer #3 (Public review):

      Weaknesses:

      (1) My main concern is that the authors rely primarily on previous studies for the experimental and functional characterisation of the identified cell types. The cited papers (Piovani, 2023 and de la Forest Divonne et al., 2025) deal with distinct stages or tissues (larvae and hemocytes, respectively), which limits their direct relevance. The authors also cite other papers for in situ expression data; it would be helpful to summarise somewhere (e.g. in a table) which genes have been experimentally characterised and what their expression domains are, or alternatively to provide HCR or in situ staining on the mantle. For instance, what is the rationale for the claim that proliferative cells give rise to the mantle? The trajectory inference approach used (Monocle) would likely yield a similar result regardless of the reference cell type, so additional justification is needed.

      We agree that our reliance on previous studies for functional and experimental characterization requires clearer justification and integration. In the revised manuscript, we will compile a new supplementary table summarizing marker genes with available experimental validation, including their associated cell types, literature sources, and experimental methods. For SEC populations, we will select representative marker genes and perform FISH to validate their spatial localization, thereby providing independent support for cell identity.

      Regarding trajectory inference, we agree that methods such as Monocle are sensitive to assumptions. We will clarify the rationale for root cell selection, test alternative root assignments to assess robustness, and revise our interpretation to avoid strong lineage claims. Rather than stating that proliferative cells give rise to mantle cells, we will describe the observed trajectory as being consistent with a potential developmental relationship, while acknowledging that this does not constitute direct evidence of lineage progression.

      (2) More broadly, I find that the functional properties of the identified cell types and their relationship to the expressed genes deserve more detailed discussion. For example, at L100, several genes are mentioned, but their functional roles are not discussed. Similarly, the basis for annotating the proliferative cells is not explained. How was gene orthology assessed? Throughout the manuscript, vertebrate-style gene names are used without explicitly establishing orthology status in oyster, which should be addressed.

      We thank the reviewer for this important comment. In the revised manuscript, we will expand the functional interpretation of key genes by incorporating available literature and, where possible, functional annotations. We will also clarify the basis for cell type annotation and explicitly describe the criteria used, including for proliferative cell populations (e.g. cell proliferation-associated markers).

      Regarding gene annotation, gene names in oyster were assigned based on sequence similarity searches against the eggNOG database. In the revised manuscript, we will provide a comprehensive supplementary table linking gene IDs to their annotations, along with the corresponding database sources. In addition, we will clearly describe how orthology relationships were assessed, including the methods and criteria used (e.g. sequence similarity searches and orthology databases). Throughout the revised manuscript, we will ensure that the use of vertebrate-style gene names is accompanied by appropriate annotation information and does not imply unsupported one-to-one orthology relationships.

      (3) More detail is needed on the methods and quality control for the single-cell data. The authors should clarify that the platform used (BMKMANU) is a droplet-based technology comparable in principle to Drop-seq. BMKMANU is not widely used in the field. How does it compare to 10x Genomics in terms of sensitivity and cell recovery? The authors appear to use the 10x Chromium cellranger pipeline for data analysis, which suggests compatibility, but this should be stated explicitly. Additionally, no information is provided on the number of sequencing runs or biological replicates, nor on how reproducible the results are across samples.

      In the revised manuscript, we will expand the Methods section to provide a clearer and more detailed description of the experimental and analytical procedures. BMKMANU is a droplet-based single-cell RNA-seq platform, conceptually comparable to Drop-seq and similar in principle to 10x Chromium. We will also explicitly state that the data generated are compatible with the Cell Ranger pipeline, which was used for downstream processing and analysis. Although BMKMANU is less widely used than 10x Genomics platforms, it has been successfully applied in several recent studies (e.g. Li et al., 2024: https://doi.org/10.1007/s11427-023-2548-3; Li et al., 2025: https://doi.org/10.1038/s41559-025-02642-6; Wei et al., 2024: https://doi.org/10.1038/s41467-024-46780-0), demonstrating its applicability for single-cell transcriptomic analyses across different biological systems. Regarding platform performance, based on technical information provided by the manufacturer, BMKMANU shows comparable sensitivity and cell capture efficiency to 10x Genomics platforms (http://www.biomarker.com.cn/zhizao/dg1000danxibao). In this study, the mantle sample was obtained from a single individual oyster and processed in a single sequencing run, without batch effects introduced by multiple runs. We will clearly state this in the revised manuscript. In addition, we will provide detailed quality control metrics, including the number of cells retained, gene detection rates, and filtering criteria.

      (4) A limitation of the phylostratigraphic analysis is that it is restricted to mantle tissue, making it difficult to place the results in a whole-organism context. How do the age profiles of mantle-expressed genes compare to those of more evolutionarily conserved tissues, such as the nervous system? I appreciate the methodological and experimental constraints, but this is a genuine limitation of the study. The authors could at least discuss it explicitly, and ideally consider generating a broader single-cell atlas of the oyster to provide this comparative baseline.

      We agree that restricting the phylostratigraphic analysis to mantle tissue represents a limitation when attempting to place our findings in a whole-organism evolutionary context. In the revised manuscript, we will explicitly acknowledge this limitation and expand the Discussion to address how gene age profiles in mantle tissue may differ from those in more evolutionarily conserved tissues. In particular, we will clarify that the enrichment of younger, lineage-specific genes observed in shell-forming cells may reflect tissue-specific functional specialization, and therefore should not be directly generalized to other cell types.

      We acknowledge that a broader single-cell atlas spanning multiple tissues would provide an important comparative baseline for interpreting gene age patterns across the organism. While generating such a dataset is beyond the scope of the present study, we will highlight this as an important direction for future research.

      (5) Have the authors considered the potential importance of lineage-specific gene duplication? It is well established that spiralians, including oysters, have undergone extensive lineage-specific duplication of transcription factors such as homeobox genes, and many structural shell-associated proteins may similarly have been duplicated. This could be relevant to interpreting both the phylostratigraphic results and the expansion of secretory gene families.

      We thank the reviewer for this insightful suggestion. Lineage-specific gene duplication is likely to play an important role in shaping both transcription factor repertoires and shell-associated gene families in spiralians, including oysters. In the revised manuscript, we will incorporate a discussion of lineage-specific duplication, particularly in relation to transcription factors and biomineralization-related proteins. We will also, where feasible, explore its potential contribution to our observations and highlight how such duplications may drive the expansion and diversification of secretory gene families.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This paper reports a previously unrecognized mechanism by which platelets compact fibrin fibers during clot retraction. Rather than simply pulling on fibers, the authors propose that platelets generate swirling motions that wind and loop fibrin into dense structures.

      While the results are intriguing, the underlying physical mechanism remains unexplained. In particular, it is unclear how platelets generate swirling motion capable of inducing fibrin coiling, especially when suspended in 3d fibrin mesh. This raises concerns about the conclusions.

      We explained our hypothesis concerning the physical mechanism of how platelets may generate the swirling motion, lines 200-215 and in the discussion under "ideas and speculations". We will provide, however, a more detailed explanation about this process in the revised version.

      The reviewer is right, it is difficult to imagine how platelets in a 3D fibrin mesh can accumulate fibers at the base of their extensions to form a cage-like fiber organisation around the center of the platelets. We therefore developed the 2D fiber-retraction assay, which we believe provides important insights for the coiled fiber accumulations above spread platelets in the 2D situation but also provides a framework for interpreting similar processes that may occur within a 3D clot. In response, we will place greater emphasis on clarifying and strengthening the comparison between the potential mechanistic aspects in the 2D and 3D assays, in order to better support our proposed model.

      Also, does fibrin have inherent chirality or structural asymmetry that could promote coiling independently of platelet activity?

      Yes, double stranded fibrin protofibrils have a helical twist [1]. Furthermore, a clot formed in the absence of platelets and other cellular components shows intrinsic tensile forces [2]. However, we show that inhibition of actomyosin actions prevents fibrin fiber accumulation in the 2D fiber-retraction assay providing evidence that platelet actions are necessary to observe the coiled fibers above spread platelets.

      Furthermore, platelet retraction typically involves platelet aggregation rather than isolated cells, and it is unclear how fibrin coiling would proceed in clustered platelets.

      Under the in vitro fiber retraction conditions used in our study (constrained or unconstrained clots or even in the 2D assay) individual platelets are homogenously distributed within the forming clot or on the coverslip. Therefore, there are no big platelet aggregates or clusters of platelets under our experimental conditions and the results can only demonstrate how individual platelets act on the fibrin fibers. We will emphasize this point in the revised version.

      Reviewer #2 (Public review):

      Summary:

      Grichine et al. investigate platelet-mediated fibrin compaction using human donor platelets and propose a novel mechanistic model in which platelets generate contractile forces and wind fibrin fibers into compact coiled structures. Using a combination of 2D spread assays, 3D clot imaging via expansion microscopy, live-cell imaging, and computational modelling, the authors present evidence of cage-like fibrin architectures, coiled-fibre morphologies, and platelet-centred "rosette" structures present during fibre compaction. They further suggest that actomyosin-driven cytoskeletal dynamics, potentially involving rotational or swirling motion, underlie this proposed winding mechanism, analogous to DNA looping and compaction. The study addresses an important and longstanding question in thrombosis and hemostasis and offers a conceptually novel perspective on clot compaction.

      Strengths:

      The integration of multiple imaging modalities is a notable strength of this paper. In particular, the 2D fiber-retraction assay provides a useful model for understanding the spatio-temporal dynamics of platelet-mediated fibrin compaction, which can be applied to other systems and may yield detailed mechanistic insights into biological processes. The live-imaging approaches are particularly well executed and offer valuable dynamic insight.

      Weaknesses:

      The primary weakness of this paper lies in its descriptive nature and its reliance on correlative rather than causal evidence. Several interpretations are not uniquely supported by the data presented. For example, the categorisation of fibrin accumulation in 2D assays as "fiber winding" and "fibre compaction" remains descriptive without establishing winding as a mechanism.

      In the revised version, we will avoid the terms fiber winding/compaction when introducing the 2D fiber-retraction assay (figure 3) to better align with the level of evidence, since coiled fibers cannot be distinguished in this figure. However, coiled fibers above spread platelets are clearly visible in figure 4 and 8 and dynamic fiber rotations or winding are observed in figure 12 and video 9. These observations will be presented more cautiously, as indicative rather than definitive evidence of a winding mechanism.

      Alternative mechanisms, such as circular bundling, stacked fibers under tension, or fibrin crosslinking-induced aggregation, are neither excluded nor investigated.

      For fibrin fiber bundling, staggered or crosslinked protofilaments no platelet actions are necessary as described previously [2, 3] . Since we observed a clear difference between +/- blebbistatin conditions in the 2D fiber-retraction assay, the fiber compaction we observe depends on platelet actions. Consequently, we consider these alternative mechanisms unlikely based on our data. This will be stated explicitly in the results section.

      Although the authors present compelling live imaging, establishing winding as a dynamic phenotype would require quantitative analyses, such as measuring angular velocities and coiling rates.

      We will incorporate quantitative measurements to complement the observations obtained from live imaging. It is important to note, however, that angular velocities and coiling rates are likely influenced by the number of fiber–fiber contacts present at the time coiling occurs. Specifically, an increased number of contacts is expected to elevate tension within the network, thereby modulating the forces generated by platelets and, consequently, affecting both velocity and coiling dynamics.

      The use of a second fluorophore-labelled fibrin population could further strengthen evidence for rotational dynamics.

      These live videos are quite difficult to acquire because of the following reasons:

      Small platelet size

      Heterogeneity of platelets within the population (10 d half-life, old platelets may not be able to compact fibers efficiently).

      The speed of the process and the time needed to adjust parameters for image acquisition, necessitates an arbitrary choice of the acquisition window and only one acquisition (90 min) per sample preparation is possible.

      Furthermore, the laser induced illumination can perturb the observed processes. We therefore use high-spatial-resolution 3D confocal time-lapse imaging, performed in photon-counting mode with very low laser excitation.

      For these reasons, the use of additional markers would be technically challenging and could perturb the delicate equilibrium and dynamics of the process under investigation.

      Similarly, the inference of rotational contractility or actomyosin "swirling", based on chiral actin organisation and blebbistatin treatment, is not sufficiently supported to conclude that platelets actively wind or loop fibrin fibers.

      Importantly, in the 2D fiber-retraction assay, we do not propose that the rotational actomyosin activity leads to a contractility of the platelets which would allow fiber retraction. Rather, we suggest that cytoskeletal actomyosin swirling (as demonstrated for nucleated cells by Bershadsky's team) can induce rotational dragging of extracellular bound fibrin fibers around the pseudonucleus of spread platelets thereby promoting accumulation of fibrin fibers. Consistent with this interpretation, inhibition of myosin by blebbistatin prevents the accumulation of fibrin fibers above spread platelets in the 2D fiber-retraction assay (Fig. 3).

      The mathematical model, while complementary and well-constructed, relies on multiple assumptions and lacks predictive validation.

      We thank the reviewer for this insightful comment and acknowledge that the proposed model relies on several important assumptions. In our view, the most significant assumption is that integrin molecules undergo rotational downstream motion as a consequence of their coupling to the swirling cytoskeleton. To assess the necessity and impact of these assumptions, we will perform additional calculations and include the results in the Supplementary Information. These analyses will also provide further validation of the proposed model and underlying mechanism. At the same time, it is important to emphasize that the primary purpose of the model was to examine whether the hypothetical swirling dynamics of the cytoskeleton, together with the associated receptors, could in principle reproduce the experimentally observed fibrin organization.

      Appraisal:

      While the authors successfully document intriguing fibrin architectures and provide a compelling descriptive framework, they do not fully demonstrate a mechanistic model of active fibrin winding by platelets. The conclusions regarding platelet-driven winding and rotational dynamics are not sufficiently supported by direct or quantitative evidence. To substantiate these claims, the study would benefit from experiments that directly link platelet dynamics to fibrin organisation, including coordinated measurements of platelet motion and fibre rearrangement. As it stands, the results are suggestive but do not definitively support the proposed mechanism.

      Discussion and Impact:

      Despite these limitations, the study addresses an important question in thrombosis and hemostasis and introduces a potentially impactful conceptual framework for understanding clot compaction. The imaging approaches and datasets presented will be valuable to the community, particularly for researchers interested in platelet mechanics and fibrin organisation. However, the overall impact will depend on whether the proposed mechanism can be more rigorously validated. In its current form, the study presents an interesting and thought-provoking model, but would benefit from either stronger experimental support for the proposed mechanisms or a more cautious interpretation of the findings.

      We agree that the proposed mechanism requires further validation. In the revised manuscript, we will therefore present a more cautious and explicitly hypothesis-driven interpretation of the mechanism. We hope that the publication of our observations will be of interest to researchers in the field of thrombosis and clot mechanics who possess the specialized tools and expertise necessary to rigorously evaluate and either substantiate or refute the proposed mechanistic model.

      Reviewer #3 (Public review):

      Summary:

      This work aims to understand the mechanisms that platelets use to interact with and compact fibrin fibers during clot formation. This is an important process during wound healing, and recent work has demonstrated that platelets play a critical role in generating the force required to drive the accumulation of fibrin. The authors argue that current models are insufficient to account for the observed reduction in clot volume and propose that platelets actively 'wind up' these fibers by undergoing myosin-dependent rotation. While interesting, the experiments performed by the authors do not directly test this mechanism, and further evidence is required to support their claims.

      Weaknesses:

      (1) The motivation to switch from the system used in Figures 1 and 2 to the '2D fiber-retraction assay' is not clear. While the authors state that this system has 'reduced complexity', the differences between these assays appear to disrupt the 'cage-like' organization of fibrin around platelets shown in Figures 1 and 2 (compare images in Figure 2 with those in Figure 4). An in-depth comparison of two methods is needed to support the conclusions from the 2D system.

      We agree that the cage-like fibrin organization around platelets is disrupted in the 2D fiber-retraction assay when platelets are completely spread on the coverslip before they have encountered fibrin fibers (Fig. 4). However, some platelets form the same number of extensions as platelets in a 3D clot (Fig. 9 A, B) and are not completely spread on the glass surface. For these platelets a cage-like fibrin organisation is retained under the 2D conditions (Fig. 5 and 6). However, the fiber density at the base of the bulbs is higher in the 2D assay than under the constrained 3D clot retraction conditions (Fig. 1C and Fig. 2), probably because in the 2D condition the fibers are less constrained and readily available for compaction.

      Furthermore, the change in plasma volume (Figure 2 vs Figure 7) should also be tested - the authors state that this increases fibrin fiber formation, but this is not quantified or demonstrated in the figures. Notably, this appears to change the morphology of the fibrin fibers shown (comparing Figure 2 and Figure 7).

      We thank the reviewer for raising this point. We would like to clarify that Figure 2 and Figure 7 correspond to two distinct experimental setups: the constrained clot retraction assay (Figure 2) and the 2D fiber-retraction assay (Figure 7). As such, they are not directly comparable. We understand, however, that the reviewer is likely referring to the apparent differences between Figures 3–6 (lower plasma volume, higher fiber density) and Figures 7–8 (higher plasma volume, lower apparent fiber density).

      The reduced number of visible fibers in the latter condition is not solely a consequence of plasma volume per se, but rather results from the formation of a labile fibrin gel at higher plasma concentrations, which is lost during the fixation and aspiration steps. This effect was initially observed across samples from two donors with differing plasma fibrinogen levels. In one case, an unusually low fibrinogen concentration allowed the addition of higher plasma volumes without inducing gel formation. In contrast, in the other sample, a more typical fibrinogen level resulted in gel formation under the same conditions.

      Importantly, we performed all experiments using matched donor plasma and platelets. As a result, the precise fibrinogen concentration could not be determined prior to experimentation. Nonetheless, post hoc measurements confirmed that fibrinogen levels in most donor samples fell within the normal physiological range, which allowed us to always use the same plasma volumes for low and high plasma concentrations (4ul/ml PBS and 7 ul/ml PBS, respectively) except for one donor as mentioned above.

      (2) It is unclear how the classification of platelets as 'fiber-winding' versus 'fiber compaction' differs in Figure 2. The criteria used for these classifications should be stated. Further, it seems premature to characterize fibers as wound without having established this earlier in the manuscript.

      The reviewer probably refers to figure 3 and he is right; it is premature to mention fiber winding at this stage of the results section (see our response to reviewer #2). In the revised version, we will therefore present the criteria used to classify the different degrees of fiber accumulations without referring to fiber winding.

      (3) Is the 'gearwheel' different from the 'cage' of fibrin fibers? They appear similar, but it is difficult to distinguish between them with only qualitative descriptions of these phenotypes.

      The "gearwheel" is observed for completely spread platelets in the 2D fiber-retraction assay and a figure illustrating our hypothetical speculations to compare the 2D gearwheel with the 3D clot situation is presented in the discussion under the "Ideas and Speculations" paragraph (Fig. 13). We will give a more comprehensive explanation in the revised version.

      (4) The quantification of platelet extensions in Figure 9 is confusing. While those in 9A are clear, those in 9B are not. For instance, what is the difference between #7 and #8 in the middle panel of 9B? It does not seem like #8 is labeling an extension.

      For the platelet shown in the middle panel of Figure 9B, the extensions cannot be clearly distinguished in the MIP (Maximum Intensity Projection) image because extension #8 is positioned above extension #7 and is therefore superimposed in the projection. However, the two extensions can be differentiated when examining the 3D image stack (Video 4). As indicated in the figure legend, the number of extensions was determined manually by scrolling through the z-stack image sequence. In the revised version, we will also define the abbreviation “MIP” as Maximum Intensity Projection.

      (5) It is unclear what the modeling accomplishes, as there is no comparison between the results of these simulations and their experiments.

      We thank the reviewer for this valuable concern. We chose not to combine the experimental fibrin organization and the modeling results within the same figure panel, as the resulting image would be too complex and difficult to interpret. However, we will provide a more detailed comparison between the experimental observations and the modeling results in the Results section. It is also important to emphasize that the comparison between the model and the experimental data was intended to be primarily qualitative rather than quantitative.

      (6) The data presented in Figure 12 provides the most direct support for their mechanism, but falls short of directly testing their claims. These experiments should be repeated to include blebbistatin to test the contribution of myosin and include quantitative rather than qualitative comparisons of these experiments.

      As mentioned already above, these live videos are quite tricky to acquire because of the following reasons:

      Small platelet size

      Heterogeneity of platelets within the population (10 d half-life, old platelets may not be able to compact fibers efficiently).

      The speed of the process and the time required to optimize imaging parameters, necessitate the selection of an arbitrary acquisition window. Consequently, only a single acquisition of approximately 90 min can be performed per sample preparation, with no guarantee that relevant platelet-fibrin interactions can be acquired in the acquisition window.

      Furthermore, after blood donation, the first sample is usually ready to be acquired around 3 pm, acquisition time 90 min. At least 10 successful acquisitions per condition would be required to ensure statistical robustness, but maximal 4 can be acquired per donor, because platelet samples start to deteriorate within twelve hours after blood donation.

      Taken together, the intrinsic heterogeneity of the platelet population, the low likelihood of capturing informative events, and the limited availability of suitable imaging resources at our institute render a robust and quantitative comparison between conditions with and without blebbistatin extremely challenging, if not impractical, within a reasonable timeframe.

    1. Author response:

      eLife Assessment

      This valuable study reports that the ALDH-abundant cells display stem cell properties and may play a key role in the endometrial epithelial development in the mouse. The data supporting the main conclusion are solid, although further improvements are needed to strengthen the conclusions. This work will be of great interest to reproductive biologists and biomedical researchers working on women's reproductive health.

      We thank the reviewers and editor for their critical reading and assessment of our manuscript. We carefully considered each of the points raised by the reviewers. In this document and in the edited manuscript and figures, we have carefully addressed each of the comments and requested modifications. In light of these changes, we expect that you will find that the manuscript has improved.

      We indicate our responses to the reviewers below in blue font and highlight the changes in the manuscript using the line numbers corresponding to the tracked version of the revised document.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Tang et al. characterizes the expression dynamics and functional roles of aldehyde dehydrogenase 1 activity in uterine physiology. Using a combination of in vivo lineage tracing and cell ablation coupled with organoid culture, the authors propose that Aldh1a1 lineage-marked cells contribute to uterine gland development and epithelial regeneration. The descriptive data will be of interest to reproductive biologists and clinicians and will build on established hypotheses in the field. The manuscript is well written and scientifically sound; however, several experimental limitations and interpretation caveats should be addressed.

      We thank the reviewer for their comments and expert assessment of our paper.

      (1) The methods surrounding the passage number and duration of culture following sorting prior to transcriptomic profiling should be clarified in the figure legends. Related to this, the representative images in Figures 1D and 1E do not appear consistent with the quantification presented in Figures 1F-H and should be reconciled.

      Thanks for this comment. We have now clarified this in the Figure 1 legend as follows,

      Lines 1026-1029: “Organoid formation assay performed immediately after luminal epithelial cell isolation and by plating equal numbers of viable ALDH<sup>LO</sup> (D) and ALDH<sup>HI</sup> (E) epithelial cells. ALDH<sup>LO</sup> and ALDH<sup>HI</sup> organoids were cultured for two weeks and passaged once prior to the organoid formation assays and transcriptomic analyses.”

      Regarding the second comment, we recognize that the images we showed may not have been the most representative of our quantification. As such, we replaced them with the organoid images below so that they better reflect the quantification outlined in Figure 1F-H.

      (2) The conclusion that ALDH1A1+ cells are enriched in populations with stem cell characteristics relies primarily on transcriptomic analysis. Protein-level co-localization should be performed to strengthen this claim.

      We thank the reviewer for this comment. Unfortunately, the antibodies for many of these stem cell markers (such as LGR5, AXIN2, and SUSD2) are not well-suited for immunostaining. Others that have been proposed in human and are amenable to immunostaining are not suitable markers for mouse endometrial stem cells (such as CDH2). We hope that by showing that ALDH1A1 is expressed in patterns that are similar to the previously published stem cell markers LGR5 and AXIN2 (i.e., throughout the epithelium in the developing uterus and subsequently enriched in the tips of the endometrial glands of adult mice), along with transcriptomic studies, we can demonstrate its utility as a marker for mouse endometrial stem cells.

      (3) The overlap of 19 genes between the data set here and AXIN2 HI data is presented as evidence of shared stemness identity, but no statistical assessment of this overlap is provided. A hypergeometric test should be performed to determine whether this overlap is greater than expected by chance.

      Thank you for this suggestion. We have performed a hypergeometric test and determined that the reported shared genes between the two datasets are greater than is expected by chance. We have updated the results section to state the following:

      Lines 133-141: "We determined that the overlap between ALDH<sup>HI</sup> and Axin2<sup>+</sup> stemness marker genes was significantly greater than expected by chance for both upregulated (21/346 genes, 1.81-fold enrichment, p = 0.0067) and downregulated (19/674 genes, 1.67-fold enrichment, p = 0.021) gene sets (hypergeometric test, universe = 23,182 genes)."

      (4) The impact of tamoxifen injection on Aldh1a1 expression should be characterized in the neonatal uterus, as tamoxifen itself has known estrogenic activity that could confound interpretation of the lineage tracing results at early postnatal timepoints.

      Although we took measures to control for this possibility by using multiple time-points and models to trace the impact of Aldh1a1<sup>+</sup> cells in development and adulthood, we recognize the importance of this comment and acknowledge that this is a limitation in the design of our study. We have included the following text to the Discussion acknowledging this point:

      Lines 434-442: “Given the well-documented impacts of tamoxifen for lineage tracing studies, it is imperative to use doses of tamoxifen that will minimize estrogenic impacts and result in off-target effects (Rios et al., 2016). This often requires administration at doses that will achieve maximal recombination of the desired gene, while ensuring that the potential deleterious impacts of tamoxifen are minimized (Chen et al., 2023; Pimeisl et al., 2013). The cre/ERT2 tamoxifen inducible model is widely used to study uterine biology where it serves as a useful tool to interrogate the spatiotemporal impact of key genes, either through inactivation or for lineage tracing. Despite its widely documented utility across many tissue types and developmental timepoints, the use of tamoxifen and its impacts on the endometrium remain a limitation of our study, which we tried to address by implementing multiple timepoints, doses, and orthogonal assays in our experimental design.”

      (4b) Related to this, while low-dose tamoxifen is shown to label individual cells within 24 hours of injection, the translation dynamics of the label following Cre-mediated recombination can require up to 72 hours. The presence of only a few labeled clones at PND8 but multiple separate clones per cross-section at later timepoints warrants discussion and may reflect labeling kinetics rather than clonal expansion.

      The reviewer raises an important point. We agree that the 72hr-translation kinetics of the cre-mediated recombination is a legitimate consideration for interpreting our data and we have added the text below to the Discussion section acknowledging this point.

      We have addressed this by adding the following text to the discussion:

      Lines 418-423: We hypothesized that the singly labeled cells observed from one day tracing experiments expanded in a clonal fashion during the various timepoints we measured. We note that the translation kinetics of the labeled cells following cre-mediated recombination may contribute to the limited labeling observed at PND8/PND15 and there is a potential for delayed labeling of cells between 24 and 72 hours of tamoxifen administration. However, the continuous increase in labeled cells at the subsequent timepoints favors our interpretation of clonal expansion as the primary explanation.

      (5) It would strengthen the in vivo ablation data to validate the degree of cell death following diphtheria toxin treatment directly. It is possible that a general decrease in cell number rather than specific loss of a stem cell population is responsible for the observed reduction in gland number and FOXA2 expression (Tongtong et al 2017).

      We agree that this is an important control to incorporate into our experimental design. To rule out this possibility, we performed immunohistochemistry of cleaved caspase 3 in the uterine tissues of DTR<sup>flox/flox</sup> and DTR<sup>flox/flox</sup>;Aldh1a1<sup>cre/ERT2</sup> mice 4 days after administration of diphtheria toxin. The results indicate similar levels of cleaved caspase 3 detection in both genotypes, suggesting that the decrease in FOXA2+ cells is not due to non-specific cell death, but rather the result of ALDH1A1<sup>+</sup> cells. These data and the following text have been added to the manuscript:

      Lines 321-325: “We determined that the decreased in FOXA2<sup>+</sup> cells in the experimental mice was not the result of non-specific DT-mediated cell death, as similar levels of cleaved caspase 3-positive cells were detected in the DT-treated control ROSA26<sup>DTR/DTR</sup> and ROSA26<sup>DTR/DTR</sup>;Aldh1a1<sup>cre/ERT2/+</sup> mice 4 days post-diphtheria toxin administration (Figure S3G-H’).”

      (6) The lineage tracing data in the postpartum endometrium demonstrate that Aldh1a1-marked cells are present during regeneration, but it remains unclear whether these cells are preferentially activated or expanded in response to tissue injury. Coupling these studies with diphtheria toxin-mediated ablation during active regeneration would more directly test the proposed regenerative role of this population.

      This is a great point and one that we would be very interested in pursuing as follow-up studies in our future work. Regretfully, due to the long generation time and experimental procedures associated with these proposed studies, we are not able to include these experiments in the current manuscript. Thus, we have changed our wording and conclusions throughout the manuscript to be less definitive in terms of the role of Aldh1a1 in regeneration, since this will be the focus of future studies

      The contribution of stromal Aldh1a1 lineage-positive cells is underexplored in the discussion, given the lineage tracing data showing stromal labeling across multiple timepoints and its potential relevance to mesenchymal-to-epithelial transition.

      Thank you for the suggestion. We have now expanded this section in the Discussion to include the following:

      Lines 497-505: We also found ALDH1A1<sup>+</sup> stromal cells were more prevalent when tracing began in adult mice. Other studies have shown that mesenchymal cells contribute to endometrial regeneration in the postpartum phase or after induced menses through a process of MET (Cousins et al., 2014; Kirkwood et al., 2022; Li et al., 2025). Similarly, lineage tracing studies have shown that MET is an active process and contributes to epithelial cell regeneration in the post-partum phase (Huang et al., 2012; Patterson et al., 2013). Although this is an area of active investigation in the field, with some contradicting reports, it is plausible to hypothesize that endometrial tissue has the capacity to undergo wound-healing and regeneration via several mechanisms (Ang et al., 2023; Ghosh et al., 2020). The process of MET in wound healing is widely documented in other organs, such as the kidney, liver and lung, where MET is associated with depletion of the resident epithelial cell pool (Bi et al., 2012; Niayesh-Mehr et al., 2024; Zeisberg et al., 2005).

      Finally, the word 'control' may overstate the functional evidence presented. 'Contribute' may be more accurate given the partial and context-dependent nature of the phenotypes observed.

      We agree with the reviewer’s point that control may overstate the evidence that we provide in the manuscript. To reflect this, we have edited the manuscript title and text to address this suggestion.

      Reviewer #2 (Public review):

      Tang et al. investigated the contribution of Aldh1a1+ cells, as putative stem/progenitor cells, to endometrial development, maintenance during the estrous cycle, and postpartum repair in mouse models. They employed in vitro organoid formation and in vivo lineage tracing models coupled with RNA-seq to test the stem-ness of Aldh1a1+ cells. They found that mouse endometrial cells with high ALDH activity (using the ALDEFLUOR assay) formed more and larger organoids and were enriched for stem/progenitor cell gene signatures. Similar results were shown using endometrial cells from a human patient sample. Epithelial ALDH1A1 expression was shown to be hormonally regulated, becoming more restricted to the glands, a putative epithelial stem cell niche, under estrogen stimulation. Using lineage-tracing initiated postnatally/prepubertally, Aldh1a1+ epithelial cells were shown to expand, contributing to both the luminal and glandular epithelium into adulthood, whereas adult initiation of labeling showed expansion of stromal Aldh1a1+ cells but not epithelial. Postnatal ablation of single-labeled Aldh1a1+ epithelial cells resulted in impaired gland development. Lastly, Aldh1a1-lineage traced cells (adult labeled) were present during postpartum endometrial repair as were epithelial/mesenchymal transitional cells.

      This study addresses an important area of research in the field of endometrial stem/progenitor cell biology. The authors are commended for their use of multiple complementary methods, including lineage tracing, DTR-mediated cell ablation, organoid assays, and RNA-seq in mouse and human models to assess the stem-like nature of Aldh1a1+ cells. The data support the stem/progenitor phenotype of Aldh1a1+ epithelial cells during endometrial development; however, there are noted discrepancies between organoid formation assays and lineage tracing experiments regarding the stemness of Aldh1a1+ epithelial cells in adults. Specifically, organoids were generated from adult cells and demonstrated in vitro stem cell activity; however, in vivo lineage-tracing of adult cells either during the estrous cycle or postpartum repair does not show expansion of Aldh1a1+ cells, suggesting they do not have stem/progenitor activity. Additionally, the stem-ness of epithelial vs stromal Aldh1a1+ cells is confounded in the study because epithelial cells were not purified for organoid experiments, epithelial cells were not exclusively lineage-traced as stromal cells were also labeled, and mesenchymal-epithelial transition was suggested to occur during postpartum repair. The following specific comments are presented to detail these concerns:

      We thank the reviewer for their critical reading of our manuscript and constructive comments.

      (1) The statement in the brief summary, "...critical for lifelong endometrial regeneration," is not supported by the data provided.

      We have edited the brief summary to exclude this statement, it now reads as follows:

      Lines 4-5: “We uncover ALDH1A1<sup>+</sup> cells as a group of hormone sensitive stem cells contributing to endometrial development and regeneration.”

      (2) AlDH1A1 is not restricted to the endometrial epithelium, and epithelial cells were not purified by flow cytometry for experiments in Figure 1. Figure 2 clearly shows the presence of mesenchymal cells, even using the described method for enriching for epithelial cells. Therefore, contaminating mesenchymal cells with high ALDH activity may confound the experimental results in Figure 1, either through promoting epithelial cell growth or through MET. The authors should provide clear evidence of epithelial purity in organoid experiments or that mesenchymal cells are not contained in the ALDHhi population. These comments also apply to the human organoid experiments in Figure 7.

      We thank the reviewer for raising this important point. Our group has been using the enzymatic method to routinely separate epithelial from stromal cell populations from the mouse uterus (see references dating back to 2015, PMID 26721398, 28324064, 34099644). In these experiments we typically obtain >98% purity in the epithelial and stromal cell compartments, respectively. We can directly observe this purity in the immunofluorescence images shown below, where mouse endometrial epithelial cells and stromal cells were enzymatically separated and immunostained with E-cadherin and vimentin antibodies to detect epithelial and mesenchymal cells in both cell preparations. The images show very few contaminating epithelial and stromal cells in either cell preparation. We have observed similar results when preparing epithelial and stromal cell preparation from the human endometrium, where the epithelial cell organoids display high purity with ~100% epithelial cell expression when we perform immunostaining.

      Author response image 1.

      Purity of mouse endometrial epithelial cells obtained via enzymatic and mechanical dissociation. A-B) Shows the epithelial (A) and stromal (B) cells plated on glass coverslips and immunostained with an epithelial cell marker (cytokeratin 8, red), a stromal cell marker (vimentin, green), and DAPI.

      Author response image 2.

      Human endometrial epithelial organoids were fixed and immunostained with cytokeratin 8 (green) and DAPI. The images are typical for our epithelial cell cultures and demonstrate that all epithelial cells are CK8-positive.

      (3) Lines 186-187: Susd2 was increased in EpSC clusters, yet this is a mesenchymal stem/progenitor marker in humans. The authors should discuss the implications of this.

      We thank the reviewer for highlighting this. We have now included the following in our Discussion to address this point:

      Lines 528-533: Clustering with this population of EpSCs were Susd2<sup>+</sup> cells, which are well-characterized mesenchymal progenitors that are enriched in the perivascular regions of the human endometrium (Darzi et al., 2016; Khanmohammadi et al., 2021). The presence of Susd2<sup>+</sup> cells, while unexpected in an epithelial stem cell niche, could indicate the presence of a transitional mesenchymal or perivascular cell that is differentiating into epithelium. Evidence for both mesenchymal and Nestin2<sup>+</sup> pericytes have been recently described in the mouse endometrial epithelium (Kirkwood et al., 2022; Li et al., 2025).

      (4) In Figure 5, RFP+ epithelial cells should be quantified as in previous figures to substantiate the statement in lines 279-280, "At PPD5, the proportion of RFP+ epithelial cells had expanded relative to PPD1 and PPD3 (Figure 5E-E')." Especially because in the low mag images (C-E), RFP+ epithelial cells appear to be most abundant at PPD1 and decrease at PPD3 and PPD5, suggesting that they may not be involved in endometrial regeneration/repair (contradicting the interpretation in line 285). Further, if there is in fact a decrease over postpartum repair, then regeneration should be removed from the title of the manuscript. RFP+ stromal cells should also be quantified.

      We appreciate this reviewer’s comment and agree that as stated, the conclusion is not fully supported by the data. To address this comment, we have edited the results so that they clearly indicate the results and remove any ambiguity:

      As requested, we quantified the number of RFP+ stromal and epithelial cells during the postpartum phase and noted that RFP+ cells were prominent in the stromal compartment of the endometrium. While RFP+ epithelial were also observed during these timepoints, they were less abundant than RFP+ stromal cells. Because the number of RFP+ cells did not significantly change over the postpartum phases in neither the stromal nor epithelial compartment, we have modified our conclusion to state that ALDH1A1+ cells are transiently detected in the regenerating endometrium.

      Results:

      Lines 286-295: “By analyzing the uterine tissues near the placental detachment site, we observed that RFP positive cells were prominent in the endometrial stromal cells that were adjacent to the luminal epithelium (Figure 5C-C’, green arrows). RFP<sup>+</sup> cells were also observed in the stromal cells near the placental detachment sites at PPD1 and PPD3 (Figure 5D’-E’, red & blue arrows) and in limited luminal epithelial cells (Figure 5D”,E”). Quantification of RFP<sup>+</sup> cells throughout these postpartum phases indicated that stromal cells had more frequent ALDH1A1<sup>+</sup> stromal cells (360 ± 103, PPD1, n=3; 217 ± 107, PPD3, n=3; 254 ± 32, PPD5, n=4) than ALDH1A1<sup>+</sup> epithelial cells in the regenerating endometrium (65 ± 65, PPD1, n=3; 20 ± 10, PPD3, n=3; 114.25 ± 39, PPD5, n=4) (Figure S4).”

      Discussion:

      Lines 513-521: “We also noted that a majority of ALDH1A1<sup>+</sup> cells were localized to the active areas of endometrial regeneration near the placental detachment sites at PPD1 with a pronounced expression in the sub-epithelial stromal cells. As regeneration progressed, we continued to observe ALDH1A1<sup>+</sup> cells in the stromal compartment within the placental detachment sites at PPD3 and PPD5, with a progressive, but not statistically significant, increase in ALDH1A1<sup>+</sup> epithelial cells. Collectively, our data demonstrate that ALDH1A1<sup>+</sup> lineage cells participate in the restoration of endometrial architecture and functional compartments in the postpartum phase, even if their direct contribution is transient. Future detailed and mechanistic studies will be necessary to fully characterize their role in this process and their long-term consequence in postpartum regeneration.”

      (5) For Figure 7F, it should be clearly stated in the main text that the results are from one patient sample and the data presented are experimental replicates, so as not to be confused with biological replicates (the same for Supplementary Figure S4). Were B and G in Figure 7 also from one patient?

      Thanks for pointing this out. We have edited the figure legends in the main text and supplemental figures to indicate this.

      Lines 337-338: “…main figures show representative results from one patient sample performed in technical replicates, with additional patient samples included in the supplement…”

      (6) Lines 425-427: "Ovariectomized mice treated with 90-day E2 pellets, on the other hand, showed a complete restriction of ALDH1A1 to the glandular crypts." In Figure 2 S' ALDH1A1+ cells are visible in the LE (the staining is lighter than in the GE but looks real), contradicting this statement.

      This is an important distinction. We have now edited this part of the manuscript to state:

      Lines 459-462: “Ovariectomized mice treated with 90-day E2 pellets, on the other hand, showed enriched ALDH1A1 in the glandular crypts with weak luminal epithelial staining, while the ovariectomized controls had strong ALDH1A1 expression throughout the luminal and glandular epithelium.”

      (7) Lines 466-467: "In cycling mice, we found sporadic cells that expressed both stromal and epithelial markers in the ALDHA1+ cells." These data are not presented.

      We apologize for the confusion, this sentence has been removed from the discussion.

      (8) These data support the role of Aldh1a1+ cells in endometrial epithelial development, but conclusions about their role in repair/regeneration should be tempered as the data are much weaker here.

      We thank the reviewer for their overall assessment. To address this point, we have thoroughly edited the appropriate areas to temper the conclusions and ensure that they are strongly supported by our data. We have also edited the manuscript’s title to reflect this.

      Reviewer #3 (Public review):

      Summary:

      Tan et al demonstrated the importance of ALDH-high cells in the epithelial development in the mouse endometrium, and these cells displayed properties of stem cells.

      We thank the reviewer for their assessment of our manuscript.

      Strengths:

      The findings are solid, supported and validated through a combination of technical methods. I appreciated this combined use of mouse and human endometrial cells to strengthen the findings. Genomic results from a single-cell sequencing dataset were informative as they depicted the different stages of the estrus cycle during the regeneration process. Verification with immunostainings with various markers made it convincing for readers to visualize the cell's location, progression, and status at different timepoints. Utilizing human endometrial cells further demonstrated that the phenomenon observed in mice can be translated to humans.

      This work will greatly advance the understanding of endometrial regeneration for reproductive biologists.

      We thank the reviewer for their expert assessment and positive comments regarding our manuscript.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      Reference

      Ang, C.J., Skokan, T.D., and McKinley, K.L. (2023). Mechanisms of Regeneration and Fibrosis in the Endometrium. Annu Rev Cell Dev Biol 39, 197-221.

      Bi, W.R., Jin, C.X., Xu, G.T., and Yang, C.Q. (2012). Bone morphogenetic protein-7 regulates Snail signaling in carbon tetrachloride-induced fibrosis in the rat liver. Exp Ther Med 4, 1022-1026.

      Chen, M.Y., Zhao, F.L., Chu, W.L., Bai, M.R., and Zhang, D.M. (2023). A review of tamoxifen administration regimen optimization for Cre/loxp system in mouse bone study. Biomed Pharmacother 165, 115045. Cousins, F.L., Murray, A., Esnal, A., Gibson, D.A., Critchley, H.O., and Saunders, P.T. (2014). Evidence from a mouse model that epithelial cell migration and mesenchymal-epithelial transition contribute to rapid restoration of uterine tissue integrity during menstruation. PLoS One 9, e86378.

      Cousins, F.L., Pandoy, R., Jin, S., and Gargett, C.E. (2021). The Elusive Endometrial Epithelial Stem/Progenitor Cells. Front Cell Dev Biol 9, 640319.

      Darzi, S., Werkmeister, J.A., Deane, J.A., and Gargett, C.E. (2016). Identification and Characterization of Human Endometrial Mesenchymal Stem/Stromal Cells and Their Potential for Cellular Therapy. Stem Cells Transl Med 5, 1127-1132.

      Ghosh, A., Syed, S.M., Kumar, M., Carpenter, T.J., Teixeira, J.M., Houairia, N., Negi, S., and Tanwar, P.S. (2020). In Vivo Cell Fate Tracing Provides No Evidence for Mesenchymal to Epithelial Transition in Adult Fallopian Tube and Uterus. Cell Rep 31, 107631.

      Huang, C.C., Orvis, G.D., Wang, Y., and Behringer, R.R. (2012). Stromal-to-epithelial transition during postpartum endometrial regeneration. PLoS One 7, e44285.

      Khanmohammadi, M., Mukherjee, S., Darzi, S., Paul, K., Werkmeister, J.A., Cousins, F.L., and Gargett, C.E. (2021). Identification and characterisation of maternal perivascular SUSD2(+) placental mesenchymal stem/stromal cells. Cell Tissue Res 385, 803-815.

      Kirkwood, P.M., Gibson, D.A., Shaw, I., Dobie, R., Kelepouri, O., Henderson, N.C., and Saunders, P.T.K. (2022). Single-cell RNA sequencing and lineage tracing confirm mesenchyme to epithelial transformation (MET) contributes to repair of the endometrium at menstruation. Elife 11.

      Li, S.Y., Whiteside, S., Li, B., Sun, X., and DeFalco, T. (2025). Mesenchymal-to-epithelial transition of perivascular cells contributes to endometrial re-epithelialization. Nat Commun 16, 10174.

      Niayesh-Mehr, R., Kalantar, M., Bontempi, G., Montaldo, C., Ebrahimi, S., Allameh, A., Babaei, G., Seif, F., and Strippoli, R. (2024). The role of epithelial-mesenchymal transition in pulmonary fibrosis: lessons from idiopathic pulmonary fibrosis and COVID-19. Cell Commun Signal 22, 542.

      Patterson, A.L., Zhang, L., Arango, N.A., Teixeira, J., and Pru, J.K. (2013). Mesenchymal-to-epithelial transition contributes to endometrial regeneration following natural and artificial decidualization. Stem Cells Dev 22, 964-974.

      Pimeisl, I.M., Tanriver, Y., Daza, R.A., Vauti, F., Hevner, R.F., Arnold, H.H., and Arnold, S.J. (2013). Generation and characterization of a tamoxifen-inducible Eomes(CreER) mouse line. Genesis 51, 725-733.

      Rios, A.C., Fu, N.Y., Cursons, J., Lindeman, G.J., and Visvader, J.E. (2016). The complexities and caveats of lineage tracing in the mammary gland. Breast Cancer Res 18, 116.

      Seishima, R., Leung, C., Yada, S., Murad, K.B.A., Tan, L.T., Hajamohideen, A., Tan, S.H., Itoh, H., Murakami, K., Ishida, Y., et al. (2019). Neonatal Wnt-dependent Lgr5 positive stem cells are essential for uterine gland development. Nat Commun 10, 5378.

      Zeisberg, M., Shah, A.A., and Kalluri, R. (2005). Bone morphogenic protein-7 induces mesenchymal to epithelial transition in adult renal fibroblasts and facilitates regeneration of injured kidney. J Biol Chem 280, 8094-8100.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of the triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observed differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle, and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine-scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

      We thank the Reviewer for these comments.

      Weaknesses:

      The findings are limited to one pair of muscle heads. While an important initial finding, the lack of confirmation from analysis of other muscles acting at other joints leaves the general relevance of these findings unclear.

      The Reviewer raises a fair point. While outside the scope of this paper, future studies should certainly address a wider range of muscles to better characterize motor unit firing patterns across different sets of effectors with varying anatomical locations. Still, the importance of results from the triceps long and lateral heads should not be understated as this paper, to our knowledge, is the first to capture the difference in firing patterns of motor units across any set of muscles in the locomoting mouse.

      While differences between muscle heads with somewhat distinct functions are interesting and relevant to joint control, differences between MUs for individual muscles, like those in Marshall et al., are more striking because they cannot be attributed potentially to differences in each head's function. The present manuscript does show some signs of differences for MUs within individual heads: in Figure 2C, we see what looks like two clusters of motor units within the long head in terms of their recruitment probability. However, a statistical basis for the existence of two distinct subpopulations is not provided, and no subsequent analysis is done to explore the potential for differences among MUs for individual heads.

      We agree with the Reviewer and have revised the manuscript to better examine potential subpopulations of units within each muscle as presented in Figure 2C. We performed Hartigan’s dip test on motor units within each muscle to test for multimodal distributions. For both muscles, p > 0.05, so we can not reject the null hypothesis that the units in each muscle come from a multimodal distribution. However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.

      Still, the limited sample size warrants further data collection and analysis since the varying properties across motor units may lead to different activation patterns. Given these results, we have edited the text as follows:

      “A subset of units, primarily in the long head, were recruited in under 50% of the total strides and with lower spike counts (Figure 2C). This distribution of recruitment probabilities might reflect a functionally different subpopulation of units. However, the distribution of recruitment probabilities were not found to be significantly multimodal (p>0.05 in both cases, Hartigan’s dip test; Hartigan, 1985). However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.”

      The statistical foundation for some claims is lacking. In addition, the description of key statistical analysis in the Methods is too brief and very hard to understand. This leaves several claims hard to validate.

      We thank the Reviewer for these comments and have clarified the text related to key statistical analyses throughout the manuscript, as described in our other responses below.

      Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to describe the firing activity of individual motor units in mice during locomotion. To achieve this, they implanted small arrays of eight electrodes in two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Simultaneously, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice at five different speeds, ranging from 10 to 27.5 cm·s⁻¹.

      From these data, the authors reported that:

      (1) a significant portion of the identified motor units was not consistently recruited across strides,

      (2) motor units identified from the lateral head of the triceps tended to be recruited later than those from the long head,

      (3) the number of spikes per stride and peak firing rates were correlated in both muscles, and

      (4) the probability of motor unit recruitment and firing rates increased with walking speed.

      The authors conclude that these differences can be attributed to the distinct functions of the muscles and the constraints of the task (i.e., speed).

      Strengths:

      The combination of novel electrode arrays to record intramuscular electromyographic signals from a larger muscle volume with an advanced spike sorting pipeline capable of identifying populations of motor units.

      We thank the Reviewer for this comment.

      Weaknesses:

      (1) There is a lack of information on the number of identified motor units per muscle and per animal.

      The Reviewer is correct that this information was not explicitly provided in the prior submission. We have therefore added Table 1 that quantifies the number of motor units per muscle and per animal.

      (2) All identified motor units are pooled in the analyses, whereas per-animal analyses would have been valuable, as motor units within an individual likely receive common synaptic inputs. Such analyses would fully leverage the potential of identifying populations of motor units.

      Please see our answer to the following point, where we address questions (2) and (3) together.

      (3) The current data do not allow for determining which motor units were sampled from each pool. It remains unclear whether the sample is biased toward high-threshold motor units or representative of the full pool.

      We thank the Reviewer for these comments. To clarify how motor unit responses were distributed across animals and muscle targets, we updated or added the following figures:  

      Figure 2C

      Figure 4–figure supplement 1

      Figure 5–figure supplement 2

      Figure 6–figure supplement 2

      These provide a more complete look at the range of activity within each motor pool, suggesting that we do measure from units with different activation thresholds within the same motor pool, rather than this variation being due to cross-animal differences. For example, Figure 2C illustrates that motor units from the same muscle and animal show a wide variety of recruitment probabilities. However, the limited number of motor units recorded from each individual animal does not allow a statistically rigorous test for examining cross-animal differences.

      (4) The behavioural analysis of the animals relies solely on kinematics (2D estimates of elbow angle and stride timing). Without ground reaction forces or shoulder angle data, drawing functional conclusions from the results is challenging.

      The Reviewer is correct that we did not measure muscular force generation or ground reaction forces in the present study. Although outside the scope of this study, future work might employ buckle force transducers as used in larger animals (Biewener et al., 1988; Karabulut et al., 2020) to examine the complex interplay between neural commands, passive biomechanics, and the complex force-generating properties of muscle tissue.

      Major comments:

      (1) Spike sorting

      The conclusions of the study rely on the accuracy and robustness of the spike sorting algorithm during a highly dynamic task. Although the pipeline was presented in a previous publication (Chung et al., 2023, eLife), a proper validation of the algorithm for identifying motor unit spikes is still lacking. This is particularly important in the present study, as the experimental conditions involve significant dynamic changes. Under such conditions, muscle geometry is altered due to variations in both fibre pennation angles and lengths.

      This issue differs from electrode drift, and it is unclear whether the original implementation of Kilosort includes functions to address it. Could the authors provide more details on the various steps of their pipeline, the strategies they employed to ensure consistent tracking of motor unit action potentials despite potential changes in action potential waveforms, and the methods used for manual inspection of the spike sorting algorithm's output?

      This is an excellent point and we agree that the dynamic behavior used in this investigation creates potential new challenges for spike sorting. In our analysis, Kilosort 2.5 provides key advantages in comparing unit waveforms across multiple channels and in detecting overlapping spikes. We modified this version of Kilosort to construct unit waveform templates using only the channels within the same muscle (Chung et al., 2023), as clarified in the revised Methods section (see “Electromyography (EMG)”):

      “A total of 33 units were identified across all animals. Each unit’s isolation was verified by confirming that no more than 2% of inter-spike intervals violated a 1 ms refractory limit. Additionally, we manually reviewed cross-correlograms to ensure that each waveform was only reported as a single motor unit.”

      The Reviewer is correct that our ability to precisely measure a unit’s activity based on its waveform will depend on the relationship between the embedded electrode and the muscle geometry, which alters over the course of the stride. As a follow-up to the original text, we have included new analyses to characterize the waveform activity throughout the experiment and stride (also in Methods):

      “We further validated spike sorting by quantifying the stability of each unit’s waveform across time (Figure 1–figure supplement 1). First, we calculated the median waveform of each unit across every trial to capture long-term stability of motor unit waveforms. Additionally, we calculated the median waveform through the stride binned in 50 ms increments using spiking from a single trial. This second metric captures the stability of our spike sorting during the rapid changes in joint angles that occur during the burst of an individual motor unit. In doing so, we calculated each motor unit’s waveforms from the single channel in which that unit’s amplitude was largest and did not attempt to remove overlapping spikes from other units before measuring the median waveform from the data. We then calculated the correlation between a unit’s waveform over either trials or bins in which at least 30 spikes were present. The high correlation of a unit waveform over time, despite potential changes in the electrodes’ position relative to muscle geometry over the dynamic task, provides additional confidence in both the stability of our EMG recordings and the accuracy of our spike sorting.”

      We have included a supplementary to Figure 1 to highlight the effectiveness of our spike sorting.

      (2) Yield of the spike sorting pipeline and analyses per animal/muscle

      A total of 33 motor units were identified from two heads of the triceps in six mice (17 from the long head and 16 from the lateral head). However, precise information on the yield per muscle per animal is not provided. This information is crucial to support the novelty of the study, as the authors claim in the introduction that their electrode arrays enable the identification of populations of motor units. Beyond reporting the number of identified motor units, another way to demonstrate the effectiveness of the spike sorting algorithm would be to compare the recorded EMG signals with the residual signal obtained after subtracting the action potentials of the identified motor units, using a signal-to-residual ratio.

      Furthermore, motor units identified from the same muscle and the same animal are likely not independent due to common synaptic inputs. This dependence should be accounted for in the statistical analyses when comparing changes in motor unit properties across speeds and between muscles.

      We thank the Reviewer for this comment. Regarding motor unit yield, as described above the newly-added Table 1 displays the yield from each animal and muscle.

      Regarding spike sorting, while signal-to-residual is often an excellent metric, it is not ideal for our high-resolution EMG signals since isolated single motor units are typically superimposed on a “bulk” background consisting of the low-amplitude waveforms of other motor units. Because these smaller units typically cannot be sorted, it is challenging to estimate the “true” residual after subtracting (only) the largest motor unit, since subtracting each sorted unit’s waveform typically has a very small effect on the RMS of the total EMG signal. To further address concerns regarding spike sorting quality, we added Figure 1–figure supplement 1 that demonstrates motor units’ consistency over the experiment, highlighting that the waveform maintains its shape within each stride despite muscle/limb dynamics and other possible sources of electrical noise or artifact.

      Finally, the Reviewer is correct that individual motor units in the same muscle are very likely to receive common synaptic inputs. These common inputs may reflect in sparse motor units being recruited in overlapping rather than different strides. Indeed, in the following text added to the Results, we identified that motor units are recruited with higher probability when additional units are recruited.

      “Probabilistic recruitment is correlated across motor units

      Our results show that the recruitment of individual motor units is probabilistic even within a single speed quartile (Figure 5A-C) and predicts body movements (Figure 6), raising the question of whether the recruitment of individual motor units are correlated or independent. Correlated recruitment might reflect shared input onto the population of motor units innervating the muscle (De Luca, 1985; De Luca & Erim, 1994; Farina et al., 2014). For example, two motor units, each with low recruitment probabilities, may still fire during the same set of strides. To assess the independence of motor unit recruitment across the recorded population, we compared each unit’s empirical recruitment probability across all strides to its conditional recruitment probability during strides in which another motor unit from the same muscle was recruited (Figure 7). Doing this for all motor unit pairs revealed that motor units in both muscles were biased towards greater recruitment when additional units were active (p<0.001, Wilcoxon signed-rank tests for both the lateral and long heads of triceps). This finding suggests that probabilistic recruitment reflects common synaptic inputs that covary together across locomotor strides.”

      (3) Representativeness of the sample of identified motor units

      However, to draw such conclusions, the authors should exclusively compare motor units from the same pool and systematically track violations of the recruitment order. Alternatively, they could demonstrate that the motor units that are intermittently active across strides correspond to the smallest motor units, based on the assumption that these units should always be recruited due to their low activation thresholds.

      One way to estimate the size of motor units identified within the same muscle would be to compare the amplitude of their action potentials, assuming that all motor units are relatively close to the electrodes (given the selectivity of the recordings) and that motoneurons innervating more muscle fibres generate larger motor unit action potentials.

      We thank the Reviewer for this comment. Below, we provide more detailed analyses of the relationships between motor unit spike amplitude and the recruitment probability as well as latency (relative to stride onset) of activation.

      We generated Author response image 1 to illustrate the relationship between the amplitude of motor units and their firing properties. As suspected, units with larger-amplitude waveforms fired with lower probability and produced their first spikes later in the stride. If we were comfortable assuming that larger spike amplitudes mean higher-force units, then this would be consistent with a key prediction of the size principle (i.e. that higher-force units are recruited later). However, we are hesitant to base any conclusions on this assumption or emphasize this point with a main-text figure, since EMG signal amplitude may also vary due to the physical properties of the electrode and distance from muscle fibers. Thus it is possible that a large motor unit may have a smaller waveform amplitude relative to the rest of the motor pool.

      Author response image 1.

      Relation between motor unit amplitude and (A) recruitment probability and (B) mean first spike time within the stride. Colored lines indicate the outcome of linear regression analyses.

      Currently, the data seem to support the idea that motor units that are alternately recruited across strides have recruitment thresholds close to the level of activation or force produced during slow walking. The fact that recruitment probability monotonically increases with speed suggests that the force required to propel the mouse forward exceeds the recruitment threshold of these "large" motor units. This pattern would primarily reflect spatial recruitment following the size principle rather than flexible motor unit control.

      We thank the Reviewer for this comment. We agree with this interpretation, particularly in relation to the references suggested in later comments, and have added the following text to the Discussion to better reflect this argument:

      “To investigate the neuromuscular control of locomotor speed, we quantified speed-dependent changes in both motor unit recruitment and firing rate. We found that the majority of units were recruited more often and with larger firing rates at faster speeds (Figure 5, Figure5–figure supplement 1). This result may reflect speed-dependent differences in the common input received by populations of motor neurons with varying spiking thresholds (Henneman et al., 1965). In the case of mouse locomotion, faster speeds might reflect a larger common input, increasing the recruitment probability as more neurons, particularly those that are larger and generate more force, exceed threshold for action potentials (Farina et al., 2014).”

      (4)    Analysis of recruitment and firing rates

      The authors currently report active duration and peak firing rates based on spike trains convolved with a Gaussian kernel. Why not report the peak of the instantaneous firing rates estimated from the inverse of the inter-spike interval? This approach appears to be more aligned with previous studies conducted to describe motor unit behaviour during fast movements (e.g., Desmedt & Godaux, 1977, J Physiol; Van Cutsem et al., 1998, J Physiol; Del Vecchio et al., 2019, J Physiol).

      We thank the Reviewer for this comment. In the revised Discussion (see ‘Firing rates in mouse locomotion compared to other species’) we reference several examples of previous studies that quantified spike patterns based on the instantaneous firing rate. We chose to report the peak of the smoothed firing rate because that quantification includes strides with zero spikes or only one spike, which occur regularly in our dataset (and for which ISI rate measures, which require two spikes to define an instantaneous firing rate, cannot be computed). Regardless, in the revised Figure 4B, we present an analysis that uses inter-spike intervals as suggested, which yielded similar ranges of firing rates as the primary analysis.

      (5)    Additional analyses of behaviour

      The authors currently analyse motor unit recruitment in relation to elbow angle. It would be valuable to include a similar analysis using the angular velocity observed during each stride, re broadly, comparing stride-by-stride changes in firing rates with changes in elbow angular velocity would further strengthen the final analyses presented in the results section.

      We thank the Reviewer for this comment. To address this, we have modified Figure 6 and the associated Supplemental Figures, to show relationships in unit activation with both the range of elbow extension and the range of elbow velocity for each stride. These new Supplemental Figures show that the trends shown in main text Figure 6C and 6E (which show data from all speed quartiles on the same axes) are also apparent in both the slower and faster quartiles individually, although single-quartile statistical tests (with smaller sample size than the main analysis) not reach statistical significance in all cases.

      Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that:

      (1) Motor units are recruited differently in the two types of muscles.

      (2) Individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle.

      (3) The recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique data set, and the data analysis is convincing and well-performed.

      We thank the Reviewer for the comment.

      Weaknesses:

      The implications of "probabilistical recruitment" should be explored, addressed, and analyzed further.

      Comments:

      One of the study's main findings (perhaps the main finding) is that the motor units are "probabilistically" recruited. The authors do not define what they mean by probabilistically recruited, nor do they present an alternative scenario to such recruitment or discuss why this would be interesting or surprising. However, on page 4, they do indicate that the recruitment of units from both muscles was only active in a subset of strides, i.e., they are not reliably active in every step.

      If probabilistic means irregular spiking, this is not new. Variability in spiking has been seen numerous times, for instance in human biceps brachii motor units during isometric contractions (Pascoe, Enoka, Exp physiology 2014) and elsewhere. Perhaps the distinction the authors are seeking is between fluctuation-driven and mean-driven spiking of motor units as previously identified in spinal motor networks (see Petersen and Berg, eLife 2016, and Berg, Frontiers 2017). Here, it was shown that a prominent regime of irregular spiking is present during rhythmic motor activity, which also manifests as a positive skewness in the spike count distribution (i.e., log-normal).

      We thank the Reviewer for this comment and have clarified several passages in response. The Reviewer is of course correct that irregular motor unit spiking has been described previously and may reflect motor neurons’ operating in a high-sensitivity (fluctuation-driven) regime. We now cite these papers in the Discussion (see ‘Firing rates in mouse locomotion compared to other species’). Additionally, the revision clarifies that “probabilistically” - as defined in our paper - refers only to the empirical observation that a motor unit spikes during only a subset of strides, either when all locomotor speeds are considered together (Figure 2) or separately (Figure 5A-C):

      “Motor units in both muscles exhibited this pattern of probabilistic recruitment (defined as a unit’s firing on only a fraction of strides), but with differing distributions of firing properties across the long and lateral heads (Figure 2).”

      “Our findings (Figure 4) highlight that even with the relatively high firing rates observed in mice, there are still significant changes in firing rate and recruitment probability across the spikes within bursts (Figure 4B) and across locomotor speeds (Figure 5F). Future studies should more carefully examine how these rapidly changing spiking patterns derive from both the statistics of synaptic inputs and intrinsic properties of motor neurons (Manuel & Heckman, 2011; Petersen & Berg, 2016; Berg, 2017).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, there are several issues with the statistics that need to be corrected to properly support the claims made in the paper.

      The authors compare the fractions of MUs that show significant variation across locomotor speeds in their firing rate and recruitment probability. However, it is not statistically founded to compare the results of separate statistical tests based on different kinds of measurements and thus have unconstrained differences in statistical power. The comparison of the fractional changes in firing rates and recruitment across speeds that follow is helpful, though in truth, by contemporary standards, one would like to see error bars on these estimates. These could be generated using bootstrapping.

      The Reviewer is correct, and we have revised the manuscript to better clarify which quantities should or should not be compared, including the following passage (see “Motor unit mechanisms of speed control” in Results):

      “Speed-dependent increases in peak firing rate were therefore also present in our dataset, although in a smaller fraction of motor units (22/33) than changes in recruitment probability (31/33). Furthermore, the mean (± SE) magnitude of speed-dependent increases was smaller for spike rates (mean rate<sub>fast</sub>/rate<sub>slow</sub> of 111% ± 20% across all motor units) than for recruitment probabilities (mean p(recruitment)<sub>fast</sub>/p(recruitment)<sub>slow</sub> of 179% ± 3% across all motor units). While fractional changes in rate and recruitment probability are not readily comparable given their different upper limits, these findings could suggest that while both recruitment and peak rate change across speed quartiles, increased recruitment probability may play a larger role in driving changes in locomotor speed.”

      The description in the Methods of the tests for variation in firing rates and recruitment probability across speeds are extremely hard to understand - after reading many times, it is still not clear what was done, or why the method used was chosen. In the main text, the authors quote p-values and then state "bootstrap confidence intervals," which is not a statistical test that yields a p-value. While there are mathematical relationships between confidence intervals and statistical tests such that a one-to-one correspondence between them can exist, the descriptions provided fall short of specifying how they are related in the present instance. For this reason, and those described in what follows, it is not clear what the p-values represent.

      Next, the authors refer to fitting a model ("a Poisson distribution") to the data to estimate firing rate and recruitment probability, that the model results agree with their actual data, and that they then bootstrapped from the model estimates to get confidence intervals and compute p-values. Why do this? Why not just do something much simpler, like use the actual spike counts, and resample from those? I understand that it is hard to distinguish between no recruitment and just no spikes given some low Poisson firing rate, but how does that challenge the ability to test if the firing rates or the number of spiking MUs changes significantly across speeds? I can come up with some reasons why I think the authors might have decided to do this, but reasoning like this really should be made explicit.

      In addition, the authors would provide an unambiguous description of the model, perhaps using an equation and a description of how it was fit. For the bootstrapping, a clear description of how the resampling was done should be included. The focus on peak firing rate instead of mean (or median) firing rate should also be justified. Since peaks are noisier, I would expect the statistical power to be lower compared to using the mean or median.

      We thank the Reviewer for the comments and have revised and expanded our discussion of the statistical tests employed. We expanded and clarified our description of these techniques in the updated Methods section:

      “Joint model of rate and recruitment

      We modeled the recruitment probability and firing rate based on empirical data to best characterize firing statistics within the stride. Particularly, this allowed for multiple solutions to explain why a motor unit would not spike within a stride. From the empirical data alone, strides with zero spikes would have been assumed to have no recruitment of a unit. However, to create a model of motor unit activity that includes both recruitment and rate, it must be possible that a recruited unit can have a firing rate of zero. To quantify the firing statistics that best represent all spiking and non-spiking patterns, we modeled recruitment probability and peak firing rate along the following piecewise function:

      Eq. 1:

      Eq. 2:

      where y denotes the observed peak firing rate on a given stride (determined by convolving motor unit spike times with a Gaussian kernel as described above), p denotes the probability of recruitment, and λ denotes the expected peak firing rate from a Poisson distribution of outcomes. Thus, an inactive unit on a given stride may be the result of either non-recruitment or recruitment with a stochastically zero firing rate. The above equations were fit by minimizing the negative log-likelihood of the parameters given the data.”

      “Permutation test for joint model of rate and recruitment and type 2 regression slopes

      To quantify differences in firing patterns across walking speeds, we subdivided each mouse’s total set of strides into speed quartiles and calculated rate (𝜆, Eq. 1 and 2, Fig. 5A-C) and recruitment probability terms (p, Eq. 1 and 2, Fig. 5D-F) for each unit in each speed quartile. Here we calculated the difference in both the rate and recruitment terms across the fastest and slowest speed quartiles (p<sub>fast</sub>-p<sub>slow</sub> and 𝜆<sub>fast</sub>-𝜆<sub>slow</sub>). To test whether these model parameters were significantly different depending on locomotor speed, we developed a null model combining strides from both the fastest and slowest speed quartiles. After pooling strides from both quartiles, we randomly distributed the pooled set of strides into two groups with sample sizes equal to the original slow and fast quartiles. We then calculated the null model parameters for each new group and found the difference between like terms. To estimate the distribution of possible differences, we bootstrapped this result using 1000 random redistributions of the pooled set of strides. Following the permutation test, the 95% confidence interval of this final distribution reflects the null hypothesis of no difference between groups. Thus, the null hypothesis can be rejected if the true difference in rate or recruitment terms exceeds this confidence interval.

      We followed a similar procedure to quantify cross-muscle differences in the relationship between firing parameters. For each muscle, we estimated the slope across firing parameters for each motor unit using type 2 regression. In this case, the true difference was the difference in slopes between muscles. To test the null hypothesis that there was no difference in slopes, the null model reflected the pooled set of units from both muscles. Again, slopes were calculated for 1000 random resamplings of this pooled data to estimate the 95% confidence interval.”

      The argument for delayed activation of the lateral head is interesting, but I am not comfortable saying the nervous system creates a delay just based on observations of the mean time of the first spike, given the potential for differential variability in spike timing across muscles and MUs. One way to make a strong case for a delay would be to show aggregate PSTHs for all the spikes from all the MUs for each of the two heads. That would distinguish between a true delay and more gradual or variable activation between the heads.

      This is a good point and we agree that the claim made about the nervous system is too strong given the results. Even with Author response image 2 that the Reviewer suggested, there is still not enough evidence to isolate the role of the nervous system in the muscles’ activation.

      Author response image 2.

      Aggregate peristimulus time histogram (PSTH) for all motor unit spike times in the long head (top) and lateral head (bottom) within the stride.

      In the ideal case, we would have more simultaneous recordings from both muscles to make a more direct claim on the delay. Still, within the current scope of the paper, to correct this and better describe the difference in timing of muscle activity, we edited the text to the following:

      “These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, the motor pool for the long head becomes active roughly 100 ms before the motor pool supplying the lateral head during locomotion (Figure 3C).”

      The results from Marshall et al. 2022 suggest that the recruitment of some MUs is not just related to muscle force, but also the frequency of force variation - some of their MUs appear to be recruited only at certain frequencies. Figure 5C could have shown signs of this, but it does not appear to. We do not really know the force or its frequency of variation in the measurements here. I wonder whether there is additional analysis that could address whether frequency-dependent recruitment is present. It may not be addressable with the current data set, but this could be a fruitful direction to explore in the future with MU recordings from mice.

      We agree that this would be a fruitful direction to explore, however the Reviewer is correct that this is not easily addressable with the dataset. As the Reviewer points out, stride frequency increases with increased speed, potentially offering the opportunity to examine how motor unit activity varies with the frequency, phase, and amplitude of locomotor movements. However, given our lack of force data (either joint torques or ground reaction forces), dissociating the frequency/phase/amplitude of skeletal kinematics from the frequency/phase/amplitude of muscle force. Marshall et al. (2022) mitigated these issues by using an isometric force-production task (Marshall et al., 2022). Therefore, while we agree that it would be a major contribution to extend such investigations to whole-body movements like locomotion, given the complexities described above we believe this is a project for the future, and beyond the scope of the present study.

      Minor:

      Page 5: "Units often displayed no recruitment in a greater proportion of strides than for any particular spike count when recruited (Figures 2A, B)," - I had to read this several times to understand it. I suggest rephrasing for clarity.

      We have changed the text to read:

      “Units demonstrated a variety of firing patterns, with some units producing 0 spikes more frequently than any non-zero spike count (Figure 2A, B),...”

      Figure 3 legend: "Mean phase ({plus minus} SE) of motor unit burst duration across all strides.": It is unclear what this means - durations are not usually described as having a phase. Do we mean the onset phase?

      We have changed the text to read:

      “Mean phase ± SE of motor unit burst activity within each stride”

      Page 9: "suggesting that the recruitment of individual motor units in the lateral and long heads might have significant (and opposite) effects on elbow angle in strides of similar speed (see Discussion)." I wouldn't say "opposite" here - that makes it sound like the authors are calling the long head a flexor. The authors should rephrase or clarify the sense in which they are opposite.

      This is a fair point and we agree we should not describe the muscles as ‘opposite’ when both muscles are extensors. We have removed the phrase ‘and opposite’ from the text.

      Page 11: "in these two muscles across in other quadrupedal species" - typo.

      We have corrected this error.

      Page 16: This reviewer cannot decipher after repeated attempts what the first two sentences of the last paragraph mean. - “Future studies might also use perturbations of muscle activity to dissociate the causal properties of each motor unit’s activity from the complex correlation structure of locomotion. Despite the strong correlations observed between motor unit recruitment and limb kinematics (Fig. 6, Supplemental Fig. 3), these results might reflect covariations of both factors with locomotor speed rather than the causal properties of the recorded motor unit.”

      For better clarity, we have changed the text to read:

      “Although strong correlations were observed between motor unit recruitment and limb kinematics during locomotion (Figure 6, Figure 6–figure supplement 1), it remains unclear whether such correlations actually reflect the causal contributions that those units make to limb movement. To resolve this ambiguity, future studies could use electrical or optical perturbations of muscle contraction levels (Kim et al., 2024; Lu et al., 2024; Srivastava et al., 2015, 2017) to test directly how motor unit firing patterns shape locomotor movements.The short-latency effects of patterned motor unit stimulation (Srivastava et al., 2017) could then reveal the sensitivity of behavior to changes in muscle spiking and the extent to which the same behaviors can be performed with many different motor commands.”

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Introduction:

      (1) "Although studies in primates, cats, and zebrafish have shown that both the number of active motor units and motor unit firing rates increase at faster locomotor speeds (Grimby, 1984; Hoffer et al., 1981, 1987; Marshall et al., 2022; Menelaou & McLean, 2012)." I would remove Marshall et al. (2022) as their monkeys performed pulling tasks with the upper limb. You can alternatively remove locomotor from the sentence and replace it with contraction speed.

      Thank you for the comment. While we intended to reference this specific paper to highlight the rhythmic activity in muscles, we agree that this deviates from ‘locomotion’ as it is referenced in the other cited papers which study body movement. We have followed the Reviewer’s suggestion to remove the citation to Marshall et al.

      (2) "The capability and need for faster force generation during dynamic behavior could implicate motor unit recruitment as a primary mechanism for modulating force output in mice."

      The authors could add citations to this sentence, of works that showed that recruitment speed is the main determinant of the rate of force development (see for example Dideriksen et al. (2020) J Neurophysiol; J. L. Dideriksen, A. Del Vecchio, D. Farina, Neural and muscular determinants of maximal rate of force development. J Neurophysiol 123, 149-157 (2020)).

      Thank you for pointing out this important reference. We have included this as a citation as recommended.

      Results:

      (3) "Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in the triceps brachii (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units (Figure 1E) as described previously (Chung et al., 2023)."

      This sentence can be misleading for the reader as the array used by the researchers has 4 threads of 8 electrodes. Would it be possible to specify the number of electrodes implanted per head of interest? I assume 8 per head in most mice (or 4 bipolar channels), even if that's not specifically written in the manuscript.

      Thank you for the suggestion. As described above, we have added Table 1, which includes all array locations, and we edited the statement referenced in the comment as follows:

      “Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in forelimb muscles (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units in the triceps brachii long and lateral heads (Table 1, Figure 1E) as described previously (Chung et al., 2023).“

      (4) "These findings demonstrate that despite the overlapping biomechanical functions of the long and lateral heads of the triceps, the nervous system creates a consistent, approximately 100 ms delay (Figure 3C) between the activation of the two muscles' motor neuron pools. This timing difference suggests distinct patterns of synaptic input onto motor neurons innervating the lateral and long heads."

      Both muscles don't have fully overlapping biomechanical functions, as one of them also acts on the shoulder joint. Please be more specific in this sentence, saying that both muscles are synergistic at the elbow level rather than "have overlapping biomechanical functions".

      We agree with the above reasoning and that our manuscript should be clearer on this point. We edited the above text in accordance with the Reviewer suggestion as follows:

      "These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, …”

      (5) "Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role."

      It is difficult to draw such an affirmative conclusion on the synaptic inputs from the data presented by the authors. The differences in firing rates may solely arise from other factors than distinct synaptic inputs, such as the different intrinsic properties of the motoneurons or the reception of distinct neuromodulatory inputs.

      To better explain our findings, we adjusted the above text in the Results (see “Motor unit firing patterns in the long and lateral heads of the triceps”):

      “Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role.”

      We also included the following distinction in the Discussion (see “Differences in motor unit activity patterns across two elbow extensors”) to address the other plausible mechanisms mentioned.

      “The large differences in burst timing and spike patterning across the muscle heads suggest that the motor pools for each muscle receive distinct inputs. However, differences in the intrinsic physiological properties of motor units and neuromodulatory inputs across motor pools might also make substantial contributions to the structure of motor unit spike patterns (Martínez-Silva et al., 2018; Miles & Sillar, 2011).”

      (6) "We next examined whether the probabilistic recruitment of individual motor units in the triceps and elbow extensor muscle predicted stride-by-stride variations in elbow angle kinematics."

      I'm not sure that the wording is appropriate here. The analysis does not predict elbow angle variations from parameters extracted from the spiking activity. It rather compares the average elbow angle between two conditions (motor unit active or not active).

      We thank the Reviewer for this comment and agree that the wording could be improved here to better reflect our analysis. To lower the strength of our claim, we replaced usage of the word

      ‘predict’ with ‘correlates’ in the above text and throughout the paper when discussing this result.

      Methods:

      (7) "Using the four threads on the customizable Myomatrix array (RF-4x8-BHS-5), we implanted a combination of muscles in each mouse, sometimes using multiple threads within the same muscle. [...] Some mice also had threads simultaneously implanted in their ipsilateral or contralateral biceps brachii although no data from the biceps is presented in this study."

      A precise description of the localisation of the array (muscles and the number of arrays per muscle) for each animal would be appreciated.

      (8) "A total of 33 units were identified and manually verified across all animals." A precise description of the number of motor units concurrently identified per muscle and per animal would be appreciated. Moreover, please add details on the manual inspection. Does it involve the manual selection of missing spikes? What are the criteria for considering an identified motor unit as valid?

      As discussed earlier, we added Table 1 to the main text to provide the details mentioned in the above comments.

      Regarding spike sorting, given the very large number of spikes recorded, we did not rely on manual adjusting mislabeled spikes. Instead, as described in the revised Methods section, we verified unit isolation by ensuring units had >98% of spikes outside of 1ms of each other. Moreover, as described above we have added new analyses (Figure 1–figure supplement 1) confirming the stability of motor unit waveforms across both the duration of individual recording sessions (roughly 30 minutes) and across the rapid changes in limb position within individual stride cycles (roughly 250 msec).

      Reviewer #3 (Recommendations for the authors):

      Figure 2 (and supplement) show spike count distributions with strong positive skewness, which is in accordance with the prediction of a fluctuation-driven regime. I suggest plotting these on a logarithmic x-axis (in addition to the linear axis), which should reveal a bell-shaped distribution, maybe even Gaussian, in a majority of the units.

      We thank the Reviewer for the suggestion. We present the requested analysis (Author response image 3), which shows bell-shaped distributions for some (but not all) distributions. However, we believe that investigating why some replotted distributions are Gaussian and others are not falls beyond the scope of this paper, and likely requires a larger dataset than the one we were able to obtain.

      Author response image 3.

      Spike count distributions for each motor unit on a logarithmic x-axis.

      Why not more data? I tried to get an overview of how much data was collected.

      Supplemental Figure 1 has all the isolated units, which amounts to 38 (are the colors the two muscle types?). Given there are 16 leads in each myomatrix, in two muscles, of six mice, this seems like a low yield. Could the authors comment on the reasons for this low yield?

      Regarding motor unit yield, even with multiple electrodes per muscle and a robust sorting algorithm, we often isolated only a few units per muscle. This yield likely reflects two factors. First, because of the highly dynamic nature of locomotion and high levels of muscle contraction, isolating individual spikes reliably across different locomotor speeds is inherently challenging, regardless of the algorithm being employed. Second, because the results of spike-train analyses can be highly sensitive to sorting errors, we have only included the motor units that we can sort with the highest possible confidence across thousands of strides.

      Minor:

      Figure captions especially Figure 6: The text is excessively long. Can the text be shortened?

      We thank the Reviewer for this comment. Generally, we seek to include a description of the methods and results within the figure captions, but we concede that we can condense the information in some cases. In a number of cases, we have moved some of the descriptive text from the caption to the Methods section.

      References

      Berg, R. W. (2017). Neuronal Population Activity in Spinal Motor Circuits: Greater Than the Sum of Its Parts. Frontiers in Neural Circuits, 11. https://doi.org/10.3389/fncir.2017.00103

      Biewener, A. A., Blickhan, R., Perry, A. K., Heglund, N. C., & Taylor, C. R. (1988). Muscle Forces During Locomotion in Kangaroo Rats: Force Platform and Tendon Buckle Measurements Compared. Journal of Experimental Biology, 137(1), 191–205. https://doi.org/10.1242/jeb.137.1.191

      Chung, B., Zia, M., Thomas, K. A., Michaels, J. A., Jacob, A., Pack, A., Williams, M. J., Nagapudi, K., Teng, L. H., Arrambide, E., Ouellette, L., Oey, N., Gibbs, R., Anschutz, P., Lu, J., Wu, Y., Kashefi, M., Oya, T., Kersten, R., … Sober, S. J. (2023). Myomatrix arrays for high-definition muscle recording. eLife, 12, RP88551. https://doi.org/10.7554/eLife.88551

      De Luca, C. J. (1985). Control properties of motor units. Journal of Experimental Biology, 115(1), 125–136. https://doi.org/10.1242/jeb.115.1.125

      De Luca, C. J., & Erim, Z. (1994). Common drive of motor units in regulation of muscle force. Trends in Neurosciences, 17(7), 299–305. https://doi.org/10.1016/0166-2236(94)90064-7

      Farina, D., Negro, F., & Dideriksen, J. L. (2014). The effective neural drive to muscles is the common synaptic input to motor neurons. The Journal of Physiology, 592(16), 3427–3441. https://doi.org/10.1113/jphysiol.2014.273581

      Hartigan, P. M. (1985). Algorithm AS 217: Computation of the Dip Statistic to Test for Unimodality. Applied Statistics, 34(3), 320. https://doi.org/10.2307/2347485

      Henneman, E., Somjen, G., & Carpenter, D. O. (1965). FUNCTIONAL SIGNIFICANCE OF CELL SIZE IN SPINAL MOTONEURONS. Journal of Neurophysiology, 28(3), 560–580. https://doi.org/10.1152/jn.1965.28.3.560

      Karabulut, D., Dogru, S. C., Lin, Y.-C., Pandy, M. G., Herzog, W., & Arslan, Y. Z. (2020). Direct Validation of Model-Predicted Muscle Forces in the Cat Hindlimb During Locomotion. Journal of Biomechanical Engineering, 142(5), 051014. https://doi.org/10.1115/1.4045660

      Kim, J. J., Wyche, I. S., Olson, W., Lu, J., Bakir, M. S., Sober, S. J., & O’Connor, D. H. (2024). Myo-optogenetics: Optogenetic stimulation and electrical recording in skeletal muscles. https://doi.org/10.1101/2024.06.21.600113

      Lu, J., Zia, M., Baig, D. A., Yan, G., Kim, J. J., Nagapudi, K., Anschutz, P., Oh, S., O’Connor, D., Sober, S. J., & Bakir, M. S. (2024). Opto-Myomatrix: μLED integrated microelectrode arrays for optogenetic activation and electrical recording in muscle tissue. https://doi.org/10.1101/2024.07.01.601601

      Manuel, M., & Heckman, C. J. (2011). Adult mouse motor units develop almost all of their force in the subprimary range: A new all-or-none strategy for force recruitment? Journal of Neuroscience, 31(42), 15188–15194. https://doi.org/10.1523/JNEUROSCI.2893-11.2011

      Marshall, N. J., Glaser, J. I., Trautmann, E. M., Amematsro, E. A., Perkins, S. M., Shadlen, M. N., Abbott, L. F., Cunningham, J. P., & Churchland, M. M. (2022). Flexible neural control of motor units. Nature Neuroscience, 25(11), 1492–1504. https://doi.org/10.1038/s41593-022-01165-8

      Martínez-Silva, M. de L., Imhoff-Manuel, R. D., Sharma, A., Heckman, C. J., Shneider, N. A., Roselli, F., Zytnicki, D., & Manuel, M. (2018). Hypoexcitability precedes denervation in the large fast-contracting motor units in two unrelated mouse models of ALS. eLife, 7(2007), 1–26. https://doi.org/10.7554/eLife.30955

      Miles, G. B., & Sillar, K. T. (2011). Neuromodulation of Vertebrate Locomotor Control Networks. Physiology, 26(6), 393–411. https://doi.org/10.1152/physiol.00013.2011

      Petersen, P. C., & Berg, R. W. (2016). Lognormal firing rate distribution reveals prominent fluctuation–driven regime in spinal motor networks. eLife, 5. https://doi.org/10.7554/elife.18805

      Srivastava, K. H., Elemans, C. P. H., & Sober, S. J. (2015). Multifunctional and Context-Dependent Control of Vocal Acoustics by Individual Muscles. The Journal of Neuroscience, 35(42), 14183–14194. https://doi.org/10.1523/JNEUROSCI.3610-14.2015

      Srivastava, K. H., Holmes, C. M., Vellema, M., Pack, A. R., Elemans, C. P. H., Nemenman, I., & Sober, S. J. (2017). Motor control by precisely timed spike patterns. Proceedings of the National Academy of Sciences of the United States of America, 114(5), 1171–1176. https://doi.org/10.1073/pnas.1611734114

    1. Author response:

      The following is the authors’ response to the current reviews.

      We are pleased that Reviewer 3 appreciated our findings and found the temporal lag between the expression of TFF1 and TFF3 during signaling particularly interesting. The reviewer also advised us not to overemphasize that this lag arises from phase separation of ERα at the TFF1 locus, as the use of 1,6-hexanediol alone is not sufficient to conclusively establish whether ERα condensates undergo liquid–liquid phase separation. We agree with this assessment and have revised the manuscript accordingly. Specifically, we have modified the title to remove reference to phase separation and have updated the text throughout the manuscript to avoid claiming that the observed condensates are a result of phase separation. The revised title is: “Ligand-dependent Enhancer Activation Indirectly Modulates Non-target Promoters in a Chromatin Domain.”

      With these changes, we are proceeding with the Version of Record using revised version of the manuscript.

      ———

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But I think they can be fixed with more efforts.

      We thank the reviewer for their positive comments on the paper. We have addressed all their specific recommendations below.  

      The deletion of enhancer reveals the absolute reliance of TFF1 on its enhancers for its expression. Authors should elaborate more on this as this is an important finding.

      We thank the reviewer for the comment. We have now added a more detailed discussion on the requirement of enhancer for TFF1 expression in the revised manuscript (line 368-385).  

      In Fig. 1, TFF3 expression is shown to be induced upon E2 signaling through qRT-PCR, while smFISH does not display a similar pattern. The authors attribute this discrepancy to the overall low expression of TFF3. In my opinion, this argument could be further supported by relevant literature, if available. Additionally, does GRO-seq data reveal any changes in TFF3 expression following estrogen stimulation? The GRO-seq track shown in Fig.1 should be adjusted to TFF3 expression to appreciate its expression changes.

      We have now included a browser shot image of TFF3 region showing GRO-Seq signal at E2 time course (Fig. S1C). We observed an increased transcription towards the 3’ end of TFF3 gene body at 3h.  The increased transcription at 3h, corroborates with smFISH data. The relative changes of TFF3 expression measured by qRT-PCR and smFISH for intronic transcripts are somewhat different, we speculate that such biased measurements that are dependent on PCR amplifications could be more for genes that express at low levels and smFISH using intronic probes may be a more sensitive assay to detect such changes.    

      Since the mutually exclusive relationship between TFF1 and TFF3 is based on snap shots in fixed cells, can authors comment on whether the same cell that expresses TFF1 at 1h, expresses TFF3 at 3h? Perhaps, the calculations taking total number of cells that express these genes at 1 and 3h would be useful.

      Like pointed out by the reviewer, since these are fixed cells, we cannot comment on the fate of the same cell at two time points. To further address this limitation, future work could employ cells with endogenous tags for TFF1 and TFF3 and utilize live cell imaging techniques. In a fixed cell assay, as the reviewer suggests, it can be investigated whether a similar fraction shows high TFF3 expression at 3h, as the fraction that shows high TFF1 expression at 1 h. To quantify the fractions as suggested by the reviewer, we plotted the fraction of cells showing high TFF1 and TFF3 expression at 1h and 3h. We identify truly high expressing cells by taking mean and one standard deviation (for single cell level data) at E2-1hr as the threshold for TFF1 (80 and above transcript counts) and mean and one standard deviation (for single cell level data) at E2-3hr as the threshold for TFF3 (36 and above transcript counts). The fraction with high TFF1 expression at 1h  (12.06 ± 2.1) is indeed comparable to that with high TFF3 expression at 3h (12.50 ± 2.0) (Fig. 2C and Author response image 1). We should note that if the transcript counts were normally distributed, a predetermined fraction would be expected to be above these thresholds and comparable fractions can arise just from underlying statistics. But in our experiments, this is unlikely to be the case given the many outliers that affect both the mean and the standard deviation, and the lack of normality and high dispersion in single cell distributions. Of course, despite the fractions being comparable, we cannot be certain if it is the same set of cells that go from high expression of TFF1 to high expression of TFF3, but definitely that is a possibility. We thank the reviewer for pointing out this comparison.

      Author response image 1.

      The graph represents the percent of cells that show high expression for TFF1 and TFF3 at 1h and 3h post E2 signaling. The threshold was collected by pooling in absolute RNA counts from 650 analyzed cells (as in Fig. 2C). The mean and standard deviation over single cell data were calculated. Mean plus one standard deviation was used to set the threshold for identifying high expressing cells. For TFF1, as it maximally expresses at 1h the threshold used was 80. For TFF3, as it maximally expresses at 3h the threshold used was 36. Fraction of cells expressing above 80 and 36 for TFF1 and TFF3 respectively were calculated from three different repeats. Mean of means and standard deviations from the three experiments are plotted here.

      Authors conclude that TFF3 is not directly regulated by enhancer or estrogen receptor. Does ERa bind on TFF3 promoter? 

      The ERa ChIP-seq performed at 1h and 3h of signaling suggests that TFF3 promoter is not bound by ERa as shown in supplementary Fig. 1B and S1B. However, one peak upstream to TFF1 promoter is visible and that is lost at 3h. 

      Minor comments:

      Reviewer’s comment -The figures would benefit from resizing of panels. There is very little space between the panels.

      We have now resized the figures in the revised manuscript.

      The discussion section could include an extrapolation on the relationship between ERα concentration and transcriptional regulation. Given that ERα levels have been shown to play a critical role in breast cancer, exploring how varying concentrations of ERα affect gene expression, including the differential regulation of target and non-target genes, would provide valuable insights into the broader implications of this study.

      This is a very important point that was missing from the manuscript. We have included this in the discussion in the revised manuscript (line 426-430).

      Reviewer #2:

      Summary:

      In this manuscript by Bohra et al., the authors use the well-established estrogen response in MCF7 cells to interrogate the role of genome architecture, enhancers, and estrogen receptor concentration in transcriptional regulation. They propose there is competition between the genes TFF1 and TFF3 which is mediated by transcriptional condensates. This reviewer does not find these claims persuasive as presented. Moreover, the results are not placed in the context of current knowledge.

      Strengths:

      High level of ERalpha expression seems to diminish the transcriptional response. Thus, the results in Fig. 4 have potential insight into ER-mediated transcription. Yet, this observation is not pursued in great depth however, for example with mutagenesis of ERalpha. However, this phenomenon - which falls under the general description of non monotonic dose response - is treated at great depth in the literature (i.e. PMID: 22419778). For example, the result the authors describe in Fig. 4 has been reported and in fact mathematically modeled in PMID 23134774. One possible avenue for improving this paper would be to dig into this result at the single-cell level using deletion mutants of ERalpha or by perturbing co-activators.

      We thank the reviewer for pointing us to the relevant literature on our observation which will enhance the manuscript. We have discussed these findings in relations to ours in the discussion section (Line 400-413). We thank the reviewer for insight on non-monotonic behavior.

      Weaknesses:

      There are concerns with the sm-RNA FISH experiments. It is highly unusual to see so much intronic signal away from the site of transcription (Fig. 2) (PMID: 27932455, 30554876), which suggests to me the authors are carrying out incorrect thresholding or have a substantial amount of labelling background. The Cote paper cited in the manuscript is likewise inconsistent with their findings and is cited in a misleading manner: they see splicing within a very small region away from the site of transcription. 

      We thank the reviewer for this comment, and apologize if they feel we misrepresented the argument from Cote et al. This has now been rectified in the manuscript. However, we do not agree that the intronic signals away from the site of transcription are an artefact. First, the images presented here are just representative 2D projections of 3D Z-stacks; whereas the full 3D stack is used for spot counting using a widely-used algorithm that reports spot counts that are constant over wide range of thresholds (Raj et al., 2008). The veracity of automated counts was first verified initially by comparison to manual counts. Even for the 2D representations the extragenic intronic signals show up at similar thresholds to the transcription sites. 

      The signal is not non-specific arising from background labeling, explained by following reasons:

      • To further support the time-course smFISH data and its interpretation without depending on the dispersed intronic signal, we have analyzed the number of alleles firing/site of transcription at a given time in a cell under the three conditions. We counted the sites of transcription in a given cell and calculated the percentage of cells showing 1,2,3,4 or >4 sites. We see that the percent of cells showing a single site of transcription for TFF1 is very high in uninduced cells and this decreases at 1h. At 1h, the cells showing 2, 3 and 4 sites of transcription increase which again goes down at 3h (Author response image 2A). This agrees with the interpretation made from mean intronic counts away from the site of transcription. Similarly, for TFF3, the number of cells showing 2,3 and 4 sites of transcription increase slightly at 3hr compared to uninduced and 1hr (Author response image 2B).  We can also see that several cells have no alleles firing at a given time as has been quantified in the graphs on right showing total fraction of cells with zero versus non-zero alleles firing (Author response image 2A-B). A non-specific signal would be present in all cells.

      • There is literature on post-transcriptional splicing of RNA beyond our work, which suggests that intronic signal can be found at relatively large distances away from the site of transcription. Waks et al. showed that some fraction of unspliced RNA could be observed up to 6-10 microns away from the site of transcription suggesting that there can be a delay between transcription and (alternative) splicing (Waks et al., 2011). Pannuclear disperse intronic signals can arise as there can be more than one allele firing at a time in different nuclear locations. The spread of intronic transcripts in our images is also limited in cells in which only 1 allele is firing at E2-1 hour (Author response image 2C) or uninduced cells (Author response image 2D). Furthermore, Cote et al. discuss that “Of note, we see that increased transcription level correlates with intron dispersal, suggesting that the percentage of splicing occurring away from the transcription site is regulated by transcription level for at least some introns. This may explain why we observe posttranscriptional splicing of all genes we measured, as all were highly expressed.” This is in line with our interpretation that intron signal dispersal can occur in case of posttranscriptional splicing (Coté et al., 2023). Additionally, other studies have suggested that transcripts in cells do not necessarily undergo co-transcriptional splicing which leads us to conclude that intronic signal can be found farther away from the site of transcription. Coulon et al. showed that splicing can occur after transcript release from the site and suggested that no strict checkpoint exists to ensure intron removal before release which results in splicing and release being kinetically uncoupled from each other (Coulon et al., 2014). Similarly, using live-cell imaging, it was shown that splicing is not always coupled with transcription, and this could depend on the nature and structural features of transcript (such as blockage of polypyrimidine tract which results in delayed recognition) (Vargas et al., 2011). Drexler  et al. showed that as opposed to drosophila transcripts that are shorter, in mammalian cells, splicing of the terminal intron can occur post-transcriptionally (Drexler et al., 2020). Using RNA polymerase II ChIP-Seq time course data from ERα activation in the MCF-7 cells, Honkela et al. showed that large number of genes can show significant delays between the completion of transcription and mRNA production (Honkela et al., 2015). This was attributed to faster transcription of shorter genes which results in splicing  delays suggesting rapid completion of transcription on shorter genes can lead to splicing-associated delays (Honkela et al., 2015). More recently, comparisons of nascent and mature RNA levels suggested a time lapse between transcription and splicing for the genes that are early responders during signaling (Zambrano et al., 2020). The presence of significant numbers of TFF1 nascent RNA in the nucleus in our data corroborates with above observations. 

      • Uniform intensities across many transcripts suggests these are true signal arising from RNA molecules which would not be the case for non-specific, background signal (Author response image 2E).

      • Splicing occurs in the nucleus and intron containing pre-transcripts should be nuclear localized. Thus, intronic signals should remain localized to the nucleus unlike the mature mRNA which translocate to the cytoplasm after processing and thus exonic signals can be found both in the nucleus and the cytoplasm. In keeping with this, we observe no signal in the cytoplasm for the intronic probes and it remains localized within the nucleus as expected and can be seen in Author response image 2F, while exonic signals are observed in both compartments. This suggests to us that the signal is coming from true pre-transcripts. There is no reason for non-specific background labelling to remain restricted to the nucleus.

      • We observe that the mean intronic label counts for both the genes TFF1 and TFF3 increases upon E2-induction compared to uninduced condition (Fig. 2B). Similarly, the mean intronic count for both genes reduce drastically in the TFF1-enhancer deleted cells (Fig. 3C, D). This change in the number of intronic signal specifically on induction and enhancer deletion suggests that the signal is not an artefact and arises from true nascent transcripts that are sensitive to stimulus or enhancer deletion.

      • We expect colocalization of intronic signal with exonic signals in the nucleus, while there can be exonic signals that do not colocalize with intronic, representing more mature mRNA. Indeed, we observe a clear colocalization between the intronic and exonic signals in the nucleus, while exonic signals can occur independent of intronic both in the nucleus and the cytoplasm. This clearly demonstrates that the intronic signals in our experiments are specific and not simply background labelling (Author response image 2G).

      These studies and the arguments above lead us to conclude that the presence of intronic transcripts in the nucleus, away from the site of transcription is not an artefact. We hope the reviewer will agree with us. These analyses have now been included in the manuscript as Supplementary Figure 6 and have been added in the manuscript at line numbers 106-111, 201204,  215-217 and line 231-235. We thank the reviewer for raising this important point.

      Author response image 2.

      Dynamic induction and RNA localization of TFF1 and TFF3 transcription across cell populations using smRNA FISH A. Bar graph depicting the percentage of cells with 1,2,3,4, or greater than 4 sites of transcription for TFF1 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph on right shows the number of cells with zero or non-zero number of alleles firing. B. Bar graph depicting the percentage of cells with 1,2,3,4 or greater than 4 sites of transcription for TFF3 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph in the middle shows the number of cells with 2,3,4 or greater than 4 sites of transcription for TFF3.The graph on the right shows the number of cells with zero or non-zero number of alleles firing. C. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in cells induced for 1 hour with E2. The image shows that when a single allele of TFF1 is firing, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. D. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in uninduced cells. The image shows that when a single allele of TFF1 is firing and transcription is low, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. E. Line profile through several transcripts in the nucleus show uniform and similar intensities indicating that these are true signals. F. 60X Representative images from a single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1 (top) and InTFF3 and ExTFF3 (bottom). The image shows that there is no intronic signal in the cytoplasm, while exonic signals can be found both in the nucleus and the cytoplasm. The scale bar is 5 microns. G. 60X Representative images from single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1. The image shows that all intronic signals are colocalized with exonic signals, but all exonic signals are expectedly not colocalized with intronic signals, representing more mature mRNA. The scale bar is 5 microns.

      One substantial way to improve the manuscript is to take a careful look at previous single cell analysis of the estrogen response, which in some cases has been done on the exact same genes (PMID: 29476006, 35081348, 30554876, 31930333). In some of these cases, the authors reach different conclusions than those presented in the present manuscript. Likewise, there have been more than a few studies that have characterized these enhancers (the first one I know of is: PMID 18728018). Also, Oh et al. 2021 (cited in the manuscript) did show an interaction between TFF1e and TFF3, which seems to contradict the conclusion from Fig. 3. In summary, the results of this paper are not in dialogue with the field, which is a major shortcoming. 

      We thank the reviewer for pointing out these important studies. The studies from Prof. Larson group are particularly very insightful (Rodriguez et al., 2019). We have now included this in the discussion (line 106-111 and line 420-424) where we suggest the differences and similarities between our, Larson’s group and also Mancini’s group (Patange et al., 2022; Stossi et al., 2020). 

      The 4C-Seq data from the manuscript Oh et al. 2021 is exactly consistent with our observation from Fig 3 as they also observed little to no interaction between TFF1e and TFF3p in WT cells, only upon TFF1p deletion, did the TFF1e become engaged with the TFF3p. In agreement with this, we also observe little to no interaction between TFF1e and TFF3p in WT cells (Fig.3A). This is also consistent with our competition model for resources between these two genes. Oh et al. shows interaction between TFF1e and TFF3 when the TFF1 promoter is deleted showing that when the primary promoter is not available the enhancer is retargeted to the next available gene (Oh et al., 2021). It does not show that in WT or at any time point of E2 signalling does TFF1e and TFF3 interact.

      In the opinion of this reviewer, there are few - if any - experiments to interrogate the existence of LLPS for diffraction-limited spots such as those associated with transcription. This difficulty is a general problem with the field and not specific to the present manuscript. For example, transient binding will also appear as a dynamic 'spot' in the nucleus, independently of any higher-order interactions. As for Fig. 5, I don't think treating cells with 1,6 hexanediol is any longer considered a credible experiment. For example, there are profound effects on chromatin independent of changes in LLPS (PMID: 33536240).  

      We are cognizant of and appreciate the limitations pointed out by the reviewer. We and others have previously shown that ERa forms condensates on TFF1 chromatin region using ImmunoFISH assay (Saravanan et al., 2020).  The data below shows the relative mean ERα intensity on TFF1 FISH spots and random regions clearly showing an appearance of the condensate at the TFF1 site. Further, the deletion of TFF1e causes the reduction in size of this condensate. Thus, we expect that these ERα condensates are characterized by higher-order interactions and become disrupted on treatment with 1,6-hexanediol. These condensates are the size of below micron as mentioned by the reviewer, but most TF condensates are of the similar sizes. We agree with the reviewer that 1,6- hexanediol treatment is a brute-force experiment with several irreversible changes to the chromatin. Although we have tried to use it at a low concentration for a short period of time and it has been used in several papers (Chen et al., 2023; Gamliel et al., 2022). The opposite pattern of TFF1 vs. TFF3 expression upon 1,6- hexanediol treatment suggests that there is specificity. Further, to perturb condensates, mutants of ERa can be used (N-terminus IDR truncations) however, the transcriptional response of these mutants is also altered due to perturbed recruitment of coactivators that recognize Nterminus of ER, restricting the distinction between ERa functions and condensate formation.

      References:

      Chen, L., Zhang, Z., Han, Q., Maity, B. K., Rodrigues, L., Zboril, E., Adhikari, R., Ko, S.-H., Li, X., Yoshida, S. R., Xue, P., Smith, E., Xu, K., Wang, Q., Huang, T. H.-M., Chong, S., & Liu, Z. (2023). Hormone-induced enhancer assembly requires an optimal level of hormone receptor multivalent interactions. Molecular Cell, 83(19), 3438-3456.e12. https://doi.org/10.1016/j.molcel.2023.08.027

      Coté, A., O’Farrell, A., Dardani, I., Dunagin, M., Coté, C., Wan, Y., Bayatpour, S., Drexler, H. L., Alexander, K. A., Chen, F., Wassie, A. T., Patel, R., Pham, K., Boyden, E. S., Berger, S., Phillips-Cremins, J., Churchman, L. S., & Raj, A. (2023). Post-transcriptional splicing can occur in a slow-moving zone around the gene. eLife, 12. https://doi.org/10.7554/eLife.91357.2

      Coulon, A., Ferguson, M. L., de Turris, V., Palangat, M., Chow, C. C., & Larson, D. R. (2014). Kinetic competition during the transcription cycle results in stochastic RNA processing. eLife, 3, e03939. https://doi.org/10.7554/eLife.03939

      Drexler, H. L., Choquet, K., & Churchman, L. S. (2020). Splicing Kinetics and Coordination Revealed by Direct Nascent RNA Sequencing through Nanopores. Molecular Cell, 77(5), 985-998.e8. https://doi.org/10.1016/j.molcel.2019.11.017

      Gamliel, A., Meluzzi, D., Oh, S., Jiang, N., Destici, E., Rosenfeld, M. G., & Nair, S. J. (2022). Long-distance association of topological boundaries through nuclear condensates. Proceedings of the National Academy of Sciences of the United States of America, 119(32), e2206216119. https://doi.org/10.1073/pnas.2206216119

      Honkela, A., Peltonen, J., Topa, H., Charapitsa, I., Matarese, F., Grote, K., Stunnenberg, H. G., Reid, G., Lawrence, N. D., & Rattray, M. (2015). Genome-wide modeling of transcription kinetics reveals patterns of RNA production delays. Proceedings of the National Academy of Sciences of the United States of America, 112(42), 13115. https://doi.org/10.1073/pnas.1420404112

      Oh, S., Shao, J., Mitra, J., Xiong, F., D’Antonio, M., Wang, R., Garcia-Bassets, I., Ma, Q., Zhu, X., Lee, J.-H., Nair, S. J., Yang, F., Ohgi, K., Frazer, K. A., Zhang, Z. D., Li, W., & Rosenfeld, M. G. (2021). Enhancer release and retargeting activates disease-susceptibility genes. Nature, 595(7869), Article 7869. https://doi.org/10.1038/s41586-021-03577-1

      Patange, S., Ball, D. A., Wan, Y., Karpova, T. S., Girvan, M., Levens, D., & Larson, D. R. (2022). MYC amplifies gene expression through global changes in transcription factor dynamics. Cell Reports, 38(4). https://doi.org/10.1016/j.celrep.2021.110292

      Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A., & Tyagi, S. (2008). Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods, 5(10), Article 10. https://doi.org/10.1038/nmeth.1253

      Rodriguez, J., Ren, G., Day, C. R., Zhao, K., Chow, C. C., & Larson, D. R. (2019). Intrinsic Dynamics of a Human Gene Reveal the Basis of Expression Heterogeneity. Cell, 176(1–2), 213-226.e18. https://doi.org/10.1016/j.cell.2018.11.026

      Saravanan, B., Soota, D., Islam, Z., Majumdar, S., Mann, R., Meel, S., Farooq, U., Walavalkar, K., Gayen, S., Singh, A. K., Hannenhalli, S., & Notani, D. (2020). Ligand dependent gene regulation by transient ERα clustered enhancers. PLOS Genetics, 16(1), e1008516. https://doi.org/10.1371/journal.pgen.1008516

      Stossi, F., Dandekar, R. D., Mancini, M. G., Gu, G., Fuqua, S. A. W., Nardone, A., De Angelis, C., Fu, X., Schiff, R., Bedford, M. T., Xu, W., Johansson, H. E., Stephan, C. C., & Mancini, M. A. (2020). Estrogeninduced transcription at individual alleles is independent of receptor level and active conformation but can be modulated by coactivators activity. Nucleic Acids Research, 48(4), 1800. https://doi.org/10.1093/nar/gkz1172

      Vargas, D. Y., Shah, K., Batish, M., Levandoski, M., Sinha, S., Marras, S. A. E., Schedl, P., & Tyagi, S. (2011). Single-Molecule Imaging of Transcriptionally Coupled and Uncoupled Splicing. Cell, 147(5), 1054–1065. https://doi.org/10.1016/j.cell.2011.10.024

      Waks, Z., Klein, A. M., & Silver, P. A. (2011). Cell-to-cell variability of alternative RNA splicing. Molecular Systems Biology, 7(1), 506. https://doi.org/10.1038/msb.2011.32

      Zambrano, S., Loffreda, A., Carelli, E., Stefanelli, G., Colombo, F., Bertrand, E., Tacchetti, C., Agresti, A., Bianchi, M. E., Molina, N., & Mazza, D. (2020). First Responders Shape a Prompt and Sharp NF-κB-Mediated Transcriptional Response to TNF-α. iScience, 23(9), 101529. https://doi.org/10.1016/j.isci.2020.101529

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides a valuable characterization of individual sarcomere's contractility and synchrony in spontaneously beating cardiomyocytes as a function of substrate stiffness. The authors, however, provide an incomplete explanation for the observed heterogeneous and stochastic dynamics, so that the work remains mainly descriptive. The work will be of interest to scientists working on muscle biophysics, nonlinear dynamics, and synchronization phenomena in biological systems.

      We appreciate the reviewer’s insightful comments. A detailed explanation of the described phenomena in the form of a theoretical model and simulations was not included in our manuscript, because we believed it would be most impactful to present a detailed quantitative statistical description of the experiments in one manuscript and then introduce the model, which we already had in preparation, in a separate manuscript to avoid diluting the overall message.

      However, following the reviewers’ advice, we have now included a comprehensive model into the revised manuscript. This model qualitatively and quantitatively explains the experimentally observed phenomena and introduces a novel class of coupled relaxation oscillators based on a non-monotonic force-velocity relationship of individual sarcomeres. We believe that this addition significantly strengthens the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors experimentally demonstrated the heterogeneous behavior of sarcomeres in cardiomyocytes and that a stochastic component exists in their contractile activity, which cancels out at the level of myofibrils.

      Strengths:

      The experiments and data analysis are robust and valid. With very good statistics and unbiased methods, they show cellular activity at the individual level and highlight the heterogeneity between biological networks. The similarity of the results to the study cited in [24] demonstrates the validity of the in vitro setup for answering these questions and the feasibility of such in-vitro systems to extend our knowledge of physiology.

      Weaknesses:

      Compared to the current literature ([24]), the study does not show a high degree of innovation. It mainly confirms what has been established in the past. The authors complemented the published experiments by developing an in vitro setup with stem cells and by changing the stiffness of the substrate to simulate pathological conditions. However, the experiments they performed do not allow them to explain more than the study in [24], and the conclusions of their study are based on interpretation and speculation about the possible mechanism underlying the observations.

      We thank the reviewer for contextualizing our work with the literature. We appreciate the comparison to the study by Kobirumaki-Shimozawa et al. which we cite prominently. They observed stochastically varying beating patterns of individual sarcomeres on a beat-to-beat basis. They propose that this arises from a "titin-based mechanism" operating stochastically, which they interpret as being fundamentally linked to sarcomere-length-dependent effects. This interpretation differs from our model. We feel that the inclusion of our comprehensive model in the revised manuscript will emphasize the significance and novelty of our findings. Our work proposes a distinct alternative mechanistic explanation for the observed stochasticity, grounded in the force-velocity relationship and intrinsic stochasticity, and presents additional novel dynamic phenomena (such as popping and high-frequency oscillations) not reported in the literature yet. We outline the key advancements of our study below:

      (1) Physiologically Relevant Human Model System: Our study utilizes human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs). Using a human cell model provides direct relevance for understanding human cardiac physiology and pathophysiology, overcoming limitations inherent in translating findings from rodent models. The hiPSC-CMs exhibit key physiological differences from the mouse ventricular myocytes observed in [24], most notably beating at a significantly lower frequency (~1 Hz or 60 bpm) compared to mice (~5-8 Hz or 300-500 bpm). This difference in timescale is critical as it allowed us to resolve complex intra-beat dynamics that may be different and also harder to observe in mouse cardiomyocytes.

      (2) Advanced Experimental Methodology and Resolution: We developed a novel assay incorporating our SarcAsM algorithm for high-throughput tracking and analysis of individual sarcomere dynamics. This approach gave us spatial resolution better than 20 nm at significantly higher sampling rates than previous studies, including Kobirumaki-Shimozawa et al. Furthermore, our high-throughput in vitro approach made it possible to analyze vastly larger datasets than, e.g., the study by Kobirumaki-Shimozawa et al. (which reports observations from fewer than 20 myofibrils, encompassing less than 200 sarcomeres in total). While we recognize that in-vivo tissue studies present unique experimental challenges, the substantially greater statistical power of our study is crucial for reliably characterizing the complex, stochastic dynamics we report. The enhanced resolution and statistical robustness are not merely incremental; they enable the detailed identification and analysis of heterogeneous behaviors that were previously inaccessible or could not be characterized with the same level of confidence.

      (3) Novel Observed Phenomena: Our high-resolution data reveals specific dynamic behaviors, such as sarcomere "popping" and high-frequency oscillations during contraction, which, to our knowledge, have not been previously reported or characterized in cardiomyocytes. The resolution limitations and the high beating frequency in mouse models may not have permitted the observation of these subtle, but potentially important phenomena.

      (4) Distinct Mechanistic Explanation and Model: Kobirumaki-Shimozawa et al. propose a qualitative model where sarcomere motion variability primarily arises from length-dependent activation. This view is essentially a static one, based on a long history of isometric skeletal muscle experiments, where time-dependent forces are not relevant. We argue that in highly dynamic cardiomyocytes this may not be the most useful approach. While we acknowledge length dependence can play a role, our integrated experimental-theoretical work proposes a different primary mechanism. Our model demonstrates that the observed stochastic heterogeneity and beat-to-beat variations, including the oscillatory motion and popping, can be quantitatively explained by dynamic instabilities arising from a non-monotonic force-velocity relationship of individual sarcomeres in conjunction with intrinsic sarcomere-level stochastic fluctuations. The model emphasizes the active, transient nature of force generation rather than solely assuming length dependence. Our model provides an alternative explanation for the observed dynamics, and a quantitative, mechanism-based understanding.

      Reviewer #2 (Public Review):

      Summary:

      Sarcomeres, the contractile units of skeletal and cardiac muscle, contract in a concerted fashion to power myofibril and thus muscle fiber contraction.

      Muscle fiber contraction depends on the stiffness of the elastic substrate of the cell, yet it is not known how this dependence emerges from the collective dynamics of sarcomeres. Here, the authors analyze the contraction time series of individual sarcomeres using live imaging of fluorescently labeled cardiomyocytes cultured on elastic substrates of different stiffness. They find that reduced collective contractility of muscle fibers on unphysiologically stiff substrates is partially explained by a lack of synchronization in the contraction of individual sarcomeres.

      This lack of synchronization is at least partially stochastic, consistent with the notion of a tug-of-war between sarcomeres on stiff sarcomeres. A particular irregularity of sarcomere contraction cycles is 'popping', the extension of sarcomeres beyond their rest length. The statistics of 'popping' suggest that this is a purely random process.

      Strengths:

      This study thus marks an important shift of perspective from whole-cell analysis towards an understanding of the collective dynamics of coupled, stochastic sarcomeres.

      Weaknesses:

      Further insight into mechanisms could be provided by additional analyses and/or comparisons to mathematical models.

      We thank the reviewer for the feedback. We have enhanced the manuscript by a comprehensive dynamic model, that we also contrast with previously proposed models.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript of Haertter and coworkers studied the variation of length of a single sarcomere and the response of microfibrils made by sarcomeres of cardiomyocytes on soft gel substrates of varying stiffnesses.

      The measurements at the level of a single sarcomere are an important new result of this manuscript. They are done by combining the labeling of the sarcomeres z line using genetic manipulation and a sophisticated tracking program using machine learning. This single sarcomere analysis shows strong heterogeneities of the sarcomeres that can show fast oscillations not synchronized with the average behavior of the cell and what the authors call popping events which are large amplitude oscillations. Another important result is the fact that cardiomyocyte contractility decreases with the substrate stiffness although the properties of single sarcomeres do not seem to depend on substrate stiffness.

      The authors suggest that the cardiomyocyte cell behavior is dominated by sarcomere heterogeneity. They show that the heterogeneity between sarcomeres is stochastic and that the contribution of static heterogeneity (such as composition differences between sarcomeres) is small.

      Strengths:

      All the results are to my knowledge new and original and deserve attention.

      Weaknesses:

      However, I find the manuscript a bit frustrating because the authors only give very qualitative explanations of the phenomena that they observe. They mention that popping could be explained by a nonlinear force-velocity relation of the sarcomere leading to a rapid detachment of all motors. However, they do not explicitly provide a theoretical description. How would the popping depend on the parameters and in particular on the substrate stiffness? Would the popping statistics be affected by the stiffness? It is also not clear to me how the dependence on the soft gel stiffness of the cardiomyocyte cell can be explained by the stochasticity of the sarcomere properties. Can any of the results found by the authors be explained by existing theories of cardiomyocytes? The only one I know is that of Safran and coworkers.

      I also found the paper very difficult to read. The authors should perhaps reorganize the structure of the presentation in order to highlight what the new and important results are.

      We are grateful for this detailed and critical feedback. The observed phenomena (stochastic heterogeneity, popping, high-frequency oscillatory motion) can indeed be explained by a nonmonotonic force-velocity relation along with stochastic fluctuations of individual sarcomeres. At the time of initial submission of this manuscript, we already had a theoretical model in preparation, which both qualitatively and quantitatively explains the observed phenomena. As a result, we included certain interpretations preemptively, which caused some lack of clarity in the absence of the full model. We have now added the model to this manuscript, providing a mechanistic interpretation of our findings. The model is different from prior models in that it emphasizes time-dependent forces, typically disregarded in models built to understand isometric skeletal muscle experiments.

      We have shortened, streamlined and restructured our manuscript to improve the readability and accessibility of our study.

      Recommendations for the authors:

      There is a consensus among reviewers that the link between the stiffness dependence of the observed stochastic dynamics and the proposed tug-of-war mechanism is unclear. More quantitative support and discussion is required, possibly using theoretical modeling.

      We are grateful for the insightful and comprehensive feedback by both editor and reviewers. As suggested, we have now added a comprehensive model explaining the observed phenomena and presenting a new conceptual view on cardiac muscle dynamics.

      Reviewer #1 (Recommendations For The Authors):

      The authors addressed an interesting question related to the dynamics of cardiac cells and their multiscale dynamics. They did a good job in terms of experimental design and data analysis. However, I fear that they do not contribute enough new information to the topic.

      The authors should refer to the study in [24] and explain better the difference between these two studies. Although the different approaches are quite obvious, it is not clear to me what additional insights they add to the problem. They conducted their experiments with different stiffnesses. However, the conclusions they draw from the study are based on speculation (e.g. about the behavior of myosin heads in relation to shortening and relaxation), while their data mainly confirm previous studies. They need to address more explicitly the novelty of their study.

      Novelty and Comparison with Previous Studies: We understand the concern about distinguishing our contribution from prior work, specifically Kobirumaki-Shimozawa et al., 2021.

      As detailed in our public response, these are the key advances:

      Use of a medically relevant human iPSC-CM model vs. mouse cardiomyocytes.

      Superior spatial and temporal resolution via our SarcAsM algorithm, revealing novel phenomena like popping and high-frequency oscillations not previously reported.

      Significantly greater statistical power due to our high-throughput in vitro assay.

      We added a distinct mechanistic explanation based on the dynamic force-velocity relationship and sarcomere-level stochasticity, contrasting with the static, deterministic titin/length-dependence focus of previous studies.

      Interpretation and Speculation: We acknowledge that without the explicit model, some interpretations in the initial submission appeared speculative. As noted in our public response, we had already started to develop a theoretical model explaining our observations at the time of submission, targeting a second follow-up publication. Including interpretations based on this unpublished model prematurely clearly caused confusion. We now include the full model in the revised manuscript.

      Integration of the Theoretical Model: We have now fully integrated the model into the revised manuscript. The model explicitly demonstrates how the non-monotonic force-velocity relationship of individual sarcomeres leads to dynamic instabilities around a critical force threshold. This instability along with stochasticity drives a 'tug-of-war' between coupled sarcomeres, generating complex emergent behaviors.

      Mechanistic Explanation Beyond Length-Dependence: Our model quantitatively reproduces all key experimental findings (stochastic heterogeneity, popping, oscillations) without relying on length-dependent activation effects. This strongly supports our conclusion that the active, transient dynamics of individual sarcomeres governed by the force-velocity relationship are fundamental drivers of these complex contractile patterns. We believe this provides a significant conceptual advance, highlighting a potentially underappreciated aspect of sarcomere dynamics. Previous models focused mostly on length-dependence, historically based on skeletal muscle fiber experiments that were often done under static, isometric conditions. We feel that the new model represents a substantial paradigm shift in understanding highly dynamic muscles such as heart muscle.

      We are confident that the inclusion of the model addresses the majority of the reviewer's concerns.

      Additional comments:

      The authors write of a tug-of-war competition between the sarcomeres, and I'm not sure what they mean by that. I would spend more words explaining this point, especially because it seems to be an important point to describe their results. Similarly, they talked about an all-or-nothing phenomenon when they described the elongation of sarcomeres. What do they mean by this?

      We have revised the manuscript where clarification was needed and now define the terms mentioned more explicitly.

      (1) "Tug-of-War": We used this term metaphorically to describe the mechanical competition between linearly coupled sarcomeres within a myofibril, especially when contracting against rigid external boundary conditions. While it is not a perfect analogy, the metaphor intuitively captures the inherent instability of this interaction: similar to how a team in a real tug-of-war might suddenly yield when one person tires and the rest of team gets overloaded, rather than steadily losing ground, the dynamic instability arising from the non-monotonic force-velocity relationship (detailed in our model, lines 300ff) can cause individual sarcomeres to abruptly change state (e.g., shorten or rapidly lengthen) while under tension from their neighbors. We have removed the term from the title and now use it more sparingly within the manuscript to better reflect its role as an illustrative analogy.

      (2) "All-or-Nothing" Elongation (Popping): The term "popping" describes our experimental observation of sudden, rapid, and extensive elongation of individual sarcomeres. This typically occurs late in the contraction cycle during early relaxation, when overall force may be declining, but individual sarcomeres can still experience significant tension from their neighbors. We described this specific type of rapid elongation in the original manuscript as an "all-or-nothing" phenomenon because, typically, sarcomeres in these events yield rapidly and strongly overshoot their resting length without recovering in a given activation cycle. The speed of popping events is substantially higher than the speed of coordinated gradual shortening observed during systoles that is driven by bound myosin heads. This observation strongly suggests an instability-driven, avalanche-like unbinding of myosin heads from the actin filaments during these events.

      We agree that the term "all-or-nothing" is not precise, and we have removed it, as it is not essential for describing the observed "popping" dynamics.

      The authors claim that the popping frequency increases as a function of stiffness. However, Figure 4E does not really seem to be a common practice in terms of statistical significance. A better description could help to remove this doubt.

      We clarified the presentation of popping frequency data and its statistical interpretation.

      (1) Popping Frequency vs. Substrate Stiffness (previously Figure 4D, now Figure 3G):

      We first corrected that the dependence of popping frequency on substrate stiffness was presented in Figure 4D, not 4E. In the revised, shortened manuscript it can be now found in Fig. 3G. Due to the large number of observations (N) in our dataset, the slight upward trend in popping frequency with increasing substrate stiffness shown in Figure 4D does reach statistical significance using standard tests. For details see Figure captions.

      (2) Popping Frequency vs. Sarcomere Resting Length (previously Figure 4E, now Figure 3H):

      Figure 4E addresses the relationship between popping frequency and the individual sarcomere's resting length. To generate this plot, we binned sarcomeres based on their measured resting length (in intervals of 0.02 µm) and calculated the mean popping frequency within each bin across all conditions. We have now clarified this in the figure caption.

      (3) Interpretation of Length Dependence:

      While Figure 3H clearly shows that longer sarcomeres are more prone to popping, we argue this is likely a modulating factor rather than the sole underlying cause. Two key observations support this interpretation:

      Even very short sarcomeres (e.g., < 1.65 µm resting length) exhibit a non-zero popping frequency (around 5-10%), indicating that popping is not exclusive to long sarcomeres.

      The distribution of resting lengths, now added to the graph, is narrower than the wide range (1.6-2.0 µm) plotted in Figure 3H. Popping still occurs stochastically within a myofibril of sarcomere with relatively similar resting lengths.

      Therefore, while length clearly influences the probability of popping, the phenomenon itself appears to be fundamentally stochastic, occurring across a range of lengths. This is consistent with our model in which dynamic instabilities (driven by the non-linear force-velocity relationship) and stochastic fluctuations are the primary triggers, while length affects probability of occurrence.

      Changes in Manuscript:

      We have revised the text associated with Figures 3G and 3H to clarify the distinction between stiffness and length dependence.

      We have added a statement in the Methods section and figure legends (e.g., Legend for Fig 3) explaining our approach to statistical analysis and interpretation for large datasets where standard p-values may be less informative.

      We believe these clarifications directly address the reviewer's concerns about the data presentation and interpretation in Figure 3.

      Reviewer #2 (Recommendations For The Authors):

      This is an interesting study, which however could and should be extended, see below. The current manuscript contains much less information than its length suggests; its figures contain partially redundant data.

      Taking into account this critical feedback, we have restructured, streamlined and shortened the manuscript to improve readability and accessibility.

      (1) How regular are the cellular contraction cycles?

      Have the authors computed a coefficient of variation of cycle durations?

      Does this regularity depend on substrate stiffness?

      We have substantially improved the detection accuracy of contraction intervals compared to our initial submission (details see SarcAsM, https://www.biorxiv.org/content/10.1101/2025.04.29.650605v1). We calculated the beating rate variability (defined as the standard deviation of cycle durations), and found a low variability of on average less than 0.05 s across the tested conditions. The distribution of this variability is positively skewed, with the majority of values clustering near zero. We have added new panels showing these results to Fig. S2B.

      (2) Which experiments could the authors perform to identify the origin of the apparent 3-Hz oscillations?

      Would these oscillations persist even if the cardiomyocytes would not beat?

      We now address these questions in the revised manuscript.

      (1) Active Nature: The ~3 Hz oscillations are clearly linked to active contraction. They are absent in quiescent, non-beating cardiomyocytes observed under identical conditions, confirming that they are not passive fluctuations or baseline cellular tremors.

      (2) Signal Fidelity: We are confident these are genuine physiological events, not artifacts. Our high temporal resolution (~15 ms frame time) and tracking accuracy (< 20 nm) allow reliable detection because events are well above system noise. This is now explained in the revised manuscript.

      (3) Can the authors augment their study by modeling?

      For example, could the experimental data be fitted by a Kuramoto-type model of the form d phi_i / dt = eps*sin( Omega - phi_i ) + lambda*sin( phi_i - phi_i+1 ) + xi_i, combining phase-locking of sarcomere oscillations with phase phi_i to intracellular calcium oscillations with phase Omega, and anti-phase synchronization between neighboring sarcomeres, as well as noise xi?

      If yes, how would the coupling strength depend on subtrate stiffness?

      We now added a model. While a Kuramoto-type phase model is powerful for studying synchronization, we determined that a more mechanistic approach was required. Crucially, sarcomeres are mechanically coupled in series within a myofibril, and this direct physical linkage is not well-represented by the abstract, phase-based coupling of a Kuramoto model.

      Instead, our model comprises serially coupled sarcomeres, each governed by an underdamped Langevin equation. This framework allowed us to infer the force-velocity relation without any prior assumptions directly from our experimental data, revealing a critical non-monotonic characteristic. As we now emphasize in the revised manuscript, this behavior is mathematically equivalent to a Van-der-Pol relaxation oscillator, which reflects the instability-driven nature of the system.

      Furthermore, and in line with the reviewer's suggestion, our model incorporates a stochastic noise term which we found essential for reproducing the observed phenomena. Without this noise term, the characteristic sarcomere dynamics do not emerge (Fig. 5).

      (4) What is the maximally extended length of titin, and how does this length correspond to the maximal length of popping sarcomeres?

      The force-extension curves of titin have been measured in single-molecule experiments (and the packing density of titin is known) - can the authors use this information to infer the forces acting inside sarcomeres?

      We thank the reviewer for this thoughtful question. While sarcomere length during popping can be measured, inferring the corresponding intra-sarcomeric force is not straightforward in a living, contracting cardiomyocyte. The relationship between extension and force is complex and dynamic, involving multiple molecular components.

      Our data show elongations up to 0.5 μm during popping events. While this magnitude is plausibly within the extensibility range of titin and other mechanically relevant components (Caporizzo & Prosser, 2021; Loescher & Linke, 2023), directly inferring force from this observation is challenging. In such a multi-component system with both active and passive elements, total force comprises several factors that cannot be disentangled from a simple length measurement alone. First, the system is dominated by active, velocity-dependent force generation of cross-bridges, which our model shows is non-monotonic. Second, titin exhibits a restoring force that is strongly strain-rate dependent (Rief et al., 1997), critical during rapid elongation. Third, viscous drag forces within the sarcomere are also highly strain-rate dependent, contributing significantly during rapid length changes. Fourth, other structural elements such as microtubules and intermediate filaments contribute to viscoelastic properties, particularly at high strains (Caporizzo & Prosser, 2021). This complex interplay makes it impossible to map a given sarcomere length to a unique force value using single-molecule titin data alone.

      (5) I urge the authors to make their raw data openly available.

      We agree on the importance of data availability. While the complete raw imaging dataset is several hundred gigabytes and thus impractical to deposit, we have uploaded a comprehensive dataset to Zenodo to ensure full reproducibility. This repository includes a representative subset of raw imaging data (50 cells per condition), with corresponding sarcomere motion data provided in a readable JSON format. Crucially, the deposition also contains the complete aggregated data underlying all figures and statistical analyses presented in the manuscript. All provided data can be programmatically accessed and analyzed using our `SarcAsM` Python API. The data can be accessed at: https://doi.org/10.5281/zenodo.17564384.

      Minor

      (1) How did the authors determine the start and end of contraction cycles when analyzing their data?

      The start and end points of each contraction cycle were identified using ContractionNet, a custom convolutional neural network we developed for this purpose. This method, used for all analyses in the revised manuscript, detects contraction intervals with high accuracy directly from sarcomere dynamics time-series data and significantly outperforms the threshold-based approach used previously. The complete methodology, algorithm description, and validation of ContractionNet are detailed in our companion paper on the SarcAsM analysis software

      (www.biorxiv.org/content/10.1101/2025.04.29.650605v1, see Fig. S6).

      (2) What are the measurement errors in determining Delta_SL?

      The measurement error for the Z-band trajectories is approximately 17 nm. This high tracking accuracy is achieved with our deep-learning-based Z-band segmentation approach, which employs a 3D convolutional neural network (3D U-Net) to leverage both spatial and temporal context for robust Z-band segmentation in noisy, high-speed recordings. A full description of this validation is available in our SarcAsM companion paper (see Figure S3 therein).

      (3) Does popping occur while other sarcomeres are still contracting?

      This is an important point. Yes, popping frequently occurs while other sarcomeres within the same myofibril are still actively shortening. This simultaneity is clearly visualized in the newly added Movie M1, which displays a phase-space plot (velocity vs. length change relative to rest) for all tracked sarcomeres over time. In this visualization, popping events appear as trajectories moving into the top-right quadrant (rapid elongation), while concurrently, other sarcomeres are represented by points in the left quadrants (negative velocity), indicating ongoing shortening. We have included Movie M1 as supplementary material.

      (4) The authors argue that their data on popping sarcomeres is consistent with homogeneous popping probabilities.

      (5) Can the authors assess in simulations how dispersed the popping probabilities of individual sarcomeres could be before they would notice a statistically significant difference to the homogeneous case?

      This question touches on a key challenge in analyzing these complex dynamics. A direct statistical test of popping probability for each individual sarcomere is not feasible, as the number of events per sarcomere over our observation time is too low for robust single-unit analysis. Consequently, our approach relies on testing the cumulative distributions of inter-event spatial distances and temporal gaps across all sarcomeres within a given region (LOI).

      In nearly half of the analyzed LOIs, these cumulative distributions were statistically indistinguishable (p > 0.05) from the geometric distribution expected for a single, homogeneous stochastic process. This provides strong support for our primary conclusion that popping is fundamentally a random phenomenon.

      For the cases that deviate from the homogeneous model, we argue that this does not refute the underlying stochasticity of the events. Instead, we propose this is the expected statistical signature of pooling data from a population of sarcomeres that have slight, intrinsic variations in their individual popping probabilities due to factors like resting length or structural integrity. Even if each sarcomere's popping is a locally random event, a cumulative test performed on a population with varied baseline probabilities is expected to detect a deviation from a simple, homogeneous model.

      Regarding the requested simulation study: While we agree this would be methodologically informative, the sensitivity to detect probability dispersion depends on multiple interacting factors (number of sarcomeres per LOI, observation time, event rates, and the assumed form of heterogeneity). Any single simulation scenario would therefore be highly model-dependent and of limited generality. Rather than introducing additional assumptions, we base our conclusions on the observed agreement with the homogeneous model in approximately half of LOIs and the correlation of deviations with measurable properties (Fig. 4E). A comprehensive statistical analysis would constitute a substantial methodological study beyond the scope of this mechanistically focused manuscript.

      (6) Can the authors measure sarcomere rest length and check if this rest length is correlated with the popping probability of individual sarcomeres?

      Yes, we performed this analysis. As shown in Figure 3H (previously Fig. 4E), we found a positive correlation between sarcomere resting length and popping frequency, confirming that longer sarcomeres have a higher probability of popping.

      Importantly, however, the popping probability remains non-zero even for shorter sarcomeres. As detailed in our response to Reviewer #1 regarding this figure, we interpret resting length as a significant modulating factor that influences popping probability, rather than the sole determinant of the phenomenon.

      (7) Several mathematical models of sarcomere contraction exist (e.g., crossbridge models).

      (8) Could the authors perform computer simulations of several such stochastic sarcomere models coupled in series?

      Alternatively, could the authors discuss this?

      As I understand, references 16-18 model myofibril contraction assuming static variability of sarcomeres, but do not account for stochasticity in the contractility of individual sarcomeres.

      We thank the reviewer for this excellent suggestion. We have performed such simulations, and the theoretical model is a central component of our revised manuscript (new Figures 4 and 5; manuscript lines 316ff).

      As the reviewer points out, previous models (e.g., refs 12 and 14 in our manuscript) have often relied on predefined static variability between sarcomeres to explain heterogeneous behavior. Our work takes a fundamentally different approach. We model the myofibril as a chain of serially coupled sarcomeres, where the dynamics of each unit are governed by an underdamped Langevin equation. This formulation inherently incorporates stochasticity and describes the interplay between a non-monotonic, velocity-dependent active force, a length-dependent passive force, and the mechanical coupling to its neighbors.

      Crucially, the model parameters were not assumed, but were instead inferred by fitting the model directly to our experimental data using a gradient-free optimization algorithm. This data-driven stochastic model was sufficient to quantitatively reproduce key observed phenomena, including high-frequency oscillations and popping events. Our central finding is that these complex behaviors emerge naturally from the coupled system, driven by the non-monotonic force-velocity relationship and intrinsic stochastic fluctuations. This demonstrates that predefined static heterogeneity is not required to explain the observed dynamics.

      (9) The manuscript could be shortened (e.g., lines 52-56 in the introduction provide little extra value).

      We have significantly revised the entire manuscript to improve clarity and readability. We have removed sentences in the introduction as suggested and substantially restructured major sections. One of the main reasons for this was the integration of our theoretical model, which was originally prepared as a separate manuscript. This required us to completely reframe the introduction and reorganize the figures and results.

      We are confident that these extensive changes have resulted in a stronger, more concise and impactful paper that now integrates our experimental findings with a theoretical model.

      (10) Figure 2 is overloaded with data. Several panels could be moved to the SM without compromising the key message.

      Introducing the notation in panels Figures 2A-C does not seem ideal to me; maybe add a cartoon?

      We agree that the Fig. 2 was dense. We have redesigned panels A-F to improve clarity and better guide the reader. We now use a consistent color-coding scheme to link the extrema in the phase portraits (A-C) to the corresponding distributions of individual sarcomeres (E-G). We have also revised the accompanying text to make the figure's logic more transparent.

      We have considered moving panels A-C to the supplementary materials. However, we believe their placement in the main text is crucial for two reasons:

      (1) Revealing Core Dynamics: The length-velocity phase portrait is the first visualization that reveals the underlying near-oscillatory dynamics of individual sarcomeres. This was not an assumed behavior but a critical experimental observation that directly motivated our entire theoretical modeling effort. We now also provide animated versions of these plots (Movies X-Y) to further illustrate these complex dynamics.

      (2) Enabling Model-Experiment Comparison: A phase portrait is a standard tool for comparing experimental data with theoretical models. Retaining it in the main text allows us to directly compare data and model in our new Figures 4 and 5, providing a clear validation of our model.

      (11) Similarly, Figures 4F, G, and H seem dispensable to me.

      (I also wonder how clear the analogy of a coin flip is if a biased coin with probabilities p and 1-p needs to be used.)

      We agree that the previous Figure 4F, which served a purely illustrative purpose, was dispensable and have removed it. The "coin flip" analogy was potentially confusing and we have removed it.

      As part of a broader restructuring of the manuscript, the quantitative analyses from the original Figures 4G and 4H are now presented as Figures 3I and 3J. They provide important supporting evidence for the stochastic nature of the resulting popping events. We believe retaining this quantitative analysis is valuable, and we hope that by streamlining the figure and removing the analogy, we have addressed the reviewer's concerns.

      (12) Equation (1) is unnecessarily complicated. The same holds for Equation (2).

      It might make sense to separate definitions for serial and mutual correlations.

      (This would also simplify the axes labels in Figure 3C.)

      (13) The notation used in Equation (1) is not fully clear.

      I assume t denotes a unit-less time index and T is the unit-less duration of a contraction cycle, measured in multiples of a fixed time interval?

      Regarding comments (12) and (13):

      We thank the reviewer for these helpful suggestions. In response to comment (12), we have separated the definitions for the mutual (r<sub>m</sub>) and serial (r<sub>s</sub>) correlation coefficients, presenting them as distinct calculations rather than as special cases of a single, more complex formula. This makes their definitions more direct and explicit. The calculation for the serial correlation coefficient has also been streamlined into a concise inline definition.

      In response to comment (13), we have clarified the notation in Equation (1). In the manuscript text (lines 208f), we now explicitly state that 𝑡 represents the discrete, unitless time index (i.e., the frame number) within a time-series, and 𝑇 is the total number of frames (i.e., the total duration in frames) of a given contraction cycle.

      While Equation (1) itself is the standard definition for the uncentered correlation coefficient and cannot be algebraically simplified, we have added text to specify this and justify its use. This metric (equivalent to cosine similarity) is appropriate for our analysis as it assesses the similarity in the shape of motion patterns, independent of their mean values.

      Finally, to further streamline the paper, we have removed the velocity correlation analysis and the corresponding parts of Figure 3.

      (14) The authors should make clear in all figures what is experiment and what is simulation.

      We have now clarified the nature of each graph in the figure captions.

      (15) The caption of Figure 3C could be simplified.

      We have simplified all figure captions.

      (16) I found Figure 3A hard to understand.

      We concluded that Figure 3A was confusing and did not add essential information to the manuscript. We have removed it entirely.

      Reviewer #3 (Recommendations For The Authors):

      In conclusion, l think that the manuscript would gain a lot if some more precise and more quantitative interpretation of the results were given. This might require a collaboration with theorists.

      We have integrated a novel theoretical framework into the revised manuscript (new Figures 4 and 5; manuscript lines 300ff as described above.

      This new section introduces a data-driven, stochastic dynamical model that simulates the myofibril as a chain of serially coupled sarcomeres. Each sarcomere's motion is governed by an underdamped Langevin equation, a formulation that inherently accounts for stochasticity. Crucially, our model incorporates a non-monotonic force-velocity relationship inferred directly from our experimental data, rather than relying on predefined static variability between sarcomeres a key distinction from previous theoretical work.

      This integrated model successfully and quantitatively reproduces all major experimental phenomena described in the paper, including high-frequency oscillations and stochastic "popping" events. It demonstrates that these complex behaviors emerge naturally as dynamic instabilities from the coupled system. This addition elevates the manuscript from a descriptive study to one that provides a predictive, mechanism-driven framework for understanding sarcomere dynamics.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a theoretical analysis that gives compelling evidence that length control of bundles of actin filaments undergoing assembly and disassembly emerges even in the absence of a length control mechanism at the individual filament level. Furthermore, the length distribution should exhibit a variance that grows quadratically with the average bundle length. The experimental data are compatible with these fundamental theoretical findings, but further investigations are necessary to make the work conclusive concerning the validity of the inferences for filamentous actin structures in cells.

      We think this is an excellent assessment of the article. We suggest adding a sentence after the first one: “The distribution of bundle lengths is not Gaussian but Gumbel, since the bundle length is the length of the longest filament in the bundle.”

      Public Reviews:

      Reviewer #1 (Public Review):

      Actin filaments and their kinetics have been the subject of extensive research, with several models for filament length control already existing in the literature. The work by Rosario et al. focuses instead on bundle length dynamics and how their fluctuations can inform us of the underlying kinetics. Surprisingly, the authors show that irrespective of the details, typical "balance point" models for filament kinetics give the wrong scaling of bundle length variance with mean length compared to experiments. Instead, the authors show that if one considers a bundle made of several individual filaments, length control for the bundle naturally emerges even in the absence of such a mechanism at the individual filament level. Furthermore, the authors show that the fluctuations of the bundle length display the same scaling with respect to the average as experimental measurements from different systems. This work constitutes a simple yet nuanced and powerful theoretical result that challenges our current understanding of actin filament kinetics and helps relate accessible experimental measurements such as actin bundle length fluctuations to their underlying kinetics. Finally, I found the manuscript to be very well written, with a particularly clear structure and development which made it very accessible.

      We are grateful to Reviewer #1 for this very favorable assessment.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a theoretical study of the length dynamics of bundles of actin filaments. They first show a "balance point model" in which the bundle is described as an effective polymer. The corresponding assembly and disassembly rates can depend on bundle length. This model generates a steady-state bundle-length distribution with a variance that is proportional to the average bundle length. Numerical simulations confirm this analytic result. The authors then present an analysis of previously published length distributions of actin bundles in various contexts and argue that these distributions have variances that depend quadratically with the average length. They then consider a bundle of N-independent filaments that each grow in an unregulated way. Defining the bundle length to be that of the longest filament, the resulting length distribution has a variance that scales quadratically with the average bundle length.

      Strengths:

      The manuscript is very well written, and the computations are nicely presented. The work gives fundamental insights into the length distribution of filamentous actin structures. The universal dependence of the variance on the mean length is of particular interest. It will be interesting to see in the future, how many universality classes there are, and which features of a growth process determine to which class it belongs.

      Weaknesses:

      (1) You present the data in Fig. 3 as arguments against the balance point model. Although I agree that the data is compatible with your description of a bundle of filaments, I think that the range of mean lengths you can explore is too limited to conclusively argue against the balance point model. In most cases, your data extend over half an order of magnitude only. Could you provide a measure to quantify how much your model of independent filaments fits better than the balance point model?

      Indeed, we agree that the experimental data we present, each on their own, provide inconclusive evidence of the scaling predicted by our model. However, in aggregate, as presented in Fig. 3E, the data make for compelling evidence of scaling of the variance with the average length squared, as quantified by the power-law fit. Also, we think that Fig. 3E argues strongly against the Balance Point Model, because the data do not conform with simple linear scaling (indicated by the dashed line in Fig. 3E). Regardless, we agree with the referee that better data is needed to make a more convincing case, and we see this paper as a call to arms to collect such data in the future. The published data we used (other than our own data from experiments on yeast actin cables) is from experiments that were not designed with this question in mind, i.e., how do length fluctuations scale with the mean?

      (2) Concerning your bundled-filament model, why do you consider the polymerizing ends to be all aligned? Similarly to the opposite end, fluctuations should be present. Furthermore, it is not clear to me, where the presence of crosslinking proteins enters your description. Finally, linked to my first remark on this model, why is the longest filament determining the length of the bundle in all the biological examples you cite? I am thinking in particular about the actin cables in yeast.

      In the case of the yeast actin cables (which grow from the bud neck into the mother cell), we know that the formins that polymerize the actin filaments are spatially aligned at the bud neck. In the cases of stereocilia and microvilli, again the polymerizing ends of the actin filaments are well-aligned at the growing tips of these bundled actin structures, as indicated by classic EM studies from Lew Tilney and others. The alignment of polymerizing actin filament ends is more difficult to assess at the leading edge of lamellipodia, because of undulating shape of the polymerization (membrane) surface. In fact, this could be the reason why data from the lamellipodia experiments deviate from the line in Fig. 3E, in contrast to the data from the other three structures (this is discussed in some detail in the Supplement). Regarding the actin crosslinkers, the only role they play in our model is keeping the filaments connected in the bundle. As far as the question of why the longest filament in the actin cable is the one that specifies the length of the cable, this is addressed in more detail in our McInally et al., 2024 (PNAS) paper, where we measured cable length by segmenting the fluorescence signal of the cable. Therefore, the filaments in the bundle that extend the furthest define the reported length. Also, given the function of the cables for transporting vesicles, the furthest reach of the filaments in the bundle defines the area from which the vesicles are collected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      An important result of the model proposed by the authors is that the relationship between bundle mean length and variance should also inform the number of filaments in the bundle (Equation 13). In the SI the authors thus predict from fitting experimental results that bundles should be made of around 173 filaments, which is larger than most values proposed in the literature (and quoted in this work), except for stereocilia. Can the authors comment on this?

      This is an interesting point that we have been thinking about. Indeed, the model does relate the number of filaments to the variance of the length, but this dependence is logarithmic and therefore insensitive to changes in the number of filaments. Consequently, the number 173 comes with very large error bars and should be thought of more like a few hundred filaments in terms of the precision with which we can extract this number from data. We make this point more clearly in the revised SI, where we now say that based on the data the best we can do is say that the number of filaments is between 80 and 400.

      Along the same lines, in their derivation of Equations 12 and 13 (a key result of the manuscript) the authors make some approximations that are only valid for large N (number of filaments in the bundle). Is this approximation valid for actin cables or filopodia, estimated to comprise only around 10 filaments?

      Indeed, even for N=10 filaments the approximate formulas have errors that are well below what can be measured. We consider the details of the approximation in deriving Equations 12 and 13 from the exact distribution (Equation 11) in the Supplemental section “Distribution of bundle lengths when individual filament lengths are exponentially distributed”. For example, the exact result involves the harmonic number which for N=10 is 2.88, while the approximate formula ln(N) + gamma we use yields 2.92, a fractional error that is < 2%.

      A key assumption of the model is that the bundle length corresponds to the maximum individual filament length inside the bundle. Couldn't bundles comprise several filaments one after another, head-to-tail? What do the authors expect then?

      Excellent point. Indeed, this is precisely the geometry of the yeast actin cable. In our previously published McInally et al., 2024 (PNAS) paper we worked out the math in that case and found that the main result about the variance holds. In this paper we presented a simpler, model that retains the same features of the one presented in the PNAS paper to better accentuate the origins of the scaling of the variance with the mean length, which is simply the result of bundling and identifying the length of the bundle with the length of the longest filament (or, more precisely, furthest extending filament) in the bundle.

      The model also allows us to relate the bundle length fluctuations and average to the individual filament characteristic length (Equations 12 and 13 again). Can the authors comment on the values of 〈l〉 they would obtain for experimental data?

      It is hard to give a precise number, as we would need to know also the number of filaments in the bundle, and for that we would need better electron microscopy data (which has proven difficult for the field to obtain). Still with typical numbers in the 10s to 100s the expected average filament lengths are roughly, ln(10) – ln(100), or 2-5 times smaller than the average bundle length.

      I find the Methods section a bit underwhelming. In particular, can the authors give more details on their treatment of experimental data? Bootstrapping sampling is mentioned but there is no information on the size of the original data sets, which could affect the validity of such a method.

      Thanks for the criticism. We have added details regarding the sizes of the data sets used in the analysis in the Methods section.

      Along the same lines, is the graph in Figure 1E the result of a simulation like the ones the authors used to obtain their result or is it just a schematic? If the first, I would suggest replacing it with an actual simulated length trajectory. In general, I think this work would benefit from more detailed explanations and examples of how stochastic trajectories were computed and analysed.

      This is also a good point. We still prefer to keep the schematic in this figure since our goal here is to define the question before we commence with computations and data analysis. The stochastic trajectories were generated using the standard Gillespi algorithm and the statistics of length were gathered once the dynamics of length reach steady state. We explain this in the Methods section and give more details in the Supplement.

      Finally, while I find the writing in this manuscript to be excellent, I think the figures require some work. The schematics and drawings, which are very low resolution, the font size for the axes, and the choice of colours all make it more cumbersome than necessary to understand what is being shown.

      Thank you for pointing this out. We have made better versions of the figures.

      Reviewer #2 (Recommendations For The Authors):

      "In this case, the length distribution of the bundle derived from extreme value statistics, leads to a peaked non-Gaussian distribution, even when filaments within the bundle are unregulated and exponentially distributed."

      You mention "extreme value statistics" only once, in the introduction. I would suggest that you come back to this notion and explain how your results connect to extreme value statistics or delete it from the manuscript.

      Good point. We added a sentence to draw the reader’s attention to the fact that our result is an extreme value distribution (Equation 11 is the Gumbel distribution) used in statistics of extreme events.

      This is a follow-up of one of my major points of criticism: Fig. 3A: why do you fit (if I understand correctly) the blue and orange data points with the same power law? For (A-- D) The data extend over less than an order of magnitude. Why is a power law fit appropriate? Can you quantify how much better your fits are compared to a linear dependence? Bundling the data of all structures yields a common matter curve (with the exception of filopodia). This is quite remarkable, I think, and merits some more discussion than currently given in the manuscript.

      Good point. We should have been more clear. In Figures 3A-D we show individual data sets for the different bundle structures and compare the prediction of the Balance Point Model (dashed line) to the data. We also do a fit to a power law to show that the data is consistent with the Bundle model. This comparison is made much more clear in Figure 3E.

      Fig 1B, right does not show the addition and removal of subunits - Fig. 1C does. Panel C is not explained in the caption. The second appearance of (D) in the caption could be omitted.

      Good points. We fixed these issues in the new version of the Figure and caption.

      "For individual actin filaments (...)" I found this and the following paragraph slightly confusing at first reading: as long as you write about single filaments, do you have annealing in mind, where two filaments merge and form a longer filament? In case you consider a bundle, do you consider a filament that is cross-linked to other filaments and thereby added to the bundle? Similarly for removing filament segments (severing or unbundling)? Probably, my confusion is a consequence of you seemingly using filament to describe bundles as well as single actin filaments.

      Sorry for the confusion. We tried to be consistent throughout the text and use “filament” to denote a single actin filament and “bundle” a collection of parallel filaments crosslinked together. The assembly and disassembly dynamics of the filaments in the bundle are only relevant to the extent that they affect the length distribution of individual filaments. The main result is largely independent of that (as demonstrated in the Supplement by considering different single filament distributions) once we decide that the length of the bundle is given by the length of the longest filament in the bundle. This is the point of extreme value statistics where a universal, Gumbel distribution for the length of the longest filament in the bundle arises independent of the length distribution of a single filament (this result is akin to the Central Limit Theorem which predicts a Gaussian distribution of the mean of a large number of random numbers irrespective of the distribution they’re drawn from.)

      In Figure 4D, the variance of the filopodia lengths" Probably Figure 3D?

      Yes. Thank you. We fixed this.

      "The filopodia data seemingly has the same slope (...) but with variances higher than what is measured for other actin structures." This finding does not contradict the main statement of a nonlinear scaling of the variance with the mean length, right? I therefore find this discussion slightly peripheral and also confusing. Also, what is the reason to assume that EM might get the actual length of filopodia wrong by a factor of 2 to 3?

      The issue with filopodia is that the way the lengths are measured is by the extent to which the structure as a whole protrudes from the cell. This leaves unresolved the lengths of the actual filaments in the structure, and we suspect that they are longer as they extend into the cytoplasm. This would contribute to the shift off the common curve in the direction that is observed (larger variance associated with smaller average length). We have no way to justify that this would lead to a 2-3 factor other than that would be enough to collapse the data onto the common curve. Clearly more careful experiments are needed to resolve the issue. We added some clarifying remarks to this effect into the discussion.

      Eq.(14) What is Z?

      Thanks for pointing out this omission. Z = L/<L> and we have added that in the formula where Z appears.

      LIST OF CHANGES

      Here we summarize the changes we made to the manuscript and the Supplementary material in response to the reviewers.

      (1) Fixed typo: Figure 1 legend had two parts labelled D which has been changed into a D and a C. The explanation of panel C has been added.

      (2) Fixed typo: The incorrect call to Figure 4D is now corrected to Figure 3D.

      (3) In the Supplementary material we made more precise our estimate of the number of filaments. The wording “From this we can estimate the number of filaments. We find, with a confidence interval of…” we have changed to “From this we can estimate the number of filaments to be between 80 and 400 which compares favourably to the typical number of filaments in the different actin structures that were analyzed.”

      (3) In the Methods section we added the number of measured filament lengths in the different data sets used in the analysis.

      (4) We made better (higher resolution) versions of all the Figures.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The factors that create and maintain diversity in host-associated microbiomes remain poorly understood. A better understanding of these factors will help in the efforts to leverage the adaptive potential of the microbiome to help solve pressing problems in health and agriculture.

      Experimental evolution provides a promising path forward as we can track the causes and consequences in the emergence of novel variants, but experimental evolution remains underutilized in host-microbiome interactions. Here, Gracia-Alvira utilizes a long-term experimental evolution study in Drosophila simulans under hot and cold temperature regimes to identify strain-level variation in an important fly bacterium, Lactiplantibacillus plantarum. They identify three strains of L. plantarum, which are most prevalent in their respective three temperature regimes, suggesting that these are locally adapted bacteria. Then, using a combination of genomics, in vitro, and in vivo, Gracia-Alvira et al attempt to understand the factors that led to the differentiation of the hot and cold L. plantarum and their impacts on the fly host.

      Strengths:

      This is an excellent use of experimental evolution to track the emergence of novelty in the microbiome. The genomic analyses are all solid and appropriate for the data sets. It is especially striking that the comparisons with the other, independent experimental evolution studies in different labs (and across continents between Portugal and South Africa) show a consistent response to temperature. Many have disregarded the microbiome as it is something that is too sensitive to seemingly innocuous variables (particularly in the fly microbiome), such that we cannot find generalities. However, this finding highlights the potential for experimental evolution to uncover these dynamics. The question of how strains emerge and are maintained is timely and is one of the key open questions in host-microbiome evolution currently.

      Weaknesses:

      (1) The framing in the title and throughout the discussion about "subspecies competition" does not match the data that was collected. The subspecies competition requires actually tracking the competitive outcomes between the hot, cold, and unevolved L. plantarum. In the in vivo work, I can see that mixes of the strains were made, but they did not track whether the cold strain outcompeted the hot strain in vivo under cold conditions, for example.

      We thank the reviewer for the honest concern and take this opportunity to defend our claim of "subspecies competition used across the manuscript. As the reviewer states, subspecies competition requires tracking the competitive outcomes between the three clades, and this is what we did by sampling and sequencing across ten years of experimental evolution (Figures 4 and S3). For this reason, we point that the subspecies competition assessment comes from the direct observation of changes in relative abundance across the time series, and not from the follow-up experiments in vivo or in vitro.

      While Figure 4 is suggestive that there is ongoing competition in the hot temperature regime, this is not necessarily shown in the cold, which is dominated by the C clade. It could also be that the bacteria cannot survive in the flies at the different temperatures. The growth curve assays hint that the bacteria can grow, but the plate reader couldn't actually maintain the 18 {degree sign}C temperature (line 455). So all of this evidence is very indirect and insufficient to say that strain competition is driving these patterns.

      We thank the reviewer for the alternative hypothesis that could explain the observed subspecies dynamic. We rule out that dominance of clade C in the cold occurs because the other two clades cannot grow in this regime based on three pieces of evidence:

      (1) In the time series, clades H and U decrease, but never disappear (Figures 4 and S3), even showing some peaks of abundance in specific replicate populations (Figure S3).

      (2) We isolated individuals belonging to clade H in the cold-evolved populations, as shown in figure 2. This is a direct evidence that clade H prevails in the cold-evolved populations, although in low abundance.

      (3) We did grow the three taxa in fly food petri dishes incubated at both temperature regimes, observing growth in all cases.

      We will include the food growth experiment in the revised manuscript as further supporting evidence for growth in both regimes.

      (2) The in vivo results are interesting in that there appears to be a fitness cost of clade C, but the explanation is underdeveloped. I say under-developed because in Figure 4, the cold L. plantarum remains much higher throughout adaptation to the hot temperature regime than the hot L. plantarum in the cold regime. The hot L. plantarum is low abundance throughout the cold regime. I felt like this observation was not explained, but it seems relevant to understanding the strain dynamics.

      We acknowledge that a strong fitness cost of clade C is observed in axenic D. melanogaster. In the native host, D. simulans, with reduced microbiome, we observed delayed development that could even be an advantage depending on the situation, as pointed out by reviewer 3 in the recommendations.

      Even if we assume that flies colonized with clade C are less fit in the experimental evolution, another caveat is whether the flies can actively select for the L. plantarum clade. Under this assumption, a clade that imposes a fitness cost to the fly (clade C) should be selected against over time because the flies colonized by this clade will have less offspring, or develop later than the rest. Alternatively, as the microbiome is shared among all the individuals in the population, the host might not be able to “purge” the pernicious clade, and L. plantarum dynamics might be controlled solely by the relative fitness between clades in the given experimental treatment. We will discuss this hypothesis in the revision as a way to explain the relationship between the abundance of each clade and the effect on the host.

      I will also note that this is not the first time that L. plantarum or other Lactobacillus have been shown to exert fitness costs to Drosophila. Gould, PNAS, 2018, shows that both Lactobacillus plantarum and Lactobacillus brevis in mono-association have lower fitness (measured through Leslie matrix projections using lifespan and fecundity) than axenic flies. Many studies of wild Drosophila fail to find Lactobacillus, or it is low abundance (e.g., Chandler, PLoS Genetics, 2014; Wang, Environmental Microbiology Reports, 2018; Henry & Ayroles, Molecular Ecology, 2022; Gale, AEM, 2025). This might help provide useful context for the in vivo results.

      We thank the reviewer for the references. These observations will be compared to our phenotypic results and discussed in the revised version of the manuscript.

      (3) The data in Figure 4 are compelling to focus on the L. plantarum variants. However, I can see from the methods that the competitive mapping included only other strains of Wolbachia.

      We appreciate the thorough reading of the methods by the reviewer. The competitive mapping comprised two steps: first we discarded the reads that mapped to Drosophila, Wolbachia and additional potential contaminants from sequencing facitilies (human, dog...). This step leaves the reads originated from whole the external microbiome of the flies, including L. plantarum. The second competitive mapping step recruits the reads that map any clade of L. plantarum.

      It is not clear how other members of the microbiome changed in response to the temperature regimes. As I note in point #2, given that Lactobacillus is often rare, it is not clear what the rest of the microbiome looks like over the course of adaptation. Indeed, it seems like Mazzucco & Schlotterer, PRSB, 2021 did a broader analysis of the microbiome and found that Acetobacter is by far the most common bacterium (I think this data is also part of the data shown here?). Expanding on why or why not in this context is important and will improve this study, particularly if the focus is on connecting these evolutionary dynamics to ecological competition to explain the emergence of strain diversity.

      We acknowledge that the rest of the Drosophila microbiome is not addressed in this study, as we wanted to focus the storyline around the intraspecific dynamics found in L. plantarum. We consider that a complete characterization of the whole Drosophila microbiome would unnecessarily elongate the paper and thus we treat it as a constant biotic factor.

      We must point out that our dataset is not the one reported by Mazzucco & Schlötterer, which was done in D. melanogaster, rather than D. simulans. Nevertheless, both experiments share the same infrastructure, temperature regimes and fly maintenance.

      We will include a list of taxa that were isolated from the populations, as well as to report L. plantarum prevalence and abundance across the experiment in order to provide context of the microbiome, beyond L. plantarum, to the readership.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gracia-Alvira et al. investigated how environmental temperature affects competition among members of the microbiome, with a focus on intraspecific diversity, using the Drosophila model. Notably, the authors identified three clades of Lactiplantibacillus plantarum from a natural population of Drosophila simulans collected in Florida. They tracked the dynamics of these three bacterial clades under two temperature conditions over the course of more than ten years. Using comparative genomics and phylogeny, they showed that these three bacterial clades likely adapted to their host independently in a temperature-specific manner. Further, by combining in vitro culture and in vivo mono-association assays, they demonstrated the functional divergence of these three bacterial clades phenotypically, including their growth dynamics and effects on host fitness. Lastly, they performed pathway analysis and speculated on key genomic variance supporting such functional divergence.

      Strengths:

      The laboratory evolutionary experiment in response to cold or hot environmental temperature is impressive, given its more than ten years of experimental time period. This collection of achieved microbiome samples paired with the fly host data can be a valuable resource for the field.

      Weaknesses:

      The laboratory evolutionary experiment can be limited due to its artificial experimental setup. For example, wild flies rely on a more diverse set of food sources and are constantly exposed to new bacterial inoculations, whereas under laboratory conditions, flies live in a more restricted ecosystem. In addition, environmental temperatures differ among different locations, but they also involve seasonal changes within the same region. This manuscript can be strengthened with further discussions that elaborate on these limitations.

      As the reviewer has correctly noted, our experimental setting is not exempt from limitations. Lab-reared flies are fed with a defined standard diet. Furthermore, although the system is not completely close to bacterial migration, this is limited as replicate populations are not allowed to mix during the maintenance of the flies. For this reason, we consider our laboratory setting as a compromise between observing wild populations, which undergo all biotic and abiotic stresses but cannot be manipulated, and evolving the bacteria in absence of the host, or in gnobiotic hosts, in which biotic interactions are not fully considered. We will extend on this in the new version of the manuscript.

      Moreover, the extent of host effects involved in these experiments remains ambiguous, because it is unclear whether these Lactiplantibacillus plantarum mostly reside within fly guts or on Drosophila medium. The laboratory evolutionary experiment possibly favored better colonizers on Drosophila medium under either cold or hot temperatures, which subsequently can saturate fly guts. As fully dissociating these variables can be experimentally tedious, the authors may want to comment more on these aspects in the discussion. Or they may want to consider some measurements. For example, measuring the growth rate of these bacteria on Drosophila medium under different temperatures, in addition to the current MRS culture experiments, or measuring the portion of the Lactiplantibacillus on Drosophila medium versus these stably colonizing fly guts.

      The reviewer's point was briefly addressed in the Results chapter: "Phenotypic differences in liquid culture".

      Reviewer #3 (Public review):

      Summary:

      The study presents an analysis of 297 pangenomes derived from 20 populations of Drosophila simulans, at 19 time points for fast-reproducing individuals in a hot environment, or at 10 time points for slow-reproducing individuals in a cold environment, over a period of more than 10 years. The authors select a particular microbial component of the pangenomes and study the dynamics of Lactiplantibacillus plantarum strains in two environments. They discover that the revealed operational taxonomic units could be divided into three phylogenetic clades, which have their own genomic and genetic features, different adaptive capabilities that depend on the environment, and have a distinct impact on the fitness of the host.

      Strengths:

      The authors prove that bacterial microbiome components are sensitive to the environment and could rapidly (years) be fixed in eukaryotic populations. This study establishes a tractable model that potentially enables the study of variability of the physiological influence of distinct strains of an important commensal species, Lactiplantibacillus plantarum, on the Drsosophila host. It is clearly shown that this single species consists of several phylogenetically and functionally diverse strains. The authors did not limit their interest to their own model, but rather they have integrated a comparative approach by analysing phylogenetic relationships among 92 described L.plantarum strains.

      Overall, the study is novel and delivers important discoveries of a longitudinal, well-replicated experiment, generating a substantial amount of genomic data. It highlights an important dimension of research that environmental selection operates at the subspecies level.

      Weaknesses:

      Even though the authors show only one particular example by conducting their longitudinal experiment, they honestly acknowledge failures important for interpretation of the biological significance of the results (gnotobiotic mono-association experiments was done with D.melanogaster, but not D. simulans) and therefore they state limitations of their conclusions (weaker effects in the non-axenic flies are due to the presence of other taxa or to higher-order interactions with other members of the microbiome). These interactions could significantly affect bacterial growth, metabolism, and physiological influence on the host.

      We agree with the reviewer in that the use gnobiotic animals is a limitation, as by "tuning" the flies' microbiome we are modifying the interactions between members, which can potentially change the phenotypic outcome. Nevertheless, we use it as a complementary approach, rather than the only inference in our study.

      The authors exploit the results of their experiment to speculate about a wide range of evolutionary phenomena, like within-species competition, ecological adaptation and evolution of the host, fitness advantage of bacteria to the host, the benefits of parasitism or mutualism, the domestication of the microbiome, etc. At the end, they conclude that their study "highlights that even subspecies diversity plays a key role in adaptation to environmental temperature". However, the potential mechanisms of such adaptation are barely discussed, so that the focus of the study shifts from the temperature-induced changes in microbial population structures toward metabolism-related adaptations of clade representatives that enable them to diversify their carbon and nitrogen sources. The role of the temperature factor remains elusive.

      We acknowledge that our study does not fully resolve the mechanism by which a different clade ends up dominating each temperature regime. The MRS liquid experiment was an attempt to answer whether differences in optimal growth temperature could explain the temperature-specific abundance of the two clades. Our experiments showed, however, thatthis was not the case. Beyond this point, it is hard to disentangle the role of the temperature, as it could also act indirectly on the bacteria, for example, through the host or the food.

      A second observation in our time series was that a third clade, U, was unfit in both regimes despite starting the experiment in high abundance. For this reason we also studied what made this clade less fit. Based on our analyses, we propose that the decrease of clade U was driven by the shift to a laboratory diet, shared by all experimental populations.

      In addition to that, the paper has a clearly minimalistic experimental approach to address functional properties of the revealed L.plantarum strains, so that their own fitness, or their relationship with the Drosophila host, is characterised superficially. Therefore, the authors' discourse can be speculative rather than factual (especially when the authors use the expression "likely" to share their guesses in the "Results" section). Nevertheless, these minor drawbacks do not underscore the novelty of the discovered phenotypes and the importance of their further investigation.

      We consider the reviewer's concern and will tone down the phrasing when reporting our findings in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.

      Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about this manuscript and offer a few minor comments below that may help to further strengthen the study.

      We appreciate the reviewer’s positive assessment of our work and suggestions for improvement.

      (1) Page 4

      PIC structures show the linker emerging from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits, followed by a turn and a short stretch of helix just N-terminal to a disordered region that connects to the C-terminal region (see schematic in Figure 1A).

      The linker helix was only observed in the poised PIC (Abril-Garrido et al., 2023), not in other fully-engaged PIC structures.

      Thanks for clarifying. We note that some structures of TFIIH alone also see the long helix. Accordingly, we modified this section to read:

      “In many TFIIH and PIC structures the linker is not visible, presumably due to flexibility. However, when it is seen (Abril-Garrido et al., 2023; Greber et al., 2019), the linker emerges from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits…”

      (2) Page 8

      Recent structures (reviewed in (Yu et al., 2023)) show that the Kinase Module would block interactions between the Core Module and other NER factors. Therefore, TFIIH either enters into the NER complex as the free Core Module, or the Kinase Module must dissociate soon after.

      To my knowledge, this is still controversial in the NER field. I note the potential function of the kinase module is likely attributed to the N-terminal region of Tfb3 through its binding to Rad3.

      We are not experts on NER, but in reviews of the field this appears to be a widely held assumption. A 2008 paper from the Egly lab (Coin et al., DOI 10.1016/j.molcel.2008.04.024) is usually cited, which shows that the interaction between XPD (metazoan Rad3) and XPA is likely incompatible with XPD-MAT1 interaction. In addition to the Yu 2023 review, we now also cite a more recent publication that more extensively reviews the models for core TFIIH interactions (van Sluis et al, 2025). We looked at the multiple recently published structures of various TCR-NER and GG-NER intermediate complexes, and none of them show the CAK module or even the Tfb3/Mat1 N-term, even though those proteins were typically included during assembly. We also consulted with our colleagues Johannes Walter and Lucas Farnung, who are studying various TC-NER intermediates biochemically and structurally. Although the CAK module is included in their assembly reactions, it is not visible in their cryoEM structures. They tell me that the presence of CAK would be compatible with early TC-NER intermediates, but is predicted to overlap with later interactions of XPD with the TC-NER factor STK19 (see Mevissen et al., Cell 2024). To be conservative, we modified the sentence to say “Recent structures … suggest” rather than “show”.

      Because the yeast strains used in Figure 6 retain the N-terminal region of Tfb3, the UV sensitivity assay presented here is unlikely to directly address the contribution of the kinase module to NER.

      We agree that our experiment only shows that the connection between Tfb3 N- and C-term domains is not necessary for NER. The individual domains might still be able to function independently. Accordingly, we changed the heading of that section from “Disconnected core TFIIH does not cause an NER defect” to “Split Tfb3 does not cause an NER defect.” This more closely matches the figure legend title.

      (3) Page 11

      Notably, release of the Tfb3 Linker contact also results in the long alpha-helix becoming disordered (Abril-Garrido et al., 2023), which could allow the kinase access to a far larger radius of area. This flexibility could help the kinase reach both proximal and distal repeats within the CTD, which can theoretically extend quite far from the RNApII body.

      Although the kinase module was resolved at low resolution in all PIC-Mediator structures, these structural studies consistently reveal the same overall positioning of the kinase module on Mediator, indicating that its localization is constrained rather than variable. This observation suggests that the linker region may help position the kinase module at this specific site, likely through direct interactions with the PIC or Mediator. This idea is further supported by numerous cross-links between the linker region and Mediator (Robinson et al., 2016).

      That is true. But please note that this sentence was meant to describe movement of the kinase module AFTER release from Mediator (see previous sentence). Re-reading the passage, we realized the confusion is because we propose multiple possible pathways in that paragraph. In the first half, we suggest the capture of the kinase module by Mediator might trigger the conformation changes in the linker. In the second half (where it says “Alternatively….”) we suggest the Mediator-CAK interaction could instead come first, and the release of this contact could free the CAK module to move around. We have modified the paragraph to make it clear these are two different distinct models.

      Reviewer #2 (Public review):

      Summary:

      This work advances our understanding of how TFIIH coordinates DNA melting and CTD phosphorylation during transcription initiation. The finding that untethered kinase activity becomes "unfocused," phosphorylating the CTD at ser5 throughout the coding sequence rather than being promoter-restricted, suggests that the TFIIH Core-Kinase linkage not only targets the kinase to promoters but also constrains its activity in a spatial and temporal manner.

      Strengths:

      The experiments presented are straightforward, and the models for coupling initiation and CTD phosphorylation and for the evolution of these linked processes are interesting and novel. The results have important implications for the regulation of initiation and CTD phosphorylation.

      Weaknesses:

      Additional data that should be easily obtainable and analysis of existing data would enable an additional test of the models presented and extract additional mechanistic insights.

      We thank the reviewer for the positive assessment and address their specific suggestions below.

      Reviewer #3 (Public review):

      Summary:

      Eukaryotic gene transcription requires a large assemblage of protein complexes that govern the molecular events required for RNA Polymerase II to produce mRNAs. One of these complexes, TFIIH, comprises two modules, one of which promotes DNA unwinding at promoters, while the other contains a kinase (Kin28 in yeast) that phosphorylates the repeated motif at the C-terminal domain (CTD) of the largest subunit of Pol II. Kin28 phosphorylation of Ser5 in the YSPTSPS motif of the CTD is normally highly localized at promoter regions, and marks the beginning of a cycle of phosphorylation events and accompanying protein association with the CTD during the transition from initiation to elongation.

      The two modules of TFIIH are linked by Tfb3. Tfb3 consists of two globular regions, an N-terminal domain that contacts the Core module of TFIIH and a C-terminal domain that contacts the kinase module, connected by a linker. In this paper, Giordano et al. test the role of Tfb3 as a connector between the two modules of TFIIH in yeast. They show that while no or very slow growth occurs if only the C-terminal or N-terminal region of Tfb3 is present, near normal growth is observed when the two unlinked regions are expressed. Consistent with this result, the separate domains are shown to interact with the two distinct TFIIH modules. ChIP experiments show that the Core module of TFIIH maintains its localization at gene promoters when the Tfb3 domains are separated, while localization of the kinase module and of Ser5 phosphorylation on the CTD of Pol II is disrupted. Finally, the authors examine the effect of separating the Tfb3 domains on another function of TFIIH, namely nucleotide excision repair, and find little or no effect when only the N-terminal region of Tfb3 or the two unlinked domains are present.

      Strengths:

      Experiments involving expression of Tfb3 domains in yeast are well-controlled, and the data regarding viability, interaction of the separate Tfb3 domains with TFIIH modules, genome-wide localization of the TFIIH modules and of phosphorylated Ser5 CTDs, and of effects on NER, are convincing. The experiments are consistent with current models of TFIIH structure and function and support a model in which Tfb3 tethers the kinase module of TFIIH close to initiation sites to prevent its promiscuous action on elongating Pol II.

      We appreciate that the reviewer finds that our main conclusions are convincing.

      Weaknesses:

      (1) The work is limited in scope and does not provide any major insights into the mechanism of transcription. One indication of this limitation is that in the Discussion, published structural and functional results on transcription are used to support the interpretations of the results here more than current results inform previous models or findings.

      The story we present here is pretty simple, so in that sense we agree it is limited. However, we believe the findings do have mechanistic implications. That the Tfb3/Mat1 tether not only targets kinase activity to the 5’ end, but also somehow limits it from acting downstream seems significant. As for the Discussion, in our papers we always attempt to tie in our results and models with as much of the relevant published literature as possible. We believe this is more interesting, useful, and convincing than simply summarizing the Results section.

      (2) The first described experiment, which purports to show that three kinases cannot function in place of Kin28 when tethered (by fusion) to Tfb3, is missing the crucial control of showing that Kin28 can support viability in the same context. This result also does not connect with the rest of the manuscript.

      Our original motivation for the experiment in Figure 1 was to develop a system where we could plug different kinases into the CTD-proximal position. This didn’t work, so it is true that this negative result is somewhat unconnected to the rest of the paper. We choose to include it because it produced the unexpected observation that the Tfb3 C-term domain was not essential for viability, contradicting an earlier report. As for the suggested control of fusing Kin28, please see our reply to the editor’s comments below.

      (3) Finally, the authors present the interesting and reasonable speculation that the TFIIH complex and connecting Tfb3 found in mammals and yeast may have evolved from an earlier state in which the two TFIIH subdomains were present as unconnected, distinct enzymes. This idea is supported by a single example from the literature (T. brucei). A more thorough evolutionary analysis could have tested this idea more rigorously.

      Please see our full reply to Point 5 in the editor’s comments. In short, T. brucei was the only primitive eukaryote for which h we found an actual biochemical analysis of TFIIH. However, we now cite some papers reporting protein sequence comparisons for organisms not having a consensus CTD, which lend further support to our idea of fusion of a CDK to TFIIH co-evolved with the CTD during very early in eukaryotic evolution.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Suggestions for Improvement:

      (1) Analyze existing Pol II ChIP-seq data to determine whether reduced TSS-proximal vs. gene-body occupancy observed with the split Tfb3 alleles reflects initiation defects, and whether different gene classes (high vs. low expression, stress-induced genes) show differential effects of splitting Tfb3.

      Thanks for the suggestion. The new analysis is included as Supplemental Figure S6. Several factors indicate an initiation defect rather than an elongation defect (either elongation processivity or elongation rate). First, the shape of the RNApII occupancy trace is flat in all mutants, arguing against a processivity defect, which would have led to a downward slope due to RNApII progressively dropping off from the gene. Because this effect is best seen on long genes (more than 2kb), we generated metagene profiles on long, well-expressed genes only, which led to the same conclusion (see Sup Fig 6A). Second, the mutants lead to decreased RNApII occupancy, arguing against a strong decrease in elongation rate, which -if anything- would have led to an increase in RNApII during early transcription. While we cannot completely exclude the possibility of a mild decrease in elongation rate, such an effect doesn’t fit the patterns we observe. The overall decrease of RNApII occupancy is rather a strong indication of a decrease in early steps (PIC assembly or initiation).

      As requested, we looked at potential differences between gene classes two ways. First, we generated RNApII metagenes on RNApII occupancy quintiles (Q1-Q5). As shown in Sup Fig 6B, RNApII occupancy is similarly decreased in all mutants for all quintiles, demonstrating that the effect of Tfb3 splitting on transcription is not linked to expression level. Second, we generated RNApII occupancy metagenes for TFIID-regulated genes and coactivator redundant (CR) genes. This classification from the Hahn lab (doi:10.7554/eLife.50109) is very similar to the one developed by the Pugh lab (doi:10.1016/s1097-2765(04)00087-5). TFIID-regulated genes are enriched for housekeeping genes and are typically devoid of a TATA box, while the CR genes tend to be highly regulated and to contain a TATA box. As shown in Sup Fig 6C, the effect of the Tfb3 split mutants is similar on both gene classes.

      (2) Determine whether Kin28 abundance in whole cell extracts is reduced by splitting Tfb3, as a factor in reducing its occupancies at gene promoters.

      We actually did test for Kin28 and Ccl1 levels in the extracts when we did the IP experiment shown in Fig 3. We ran the extracts next to the precipitated factors. Unfortunately, as can be seen on the bottom blot, our antibodies were not strong enough to detect either Kin28 or Ccl1 in extracts, even with WT Tfb3. Although we don’t include this inconclusive result in the final paper, we show it in Author response image 1 (note that extracts are labeled as “IgG input”).

      Author response image 1.

      (3) Include the key positive control construct of replacing the C-term of Tfb3 with Kin28 in the experiments of Figure 1.

      We elected not to do this experiment for several reasons. As reviewer 3 points out, this kinase fusion experiment turned out to be somewhat disconnected from the rest of the paper. Even though it didn’t work, we included it in the paper because the results led us to the realization that the Tfb3 C-term was actually not fully essential for viability as reported, which in turn led us to the idea of splitting Tfb3. Structural studies (https://doi.org/10.1126/sciadv.abd4420, https://doi.org/10.1073/pnas.2009627117, https://doi.org/10.7554/eLife.44771) show that, in addition to providing linkage to the core module, the C-term of Tfb3 induces a conformation change in Kin28/Cdk7 necessary for full kinase activity (which is likely why the strains without C-term are just barely viable). If we were to pursue why the fusions didn’t work, we could tether Kin28 directly to the Tfb3 linker (and may try this in the future), but then would need to also express the C-term separately for its activating function. Even then, this would be an imperfect control for the fusion experiments in Figure 1. Because were trying to best mimic Kin28 being tethered via the accessory subunit Tfb3/Mat1, in the Figure 1 experiment we did not directly attach the kinases to Tfb3. For Ctk1/Cdk12, we fused the Tfb3 linker to the Ctk3 accessory subunit (analogous to Tfb3), and for Bur1/Cdk9, we fused to the cyclin subunit Bur2 (there is no known third subunit in this complex). The one exception was Mpk1, which has no partner subunits and is not a CDK. There are many reasons why this high-risk protein fusion experiment may not have worked, but we feel it’s not that useful to pursue it in this paper.

      (4) Provide direct evidence for the claimed dominant negative effect of the N-term-Linker construct by extending results in Figure 2C to compare growth of WT TFB3 cells expressing this construct vs. vector alone.

      We thank the reviewers for this suggestion. We tested this by transforming high copy plasmids expressing the different Tfb3 truncations into cells expressing the WT Tfb3. We did not see a clear dominant negative effect (some colonies were small, but many looked normal). Accordingly, in the absence of a reproducible effect, we removed this claim from the paper. In Fig 2C, the WT plasmid was transformed into cells already expressing the truncation on a high copy plasmid (the opposite order of our new experiment). It’s possible that phenotypes vary depending on which plasmid was there first (2 micron plasmids have variable copy number and can compete with each other for replication and passage during cell division). In any case, in the face of ambiguous results we no longer claim a dominant negative effect of the N-term-Linker protein. This was a minor side-point of the paper and does not affect any of our other conclusions.

      (5) Expand the evolutionary analysis to provide evidence beyond the case of T. brucei that the Tfb3-mediated connection between core and kinase modules is an evolutionary addition to the ancestral state.

      We note that the two papers we cited for the lack of a CAK module in T. brucei reached that conclusion based on purification of its TFIIH complex. We were unable to find similar biochemical studies in other primitive eukaryotes. Another way to expand the evolutionary comparison would be through sequence homology searches. We attempted to do this using various tools available at NCBI and EMBL. These show that Tfb3/Mat1 is found extensively throughout eukaryotes. Unfortunately, because the NTD of Tfb3 is a RING domain, homology searches in primitive eukaryotes yield a number of weak matches in the zinc binding motif, but no way of knowing if any of these are related to TFIIH. Similarly, searches with Cdk7/Kin28 or Cyclin H/Ccl1 pulls up all CDKs and cyclins, with roughly equal statistical similarity to the yeast kinase/cyclin. Someone with more experience with evolutionary analysis would likely have better luck, but our efforts were inconclusive. However, we did find two papers from Guo and Stiller (2004 and 2005) that analyzed genome sequences available at the time and reached the conclusion that both concensus CTD and the CAK module are absent in the evolutionary branch of primitive eukaryotes that contains T. brucei and Giardia lamblia. We also found papers identifying a putative Mat1/Tfb3 in Plasmodium falciparum, although this protein was not yet shown to be associated with TFIIH. We now cite these papers in the discussion of our evolutionary hypothesis.

      (6) Include Western blot analysis of the Tfb3 chimeras and truncations analyzed in Figures 1-2 to determine if poor expression contributes to any of the poor-growth phenotypes.

      The western blot of the Tfb3 fusions used in Figure 1 is shown in Sup Fig 1. The Tfb3 truncations are shown in the Input panel of Fig 3A (although some of these are TAP fusions, the growth phenotypes did not change with TAP-tagging). In general, all the fusions and truncations are detectable but possibly reduced relative to WT Tfb3. Note that the anti-Tfb3 antibody is a polyclonal made against recombinant Tfb3, and we don’t know that the reactive epitopes are distributed equally throughout the protein, so it’s difficult to be confident about relative quantitation with partial Tfb3 proteins.

      (7) Provide direct evidence that the N-terminal Tfb3 segment interacts exclusively with the core TFIIH module and not Kin28, analogous to the opposite results shown in Figure 3B and 4A-B for the C-terminal domain.

      This could be interesting, but we elected not do this experiment due to time and manpower limitations. Since the N-term is unambiguously essential for viability, we can assume it retains at least some interactions with core TFIIH (unless the N-term has some other essential function that hasn’t been discovered).

      (8) Confirm that the Ser5P phosphorylation levels given by the different Tfb3-TAP immune complexes are all much higher than the background level observed with control complexes prepared with extracts expressing WT, untagged Tfb3.

      We should have done this control in Sup Fig 2B, especially since we did pull down the beads from the untagged strain as shown in panel A. We haven’t seen appreciable kinase activity when we’ve done this control in the past, so we feel confident the signals seen are not background. Therefore, we elected not to repeat this experiment.

      (9) Conduct an in vitro reconstitution comparing the activity of free kinase module and intact TFIIH on elongating RNA polymerase II in directing promoter-localized vs. downstream Ser5P accumulation.

      This would be a nice experiment, but would require a substantial amount of work that is beyond our resources at the time.

      (10) Revise the text to better emphasize any novel mechanistic insights afforded by the work and address all other minor comments/criticisms.

      Done, as addressed in all the other comment replies.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors suggest that their results support model 3, in which intact TFIIH restrains kinase activity outside the PIC. Directly testing this model would be a significant addition and would strengthen the proposed mechanism. An in vitro reconstitution comparing the activity of the free kinase module and intact TFIIH on elongating RNA polymerase II (or, at a minimum, purified Pol II) would directly test the mechanism underlying downstream Ser5P accumulation.

      Sup Fig 2 addresses this point to some extent, since we the TAP pull-down of full-length Tfb3 precipitates at least some intact TFIIH, whereas the split C-term TAP constructs do not (as shown in Fig 4). However, this is not a very quantitative assay and we agree with the reviewer that a careful reconstitution, especially in the context of real transcription, would be far better. Unfortunately, this is currently beyond our capabilities. However, in the Discussion we do cite some published data arguing that association of the core TFIIH does have some inhibitory effect on the kinase module. First, in our 2002 MCB paper (Keogh et al., see Fig 7) using a GST-CTD kinase assay, we found that free kinase module (called TFIIK there) was strongly active even with a non-phosphorylatable mutation in the activating T-loop. In contrast, the same mutation inactivated CTD kinase activity in the intact TFIIH. Similarly, the Taatjes lab (Rimel et al., Genes Dev. 2020) found that free CAK was active on multiple substrates that were not phosphorylated by the full TFIIH complex.

      (2) Experiments from Carl Wu's laboratory (Nguyen et al., 2021) showed that there is a significant amount of apparently free Kin28 as well as free TFIIH in cells. Please reference and comment on this when discussing the model, suggesting that TFIIH is mostly sequestered at promoters.

      Good point. We added this to the discussion where we discuss the arguments against a sequestering model.

      (3) The existing ChIP-seq data could be analyzed more thoroughly to extract additional mechanistic insights. Specifically: (i) quantify TSS-proximal vs. gene body Pol II to determine if reduced occupancy reflects initiation defects (ii) analyze whether gene classes (high vs. low expression, stress-induced genes) show differential effects.

      Thanks for the suggestion. We did this and show the results as a new Supplemental Figure 6. No differences were found. Please see our response to the Editor’s comment #1 for a fuller description.

      (4) The complete loss of Kin28 ChIP signal in mutant strains (Figure 5B) could reflect kinase mislocalization or reduced protein abundance. Figure 3B examines TAP-purified material but does not address total cellular protein levels. Examining whole-cell extracts for Kin28 and Ccl1 in all strains would strengthen the interpretation of the ChIP results.

      As described in our response to Point 2 in the Editor’s comments section, we did do this control. Unfortunately, the Kin28 and Ccl1 antibodies were not strong enough to detect these proteins in extracts before precipitation.

      Reviewer #3 (Recommendations for the authors):

      (1) The experiment of Figure 1 should be repeated with a Tfb3-Kin28 positive control or dropped from the manuscript.

      This could be an interesting experiment, but please see our response to Editor comment #3 for why we decided to keep the figure as is.

      (2) Figure 2C legend doesn't mention linker C-term low copy construct.

      Thanks for catching that error. It is now fixed.

      (3) The claim that the N-term linker has a dominant negative effect (Figure 2C) requires direct comparison (growth on the same plate) of TFB3+ cells with and without expression of the N-term linker.

      As detailed in the response to the Editor’s comment #4, we did this test. The results did not support a dominant negative phenotype, so we removed this claim. Thanks for helping us avoid a mistake.

      (4) Page 7, "Supplementary Fig. S4A, B, promoters in green boxes" should read "Supplementary Fig. S5A, B, promoters in green boxes".

      Thanks for catching that error. It is now fixed.

      (5) Readers might be concerned that the ChIP-seq signal observed in Figure 5 and S5 could reflect an artifactual signal over highly transcribed regions. The different distributions of Rpb1, Ser5p, and Ser2p argue against this. This might be worth mentioning in the text.

      Thanks for raising this issue. “Hyper-ChIPpable” genes can be a problem in metagene analysis. We now include the analysis suggested by Reviewer 2 where we separately look at genes with different transcription frequencies. Seeing the same relative patterns regardless of expression level makes us confident that the results are not artifactual.

      (6) p. 12, "the Tfb3 the linker"; "In contrast, The N-term linker"; "suggest" should be "suggests"

      We appreciate the reviewer’s careful reading of the manuscript and have corrected these typos.

  2. May 2026
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The detailed genetic analysis of two cancer genes (BRCA1 and BRCA2) demonstrated new roles for these genes in causing the tumor microenvironment in lung cancer. Further experimental explorations of the immune-related changes may still be required. The solid findings of this study provide a foundation for further developing drugs targeting BRCA1/2 in lung cancer therapy.

      We would like to express our sincere gratitude for your thoughtful and constructive comments on our manuscript. We carefully considered each comment from these two reviewers and revised the manuscript accordingly. Below, we provided a point-by-point response to each comment.

      Reviewer #1 (Public review):

      Summary:

      Liao et al. performed a large-scale integrative analysis to explore the function of two cancer genes (BRCA1 and BRCA2) in lung cancer, which is one of the cancers with an extremely high mortality rate. The detailed genetic analysis demonstrated new roles of BRCA1/2 in causing the tumor microenvironment in lung cancer. In particular, the discovery of different mechanisms of BRCA1 and BRCA2 provides an essential foundation for developing drugs that target BRCA1 or BRCA2 in lung cancer therapy.

      Strengths:

      (1) This study leveraged large-scale genomic and transcriptomic datasets to investigate the prognostic implications of BRCA1/2 mutations in LUAD patients (~2,000 samples). The datasets range from genomics to single-cell RNA-seq to scTCR-seq.

      (2) In particular, the scTCR-seq offers a powerful approach for understanding T cell diversity, clonal expansion, and antigen-specific immune responses. Leveraging these data, this study found that BRCA1 mutations were associated with CD8+ Trm expansion, whereas BRCA2 mutations were linked to tumor CD4+ Trm expansion and peripheral T/NK cell cytotoxicity.

      (3) This study also performed a comprehensive analysis of genomic variation, gene expression, and clinical data from the TCGA program, which provides an independent validation of the findings from LUAD patients newly collected in this study.

      (4) This study provides an exemplary integration analysis using both computational biology and wet bench experiments. The experimental testing in the A549 cell line further supports the robustness of the computational analysis.

      (5) The findings of this study offer a comprehensive view of the molecular mechanisms underlying BRCA1 and BRCA2 mutations in LUAD. BRCA1 and BRCA2 are two well-known cancer-related genes in multiple cancers. However, their role in shaping the tumor microenvironment, particularly in lung cancer, is largely unknown.

      (6) By focusing on PD-L1-negative LUAD patients, this study demonstrated the molecular mechanisms underlying resistance to immune therapy. These new insights highlight new opportunities for personalized therapeutic strategies to BRCA-driven tumors. For example, they found histone deacetylase (HDAC) inhibitors consistently downregulated 4-R genes in A549 cells.

      (7) The deposition of raw single-cell sequencing (including scRNA-seq and scTCR-seq) data will provide an essential data resource for further discovery in this field.

      Weaknesses:

      (1) The finding of histone deacetylase (HDAC) inhibitors suggests the potential roles of epigenetic regulation in lung cancer. It would be interesting to explore epigenetic changes in LUAD patients in the future.

      Thank you for your insightful comment. We fully agree that the specific situation of epigenetic dysregulation in LUAD needs to be explored. We believe that future investigations utilizing clinical specimens and animal models to map histone acetylation patterns and DNA methylation profiles were crucial for identifying novel biomarkers and therapeutic targets unique to LUAD.

      (2) For some methods, more detailed information is needed.

      This is a valid point. We agree that additional details regarding are necessary for clarity and reproducibility. We have expanded these method details in the revised manuscript.

      (3) There are grammar issues in the text that need to be fixed.

      We apologize for our irregular use of grammar. In the revised manuscript, we carefully checked the grammar and make corrections.

      (4) Some text in the figures is not labeled well.

      We appreciate the reviewers' comments. We have added labels to the revised version of the figures.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The work highlights distinct roles of BRCA1 and BRCA2 mutations in shaping immune-related processes, and is logically structured with clearly presented analyses. However, the conclusions rely primarily on descriptive computational analyses and would benefit from additional immunological validation.

      Strengths:

      By integrating public datasets with in-house data, this study examines the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma from multiple perspectives using multi-omics approaches. The analyses are diverse in scope, with a clear overall logic and a well-organized structure.

      Weaknesses:

      The study is largely descriptive and would benefit from additional immunological experiments or validation using in vivo models. The fact that the BRCA1 and BRCA2 samples were each derived from a single patient also limits the robustness of the conclusions.

      Thank you for this excellent suggestion. In the revised manuscript, we supplemented the additional immunological experiments and validation based on pathological tissue sections of lung adenocarcinoma patients. In addition, we elaborated on the limitations of our study in the Discussion section and provided reasonable explanations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract includes a lot of abbreviations, which makes it difficult to follow. For example, "IFN" is not defined. And "HRR" is defined but used only once in the abstract. This issue also appears in other parts, such as "OAK" on page 5, line 114; "DFS" on page 15, line 398; and "DSBs" on page 20, line 558. Please try to avoid unnecessary abbreviations.

      Thank you for highlighting this. We have revised the manuscript to minimize the use of abbreviations. Specifically, we have now defined all necessary abbreviations upon first mention (including 'IFN') and have removed or spelled out those used infrequently to ensure the text flows more smoothly for the reader.

      (2) Page 5, line 129, what data type is used in this part analysis?

      We apologize for our negligence. The whole exome sequencing data used here has been added in the revised manuscript.

      Materials and methods, page 6, lines 131-132: “The raw reads (fastq) of whole exome sequencing were pre-processed and trimmed with fastp (Version: 0.23.4) based on default parameters.”

      (3) Page 6, line 138, Add citation for ANNOVAR.

      Thank you for your suggestion. We have added a citation for ANNOVAR in the revised manuscript.

      (4) Page 8, line 211, what cutoff is used to define the significant makers?

      Thank you for your insightful comment. We provided the cutoff used to define significant markers.

      Materials and methods, page 8, lines 213-215: “Differential expression genes for specific clusters were identified using the “FindMarkers” function, with a threshold of |avg_log2FC| ≥ 0.5 and adjusted P-value ≤ 0.01.”

      (5) Page 11, line 276, HEK293T is not a lung cancer cell line. It would be better to label the details of this cell line.

      Thank you for your correction. We have now clarified HEK293T in the text by stating: 'human embryonic kidney cell line HEK293T'.

      Materials and methods, page 11, lines 277-278: “The human lung cancer cell line A549 (#SCSP-503) and the human embryonic kidney cell line HEK293T (#SCSP-502) were purchased from the Type Culture Collection of the Chinese Academy of Sciences, China.”

      (6) Page 16, line 415, what samples and how many individuals were used for the exome sequencing?

      We agree that specifying the sample set is crucial. The exome sequencing was conducted on 2 individuals (four samples). The samples used were tumor tissues (2 samples) and matched blood (2 samples). This information has been clarified in the revised manuscript.

      Results section, page 16, lines 415-416: “Exome sequencing was performed on four samples from two individuals: two tumor tissues and two matched blood samples.”

      (7) Page 17, line 468, Replace "Differently" with "In contrast" (more appropriate for scientific writing).

      Thank you for pointing this out. We agree that "In contrast" is more appropriate for scientific writing. Accordingly, we have replaced "Differently" with "In contrast" in this sentence (Results section, page 18, line 483).

      (8) Page 18, line 489, what is HMG?

      Thank you for pointing this out. HMG stands for High Mobility Group. We have clarified this by writing out the full term upon first mention in the manuscript (Results section, page 19, line 503).

      (9) Page 19, line 527, check the grammar for this sentence.

      We appreciate your careful reading. We have carefully rephrased this sentence to ensure clarity and grammatical accuracy.

      Results section, page 20, line 540: “Based on pseudotime order, we divided trajectories into 10 bins and analyze the activity changes of related features.”

      (10) Page 20, line 541-546. It would be better to split this long sentence into smaller ones.

      Thank you for your insightful comment. We have revised the text, splitting the long sentence into smaller ones for better clarity.

      Results section, page 20, lines 554-559: “MHC class I and II molecules showed increased activity in late pseudotime in BRCA1- and BRCA2-mutant cells, respectively (Fig. 4G-I). This pattern was also reflected in the cell density analysis (Fig. 4J). Furthermore, CD8<sup>+</sup> Tcm and Th1 signatures exhibited higher activity in late pseudotime in BRCA1- and BRCA2-mutant cells, respectively (Fig. S5F-G). These findings suggest a differential association with CD8<sup>+</sup> versus CD4<sup>+</sup> T cell engagement.”

      (11) Page 20, line 550, remove "." after "of".

      Thank you for catching this. We have removed it (Results section, Page 21, line 563).

      (12) Page 22, line 592, what is "LME"?

      Thank you for pointing this out. "LME" was indeed redundant in the original manuscript, so we have removed it in the revised version (Results section, Page 22, lines 607-609).

      (13) Page 24, line 674, Replace "suggest" with "suggested"?

      We apologize for our negligence. In the revised manuscript, we have replaced "suggest" with "suggested" (Results section, Page 25, lines 691-693).

      (14) Page 35, Figure 1I, Use "B cells" instead of "B".

      Thank you for your detailed review. We have changed to the appropriate label in Figure 1I.

      (15) Page 36, Figure 2H, the statistics and p-value are needed to show.

      Thank you for your suggestion. We have added the statistical analysis for Figure 2H, and the p-values were indicated in the revised Figure.

      Special thanks to you for your kind comments.

      Reviewer #2 (Recommendations for the authors):

      Major:

      (1) Line 44. In the Introduction section, a brief description of the prevalence of HRD or BRCA1/2 mutations in lung cancer patients should be included to highlight the significance of the study.

      This is an excellent suggestion. We revised the Introduction section (page 3, lines 61-64) to include a brief overview of the prevalence of BRCA1/2 mutations specifically in lung cancer patients. We believe this addition will strengthen the background for readers.

      Introduction section, page 3, lines 61-64: “Among the key genetic mutations that drive LUAD, BRCA1 and BRCA2 mutations (with prevalence rates of approximately 4% and 5%, respectively) have been increasingly implicated in the pathogenesis and progression of lung cancer [9, 13].”

      (2) Line 302-355. There are relatively serious grammatical issues, and substantial revision of the text is recommended.

      We acknowledge the grammatical issues in the original text. We have now carefully revised the Materials and methods section of the manuscript (pages 11-14, lines 277-358) to correct these issues and improve the overall readability. We believe the revised version is significantly improved.

      (3) Line 375. The Results section lacks detailed information on the specific BRCA1/BRCA2 mutations and data explaining how these mutations lead to functional alterations of BRCA1/2.

      Thank you for your insightful comment. In the revised manuscript, we added the amino acid changes caused by the specific BRCA1/BRCA2 mutation sites and expand the text to discuss the predicted and known pathogenic mechanisms of these variants (Results section, page 16, lines 420-433).

      Results section, page 16, lines 420-433: “Exome sequencing data show that these two types of tumor tissues harbor somatic nonsynonymous single nucleotide variants (SNV) in BRCA2 (p.N372H) and BRCA1 (p.E991G, p.S1566G, p.K1136R, p.P824L, and p.Y809H), respectively (Table S1). The BRCA2 p.N372H variant lies within the BRC3 or BRC4 motifs critical for RAD51 binding. It may alter binding affinity, impair high-fidelity homologous recombination repair, and promote genomic instability [39-41]. In BRCA1, mutations are distributed across two key functional domains: the Coiled-Coil domain (e.g., p.E991G, p.Y809H, p.P824L) and the BRCT domain (e.g., p.K1136R, p.S1566G). Coiled-Coil mutations disrupt BRCA1-PALB2-BRCA2 complex assembly, impairing localization to DNA damage sites and subsequent RAD51 recruitment; BRCT domain mutations compromise phospho-protein recognition and G2/M checkpoint control, leading to defective DNA damage response and unchecked proliferation of damaged cells [42-44]. Together, these defects promote the accumulation of genomic scars and chromosomal instability.”

      (4) Line 492-498. Changes in genes associated with BRCA1 and BRCA2 mutations should be validated by immunofluorescence.

      Thank you for your insightful comment. Immunofluorescence would provide valuable orthogonal validation of the protein-level consequences of these mutations. To address this, we obtained pathological tissue sections from patients carrying BRCA1/2 mutations and performed immunofluorescence staining for S100A10, a risk gene associated with BRCA1 mutations. We found that S100A10 was upregulated in BRCA1-mutated tumor tissue compared to adjacent non-cancerous tissue.

      Results section, page 24, lines 673-675: “Immunofluorescence experiments on patient tissue sections revealed that S100A10 was upregulated in BRCA1-mutated tumor tissue relative to adjacent non-cancerous tissue (Fig. S11D-E).”

      (5) Line 538. Although both BRCA1 and BRCA2 deficiencies impair DNA damage repair, BRCA1, but not BRCA2, activates the cGAS-STING pathway. This is a particularly interesting observation and should be validated by immunofluorescence experiments.

      Thank you for highlighting this observation. To address this, we conducted immunofluorescence experiments to quantify STING, the key protein of cGAS-STING pathway, in BRCA1- and BRCA2-deficient tissues to confirm this phenotype. We have included these results in the revised manuscript.

      Results section, page 21, lines 578-584: “Furthermore, our results revealed that BRCA1-mutant tumors showed higher activity of cGAS-STING signaling and STING mediated induction of host immune responses compared to BRCA2-mutant tumors (Fig. 5G and Fig. S6F). Also, cGAS-STING signaling gens, including cGAS, STING1, and downstream factors STAT1 and CCL5, were upregulated in BRCA1-mutant tumor cells (Fig. 5H). This observation was validated through immunofluorescence staining experiments on patient tumor tissue sections (Fig. 5I-J).”

      (6) Line 599. "CD8+ Trm cells were more abundant in BRCA1-mutant sample, whereas CD4+ Trm cells were higher in BRCA2-mutant sample". This part is also recommended to be validated using immunofluorescence or more rigorous flow cytometry analyses.

      We sincerely appreciate this insightful suggestion. To address this, we performed immunofluorescence staining to quantify the abundance of CD8<sup>+</sup> and CD4<sup>+</sup> Trm cells in BRCA1- and BRCA2-mutant tissues. We have included these results in the revised manuscript.

      Results section, page 22, lines 614-617: We identified two tissue-resident memory T cell (Trm) subsets, CD8<sup>+</sup> Trm and CD4<sup>+</sup> Trm, both predominantly derived from tumor tissues (Fig. 6B). “Interestingly, our analysis revealed that CD8<sup>+</sup> Trm cells were more abundant in BRCA1-mutant tumor, whereas CD4<sup>+</sup> Trm cells were more abundant in BRCA2-mutant tumor (Fig. 6B-D, Fig. S7D, and Fig. S8A-B).”

      (7) Line 643-676. The authors identified four risk genes associated with BRCA1 mutations-S100A10, LDHA, MYL12A, and GAPDH; however, MYL12A was not validated in the subsequent in vitro experiments. The authors state that "S100A10 can promote cancer metastasis by recruiting MDSC cells, and increased LDHA activity contributes to tumor immune escape." However, because immune cells were not included in the in vitro assays, these results instead suggest that these genes may directly suppress tumor cell proliferation.

      We thank the reviewer for this insightful observation. Our intention was not to suggest that the reduction in proliferation observed in our in vitro assays was caused by the disruption of immune cell recruitment or immune escape pathways. As the reviewer correctly points out, those mechanisms are irrelevant in a system lacking immune cells. Our results showing that "Knockdown of S100A10, LDHA, and GAPDH reduced LUAD cell proliferation in vitro (Fig. 7D-E)" strongly suggest a direct, cell-autonomous role for these genes in regulating LUAD cell growth. For the MYL12A gene, the existing study have shown that BRCA1 transcriptionally regulates this gene involved in breast tumorigenesis (PMID: 12032322). In view of the characteristics of MYL12A in lung cancer, we will conduct in-depth in vitro and in vivo validation experiments in future studies.

      (8) Line 677. The authors should emphasize the limitations arising from the small sample size and the lack of in vivo validation models in the Discussion section.

      Thank you for highlighting these important limitations. We agree that the small sample size and the lack of in vivo validation are significant limitations of the current study. We have explicitly addressed these points in the Discussion section (page 27, lines 740-750) to ensure the interpretation of our data is appropriately qualified and to provide transparency regarding the scope of our conclusions.

      Discussion section, page 27, lines 740-750: “Although we included both tumor tissues and matched paracancerous and blood samples, the sample size remains modest, which may limit the statistical power and generalizability of our findings. Therefore, our results should be interpreted as preliminary, and further studies with larger, independent cohorts are required to validate these observations. Single-cell RNA-seq and TCR-seq analyses in this study provide high-resolution insights into the cellular and clonal dynamics of the TME, the functional validation of key mechanisms remains largely correlative. While our in vitro experiments provide valuable mechanistic insight, the lack of in vivo validation, which cannot fully recapitulate the complex TME. Future studies utilizing murine models or patient-derived organoids are essential to establish causal relationships and elucidate the underlying molecular pathways.”

      Minor:

      (1) Line 163: cell/μl should be corrected to cells/μL.

      Thank you for catching this. We have corrected it in the revised manuscript (Methods section, page 7, line 165).

      (2) Line 388: Please clarify how the HRD score, tumor mutation burden, and neoantigen load were calculated.

      We thank the reviewer for this request for clarification. In the revised manuscript, we have expanded the Methods section (page 5, lines 117-121) to provide a detailed description of how these metrics were calculated. HRD score was calculated as the unweighted sum of loss of heterozygosity (LOH), telomeric allelic imbalance (TAI), and large-scale state transitions (LST). Tumor mutation burden (TMB) was defined as the total number of somatic nonsynonymous mutations per megabase of the exome captured by the sequencing panel. Neoantigen load was predicted by NetMHCpan using the patient's HLA typing and the identified somatic mutations. The data for these three indicators all obtained from a previous study (PMID: 29628290). We believe these additions provide the necessary transparency and reproducibility for our study.

      Methods section, page 5, lines 117-121: The HRD score was determined by summing specific genomic alterations, including loss of heterozygosity (LOH), large-scale state transitions (LST), and telomeric allelic imbalances (TAI). “Tumor mutation burden (TMB) was defined as the total number of somatic nonsynonymous mutations per megabase of the exome captured by the sequencing panel. Neoantigen load was predicted by NetMHCpan using the patient's HLA typing and the identified somatic mutations.”

      (3) Line 421: BRCA12 should be corrected to BRCA2.

      Thank you for your detailed review. We have revised it.

      (4) The order of Figures 7D and 7E should be reversed.

      Thank you for your insightful comment. According to your suggestion, we reversed the order of Figures 7D and 7E in the revised manuscript.

      Special thanks to you for your kind comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the emerging role of fungal pathogens in colorectal cancer and provides mechanistic insights into how Candida albicans may influence tumor-promoting pathways. While the work is potentially impactful and the experiments are carefully executed, the strength of evidence is limited by reliance on in vitro models, small patient sample size, and the absence of in vivo validation, which reduces the translational significance of the findings.

      Strengths:

      (1) Comprehensive mechanistic dissection of intracellular signaling pathways.

      (2) Broad use of pharmacological inhibitors and cell line models.

      (3) Inclusion of patient-derived organoids, which increases relevance to human disease.

      (4) Focus on an emerging and underexplored aspect of the tumor microenvironment, namely fungal pathogens.

      Weaknesses:

      (1) Clinical association data are inconsistent and based on very small sample numbers.

      We thank the reviewer for this important comment. We have investigated 4 colorectal tumors (2 in early stage and 2 in late stage), and we observed Candida albicans in the 2 late-stage samples while not in the early-stage ones. This result is consistent with TCGA data (which is large-scale) that Candida albicans mainly detectable in the late-stage colorectal tumors (Fig. 1c) and suggests that Candida albicans contributed to colorectal cancer progression, which is the main research direction of this study.

      (2) No in vivo validation, which limits the translational significance.

      We appreciate the reviewer’s concern regarding the lack of in vivo validation. While we recognize the value of in vivo models, our current institutional biosafety protocols and animal facility designations do not support the handling of pathogenic microorganisms like Candida albicans in live infection models. Consequently, these experiments were beyond the immediate technical scope of this study. To validate the findings using cell lines, we have performed Candida albicans infection experiments using organoids collected from colorectal cancer patients instead (Fig. 7). We have revised the Discussion section to acknowledge this limitation and clarify that the current work serves as a mechanistic study based on in vitro and ex vivo systems. We have also incorporated references to recent studies demonstrating the in vivo effects of C. albicans in tumor models, which support the biological relevance of our findings.

      (3) Species- and cell type-specificity claims are not well supported by the presented controls.

      We thank the reviewer for this insightful comment. We agree that our current dataset does not warrant definitive conclusions regarding species- or cell type-specificity. Accordingly, we have tempered our claims throughout the manuscript, describing the observed effects as context-dependent across different epithelial models. Specifically, we observed differential responses among the cell lines and epithelial systems evaluated, suggesting variability rather than strict specificity. Furthermore, the Discussion has been expanded to address potential underlying factors, such as variations in EGFR expression levels and other signaling determinants. We have also added a dedicated section to acknowledge this limitation and emphasize the need for future systematic investigations using a more diverse array of fungal species and cell models.

      (4) Reliance on colorectal cancer cell lines alone makes it difficult to judge whether findings are specific or general epithelial responses.

      We appreciate the reviewer’s thoughtful concern. Although most of our mechanistic experiments were performed in colorectal cancer cell lines, we also evaluated our finding across a broader range of epithelial models, including normal human colon-derived organoids and the breast epithelial cancer line MCF7 (Fig. 8). Neither model exhibited HIF-1α activation upon C. albicans exposure, supporting that the hypoxia response we observed might not be universal. Interestingly, the observed response in non-colorectal epithelial cancer lines (e.g., HCC1937, NUGC-3) suggests that this mechanism is not strictly confined to CRC. Based on these observations, we propose that the specificity is likely related to EGFR levels but may involve additional epithelial determinants, which we aim to investigate in future work.

      Reviewer #2 (Public review):

      The authors in this manuscript studied the role of Candida albicans in Colorectal cancer progression. The authors have undertaken a thorough investigation and used several methods to investigate the role of Candida albicans in Colorectal cancer progression. The topic is highly relevant, given the increasing burden of colon cancer globally and the urgent need for innovative treatment options. However, there are some inconsistencies in the figures and some missing details in the figures, including:

      (1) The authors should clearly explain in the results section which patient samples are shown in Figure 1B.

      We thank the reviewer for pointing out this omission. We apologize for the lack of clarity in the initial submission. The patient samples shown in Figure 1B are from the CRC patients with Stage III. We have revised the manuscript to explicitly state this information in the legend for Figure 1B to ensure better clarity for the reader.

      (2) What do a, ab, b, b written above the bars in Figure 1F represent? Maybe authors should consider removing them, because they create confusion. Also, there is no explanation for those letters in the figure legend.

      We thank the reviewer for this helpful comment. The letters above the bars represent statistical groupings from post-hoc multiple-comparison tests (a standard convention used after ANOVA or similar analyses): bars sharing the same letter are not significantly different, whereas different letters indicate statistically distinct groups. We chose this letter-based system over asterisks to avoid the visual clutter and potential confusion that often arise from numerous pairwise comparisons; therefore, we will retain the letter-based grouping. In the revised manuscript, we have explicitly defined this notation in the figure legend to be ease of interpretation for the reader.

      (3) The authors should submit all the raw images of Western blot with appropriate labels to indicate the bands of protein of interest along with molecular weight markers.

      We appreciate the reviewer’s request for raw data. We have now included the raw images of the Western blots in the supplementary materials, with clear annotations of the bands corresponding to the proteins of interest as well as molecular weight markers.

      (4) The authors should do the quantification of data in Figure 2d and include it in the figure.

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have quantified the subcellular localization of HIF-1α in PBS-treated versus C. albicans–infected cells shown in Figure 2d. The quantification results are shown in the following figure and provided in Supplementary Figure 3c.

      (5) In Figure 2h, the authors should indicate if the quantification represents VEGF expression after 6h or 12h of C. albicans co-culture with cells.

      We thank the reviewer for pointing this out. We have updated Figure 2h to specify that the quantification represents VEGF expression after 12 hours of co-culture with Candida albicans.

      (6) In Figure 2i, quantification of VEGF should be done and data from three independent experiments should be submitted. The authors should also mention the time point.

      We thank the reviewer for this helpful comment. In the revised manuscript, we have quantified VEGFA fluorescence intensity based on three independent experiments (the other 2 replicates were shown in Author response image 1). The corresponding time point (12 hours of co-culture) has been clearly indicated in the figure legend.

      Author response image 1.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Some of the statements regarding Candida albicans and CRC progression in Figure 1 may be overstated (since the association with stage/survival may be cross-confounded). That is, analyses of survival ought to be stage-adjusted.

      We thank the editor for this important comment. We agree that the association between C. albicans and patient survival may be influenced by tumor stage as a confounding factor. In the revised manuscript, we have moderated our statements regarding the clinical associations and clarified the limitations of these analyses, now presenting these findings as correlative observations rather than causal relationships. We have also noted in the Discussion that stage-adjusted analyses would be required to more rigorously assess the independent contribution of C. albicans to patient outcomes.

      (2) Fan et al. (citation 26) is incorrectly referenced. The paper states that Bacteroides fragilis does not affect Candida albicans colonization. Instead, Bacteroides thetaiotamicron was shown to reduce C. albicans colonization, but B. fragilis was used in the current study as a control.

      We thank the editor for pointing out this error, and we have corrected the citation accordingly. As noted, the referenced study indicates that Bacteroides thetaiotaomicron, rather than Bacteroides fragilis, reduces C. albicans colonization. We selected B. fragilis as a control in this study because it is a prevalent gut commensal and has been previously implicated in colorectal cancer progression. Although prior reports suggest that B. fragilis does not significantly affect C. albicans growth, we observed that co-culture with B. fragilis led to a noticeable inhibition of C. albicans growth under our experimental conditions. This discrepancy may reflect differences in experimental settings. We believe these findings provide additional context for the complex interactions between gut microbiota and fungal pathogens.

      (3) The link between hypoxia signaling is interesting, but for the most part, these experiments were largely done in normoxic conditions, while the colon is generally hypoxic. So I would have encouraged the authors to consider testing the effect of C. albicans presence/absence under low-oxygen conditions, which may be more physiologically relevant.

      We thank the editor for this insightful suggestion. We fully agree that evaluating the effects of C. albicans under hypoxic or anaerobic conditions would be highly relevant to the physiological tumor microenvironment. Although we have attempted to assess the impact of C. albicans on cell migration under hypoxic conditions, we observed that tumor cells exhibited markedly accelerated migration and proliferation, resulting in near-complete wound closure within 24 hours in control groups. This limited our ability to reliably detect differences between conditions using standard migration assays. We agree that in vivo models may provide a more physiologically relevant context to address this question, and we will pursue this direction in future studies when appropriate experimental conditions become available.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1 inconsistencies: In Figure 1C, there is no significant difference in C. albicans detection between stage II and stage III CRC patients. In fact, more patients in stage II appear positive, which is inconsistent with Figures 1A and 1B. For Figures 1A and 1B, the sample size (n=2) is too low to support meaningful conclusions. Please also clarify which stage is represented in Figure 1B.

      We thank the reviewer for this important comment. In the revised manuscript, we have clarified the sample information and explicitly stated that the samples shown in Figure 1b are derived from stage III CRC patients. We have also moderated our conclusions and described these findings as exploratory observations. Regarding the apparent inconsistency between Figure 1C and Figures 1a-b, we consider that this discrepancy may be partly due to the small number of clinical samples analyzed in our study. In addition, the TCGA-based analysis relies on transcriptomic data, whereas our analysis is based on immunohistochemistry (IHC). These methodological differences may also contribute to the observed variation.

      (2) Weak link between clinical and in vitro data: The transition from clinical samples to CRC cell line models feels tenuous. While C. albicans may induce hypoxia signaling, it is unclear whether this is specific to CRC cells or could occur in other epithelial cell types. Some broader testing would help strengthen this link.

      We thank the reviewer for this insightful comment. We agree that reinforcing the bridge between clinical observations and in vitro mechanistic findings, as well as clarifying cell type specificity, is important for a comprehensive study. In the revised manuscript, we have clarified that the clinical data provide correlative evidence, while the mechanistic insights are derived from controlled in vitro systems. To address the issue of cell type specificity, we have included additional analyses across multiple epithelial cell models (Figure 8). These results indicate that the response to C. albicans is not restricted to colorectal cancer cells but varies across different epithelial contexts.

      (3) Lack of in vivo validation: The mechanistic findings would be substantially strengthened by in vivo data, e.g., murine CRC models. Without this, the translational impact is limited.

      We appreciate the reviewer’s concern regarding the lack of in vivo validation. While we recognize the value of animal models, our current institutional biosafety protocols and facility designations do not support the handling of pathogenic microorganisms like Candida albicans in live infection models. Consequently, these experiments were beyond the immediate technical scope of this study, and better be performed in future studies to validate the mechanisms.

      (4) Figure 8B interpretation: The authors conclude that C. albicans shows the strongest effect on c-Myc and c-Jun activation. However, from the presented blots, the differences compared to other fungi are not obvious. The claim should be toned down or quantified more rigorously.

      We thank the reviewer for this important comment. We agree that the differences in c-Myc and c-Jun activation among fungal species are not sufficiently pronounced to support a strong comparative claim. In the revised manuscript, we have moderated the corresponding statements to avoid overinterpretation.

      (5) Cell type specificity: Since the title emphasizes CRC specificity, the cell line comparison shown in Figure 8 should be moved earlier in the results. This would clarify from the start whether the described mechanisms are CRC-specific or more generalizable.

      We thank the reviewer for this insightful suggestion. We agree that earlier presentation of cell type comparisons would help clarify the scope of the observed effects. We have revised the Results section accordingly: “To evaluate the cell type specificity of this response, we further analyzed additional epithelial cell models, as shown in Figure 8”.

      In this study, we initially identified the effects of C. albicans in colorectal cancer (CRC) cells and therefore focused on establishing the underlying mechanisms in this context. Subsequently, we extended our analysis to additional epithelial cell types to evaluate whether these effects are shared or context-dependent. We believe that this stepwise organization, from detailed mechanistic investigation in CRC cells to broader comparison across cell types, provides a logical and coherent flow for the reader. In the revised manuscript, we have further clarified this rationale in the text to improve readability and interpretation.

      (6) It would be good to use a negative fungi control instead of a PBS control for most of the experiment.

      We thank the reviewer for this valuable suggestion. We agree that a negative fungal control would further strengthen the conclusions. Unfortunately, we were unable to incorporate additional controls during the revision, while we believe that our comparative analysis across multiple fungal species (Figure 8) partially addresses this issue by demonstrating differential signaling responses. Future studies will incorporate appropriate negative fungal controls to further validate the specificity of these effects.

      (7) It is surprising that the Dectin-1 inhibitor shows a smaller effect compared with the TLR2 inhibitor. This result warrants further explanation, as Dectin-1 is a well-known receptor for C. albicans β-glucans. The authors should clarify whether this difference reflects cell type-specific expression (e.g., low Dectin-1 in CRC cells), ligand accessibility, or pathway redundancy, and discuss how it aligns with existing literature.

      We thank the reviewer for this insightful comment. We agree that the relatively modest effect of Dectin-1 inhibition compared to TLR2 inhibition warrants further consideration. In the revised manuscript, we have expanded the Discussion to address this observation. We propose several possible explanations: Firstly, the expression level of Dectin-1 is relatively low in these epithelial cancer cells, thereby limiting its functional contribution. Secondly, differences in ligand accessibility, particularly in the context of fungal cell wall architecture, may influence receptor engagement. Finally, redundancy and cross-talk among pattern recognition receptor pathways compensate for Dectin-1 inhibition. These observations highlight the context-dependent nature of host–fungal interactions.

      Reviewer #2 (Recommendations for the authors):

      All my comments that need to be addressed are given above and below:

      (1) What do a and b represent in Figure 2f? They should be removed or clearly explained in the figure legend, as they are creating confusion for the audience.

      We thank the reviewer for this comment. The letters indicate statistical groupings from post hoc multiple comparison tests. In the revised manuscript, we have added a clear explanation of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (2) In the figure legend of S3a, the authors mentioned only the Caco2 cell line, whereas in the figure, there are two more cell lines, HCT116 and SW48. The authors should revise the figure legend.

      We thank the reviewer for this comment. We have addressed this point and made the necessary corrections in the revised manuscript.

      (3) The scale bar information is missing for Figure S3b. It should be included.

      We thank the reviewer for this comment. The same scale bar was applied across all images in this panel. We have clarified this in the figure legend.

      (4) In Figure 2e, the HIF-1α level in the Caco2 cells at 24 hr time point is a lot higher compared to the level at the 12-hour time point after C. albicans infection. But in the WB quantification in Figure 2f, the level of HIF-1α is not higher when compared to 12hr. Although it is relative data based on control, authors should check this calculation again for any errors.

      We thank the reviewer for carefully examining the data. We have re-verified the quantification and confirmed that the values represent relative protein levels normalized to the corresponding controls at each time point.

      Because samples from different time points were processed and analyzed separately, direct comparison of absolute protein levels across time points is not appropriate. Therefore, relative quantification within each time point provides a more accurate and representative assessment of HIF-1α changes.

      (5) Line 125-127: This sentence should be rephrased.

      We thank the reviewer for this comment. We have revised the corresponding section to improve clarity.

      (6) PHD-mediated ubiquitination is the primary mechanism regulating HIF-1α protein stabilization. The authors should add an appropriate reference here.

      We thank the reviewer for this suggestion. An appropriate reference has been added in the revised manuscript to support this statement.

      (7) The authors claim that they observed that although the total level of HIF-1α increased, the ratio of its ubiquitinated form to total HIF-1α decreased. The authors should clearly indicate in the figure which protein band from the WB image was used for quantification from Figure S3c, which resulted in the graph presented in Figure S3d.

      We thank the reviewer for this suggestion. We have revised the figure legend to improve clarity.

      (8) In Figure 3a, there are some faint grey color lines. These graphs should be reformatted.

      We thank the reviewer for this comment. We did not observe obvious faint grey lines in the original figure; however, these artifacts may have arisen during image conversion or file transfer. To ensure optimal image quality, we have provided high-resolution vector files to improve clarity.

      (9) What do a and b in the bar graphs shown in Figure 3d,e; S4d,e,f represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends.

      (10) What do a,b,c in the bar graphs shown in Figure 4c,d,h represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends.

      (11) There are some faint grey lines in the bar graphs shown in Figure 4g. These lines should be removed.

      We thank the reviewer for this comment. We did not observe obvious faint grey lines in the original figure; however, these artifacts may have arisen during image conversion or file transfer. To ensure optimal image quality, we have provided high-resolution vector files to improve clarity.

      (12) Grey line below HIF-1α in the graph shown in Figure h should be removed.

      We thank the reviewer for this comment. We did not observe obvious faint grey lines in the original figure; however, these artifacts may have arisen during image conversion or file transfer. To ensure optimal image quality, we have provided high-resolution vector files to improve clarity.

      (13) The authors wrote - notably, despite treatment with AG1478, the levels of HIF-1α and c-MYC in C.albicans-infected cells remained significantly elevated compared to the uninfected control group (Figure 4b). There is no quantification for c-MYC. Statistics for HIF-1α quantification are missing. These should be added.

      We thank the reviewer for this comment. We have quantified HIF-1α levels, and the results are presented in Figure 4d, including statistical analysis.

      (14) There is no data for knockdown of MYD88, Dectin-1, and SYK as mentioned in the text lines 222-224. The authors should explain this discrepancy.

      We thank the reviewer for this important comment. MYD88, Dectin-1, and SYK are expressed at relatively low levels in HCT116 cells, and our preliminary qPCR analyses indicated that it would be technically challenging to achieve reliable and quantifiable knockdown of these targets. Nevertheless, previous studies have reported that Dectin-1 can be present on the surface of epithelial cells, suggesting that it may still contribute to fungal recognition even at low expression levels. Therefore, given the technical constraints of gene knockdown in this specific context, we reasoned that pharmacological inhibition would provide a more robust approach to suppress this pathway.

      (15) In line 227 in the results section it should be Figure S5c-e instead of Figure S5b-e. Figure S5b results do not match the results that are being explained here.

      We thank the reviewer for this comment. We have corrected the typos in the revised manuscript.

      (16) What do a,b,c in the bar graphs shown in Figure 5 a,b,i represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends.

      (17) Was the experiment in Figure 5e done in triplicate? If not, it should be done in triplicate and quantified. The scale bar information is missing for IF images shown in Figure 5e. It should be added.

      We thank the reviewer for this comment. The experiments were independently repeated for three times, and the quantification shown in Figure 5g represents the combined results from these biological replicates. The same scale bar was applied across all images in this panel. We have clarified this in the figure legend.

      (18) Lines 273-274 in the results section: Als3 and Hwp1 are known to be involved in the adhesion of C. albicans to epithelial cells, while Ece1 encodes the virulence factor candidalysin. References should be added.

      We thank the reviewer for this suggestion. We have added a reference in the revised manuscript to support this statement.

      (19) What do a and b in the bar graphs shown in Figures 6 f,h,r represent? Since these letters are confusing and are present in several figures, they should be either deleted or clearly explained in the figure legends or text.

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (20) What do a,b, and c in the bar graphs shown in Figure S8 b represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend to of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (21) Scale bar should be added in Figure S9.

      We thank the reviewer for these helpful comments. We have addressed this point and made the necessary corrections in the revised manuscript.

      (22) What do a and b, in the bar graphs shown in Figure S11 represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from post hoc multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (23) Were the organoids used in this paper characterized? If yes, how? Also, it should be mentioned in the appropriate section in the manuscript.

      The organoids are not characterized; they are cultured using patients’ samples according to our previous protocols (He et al. Cell Stem Cell 2022).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors analyze connectome data from Drosophila and compare the physical wiring with functional connectivity estimated from calcium imaging data. They quantify structure-function relationships as a correlation of the two connectivity modalities. They report correlations roughly comparable to what has been described in the literature on sc/fc relationships in mammalian connectome data at the meso-scale. They then repeat their analysis, focusing on segregated versus unsegregated synapses. They derive separate connectomes using one or the other class of synapse. They show differential contributions to the sc/fc relationships by segregated versus unsegregated synapses.

      Strengths:

      There is nice synthesis of multimodal imaging data (Ca and EM data from flies and meso-scale data from human and marmoset).

      Thank you very much for your comments.

      Weaknesses:

      (1) The paper is written in an unusual way. The introduction intermingles results with background, making it hard to figure out what precisely is being tested.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) There are also major methodological gaps. Though the mammalian connectomes are used as a point of reference, no descriptions of their origins or processing are included.

      The reanalysis of marmoset data is presented in Ext. Data Figure. However, as pointed out by other reviewers, the data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      (3) A major weakness stems from the actual calculation of the sc/fc correlation. In general, SC is sparse. In the case of the EM connectomes, it is *exceptionally* sparse (most neural elements are not connected to one another). The authors calculated sc/fc coupling by correlating the off-diagonal elements of sc (the logarithm of its edge weights) and fc matrices with one another. The logarithmic transformation yields a value of infinity for all zero entries. The authors simply impute these elements with 0. This makes no sense and, depending on whether these zero elements are distributed systematically versus uniformly random, could either inflate or deflate the sc/fc correlations. Care must be taken here.

      Thank you for pointing this out. As you mentioned, the SC matrix becomes increasingly sparse as the number of ROIs increases (Ext. Data Fig.2-2b). In contrast, the FC matrix may contain values even when there are no direct connections between ROIs (indirect connections). We conducted an investigation into this issue. To deal with this issue, Honey et al. (2009) [6] resampled the elements of the SC matrix in rank order using a Gaussian distribution and calculated the FC-SC correlation between this resampled SC and FC.

      Ext. Data Fig.2-2a shows a comparison between resampled SC (Honey et al.’s method) and log-scaled SC (our method). Up to 200 ROIs, the proportion of SC matrix elements that are zero is less than 10% (Ext. Data Fig.2-2b), and there is little zero replacement of logarithmic elements. In this situation, replacing with Gaussian arithmetic tends to increase the correlation coefficient (Ext. Data Fig.2-2a). On the other hand, with 10,000 ROIs, where sparsity is extremely high, the proportion of SC matrix elements that are zero exceeds 70%. In this situation, 70-80% of the zeros are randomly assigned from the smaller end of the Gaussian distribution, which causes a lowering of the correlation coefficient (Ext. Data Fig.2-2a, c, d). For these reasons, we believe that log-scaled SC has less bias than resampling with a Gaussian distribution, and conclude that using log-scaled SC as is in this paper is reasonable. Log-scaled SC has also been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. To show that we have considered this issue, Ext. Data Fig.2-2 has been added to the manuscript.

      (4) Further, in constructing the segregated versus unsegregated connectomes, they use absolute thresholds for collecting synapses. It is unclear, however, whether similar numbers of synapses were included in both matrices. If the number is different, that might explain the differential relationship with fc; one matrix has more non-zero entries (and as noted earlier, those zero entries are problematic).

      Author response image 1.

      a, Sparsity rate histogram of SC matrix with cPPSSI (0-0.1) and subsampled null SC matrices corresponding Fig.4e. Red line indicates sparsity rate of SC matrix with cPPSSI (0-0.1). b, Sparsity rate histogram of SC matrix with cPPSSI (0.9-1) and subsampled null SC matrices corresponding Fig.4f. c, Sparsity rate histogram of SC matrix with reciprocal synapse (≤2𝜇𝑚) and subsampled null SC matrices corresponding Fig.4i.

      Thank you for pointing this out. The number of synaptic connections in the SC matrix shows a large difference between those extracted from cPPSSI (0-0.1) and cPPSSI (0.9-1) (Fig. 4e, f). However, when null SC matrices (99) were generated for each and compared with the cPPSSI-extracted matrices, the FC-SC correlation was significantly higher or lower. At this point, since the sparsity rates of the null SC matrices differed a lot from that of the SC matrices extracted by cPPSSI, we regenerated the null SC matrices in Fig. 4e and 4i. As shown in Author response image 1, we ensured that the extracted SCs (red lines) fit within the null-generated matrices. This figure was added to Ext. Data Fig.4-5, and the main text was also revised. The sparsity rates are 0.52 for cPPSSI (0-0.1) and 0.123 for cPPSSI (0.9-1). Since both cases involve comparisons with null SC matrices that have closely similar sparsity rates, we believe comparison using log-scaled SC is appropriate.

      (5) There was also considerable text (in the results) describing the processing of the Ca data. In this section, the authors frequently refer to some pipelines as "better" or "worse" (more or less effective). But it is not clear what measures they adopted to assess the effectiveness of a pipeline.

      Detailed registration flow of Ca data is described in “Preprocessing of D. melanogaster calcium imaging data” in Materials and Methods section (Ext. Data Fig. 1-1a). Then, optimal nuisance factor removal methods and smoothing size were investigated. We used both correlation analysis (FC-SC correlation) and ROC curve analysis (FC-SC detection). Since signals are assumed to be transmitted between regions based on SC, when SC is treated as the ground truth, we considered a pipeline with a FC-SC higher similarity and higher detection to be better. We updated the Results section to include this point.

      Reviewer #2 (Public review):

      Summary:

      Okuno et al. investigate the structure-function relationship in the fruit fly Drosophila melanogaster. To do so, they combine published data from two recent synapse-level connectomes ("hemibrain" and "FlyWire") with a dataset comprising functional whole-brain calcium imaging and behavioural data. First, they investigate the applicability of fMRI pre-processing techniques on data from calcium imaging. They then cross-correlate this pre-processed functional data with structural data extracted from the connectomes, including a comparison to humans. The authors proceed to compare the two connectomes and find significant differences, which they attribute to differences in the accuracy of the synapse detections. Next, they present a novel algorithm to quantify whether neurons are segregated (pre- and postsynapses are spatially separate) or unsegregated (pre- and postsynapses are mixed). Using this approach, they find that unsegregated neurons may contribute more to function than segregated neurons. Applying a general linear model to the functional dataset suggests that activity in two brain areas (Wedge and AVLP) is suppressed during walking. The authors identify a GABAergic neuron in the connectome that could be responsible for this effect and suggest it may provide feedback to the fly's "compass" in the central complex.

      Strengths:

      The study tackles a relevant question in connectomics by exploring the relationship between structural and functional connectivity in the Drosophila brain. The authors apply a range of established and adapted analytical methods, including fMRI-style preprocessing and a novel synaptic segregation index. The effort to integrate multiple datasets and to compare across species reflects a broad and methodical approach.

      Thank you very much for your comments.

      Weaknesses:

      The manuscript would benefit from a clearer overarching narrative to unify the various analyses, which currently appear somewhat disjointed. While the technical methods are extensive, the writing is often convoluted and lacks crucial details, making it difficult to follow the logic and interpret key findings. Additionally, the conclusions are relatively incremental and lack a compelling conceptual advance, limiting the overall impact of the work.

      (1) The introduction currently contains a number of findings and conclusions that would be better placed in the results and discussion to clearly delineate past findings from new results and speculations.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) The narrative would benefit greatly from some clear statements along the lines of "we wanted to find out X, therefore we did Y".

      Thank you for pointing this out. In many biology papers, the problem is clear, but as you say, this paper starts by comparing the very fine SC and FC of flies, which makes the problem unclear and the results sporadic. We have revised the structure of the introduction.

      (3) More concise terminology would be helpful. For example, the connectomes are currently referred to as either "hemibrain", "FlyEM", "whole-brain", or "FlyWire".

      Thank you for pointing this out. We revised the manuscript to separate "hemibrain" and "whole-brain" from "connectome." "hemibrain" and "whole-brain" retain their original meanings.

      (4) The abstract claims "a new, more robust method to quantify the degree of pre- and post-synaptic segregation". However, the study fails to provide evidence that this method is indeed more robust than existing methods.

      We apologize, but this information was not included in the main figures or the Results section. It is presented in the Methods section and Ext. Data Fig. 4-1i, j. We moved related texts from the Methods to the Results section.

      (5) The authors define unsegregated neurons as having mixed pre- and postsynapses in the same space. However, this ignores the neurons' topology: a neuron can exhibit a clearly defined dendrite with (mostly) postsynapses and a clearly defined axon with (mostly) presynapses, which then occupy the same space. This is different from genuinely unsegregated neurons with no distinct dendritic and axonal compartments, such as CT1.

      Thank you for pointing this out. Regarding this point, we think it is difficult to discuss the neuron’s topology in this paper. We defined PPSSI and demonstrated only that unsegregated neurons with mixed pre- and post-synapses are scattered throughout the brain (Ext. Data Fig. 4-2e). Further research is needed to determine the relationship with morphology in individual neurons.

      One possibility is that inhibitory, non-spiking unsegregated neurons, such as CT1 amacrine cell [24, 27, 28] or interneurons in Antennal Lobe [29], may be widely used throughout the brain (WAGN is also a candidate for this). Grimes et al. [34] mentioned “The retina is a beautiful example of a neural network that optimizes signal processing capacity while minimizing cellular cost.” To maintain the signal dynamic range, A17 amacrine cells must optimize the processing units and wiring costs. If one unit equaled one cell, an enormous number of cell bodies would be required, reducing the number of processing units per volume and increasing the energy cost during development. To optimize this, they proposed arranging units capable of parallel processing within a single cell, thereby maximizing the processing units and wiring costs per volume.

      Signal bursts might also occur in the central nervous system (CNS), in which case CNS neurons also require dynamic range adjustment. The concept of optimizing processing units per volume is highly compelling and is thought to apply not only to the retina but throughout the entire brain.

      (6) It is not entirely clear where the marmoset dataset originates from. Was it generated for this study? If not, why is there a note in the Ethics Declaration?

      Marmoset data were reported in [10] and it was not generated for this study. We therefore removed the Ethics Declaration.

      (7) On the differences between hemibrain and FlyWire: What is the "18.8 million post-synapses" for FlyWire referring to? The (thresholded) FlyWire synapse table has 130M connections (=postsynapses). Subsetting that synapse cloud to the hemibrain volume still gives ~47M synapses. Further subsetting to only connections between proofread neurons inside the hemibrain volume gives 19.4M - perhaps the authors did something like that? Similarly, the hemibrain synapse table contains 64M postsynapses. Do the 21M "FlyEM" post-synapses refer to proofread neurons only? If the authors indeed used only (post-)synapses from proofread neurons, they need to make that explicit in results and methods, and account for differences in reconstruction status when making any comparisons. For example, the mushroom body in the hemibrain got a lot more attention than in FlyWire, which would explain the differences reported here. For that reason, connection weights are often expressed as, e.g., a fraction of the target's inputs instead of the total number of synapses when comparing connectivity across connectomic datasets. Furthermore, in Figure 3b, it looks like the FlyWire synapse cloud was not trimmed to the exact hemibrain boundaries: for example, the trimmed FlyWire synapse cloud seems to extend further into the optic lobes than the hemibrain volume does.

      Thank you for pointing this out. FlyEM connectome data version 1.2 was downloaded and used as described in Data Availability. This data is provided in the format defined by https://neuprint.janelia.org/public/neuprintuserguide.pdf, and we extracted neurons and synapses from it.

      The entire segmentation body is 28M segmentations, and there were 99,644 Traced proofread neurons. In addition, there were 73M (pre- or post- alone) synapses, 87M records in synapseSets and 128M records in synapseSet-to-synapse. When we extracted post-synapses between Traced neurons, the total number was 21.4M (i.e., connections from Traced neurons to other body fragments like Orphans were excluded).

      The FlyWire dataset (v783) was downloaded from the flywire codex and Zenodo. This dataset contained 139,255 proofread neurons and 54.5M (pair of pre- and post-) synapses, as described in Dorkenwald et al. [13], with 18.8M post-synapses in the regions corresponding to the hemibrain primary ROIs. We have updated the Results and Methods sections by taking into account your comment.

      In Fig. 3b, these images were created using a mask that extended the boundaries of the hemibrain primary ROIs, making the boundaries unclear. Therefore, we corrected the images in Fig. 3b by adjusting the mask so that the boundaries were properly aligned.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Okuno et al. re-analyze whole-brain imaging data collected in another paper (Brezovec et al., 2024) in the context of the two currently available Drosophila connectome datasets: the partial "FlyEM" (hemibrain) dataset (Scheffer et al., 2020) and the whole-brain "FlyWire" dataset (Dorkenwald et al., 2024). They apply existing fMRI signal processing algorithms to the fly imaging data and compute function-structure correlations across a variety of post-processing parameters (noise reduction methods, ROI size), demonstrating an inverse relationship between ROI size and FC-SC correlation. The authors go on to look at structural connectivity amongst more polarized or less polarized neurons, and suggest that stronger FC-SC correlations are driven by more polarized neurons.

      Strengths:

      (1) The result that larger mesoscale ROIs have a higher correlation with structural data is interesting. This has been previously discussed in Drosophila in Turner et al., 2021, but here it is quantified more extensively.

      (2) The quantification of neuron polarization (PPSSI) as applied to these structural data is a promising approach for quantifying differences in spatial synapse distribution.

      Thank you very much for your comments.

      Weaknesses:

      One should not score noise/nuisance removal methods solely by their impact on FC-SC correlation values, because we do not know a priori that direct structural connections correspond with strong functional correlations. In fact, work in C. elegans, where we have access to both a connectome and neuron-resolution functional data, suggests that this relationship is weak (Yemini et al., 2021; Randi et al., 2023). Similarly, I don't think it's appropriate to tune the confidence scores on the EM datasets using FC-SC correlations as an output metric.

      Thank you for pointing this out. We believe that the FC in C. elegans uses cell body dynamics, which is different from the synaptic population dynamics in a region of fly calcium imaging or fMRI data (BOLD [Blood Oxygenation Level Dependent] signal). The BOLD signal in a region is thought to correspond to the neurovascular coupling of synaptic population dynamics. Furthermore, compartmentalization of a neuron has been observed in C. elegans (Hendricks et al., 2012)*, showing different dynamics across neuron compartments. Thus, the dynamics of the cell body and the dynamics of the synaptic population in other regions are different in C. elegans. We speculate that there is some relationship between FC-SC between regions, because the FC-SC correlation in the fly brain reached r=0.87 with 20 ROIs (Fig. 2d). We believe that this result is different from the cell body dynamics in C. elegans.

      *Hendricks et al., “Compartmentalized calcium dynamics in a C. elegans interneuron encode head movement,” Nature 487, 99-103 (2012)

      Any discussion of FC-SC comparisons should include an analysis of excitatory/inhibitory neurotransmitters, which are available in the fly connectome dataset. However, here the authors do not perform any analyses with neurotransmitter information.

      A comparison between FC-SC and neurotransmitter has been written in the Results section. We investigated the ratios of neurotransmitter input (ExtFig.3-2a) and output (Fig. 3f) in each region, and investigated the relationship between this ratio and FC-SC correlation in each neurotransmitter. This revealed significant correlations for acetylcholine (r=0.39, p=0.0013) and GABA (r=-0.25, p=0.046) (Fig. 3g). That is, the higher the percentage of excitatory connections, the higher the FC-SC correlation; conversely, the higher the percentage of inhibitory connections, the lower the FC-SC correlation.

      Comparisons between fly and human MRI data are also premature here. Firstly, the fly connectomes, which are derived from neuron-scale EM reconstructions, are a qualitatively different kind of data from human connectomes, which are derived from DSI imaging of large-scale tracts. Likewise, calcium data and fMRI data are very different functional data acquisition methods-the fact that similar processing steps can be used on time-series data does not make them surprisingly similar, and does not in my view, constitute evidence of "similar design concepts."

      Thank you for pointing this out. As you say, fiber bundles of DTI and EM connectome are completely different. Nevertheless, the fact remains that the FC-SC correlation is high in both the fly and human brains. As mentioned above, both regional signal from calcium imaging and BOLD signal from fMRI are based on synaptic population dynamics. It was estimated that 43% of the energy consumption in the gray matter is due to synaptic activity of neurons (Harris et al., 2012), and the BOLD signal fluctuates greatly due to this activity. Furthermore, synaptic activity is thought to be much faster than the activity of microglia and astrocytes, so the FC of fMRI is thought to mainly capture the regional correlation of synaptic activity. In other words, in both flies and humans, although the size is different, the pre-synaptic activity in one region and the pre-synaptic activity in another region via neural fibers are being compared in a common manner in the form of FC-SC.

      In addition, non-spiking unsegregated neuron exists in mammals as well, such as the amacrine cell of the retina [34], and even pyramidal cells in the neocortex show local mixtures of pre- and post-synapses (Ext. Data Fig.1-2). If a functional unit is realized by local compartment in a neuron as mentioned in [34], the fly will be a powerful model organism for investigating them, and its functional “design concept” may also be useful for mammals.

      Harris et al., “The Energetics of CNS White Matter,” J. Neurosci., 2012, 32 (1) 356-371

      The comparison of FlyEM/FlyWire connectomes concludes that differences are more likely a result of data processing than of inter-individual variability. If this is the case, the title should not claim that the manuscript covers individual variability.

      Thank you for pointing this out. Inter-individual variability is relevant to both SC and FC. Regarding SC, we think the difference in the number of synapses between the two individuals is due to the difference in detection power caused by differences in the resolution of the electron microscope. Regarding FC, as stated in the Results section, “Spatial smoothing is useful for absorbing inter-individual variability and conducting second-level group analysis.” Increasing the smoothing size improves the correlation and AUC between group-averaged FC and SC, indicating the presence of inter-individual variability in FC (Fig. 2b, Ext. Data Fig. 2-1b, especially when the number of ROIs is high). We added this text in the Introduction and Results sections to address your comment.

      The analysis of the wedge-AVLP neuron strikes me as highly speculative, given that the alignment precision between the connectome and the functional data is around 5 microns (Brezovec* et al, PNAS 2024).

      As you mentioned, functional analysis has limitations in spatial resolution. In particular, the resolution in the Z axis is 4 μm, which is 1,000 times lower than the resolution of electron microscopy data. This makes it difficult to perfectly match synaptic activity to a synapse in the structural data. Furthermore, spatial smoothing is applied to functional images to absorb inter-individual variability, which can only provide blurred results for group analyses. These are considered limitations of the methods used in fMRI analysis. Despite these limitations, we applied GLM analysis to walking behavior and observed clear inactivity region. This region roughly corresponds to the synaptic cloud of a neuron named WAGN (Fig.5b and c). This neuron also connects to WPNb and ANs in the connectome data, suggesting a possibility that it is related to walking behavior. This is merely a screening reference; therefore, further biological experimentation is needed to pursue this topic.

      Recommendations for the authors:

      Reviewing Editor Comments:

      We should emphasize that the reviewers encouraged revision and resubmission. If the reviewers' comments were to be addressed in full in a revision to strengthen the evidence, this would significantly increase the impact of the findings and the relevance of the work to the fly neuroscience community and to the connectomics field more broadly.

      Thank you very much for your comments.

      Major Issues:

      (1) Structural correlation and functional correlation measure very different aspects of network data, yet a simple correlation between the off-diagonal elements of the two is used. It would be expected that this would not be directly proportional, and it's not clear why this would be a sensible measure. The authors need a better solution for dealing with the zero entries in the SC matrix. Replacing the infinities with zeros and then running the linear regression to get an SC/FC relationship is not appropriate. Even with a better metric, given that both intuition and other studies have shown a weak correlation between FC and SC, using FC-SC correlation as a quality descriptor for other properties is not proper. Furthermore, the authors don't account for neurotransmitter identity in the structural data, which would have strong implications for the relationships between FC and SC.

      Thank you for pointing this out. To investigate this issue we compared the FC-SC correlation between the Gaussian resampled SC approach used in Honey et al. (2009) [6] and the log-scaled SC used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate is low (Ext. Data Fig.2-2b), resulting in less zero replacement. Therefore, log-scaled SC is likely to more accurately represent the FC-SC relationship. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and Gaussian resampled SC randomly assigns a large number of zero elements from the smaller end of the distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Log-scaled SC has been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. When zero replacement is undesirable, using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown). It may be possible to compare various methods, but this is outside the scope of this study and requires further research.

      The C. elegans studies presented by Reviewer #3 showed a weak correlation between FC and SC. However, C. elegans neurons do not fire and exhibited different calcium fluctuations depending on the region (Hendricks et al., 2012). This suggested that the cell body and various synaptic terminal regions have different FCs, which is consistent with the objective of our study (neuronal compartmentalization). If a functional unit is locally composed of multiple neurons and synapses, it is expected that SC and FC from that region will show a strong relationship. Larger regions would include multiple functional units, and a relationship between SC and FC would also be found, which is consistent with the results of our study. The C. elegans study compared FC of the cell body (a region) with SC of whole cell (not a same region), which would be inconsistent.

      (2) Synaptic segregation on neurons can be topologically present even if pre- and post-synaptic synapses are present in similar regions of space, as an axon branch and dendrite branch can overlap in space but remain distinct along the arbor. The authors emphasize a region-based definition that does not reflect cellular anatomy. Moreover, the authors do not make an argument for their claim of better robustness of their new synaptic segregation measures.

      Author response image 2.

      Distance calculation for DBSCAN. a, Example synapse pair (black dot) of distance calculation. Red line shows the straight-line distance, and green line shows the morphology-based distance. DBSCAN will places two synapses in the same cluster based on straight-line distance, but they will be in different clusters based on the morphology-based distance.

      Thank you for pointing this out. We changed from using DBSCAN based on the straight-line distance between synapses to DBSCAN based on the morphology-based distance via the branch nearest to the synapse (Author response image 2a). This resulted in a synaptic segregation measure that incorporates cellular anatomy. We updated all related figures, such as Figure.4, Ext. Data Figure.4-1, 4-2, 4-3, 4-4, Figure.5h. Also, we updated related text in the Results and Methods sections.

      (3) Reviewers found the overall structure of the paper is difficult to follow, with sections appearing disjoint and the aims of different sections not well described. This extended to the paper organization as well, with the introduction not clearly setting up the questions and being distinct from the results. The manuscript would benefit from a clearer overarching narrative to unify the various analyses.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (4) Similarly, there are several descriptions of data and analysis that are unclear or lacking, including the source of the marmoset data and how the FlyWire synapse was subsampled.

      As pointed out by other reviewers, the marmoset data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      We have updated the Results and Methods sections regarding the extraction of "traced" neurons and synapses in FlyEM connectome data, and the extraction of post-synapses in hemibrain primary ROIs in FlyWire connectome data.

      (5) Comparisons between FlyWire and Hemibrain have shown many similarities and some clear examples of inter-individual variability. There was concern that technical decisions with handling FlyWire synapse sampling were responsible for some of the differences observed between the datasets.

      In response to Reviewer #2's question, we answered that both FlyEM and FlyWire use proofread neurons and their connecting synapses. We also updated Fig. 3b and the Results and Methods sections.

      Reviewer #1 (Recommendations for the authors):

      The paper is written in an unusual way. It would be helpful if the introduction read more like a standard introduction. Describe the relevant background that the reader needs to understand the results that come later. Frame the experiments in terms of a question or hypothesis. Results should be relegated to the results section (or, if you like, a final paragraph that summarizes the findings). They should not be intermingled throughout the introduction.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      The authors must be more attentive in terms of how they construct the segregated/unsegregated connectomes. I suggest exploring various thresholds/bins, but also considering proportionality thresholds that match the number of synapses.

      Thank you for pointing this out. As pointed out by other reviewers, we changed from using DBSCAN based on the straight-line distance between synapses to DBSCAN based on the morphology-based distance via the branch nearest to the synapse (Author response image 2a). This resulted in a synaptic segregation measure that incorporates cellular anatomy.

      We also considered about the sparsity rates of the SC matrices. Since the sparsity rates of the null SC matrices differed a lot from that of the SC matrices extracted by cPPSSI, we regenerated the null SC matrices, shown in Fig. 4e and 4i. As shown in Author response image 1, we ensured that the extracted SCs fit within the null-generated matrices. This figure was added to Ext. Data Fig.4-5, and the main text was also revised.

      The authors need a better solution for dealing with the zero entries in the sc matrix. Replacing the infinities with zeros and then running the linear regression to get an sc/fc relationship is not appropriate.

      Thank you for pointing this out. To investigate this issue, as pointed out by other reviewers, we compared the FC-SC correlation between the Gaussian resampled SC approach used in Honey et al. (2009) [6] and the log-scaled SC used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate was low (Ext. Data Fig.2-2b), resulting in less zero replacement. Therefore, log-scaled SC is likely to more accurately represent the relationship. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and resampled SC randomly assigns a large number of zero elements from the smaller end of the distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown), because this matrix can also be very sparse. It may be possible to compare various methods, but this is outside the scope of this study and requires further research.

      It would be useful to include a description of where the human/marmoset datasets came from. It would be useful to describe the processing of those datasets and whether they're comparable to how the fly data was processed.

      As pointed out by other reviewers, the marmoset data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      The pre-processing of fly calcium imaging data is described in the Methods section. Unfortunately, this processing method is not comparable to that used in humans/marmosets as it was highly customized.

      The authors report sc/fc correlations for the human/marmoset datasets based on single papers. However, in the human case, especially, the strength of sc/fc correlations is highly variable. Not just based on number/size of parcels, but based on amount of data, processing pipeline, single-subject versus group averaged (incidentally, single-subject sc/fc is ‘much’* lower than group-averaged, which has big implications for this study, where the fly datasets are, in essence, N=1 studies).

      Yes, there are numerous FC-SC correlation studies. We think Honey et al. (2009) [6] to be a highly representative study. It showed r = 0.39 to 0.48 for individual participants in 998 ROIs, and r = 0.36 for averaged one, but it increased r = 0.53 excluding absent or inconsistent structural connections. So, single-subject may not be much lower than group-averaged. Since the SC for a fly is an N=1 study, the FC-SC correlation for the same individual cannot be calculated. We think further research will be necessary.

      Reviewer #2 (Recommendations for the authors):

      Abstract:

      Please introduce the term "ROI"

      Thank you for pointing this out. We have revised the Abstract.

      Introduction:

      (1) On a general note: the introduction reads like an extended abstract (i.e., a mix of results and discussion).

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) Line 43: Does this mean FC-SC correlation is higher in flies but not significantly so? Please clarify.

      We performed Mann-Whitney U test and it was not significant (p= 0.2667).

      (3) Line 51: The "confidence" score does not indicate the degree of synaptic detection.

      In the NeuPrint user guide, https://neuprint.janelia.org/public/neuprintuserguide.pdf it states “confidence - The certainty that an annotated synapse is correct and valid.” Since “degree of synaptic detection” may be difficult to understand, we changed it to “certainty of an annotated synapse.”

      (4) Line 59-61: This statement needs refining: post-synapses do not "receive" neurotransmitters, action potentials aren't conducted along nerve fibres.

      We changed “receive” to “sense.” About “action potentials,” we changed “conduct an action potential” to “graded potentials”, and removed “along nerve fibers.”

      (5) Line 61: calcium activity as detected via GCaMP correlates with (electric) neuronal activity - please cite relevant GCaMP literature here.

      We added F. Helmchen and J. Waters, "Ca2+ imaging in the mammalian brain in vivo," Eur J Pharmacol., vol. 447, pp. 119-129, 2002.

      (6) Line 76: "interconnected" is rather vague; just say "many Drosophila neurons are reciprocally connected".

      Thank you for pointing this out. Lin et al., (2024) showed motif analysis and there are many reciprocal, three-node and rich-club connections. However, introduction was updated and this sentence was removed.

      (7) Line 77: comparing unsegregated vs reciprocal synapses is overly simplistic; these are separate features of the same object - i.e., a synapse can be reciprocal and at the same time be segregated in the presynaptic neuron but unsegregated in the postsynaptic neuron.

      Thank you for pointing this out. As you say, the relationship is complicated. In this paper, we are concerned with the degree of segregation of pre- and post-synapses, and we are looking at the segregation within a neuron. In this case, nearby reciprocal synapses (<=2 μm) are included in unsegregated synapses. We have made a correction to the sentence.

      (8) Line 79: I don't understand how we get from unsegregated synapses to local activity.

      Retinal amacrine cells have extensive unsegregated synapses, which provide local feedback inhibition of burst inputs [34]. We changed the text around these descriptions.

      (9) Line 80: What does "more essential function" mean?

      We removed this sentence.

      (10) Line 85: "as shown earlier": Is this based on results in this study or prior work? See also the general above note on mixing results/discussion into the introduction.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (11) Line 85-87: I don't understand how the applicability of certain fMRI analysis methods in turn means that functional activity is locally compartmentalized. Did you mean to say something along the lines of "we applied common fMRI methods which showed functional activity is locally compartmentalized"?

      These sentences discuss the commonality between fMRI (BOLD signal) and calcium signal, which both represent presynaptic population dynamics within a local region (voxel). Furthermore, unsegregated synapses are widespread throughout the fly brain (Ext. Data Fig.4-2) and can also be observed in human pyramidal cells (Ext. Data Fig.1-2). Unsegregated synapses suggest local compartment activity [33, 34, 39, 40] and contribute more to functional activity (Fig.4b). Therefore, the similar trend in FC-SC correlation (Fig.2d) between humans and flies suggest that both species exhibit localized compartmental activity via unsegregated synapses throughout the entire brain.

      Because these sentences contain many conclusions, they have been moved from the Introduction to the Discussion section.

      (12) Line 87: Please provide a reference for "common among various species".

      Thank you for pointing this out. Because these sentences contain many conclusions, they have been moved from the Introduction to the Discussion section.

      Results:

      (1) Line 91-92:

      (a) Please explain where the calcium data came from, how it was generated, etc.

      We added the data source and a reference (Brezovec et al. [14]).

      (b) Please clarify: what registration method?

      This is not simple. Please see the Methods section and Ext. Data Fig.1-1. This is also indicated in the text.

      (c) "calcium image" → "calcium image data"?

      We changed “calcium image” to “calcium imaging data”.

      (d) What is the "FDA template"?

      This is a brain template created by Brezovec et al. [14]. JRC2018 is a well-known brain template, but it was created by immunostaining postmortem brains and did not fit well with calcium imaging data from living flies. Therefore, we used the FDA template.

      (2) Line 93: Please introduce the term "ROI".

      We added “(Region of Interest)” in Line 38.

      (3) Line 94: Ito et al., Neuron (2014) "A systematic nomenclature for the insect brain" is a better reference for Drosophila neuropils; for the hemibrain, the ROIs were generated to match that original atlas

      Thank you for pointing this out. We added a reference.

      (4) Line 95/96: It is unclear what was used as the basis for the k-means/distance-based clustering

      This was because we wanted to investigate whether nuisance factor removal methods are robust, even for such diverse types of ROI. We added this point to the text.

      (5) Line 120ff: I'm not sure how the total number of ROIs is relevant for comparing flies and humans, given (a) the huge difference in brain size and (b) the difference in resolution of the functional data.

      Indeed, the fly brain and the human neocortex are completely different. We are investigating whether there are commonalities between them using a metric called FC-SC correlation. As described in our answer for (11), both the fMRI (BOLD signal) and calcium signal represent presynaptic population dynamics within a local region (voxel). FC represents the synchronization of synaptic activity between regions, and SC represents the structural connectivity of neurons. Both flies and humans showed high SC-FC correlation and showed similar trends (Fig. 2d), so we believe it would be interesting to investigate this phenomenon.

      (6) Line 123: "by contrast" is misleading here since, as you say, there isn't really a difference.

      We changed “by contrast” to “and.”

      (7) Line 141: I'm somewhat worried that the differences between FlyWire and hemibrain synapse counts are due to the issues mentioned above.

      Thank you for the comment but we are not sure about “the issues mentioned above” is referring to.

      (8) Line 148: There is no evidence that any differences in synapse are due to the resolution or anisotropy (as suggested in the introduction).

      We apologize that we don’t have direct evidence for it. We changed this to the sentence “This may be caused by differences in detection accuracy resulting from the resolution of EM scanning, but not to inter-individual variability.”

      (9) Line 155: References "39,45" have no brackets.

      These are not referencing numbers, but brain regions of Brodmann area 39 and 45.

      (10) Line 155-157: I don't think we can infer the composition of brain areas in humans based on a tenuous correlation in flies; this is highly speculative and really should be in the discussion.

      In humans, there are areas with strong and weak FC-SC correlations [8], which may be due to the E-I (Excitatory-Inhibitory) balance of connections. We investigated this possibility by comparing the correlation between neurotransmitters and FC-SC correlations in the fly brain. We slightly changed this sentence.

      (11) Line 159: I find the first 2-3 sentences in this paragraph confusing. Are you saying that you did all these things in the prior results sections, or that you wanted to look at X and therefore you did Y? Maybe there is an issue with the tense here?

      We changed the sentences around this description.

      (12) Line 161: "whole-brain" = FlyWire?

      We changed “whole-brain” to “FlyWire”.

      (13) Line 163: Please explain the "PPSSI" acronym.

      This is now explained on Line 75.

      (14) Line 165: The description of how the cPPSSI was calculated is hard to follow. For example, what's the "fraction of synapse number".

      We changed our sentences around this description to be clearer. The cPPSSI is the degree of segregation within a cluster and is also assigned to each synapse. The PPSSI is then the average of the cPPSSI values of all synapses in a neuron.

      (15) Line 166: Is there a difference between "cPPSSI" and "PPSSI"?

      Yes, there is. Please see our answer for (14).

      (16) Line 167: "The result showed a histogram resembling a normal distribution" → I suggest running a normality test.

      Thank you for pointing this out. We tested it by Lilliefors test and the result was p=0.001 (significantly not a normal distribution). Since there are numerous values with PPSSI=1, it is not judged to be a normal distribution. We therefore changed this description.

      (17) Line 173: I am somewhat worried about a selection bias in your correlation of segregated vs unsegregated synapses. First, it seems like only a small fraction of neurons are in the 0-0.1 and 0.9-1 PPSSI range. I would suggest running a proper correlation between PPSSI and FC-SC correlation instead of looking at just the two extremes. Second, your examples for segregated neurons (APL + CT1) are large neurons that densely innervate spatially close and functionally very similar neuropils. If the sample of unsegregated neurons consists mainly of these large interneurons, I'm not at all surprised that they contributed strongly to FC-SC correlation.

      Thank you for pointing this out. For this work we investigated synapses (not neurons), extracting those with cPPSSI of 0-0.1 and 0.9-1, and performed a rank text with the FC-SC correlation of random sub-sampled synapses. We aimed to demonstrate that unsegregated synapses in particular, strongly contribute to FC-SC, and we hope to investigate overall trends in a future study.

      (18) Line 185: I don't think the function of reciprocal synapses is "considered to be clear". There are examples of feedback inhibition through reciprocal synapses, in particular in the visual system, but that does not mean that this is true across the board.

      We changed “considered to be clear” to “considered to be clearer than unsegregated synapses.” Of course, the function of reciprocal synapses is unknown for the whole brain, but we think it is more well-studied than unsegregated synapses.

      (19) Line 188 / Figure 4h: that figure panel does not appear to show transmitter pairs.

      Figure 4h (FlyWire) showed transmitter pairs. Ext. Data Fig.4-1g did not, because FlyEM does not have transmitter information.

      (20) Line 192: Please clarify "functionally common".

      We changed our sentences to clarify this.

      (21) Line 199: "ventral nerve code" → "ventral nerve cord".

      We fixed this typo.

      (22) Line 201: I don't think you can use "conversely" here.

      We changed “Conversely” to “Moreover.”

      (23) Line 201: How certain are you that the WAGN neuron is the only candidate? Also, it would be nice to provide the neuron IDs so that people can identify them in the connectome.

      Thank you for pointing this out. We added Root ID: 720575940644632087 in the text. Actually, we found several GABA neuron candidates, such as 720575940637611365, 720575940644632087, 720575940613552947, 720575940640333109 and 720575940612264817. We investigated whether ER1(L) was present in these downstream connections and found that 720575940644632087 had the strongest connection with the largest number of synapses, so we adopted this.

      (24) Line 207: When you say "the left WAGN was strongly connected", are those connections not also present for the right WAGN?

      There is a right WAGN (Root ID: 720575940624377224), but it does not have strong interconnections with WPNb tier 2/3 (left) neurons. For the right WAGN, there are few inputs from WPNb tier 2/3 (left). We added “(left)” in the text.

      (25) Line 212: I don't think you can use "however" here.

      We removed “however.”

      (26) Line 214: "well unsegregated" → "very unsegregated"?

      This sentence was removed, because we recalculated Fig. 5h.

      Ethics Declaration:

      It seems the marmoset data were reported on in [10], so why is there a reference to the generation of the dataset?

      Yes, marmoset data were reported in [10], so we removed the Ethics Declaration.

      Reviewer #3 (Recommendations for the authors):

      (1) In my opinion, the title and framing of this manuscript dramatically overstate the results presented here. Also, the results presented in the different figures in this manuscript seem disjointed and are not very related to each other.

      Thank you for pointing this out. We have rewritten our manuscript slightly to address this. Inter-individual variability is relevant to both SC and FC. Regarding SC, we think the difference in the number of synapses between the two individuals is due to the difference in detection power caused by differences in the resolution of the electron microscope. Regarding FC, as stated in the Results section, “Spatial smoothing is useful for absorbing inter-individual variability and conducting second-level group analysis.” Increasing the smoothing size improves the correlation and AUC between group-averaged FC and SC, indicating the presence of inter-individual variability in FC (Fig. 2b, Ext. Data Fig. 2-1b, especially when the number of ROIs is high). We added this text in the Introduction and Results sections.

      (2) There are multiple ways to compute structural correlation matrices-the methods the authors implemented should be discussed in greater detail in the manuscript.

      Thank you for pointing this out. To investigate this issue, as pointed out by other reviewers, we compared the FC-SC correlation between the Gaussian resampled SC approach, used in Honey et al. (2009) [6] and the log-scaled SC approach, used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate was low (Ext. Data Fig.2-2b), resulting in fewer zero replacement. Therefore, log-scaled SC is likely to more accurately represent the relationship in our study. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and resampled SC randomly assigns a large number of zero elements from the smaller end of the Gaussian distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown), because this matrix can be also very sparse. The log-scaled SC aprroach has been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. It may be possible to compare various methods in-depth, but this is outside the scope of this study and requires further research.

      (3) The use of the FC-SC detection score defined by the authors should be discussed and justified more extensively in the text.

      Thank you for pointing this out. This has already been discussed in [10]. We defined our own “FC-SC detection score,” but we consider the overall approach to be well established in the literature. For example, Stafford et al. (2014) carried out FC-SC detection for 168 mouse cortical regions, and obtained 78.26% sensitivity and 81.69% specificity for the top 1% of SC. Hori et al. (2020) also investigated FC-SC detection for 55 cortical regions of the marmoset brain left hemisphere, achieving an AUC of 0.72. We think FC-SC detection is an index that evaluates the relationship between FC and SC from a different angle than FC-SC correlation and is worthwhile.

      Hori et al., (2020). Comparison of resting-state functional connectivity in marmosets with tracer-based cellular connectivity. NeuroImage, 204, 116241.

      Stafford et al., (2014). Large-scale topology and the default mode network in the mouse connectome. Proc. Natl. Acad. Sci. U.S.A., 111(52), 18745-18750.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) They start by incubating LFA-1 with iRBCs and show by flow analysis that a substantial population of these iRBCs binds to the LFA-1 (Figure 1C). They do conduct the control with uninfected RBCs, but put this in the supplementary material. As this is a critical control, I think that it should be moved to Figure 1C as it is essential to allow interpretation of the iRBC data. The authors also do not state which strain of P. falciparum they used (line 144). This is critical information as different strains have different variant surface antigens and should be included. With these changes, this data seems convincing.

      We thank the reviewer for this important suggestion. We agree that the uninfected RBC (uRBC) control is critical for interpreting the specificity of LFA-1 αI-Fc binding. In the revised manuscript, we have ensured that these control data are clearly presented and appropriately referenced in the main text; however, we have retained them in the Supplementary Information (Supplementary Figure S1) to maintain clarity and avoid overcrowding Figure 1, while still ensuring their visibility and accessibility to the reader. Importantly, these data demonstrate negligible binding of LFA-1 αI-Fc to uRBCs compared to iRBCs, supporting specificity. We have explicitly stated the parasite strain used (Plasmodium falciparum 3D7) in the Methods section (line 475).

      (2) They next incubated LFA-1 with the iRBCs, cross-linked and conducted a pulldown, identifying GP130 as a binding partner. Using cross-linkers is a dangerous strategy as it risks non-specific cross-linking. Did they try without cross-linking and find an interaction?

      We agree that cross-linking can introduce potential artefacts. To mitigate this, we included hIgG control pulldown experiments performed under identical conditions. Proteins identified in the control eluate were excluded as background (summarized in Supplementary Table S1). Importantly, PfGBP-130 was the only protein specifically enriched in the LFA-1 αI-Fc pulldown across all three biological replicates (Fig. 2A, Venn Diagram). While cross-linking was used to stabilize transient interactions, consistent enrichment of PfGBP-130 across the three biological replicates precludes any concerns of non-specificity.

      (3) They raised antibodies to PfGBP and showed IFA, which reveals that these antibodies stain iRBCs (Figure 2Ciii). This experiment lacks a critical control of uninfected RBCs, which needs to be included to show that the staining is specific. Without this, it is not possible to conclude that there is iRBC-specific staining with PfGBP.

      The question pertains to Fig. 2Biii. The IFA images include both infected and neighboring uninfected erythrocytes within the same field. No PfGBP-130 staining is observed in uninfected cells. PfGARP staining, specifically done to verify parasite-infected cell and surface localisation, shows complete resonance with PfGBP-130 staining. This unequivocally shows that the antibodies raised specifically recognise only infected RBCs.

      (4) They then conduct a pulldown using LFA-Fc, which does show GP130 only in the presence of the LFA-Fc, but not when empty beads are used. This is convincing. BLI measurements are also used to study this interaction (Figure 2Ci). The BLI data is presented in such a way that any association phase is obscured by the y-axis, which makes it impossible to know whether there is binding here. I think that the data needs to be shown with some baseline before the addition of the ligand so that the association can be seen. The data is also a bit messy with a downward drift and the curves showing different shapes, for example, with the 1.0uM curve seeming to have a different association rate. Also, is this n=1? I think that this data needs to be repeated and replicated. As this is the only data which shows a direct interaction between LFA1and GBP, as pulldowns are done with lysates, which might mean bridging components. I think that it is important to repeat the BLI or use additional biophysical methods to assess binding, to obtain more convincing data.

      We sincerely thank the reviewer for highlighting this important concern regarding the BLI data presentation and interpretation. We would like to clarify that the baseline signal prior to ligand addition was subtracted during data processing; therefore, the plotted curves represent the net response following ligand association. However, we agree that this may have obscured the visualization of the association phase. Accordingly, in the revised manuscript, we have re-plotted the data with adjusted y-axis scaling to better capture the association kinetics. In addition, to ensure robustness and reproducibility, the BLI experiments were performed in multiple independent replicates (n ≥ 3) using independently purified protein batches. The original figure showed a representative dataset; we have now included averaged sensorgrams along with standard deviation in the calculated KD values [K<sub>D</sub> = (1.7 ± 0.22) × 10<sup>-8</sup> M] (Figure 2C (i)). These revisions provide a clearer and more accurate representation of the binding interaction.

      (5) The authors next do some modelling of the putative complex. This is done by homology modelling and docking, which is not the most up-to-date method and is over-interpreted. Personally, I would remove this data as I did not find it convincing, and it is not important for the story. If the authors wish to include it, then I think that they should validate the modelling by mutagenesis to show that the residues which the models indicate might bind are involved in the interaction.

      We thank the reviewer for this thoughtful comment regarding the modelling analysis. We agree that computational docking and homology-based modelling have inherent limitations and should not be over-interpreted. In our study, these analyses were included strictly as supporting evidence to provide a structural framework for the PfGBP-LFA-1 interaction, while the primary conclusions are based on direct biochemical and functional validation, including pull-down, BLI measurements, receptor knockdown, and cellular inhibition assays. Importantly, the use of docking approaches such as ClusPro, followed by interface analysis and MD simulations, is a widely accepted and routinely used strategy to generate testable hypotheses for protein-protein interactions, particularly when experimental structures are unavailable (e.g., Comeau et al., 2004; Weng et al., 2019). We believe that the current modelling serves as a useful complementary analysis that is consistent with, and supportive of, the experimentally validated interactions.

      (6) They next made GP130 and tested the binding of this to THP-1 cells, which are often used as a model for macrophages. They observe greater binding of PfGBP-Fc to these cells when compared with hIgG and show that LFA-1 siRNA reduces this binding. I was a little confused about how the flow plots related to the graph in the bottom right corner of Figure 3Bii. In the flow plots, hIgG control shows 12.8% of cells in the gated region, while the unstained cells has 5.63%, but the MFI data shows a decrease in binding for hIgG vs unstained cells. How is this consistent? Also, the siRNA reduces the number of cells in the gated region from 66.6% to 25.9%, which is still substantially more that 5.63% in the unstained control. This also doesn't seem quite consistent with the MFI data. Could the authors explain this? Also, perhaps an additional experiment would be to add soluble LFA-1 into this assay as an additional control to determine whether this blocks PfGBP binding to the THP-1 cells? It could be that there are additional mechanisms of binding which indicate why the siRNA has a partial effect. The same is true for the NK cell experiments in Figure 3Ci, in which the siRNA has a partial effect. The authors also test binding to HEK, HepG2 and 'stem' cells and claim' only background levels of binding', but in each case, there is more binding to these cells by PfGBP-Fc than by hIgG, albeit less than in THP-1 and NK cells. Why have the authors decided that these increases are not significant? All in all, these experiments do indicate a role for the GBP-LFA1 interaction in the binding of immune cells to iRBCs, but perhaps not as absolutely as is suggested.

      We thank the reviewer for this insightful comment. The apparent discrepancy arises because the flow plots depict the percentage of cells within a defined positive gate, whereas the graphs quantify mean fluorescence intensity (MFI) across the entire population. We have revised figure legend accordingly to indicate the same. Regarding the partial reduction in binding upon LFA-1 (CD11a) knockdown, we agree that this indicates LFA-1 is a major but not exclusive contributor, which is biologically plausible given incomplete siRNA depletion and the known avidity-dependent nature of integrin interactions. Importantly, our conclusion is supported by multiple orthogonal approaches (αI-domain binding, LC-MS/MS identification, BLI, docking, receptor knockdown, and functional blockade). We also appreciate the suggestion of soluble LFA-1 competition, which we acknowledge as an important future experiment. Finally, we have revised the text regarding HEK293T, HepG2, and stem cells to reflect that PfGBP-Fc binding is minimal but not absent, consistent with low/non-expression of LFA-1 in non-immune cells. Overall, we have moderated our claims to state that PfGBP-LFA-1 interaction is a dominant and functionally relevant mechanism, while not excluding additional low-affinity or accessory interactions.

      Figure legend change: Representative flow plots depict the percentage of cells within a predefined positive gate, whereas the accompanying summary graph quantifies fluorescence intensity across the analyzed population. These two metrics report distinct properties of the distribution and are therefore not expected to be numerically identical.

      (7) The authors next produce CHO cells with PfGBP on the surface. These cells bind toLFA-1 specifically. When these cells were incubated with primary NK cells, they did see increases in activation markers, which were reduced by the addition of anti-CD11a, suggesting these to be specific. They also conduct the same experiment with anti-GBP with iRBCs, but this is in a different figure. It would be easier for the reader if Figure 5B were in the same figure as Figure 4B, as it is related data using the same method. I found this data convincing, showing that the LFA1:GBP interaction does contribute to immune cell recognition and activation.

      We thank the reviewer for this positive assessment and helpful suggestion regarding figure organization. We agree that the CHO-PfGBP and iRBC-based NK cell activation assays represent conceptually related experiments that both address LFA-1-PfGBP dependent activation using similar readouts. We have retained separate panels to distinguish the reductionist CHO-based system from the physiologically relevant iRBC context. We believe that the combined evidence from both systems strengthens the conclusion that PfGBP-LFA-1 interaction is a key contributor to NK cell recognition and activation.

      (8) The authors next conduct an experiment in which they assess parasite growth in the presence of NK cells and in the presence of anti-GBP. They use Heochst staining as a measure of parasite growth and claim that NK cells reduce the number of parasites, but that anti-GBP abolishes this effect (Figure 5A). I found this experiment very unconvincing as there are small effects and no demonstration of significance. More commonly used approaches to study parasite growth are lactate dehydrogenase GIA assays or calcein-AM labelling. I did not find this experiment convincing and would either remove or supplement with additional data using a more robust assay, with repeats and tests of statistical significance.

      We respectfully disagree that the assay should be removed, because flow-cytometric quantification of P. falciparum parasitemia using DNA dyes such as Hoechst is a widely used, accepted, and high-throughput approach for measuring infected erythrocytes and parasite growth, with clear separation of infected from uninfected RBCs and good reproducibility across malaria studies (Dent et. al., 2009; Jang et. al., 2014). Importantly, closely related immune-cell killing experiments in the malaria field have used the same general strategy, co-culture with effector cells followed by flow-cytometric enumeration of parasitemia to infer parasite control, including the seminal NK-cell study by Chen et. al., 2014, which our assay design follows conceptually, and later work showing reduced parasitemia after co-incubation with cytotoxic lymphocytes measured by nucleic-acid dye flow cytometry. We therefore believe the experiment is methodologically valid and directly relevant to the biological question, namely whether disrupting PfGBP-LFA-1 engagement alters NK-cell-mediated restriction of parasite expansion.

      Reviewer #2 (Public review):

      (1) PfGBP-130 is proposed to be a membrane protein based on a single predicted transmembrane domain. Figures 2b and 3a show ribbon schematics with this TM domain at residues 51-68, in agreement with TM prediction algorithms such as TMHMM 2.0 and Phobius. However, this predicted TM is upstream of the PEXEL motif (residues 84-88, sequence RILAE), a conserved sequence for parasite protein export to host cytosol that is proteolytically processed at its 4th residue. Thus, residues 1-87are removed from PfGBP-130 prior to export, yielding a mature protein without predicted TMs. Prior studies have determined that the mature PfGBP-130 lacks TMs and is retained as a soluble protein in host cell cytosol (PMID: 19055692, 35420481). Thus, the authors' model of PfGBP-130 as a surface-exposed membrane protein conflicts with both computational analysis of the mature protein and these prior reporter studies. An important simple experiment would be to evaluate PfGBP-130membrane association in immunoblots using the authors' PfGBP-130 antibody after hypotonic lysis (PMID: 19055692) and after alkaline extraction (e.g. 100 mM NaCO3, pH 11 as frequently used, PMID: 33393463). If the prior studies and computational analyses are correct, the protein will be predominantly in the soluble and/or alkaline supernatant fractions.

      We thank the reviewer for this important observation regarding PfGBP-130 topology and export. We agree that the presence of a PEXEL motif supports proteolytic processing and that the mature protein may lack a classical transmembrane domain. However, consistent with our model of surface accessibility, we would like to clarify that in an independent proteomic study performed in our laboratory on the membrane-enriched fraction of Plasmodium falciparum-infected erythrocytes, PfGBP-130 was reproducibly identified by LC-MS/MS among membrane-associated proteins (data not shown; can be provided upon request). These findings support the conclusion that, irrespective of the absence of a canonical transmembrane domain, PfGBP-130 is associated with the iRBC membrane compartment, likely via peripheral or protein-complex–mediated interactions, as described for several exported Plasmodium proteins.

      (2) Many findings rely on the specificity of antibodies generated against PfGPB-130 or NK cell receptors. Although the authors have included key controls (use of isotype control antibodies, lack of anti-PfGBP-130 binding to uninfected cells), cross-reactivity between P. falciparum antigens is well-recognized and could significantly undermine the interpretation of experiments (PMID: 2654292 and 1730474 provide key examples of antigens recognized by antibodies raised against other proteins). For example, the surface localization in IFA experiments (Figure 2B(iii)) could reflect anti-PfGBP-130binding to an unrelated parasite surface antigen, a possibility not addressed by any of the authors’ controls. As another example, the iRBC lysate immunoblot using this antibody in Fig. 2B(iv) suggests a MW of 95 kDa, which corresponds to the unprocessed pre-protein before export; cleavage in the PEXEL motif yields a processed mature protein of 85 kDa, which should be readily resolved from the pre-protein in immunoblots (PMID: 19055692). A better immunoblot using immature infected cell stages might show both the pre-protein and the mature protein as a doublet band.

      We thank the reviewer for raising this important concern regarding antibody specificity. We agree that cross-reactivity among P. falciparum antigens is a known issue and have taken multiple steps to ensure specificity in our study. First, the anti-PfGBP-130 antibodies were generated against a defined recombinant fragment and show no detectable binding to uninfected RBCs and no signal in hIgG control immunoprecipitates, supporting specificity. Importantly, in our LC-MS/MS analysis of LFA-1 αI-domain pull-downs, PfGBP-130 was specifically enriched and consistently identified across replicates, independently validating the target recognized by the antibody. Furthermore, the same antibody detects a single dominant band in both iRBC lysates and αI pull-down fractions, arguing against widespread cross-reactivity. Regarding the apparent molecular weight (~95 kDa), we agree that this likely corresponds to the precursor form, and that a processed form (~85 kDa) may not be well resolved under our current conditions.

      (3) PfGBP-130 is not essential for in vitro cultivation (PMID: 18614010 and MIS of 1.0 in the piggyBac mutagenesis screen as tabulated on plasmodb.org, indicating a highly dispensable gene). The authors should use the knockout line as a control in their IFA localization experiments to address antibody specificity. More fundamentally, their model predicts that NK cells should not recognize or kill infected cells from the knockout line when compared to their untransfected parent. Such results with the knockout line would compellingly support the authors' model without reliance on antibodies that may cross-react with other parasite antigens. PMID: 18614010reported that the PfGBP-130 knockout exhibited increased membrane rigidity, suggesting an intracellular scaffolding protein rather than a surface localization and use as a ligand for LFA-1 interaction and NK cell-mediated killing.

      We agree that a PfGBP-130 knockout line would provide a powerful genetic validation of both antibody specificity and the proposed functional role of PfGBP-130 in NK cell recognition. At present, such experiments were not included in this study, and we acknowledge this as an important limitation. However, we would like to emphasize that our conclusion does not rely on antibody-based localization alone; rather, it is supported by multiple orthogonal approaches, including LFA-1 αI-domain pull-down coupled to LC-MS/MS, biophysical interaction analysis, receptor knockdown, and functional blocking assays. In addition, in one of our previous proteomic analyses of the membrane-enriched fraction of infected erythrocytes, PfGBP-130 was identified among the proteins present in the membrane fraction, supporting its association with the iRBC membrane compartment despite lacking a classical mature transmembrane domain.

      (4) PfGBP-130 non-essentiality raises the question of why the gene would be retained if it triggers NK cell-mediated killing of infected cells in vivo. Presumably, this killing would pose strong selective pressure against retention of PfGBP-130. Some speculation is warranted to support the model.

      We thank the reviewer for this thoughtful evolutionary question. We agree that if PfGBP-130 enhances NK-cell recognition, its retention likely reflects a context-dependent fitness trade-off rather than a simple benefit or cost. This situation is not unusual in P. falciparum: several exported or surface-associated proteins are retained despite being immunogenic because they also provide advantages in other settings, such as erythrocyte remodeling, cytoadhesion, niche adaptation, immune modulation, or transmission. The clearest precedent is the PfEMP1/var system, in which highly immunogenic surface antigens are nevertheless strongly maintained because they mediate sequestration and in vivo fitness, while antigenic variation limits continuous immune exposure (Chew et. al., 2022). Similarly, other variant surface antigens such as STEVOR and RIFIN are retained despite immune recognition because they contribute to erythrocyte binding, antigenic diversity, and immune evasion or modulation (Niang et. al., 2009; Sakoguchi et. al., 2025). More broadly, many P. falciparum genes that appear dispensable in standard in vitro culture are nevertheless preserved because culture does not recapitulate the selective pressures present in vivo, including splenic clearance, endothelial interactions, immune attack, and within-host competition.

      Reviewer #3 (Public review):

      (1) Anti-GBP130 antibodies are used in the cellular assays to block the interaction between GBP130 and LFA1. They should therefore also block interactions betweenGBP130 and LFA1 recombinant proteins in the biolayer interferometry experiment. Do the authors have data to show this? Similarly, the anti-CD11a antibodies used to block the interaction in the cellular assays should also block the in vitro interaction between recombinant LFA1 and GBP130.

      We thank the reviewer for this insightful suggestion. We agree that demonstrating antibody-mediated inhibition of the recombinant PfGBP-LFA-1 interaction would provide an additional orthogonal validation of the interface. While such blocking experiments were not included in the original BLI dataset, our current study already establishes the specificity of this interaction through multiple independent approaches, including αI-domain pull-down and LC-MS/MS identification, BLI-derived high-affinity binding (KD ~10<sup>-8</sup> M), structural docking, receptor knockdown, and antibody-mediated inhibition in cellular systems. We note that antibody-mediated blocking in a purified biophysical system is not always directly comparable to cellular assays, as epitope accessibility, orientation on biosensor surfaces, and conformational states of integrins (which are known to undergo activation-dependent structural changes) can influence inhibition efficiency. Nonetheless, we fully agree that this represents an important validation experiment.

      (2) The structural modelling analysis of the predicted complex between GBP130 andLFA1 (Figure 2cii) predicts that the majority of the important GBP130 interface residues are located in the region D509-N607. However, the authors present BLI data for the GBP130-LFA1 interaction, which used the N-terminal fragment of GBP (residues 69-270), which does not include the GBP130 residues predicted to be important for the formation of the complex between the two proteins. Could the authors provide an explanation for how an interaction was observed with theGBP130-N fragment, which does not contain the residues predicted to be important for interacting with LFA1?

      We thank the reviewer for this important observation. We agree that the structural model predicts a major interaction interface within the D509-N607 region of PfGBP-130; however, this does not preclude the existence of additional or auxiliary binding determinants within the N-terminal region used in our BLI assays (aa 69-270). PfGBP-130 is a multi-domain, repeat-containing protein, and such proteins frequently exhibit distributed or multivalent interaction interfaces, where individual regions can independently engage binding partners with lower affinity while the full-length protein achieves higher avidity through cooperative interactions. In our study, the BLI data using the N-terminal fragment demonstrate that this region is sufficient to mediate direct interaction with the LFA-1 αI domain, whereas the structural model based on full-length predictions likely captures a dominant or higher-affinity interface in the C-terminal region. Importantly, the interaction is supported by multiple orthogonal datasets, including pull-down/LC-MS/MS, cellular binding assays, and functional inhibition, indicating that the observed binding is not an artefact of fragment choice.

      Author response image 1.

      To further examine this, we performed docking and binding energy analyses comparing the full-length PfGBP-130-LFA-1 complex with the N-terminal domain-LFA-1 complex. Using the PRODIGY server, the predicted binding affinity for the full-length complex was -9.8 kcal/mol, whereas the N-terminal domain complex exhibited a still favorable binding energy of -5.6 kcal/mol. Similarly, HawkDock (v2) analysis yielded binding energies of -22.2 kcal/mol for the full-length complex and -14.1 kcal/mol for the domain-only complex. While reduced relative to the full-length protein, these values remain well within the range of stable protein-protein interactions, supporting the ability of the N-terminal region to independently contribute to binding. These energy calculations take into account all non-covalent interactions. For clarity, hydrogen bonds have been specifically highlighted in the figure to represent key interaction interface.

      (3) There is no section in the materials and methods describing how the BLI was performed; this should be added. The highest concentration ofGBP130 used in the interaction measurements is 1.4uM, almost 100x the measured Kd (0.015uM) for the GBP130-LFA1 interaction. At these high concentrations ofGBP130, I would expect to start seeing saturation of binding, but the interferometry curves show that saturation is not close to being reached. This strongly suggests that the binding of GBP130 to LFA1 is non-specific.

      We thank the reviewer for raising these important technical points. We have included a detailed description of the biolayer interferometry (BLI) methodology in the Materials and Methods section in the manuscript. Regarding the concern about lack of saturation at higher analyte concentrations, we respectfully disagree that this necessarily indicates non-specific binding. In BLI assays, incomplete saturation can arise from several well-recognized factors, including suboptimal orientation or partial inaccessibility of immobilized ligand on the biosensor, mass transport limitations, or heterogeneous binding populations particularly relevant for integrins such as LFA-1, whose αI domain exists in multiple conformational states with distinct affinities. Importantly, the interaction exhibits clear concentration-dependent association and dissociation kinetics that fit a 1:1 binding model with a KD in the nanomolar range, which is inconsistent with non-specific interactions that typically show poor fitting and minimal dissociation. Furthermore, the specificity of the PfGBP-LFA-1 interaction is supported by multiple independent lines of evidence in our study, including selective enrichment in αI-domain pull-downs, absence in IgG controls, reduction upon CD11a knockdown, and functional inhibition by blocking antibodies in cellular assays. We have now clarified these points in the revised manuscript and tempered the interpretation to acknowledge potential experimental constraints of BLI while maintaining that the cumulative data strongly support a specific interaction.

      Minor points:

      (1) For the pulldown experiments, can the authors confirm that cross-linking was also performed for the protein A beads + hIgG control?

      Yes, DTSSP cross-linking was performed identically in the protein A beads + hIgG control arm. This is consistent with the control design described in the manuscript.

      (2) If the recombinant CD11a I subdomain used as a probe is correctly folded and functional, it should bind ICAM1. Do the authors have this data?

      We agree that ICAM-1 binding is an important functional validation for the recombinant CD11a αI probe (Hogg et. al., 1998). The isolated αI domain of LFA-1 is well established as the principal ICAM-1-binding module, and soluble αI-domain reagents have previously been shown to bind/block ICAM-1 interactions. We did not include this control in the current version.

      (3) Were the authors able to perform the reciprocal pull-down, using pfGBP130-N-Fc to pull down LFA1 from cell surfaces?

      We did not perform a reciprocal pull-down with PfGBP130-N-Fc and native cell-surface LFA-1 in the present study; we agree this would be a useful orthogonal experiment.

      (4) After identifying GBP130 as a co-purifying protein in the LFA-1 pull-down experiments, the authors select an N-terminal fragment of GBP130 to recombinantly express and use. How did the authors narrow down which region of GBP130interacted with LFA-1?

      The N-terminal PfGBP130 fragment (aa 69-270) was selected empirically as a tractable, soluble recombinant segment containing a defined repeat-containing extracellular region, rather than because we had already mapped the full LFA-1-binding interface. We agree with the reviewer that our structural model suggests that additional residues, including a likely dominant interface outside this fragment, may contribute to the full interaction, and we have clarified that the N-terminal fragment should be interpreted as a minimal binding-competent region, not necessarily the sole binding site.

      (5) As erythrocytes age, their surface undergoes biochemical changes, most notably a drop in levels of sialylation, decreasing the net repulsive negative charge, and they generally become more adherent. Can the authors exclude the possibility that, rather than binding to a parasite-derived ligand, LFA alpha 1 is instead binding to a marker of older erythrocytes? In the data presented, increased binding of LFA alpha 1 is observed as parasites progress through the life cycle, but the host erythrocytes will be ageing during parasite replication, which could account for the increased levels of LFA alpha 1 binding. To rule out this explanation, data from LFA alpha 1 staining of age-matched uninfected erythrocytes could be provided.

      We agree that erythrocyte aging can alter surface sialylation and adhesiveness, and loss of sialic acid is known to reduce erythrocyte surface charge and increase adhesiveness. However, our data argue against aging alone explaining the signal, because LFA-1 αI-Fc binding was compared with uninfected RBC controls and the interaction led to enrichment of a parasite-derived ligand, PfGBP130, in pull-down/MS analyses.

      (6) Figure 3b(i) Surface staining of THP1 cells was performed using GBP-130 Fc as a probe, which should detect all LFA1-positive cells. But no accompanying staining data using an anti-LFA1 antibody are shown, so it is not possible to determine whether staining profiles with GBP-130 Fc match staining profiles with anti-LFA1 antibodies. This is important to show what proportion of LFA1-positive cells can recognise parasite-derived GBP-130 Fc.

      (7) Figure 3c(i) Surface staining of peripheral NK cells is performed using GBP-130 Fc as a probe, which should detect all LFA1-positive cells. Here, as well, there are no staining data using an anti-LFA1 antibody. This would allow a comparison between cell population LFA1 staining with an anti-LFA1 antibody and cell population LFA1 staining with GBP-130 Fc. The two staining profiles should be similar as both probes bind the same surface marker. However, it appears this might not be the case because the staining data using GBP-130 Fc show that only a minor proportion of NK cells (~20%) stain positive, but the majority of peripheral NK cells usually express CD11a, as it is a key adhesion molecule in the formation of immune synapses with target cells. This suggests that GBP-130 can only bind to a subset of NK cells, and if it is binding LFA1, then it can only play a role in mediating the formation of an immune synapse with this subpopulation of NK cells. Could the authors include a comment in the manuscript making clear that the GBP-130 only assists a small proportion of NK cells in adhering to parasite-infected erythrocytes? Are there any reasonable hypotheses as to whyGBP-130 was only able to stain a small subpopulation of LFA1-expressing NK cells?

      For minor comment 6 and 7

      We agree that parallel staining with anti-CD11a would help relate PfGBP130-Fc binding to total LFA-1-positive THP-1 and NK-cell populations. Importantly, LFA-1 expression and ligand binding competence are not equivalent, because integrin binding depends strongly on activation/conformation and avidity state; in NK cells, only a subset can display LFA-1 in a partially activated conformation at baseline despite broader CD11a expression. Thus, a smaller PfGBP130-Fc-positive subset than the total CD11a-positive population is biologically plausible and does not imply inconsistency.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, the authors investigate LFP responses to methionine in the olfactory system of the Xenopus tadpole. They show that this response is local to the glomerular layer, arises ipsilaterally, and is blocked by pharmacological blockade of AMPA and NMDA receptors, with little modulation during blockade of GABA-A receptors. They then show that this response is translently enlarged following transection of the contralateral olfactory nerve, but not the optic lobe nerve. Measurement of ROS- a marker of inflammation- was not affected by contralateral nerve transection, and LFP expansion was not affected by pharmacological blockade of ROS production. Imaging biased towards presynaptic terminals suggests that the enlargement of the LFP has a presynaptic component. A D2 antagonist increases the LFP size and variability in intact tadpoles, while a GABA-B antagonist does not. On this basis, the authors conclude that the increase driven by contralateral nerve transection is due to DA signaling.

      Overall, I found the array of techniques and approaches applied in this study to be creatively and effectively employed. However, several of the conclusions made in the Discussion are too strong, given the evidence presented. For example, the authors state that "The observed potentiation was not related to inflammatory mediators associated to inury, because it was caused by a release of the inhibition made by D2 dopamine receptor present in OSN axon terminals." This statement is too strong - the authors have shown that D2 receptors are sufficient to cause an increase in LFP, but not that they are required for the potentiation evoked by nerve transection. The right experiment here would be to get rid of the D2 receptors prior to transection and show that the potentiation is now abolished. In addition, the authors have not shown any data localizing D2 receptors to OSN axon terminals.

      Similarly, the authors state, "the onset of LFP changes detected in glomeruli is determined by glutamate release from OSNs." Again, the authors have shown that blockade of AMPA/NMDA receptors decreases the LFP, and that uncaging of glutamate can evoke small negative deflections, but not that the intact signal arises from glutamate release from OSNs. The conclusions about the in vivo contribution of this contralateral pathway are also rather speculative. Acute silencing of one hemisphere would likely provide more insight into the moment-to-moment contributions of bilateral signals to those recorded in one hemisphere.

      We thank the reviewer for their positive evaluation of our manuscript. We agree with their opinion about the necessity of including new experimental evidence to back up discussion and conclusions

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is a creative and careful study, but I felt that the conclusions in the Discussion were too strong. I think these could either be toned down or additional experiments could be done to support the idea that D2 receptors are required for the nerve transection-evoked potentiation, that the source of glutamatergic input is OSNs, and that contralateral interactions are mediated by DA. In particular, I think anatomical stains showing which neurons are carrying the DA signal and whether there is any potentiation of DA release after nerve transection would greatly strengthen the conclusions.

      This new version of the manuscript contains two new figures: 6 and 9.

      New figure 6 addresses the suggestion of this reviewer and provides anatomical evidence for the distribution of dopaminergic neurons in the olfactory bulb of X. tropicalis tadpoles using a tyrosine hydroxylase antibody (mouse monoclonal, Immunostar cat. no. 22941, 1:250; RRID:AB_57226). We identified a discrete neuronal population present in the border between the mitral cell layer and the glomerular layer that resembles the type1 TH+ population described in adult frogs (Boyd and Delaney 2002). TH+ neurons send their processes to innervate olfactory glomeruli and we provide evidence that they contact the GFP lateral glomerulus labelled in Dre.mxn1:GFP X. tropicalis tadpoles (Fig. 6C). These results reinforce a modulatory role for dopamine on glomerular neurotransmission. Materials & methods (lines 152-167), results (lines 393-399) and discussion (lines 550-563) have been modified accordingly.

      Figure 9 provides new evidence on the interhemispheric connections involved in the potentiation of glomerular responses. We first demonstrate that dorsolateral pallial neurons participate in the processing of olfactory information based on the general consideration that the lateral pallium is an olfactory cortex. We confirmed this possibility by stimulating the olfactory epithelium and recording ipsilateral calcium transients in pallial neurons of tubb2b:GCaMP6s tadpoles. We next injured the dorsolateral pallium and 24-48h afterwards we recorded odor-evoked responses in the GFP labelled glomerulus located contralaterally. We observed a ~70% potentiation of responses, which was comparable to the ~75% potentiation obtained by olfactory nerve transection. These results illustrated the involvement of pallial neurons in the control of glomerular output by likely modifying the activity of TH+ neurons. The results (473-506) and discussion (569-576) now include these new results.

      Does the contribution of DA signalling change across development? I think this would be helpful to interpret the results and relatively straightforward to do: apply raclopride at different developmental stages and measure how much potentiation occurs at each stage.

      This is indeed an interesting point, but conducting a comprehensive study of dopamine release throughout development would require a substantial amount of work and delay the publication of this paper. To perform these experiments, we should first implement new technical approaches, such as successfully injuring young tadpoles or recording from late premetamorphic stages. We believe that the proposed experiments could define a new line of arguments rather than complement the present work. Nonetheless, we acknowledge the suggestion of this reviewer.

      In this new version, we provide strong evidence for dopamine release in the glomerular layer, and a key question that arises is the nature of TH+ positive neurons. Recent findings obtained in mice show that there are five different types of dopaminergic interneurons present in the olfactory bulb (Kosaka, Pignatelli, and Kosaka 2020), and important functional differences exist between axon-bearing and anaxonic neurons (Dorrego-Rivas et al. 2025). This evidence suggests a key role for development. A completely new study based on transgenic X. tropicalis displaying labeled TH+ neurons could bring together development, anatomy, and physiology to gain an understanding of how dopaminergic signaling shapes glomerular function.

      In addition, there are several places where showing additional raw data in the figures and carefully quantifying variability would be helpful. For example, in Figure 3B, the authors should show equivalent raw traces from intact and transected tadpoles. In Figure 5D, it would be helpful to show raw traces for LFP equivalent to what is shown for presynaptic imaging in Figure 5E. In Figures 6E-F, it would be helpful to show raw traces.

      Thank you for this suggestion. The examples have been added to the figure panels.

      I found the last experiment with photobleaching somewhat inconclusive, and I am not sure what it adds to the study as presently written. Line 418: Please quantify how many OSNs remained. Line 423: What is the hypothesis for the source of variability?

      The goal of this experiment is to investigate the participation of chemotopy in the potentiation induced by contralateral injury. The elimination of 30-50% of topographically related OSNs did not alter contralateral glomerular responses. This evidence suggests that chemotopy was not relevant to the gain of function observed ; however, we cannot completely rule out a certain topographical contribution, as it was not possible to completely silence all inputs of the studied glomerulus. We now link these findings to the likely innervation of several glomeruli by TH+ neurons, which suggests the absence of a one-to-one glomerulus relationship. LFP amplitudes and their variance are now illustrated in box plots to highlight the absence of significant differences. Lines (457-471).

      An increase in the variance among the recordings obtained is a consistent empirical observation. Although it is a hallmark of the potentiation recorded, we cannot provide a mechanistic explanation. Considering that neurotransmitter release from OSN axon terminals is normally inhibited by dopamine, we hypothesize that disinhibition drives an increase in release probability , leading to larger variations in glutamate release. Such variations could be reflected in the amplitude of LFP negativities.

      It would be helpful to include a measurement of LFP over time so we have some idea of how stable the odor delivery is.

      The amplitude of LFP responses was stable for >30 min. Figure 3B shows recordings obtained during 30 min and new Figure 7F over 42 min. We believe that these examples illustrate that the amplitude, as well as kinetics of the responses obtained were consistent over the period studied.

      Line 227: Small upward deflection - could this be an electrical artifact? Can you run the stimulus delivery with no odor (say, with water) to see if you get the same signal?

      We do not know the precise source of this upward deflection. It is not an electrical artifact related to stimulation, which is sometimes evident (Fig 7A, methionine application). When present, it occurs after the activation of OSNs. One possibility is that the deflection originates in the layer of nerve fibers reflecting some aspect related to the conduction of APs and the relative position of the electrode. Interestingly, some recordings of LFP responses at the level of glomeruli carried out in rats also show a positive deflection (see Figs. 1B, 2A, 3B in (Lecoq, Tiret, and Charpak 2009), thus suggesting it is an intrinsic characteristic of this type of recordings.

      Line 237-239: I wasn't clear from the text whether this was a variation due to development, to transection, or natural variability.

      We now indicate that the relationship reflects normal development (lines 261-264).

      Line 521: N-type VGCCs: can these be targeted with pharmacology to strengthen the argument?

      We acknowledge this suggestion but we have not carried out these experiments as we believe that the interpretation could be complex due to the high density of synapses present in glomeruli and the likely involvement of other types of VGCCs in neurotransmitter release.

      Small issues:

      (1) Line 190-196: Some of this could potentially be moved to the Discussion section.

      These are some arguments to defend the validity of our experimental approach to record the response of the lateral glomerulus labeled by GFP. If we move them to the discussion, the information related to the spatial extent of our recordings would be split between results and discussion. We believe that the current format of the paper allows to focus the discussion on the interpretation of the results obtained.

      (2) Line 268: exponential recover phase.

      Thanks. Corrected.

      (3) Line 278: affected to -> arises from

      Thanks. Corrected.

      (4) Line 282: affect to -> can affect.

      Thanks. Corrected.

      (5) Line 403: 2Phatal technique: Please state briefly what this is

      It is now indicated: two-photon chemical apoptotic targeted ablation (2Phatal).

      NOTE:

      During the revision of this manuscript we realized that Figures 3C and 4B indicated mean±SD. The panels have been amended to show mean±s.e.m.

      References

      Boyd, J. D., and K. R. Delaney. 2002. "Tyrosine hydroxylase-immunoreactive interneurons in the olfactory bulb of the frogs Rana pipiens and Xenopus laevis." J Comp Neurol 454 (1):42-57. doi: 10.1002/cne.10428.

      Dorrego-Rivas, A., D. J. Byrne, Y. Liu, M. Cheah, C. Arslan, M. Lipovsek, M. C. Ford, and M. S. Grubb. 2025. "Strikingly different neurotransmitter release strategies in dopaminergic subclasses." Elife 14. doi: 10.7554/eLife.105271.

      Kosaka, T., A. Pignatelli, and K. Kosaka. 2020. "Heterogeneity of tyrosine hydroxylase expressing neurons in the main olfactory bulb of the mouse." Neurosci Res 157:15-33. doi: 10.1016/j.neures.2019.10.004.

      Lecoq, J., P. Tiret, and S. Charpak. 2009. "Peripheral adaptation codes for high odor concentration in glomeruli." J Neurosci 29 (10):3067-72. doi: 10.1523/JNEUROSCI.6187-08.2009.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      An ongoing controversy in the field of learning and memory is the specific neural mechanism that maintains long-term memory (LTM). A prominent hypothesis proposed by Sacktor and Fenton and their colleagues is that LTM is maintained by the ongoing activity of the atypical PKC isoform PKMζ. Early evidence in support of this hypothesis came from experiments showing that an inhibitory peptide, ZIP, whose activity was purported to be specific for PKMζ, blocked late-phase hippocampal LTP (L-LTP) and LTM. However, in 2013, two articles reported that LTM was normal in PKMζ knockout mice and that ZIP erased LTM in the knockout mice, indicating that ZIP lacked specificity for PKMζ. In response, Sacktor and Fenton and colleagues reported in 2016 that in PKMζ null mice, there is an increase in the expression of PKC𝜾/𝛾, a related isoform of atypical PKC, and this increased expression can compensate for PKMζ; their data indicated that the upregulation of PKC 𝜾/𝛾 mediates L-LTP and LTM in the PKMζ. In the present article, the authors provide additional support for this idea. They replicate the finding of an upregulation of PKC 𝜾/𝛾 expression in the hippocampus of PKMζ knockout mice; in addition, they show that the expression of several other PKC isoforms is upregulated in the knockouts. They find that down-regulation of PKC𝜾/𝛾 expression in the hippocampus using the Cre-LoxP technology, the 2016 paper merely used an inhibitor to block the activity of PKC𝜾/𝛾-blocks L-LTP. Finally, the authors demonstrate that, although LTM is preserved in the single PKMζ knockout mouse, it is eliminated in the PKMζ/PKC𝜾/𝛾 double knockout mouse.

      Strengths:

      The experiments appear to have been carefully executed, the results reliable, and the paper well-written. Overall, the article provides significant additional support for the idea that the activity of PKMζ is critical for the maintenance of hippocampal L-LTP and LTM. The article uses genetic methods, rather than simply pharmacological ones, to demonstrate that when PKMζ is genetically deleted, PKC𝜾/𝛾, compensates for the missing PKCζ.

      Weaknesses:

      The paper sets up what I believe is probably a false dichotomy between a structural explanation - a change in the number of synaptic connections among neurons - and the persistent kinase activity explanation for memory maintenance. Why are these two explanations necessarily antithetical? It is possible that an increase in synaptic connections and the ongoing activity of PKMζ both contribute substantially to memory maintenance. The authors certainly don't provide any evidence that the number of synapses in the hippocampus remains unchanged after the induction of L-LTP or LTM. Indeed, I see no reason why persistent PKMζ activity could not be a mechanism for the maintenance of an enhanced number of synaptic connections following the induction of LTP/LTM. To the best of my knowledge, this possibility has not yet been explored. Consequently, I don't see why the present results would lead one to favor a biochemical explanation over a structural one for memory maintenance. Given the significant experimental evidence that LTM involves persistent structural changes in neurons, both explanations are equally plausible at present.

      As requested, we eliminated the discussion of a dichotomy between structural and biochemical mechanisms of long-term memory in the Abstract and Introduction. We now briefly address the relationship between the two hypotheses, which are not mutually exclusive, in the Discussion.

      Reviewer #2 (Public review):

      Summary:

      The authors are attempting to advance understanding of the role of unconventional PKCs, PKCM𝛇, and PKC𝜄/𝝀 in maintenance of late-phase LTP. Their results help to clarify the interplay between "structural" and "biochemical/enzymatic" mechanisms of LTP and learning in the hippocampus.

      Strengths:

      A strength is the use of conditional knock-outs of PKCM𝛇 and PKC𝜄/𝝀 to assess the role of these two enzymes in maintaining long-term potentiation and in compensating for each other when one of them is conditionally knocked out in the adult.

      Weaknesses:

      The paper is extremely difficult to read because the abstract does not clearly state the advances made over earlier studies by the use of conditional KO mutation. For example, in line nine of the abstract, the authors state, "Here, we found PKC𝜄/𝝀 persists in LTP and long-term memory when PKM𝛇 is genetically deleted." This is confusing because it sounds as though the experiments have repeated earlier published experiments in which the gene encoding PKM𝛇 is deleted in the embryo. The authors are not clear throughout the manuscript that they are using conditional KO of the two enzymes in the adult animal, rather than deletion of the gene. The term "genetically deleted" does not mean "conditionally deleted in the adult." The final sentences of the abstract are: "Whereas deleting PKM𝛇 and PKC𝜄/𝝀 individually induces compensation, deleting both aPKCs abolishes hippocampal late-LTP. Hippocampal 𝜄/𝝀-𝛇 -double-knockout eliminates spatial long-term memory but not short-term memory. Thus, in the absence of PKM𝛇 , a second persistent biochemical process compensates to maintain late-LTP and long-term memory." These sentences do not convey a clear logical conclusion. The Discussion does a better job of stating the importance of the experiments.

      We have clarified the genotypes of the mice in the abstract and throughout the text.

      Reviewer #3 (Public review):

      Summary:

      The manuscript addresses an important, yet unresolved and long-debated, question: whether atypical protein kinase C is required for the maintenance of late-long-term synaptic potentiation (L-LTP) and long-term memory (LTM). The authors confirm previous findings that persistent activity of PKMζ is required for hippocampal L-LTP and spatial memory. They demonstrate that genetically deleting PKCι/λ and PKMζ individually induces compensatory upregulation, whereas deleting both atypical PKCs abolishes hippocampal L-LTP spatial long-term memory. The study uses an elegant combination of immunoblots, electrophysiology, and behavioral assays. The use of Cre-recombinase to target specific hippocampal regions and neurons adds to the rigor of the findings.

      Strengths:

      The manuscript addresses an important, yet unresolved and long-debated, question; whether PKMζ is required for the maintenance of L-LTP and LTM. The study demonstrates that PKCι/λ, which was previously shown to be critical for the initial generation of the early phase of LTP and short-term memory, becomes persistently active in L-LTP and LTM in a PKMζ knock-out model, compensating for the loss of PKMζ. Furthermore, when the compensation mechanisms are eliminated by simultaneous deletion of both PKMζ and PKCι/λ, maintenance of LTP and long-term spatial memory, but not of short-term memory, is diminished. The strength of this study is that the authors used a double-knockout strategy to directly address the controversy concerning the roles of PKMζ in memory formation. By showing that PKCι/λ compensates when PKMζ is deleted, the authors provided a compelling explanation for previous contradictory findings.

      Weaknesses:

      (1) The authors should provide the numerical values for all data.

      (2) It appears that blind procedures were only used for the behavioral experiments. Some explanation is warranted.

      (3) The description of the immunoblotting procedures lacks sufficient detail. The authors state that immunoblots were stained with multiple antisera to visualize multiple PKCs on the same immunoblot. To conserve antisera, the immunoblots were cut to isolate the relevant proteins based on molecular weight. Isoforms with similar molecular weights were either stained with antisera of different species or on separate blots. Despite this explanation, it is unclear how immunoblotting was performed in practice. For example, in Figure 1B, the authors compared the changes of four conventional PKC isoforms. Because all four antibodies are mouse monoclonal antibodies recognizing proteins of similar molecular weights, each probing should presumably have its own actin loading controls. However, these controls are missing from the figure. Some clarification is warranted.

      (4) The statement in the legend to Figure 4B, that the increases of maximum avoidance time from pretraining to trial 1 are not different, indicates both groups of mice successfully established short-term memory, which is not correct. The analysis only reveals that there is no difference between the two groups. No differences could be due to both groups learning the same, as the authors suggest, or alternatively to no learning in either group.

      (5) The labeling on some of the illustrations (e.g., Figure 2B) is unreadable.

      (6) In Figure 4B, only the single statistical comparison between "pretaining" and "1 trial" is shown. The other comparisons described in the legend should also be illustrated.

      (7) There is no documentation to support the statement that "The prevailing textbook mechanism for how memory is retained asserts that stable structural changes at synapses, the result of initial protein synthesis and growth, sustain memory without the need for ongoing biochemical activity dedicated to storing information" or for the statement in the Discussion that the structural model of memory storage is the standard account.

      (1) Numerical data used in statistical analyses are now provided for LTP experiments in Figure 4 figure supplement 1. Numerical values for all other experiments are presented in the figures.

      (2) Blind procedures were performed for all experiments except for LTP experiments that involved the transfection of eGFP as control, as the eGFP could be detected visually in the hippocampal slice by the experimenter. This is now clarified in the Statistics section of the Methods.

      (3) The description of immunoblotting was clarified in the Methods, and actin loading controls presented for all immunoblots in Figure 1 and Figure 1 figure supplements 1 and 2.

      (4) Short-term memory (Figure 5B) is now determined by 2 methods. First, we show that for both groups the times to enter the shock zone increase in the first training trial, as compared to the pretraining session with the shock off. The increases are not different between the groups. Second, we show increases of the maximal avoidance time from pretraining to trial 1 for both groups are significant, and that the increases are not different. These data show that short-term memory was present in both groups and not measurably different between the groups.

      (5) The fonts of the figure labels were enlarged.

      (6) The comparisons between pretraining and training trial 1 and between training trials 1 and 3 for the two groups are now shown in Figure 5B.

      (7) We abbreviated our discussion of the structural model, which is now presented at the end of the Discussion (as per Reviewer 1), and removed the comment that it is the prevailing view, stating instead that the hypothesis is “widely held.”

      Additional points: As requested, the timing of tamoxifen injections and tissue collection for immunohistochemistry is clarified in the protocol schematic of a new Figure 2A and Figure 2A legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study examines the evolution of virulence and antibiotic resistance in Staphylococcus aureus under multiple selection pressures. The evidence presented is convincing, with rigorous data that characterizes the outcomes of the evolution experiments. However, the manuscript's primary weakness is in its presentation, as claims about the causal relationship between genotypes and phenotypes are based on correlational evidence. The manuscript needs to be revised to address these limitations, clarify the implications of the experimental design, and adjust the overall narrative to better reflect the nature of the findings.

      Thank you for your feedback. Here, we summarize the major changes made in the revised manuscript:

      (1) We did not test causality between mutations and phenotypes in our study. We were intentional about not using causal wording (“mutation X caused/led to/resulted in phenotype Y”), and only discussed these results using the terms “correlation” and “association”, and only when they were statistically significant. We understand that some readers may view these terms as being equivalent to “causation”, thus in the revision, we have modified our wording as suggested (please see below for specific lines).

      (2) We agree that experimental evolution in nematodes is not a direct simulation of evolution in humans. The goal of our study was first and foremost, a test of how multiple selective pressures can shape pathogen evolution. This point was presented in the first paragraph, the second to last paragraph of the Introduction (which included our hypotheses), and the last paragraph of the manuscript. References to humans and other mammalian systems were intended to point out similarities between our findings and what had already been found in S. aureus outside the lab. Despite differences between mammals and nematodes, several parallels arose at both the phenotypic and genomic levels, which is interesting from an evolutionary standpoint. We understand that more experiments and tests would be needed before we can make claims about the selective pressures acting on S. aureus outside the lab. We presented some information in the context of humans because a large part of the literature on S. aureus is on its role as a major bacterial pathogen; we did not want to neglect this aspect of its natural life history.

      In the revised manuscript, we are more explicit in stating these points, as well as tempering some language regarding human infection, and removing some references to humans. Please see below for specific lines as well as justification for specific references to humans/mammalian systems.

      (3) We have including additional details on the experimental design below. We hope this is sufficiently clarifying.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate how methicillin-resistant (MRSA) and sensitive (MSSA) Staphylococcus aureus adapt to a new host (C. elegans) in the presence or absence of a low dose of the antibiotic oxacillin. Using an "Evolve and Resequence" design with 48 independently evolving populations, they track changes in virulence, antibiotic resistance, and other fitness-related traits over 12 passages. Their key finding is that selection from both the host and the antibiotic together, rather than either pressure alone, results in the evolution of the most virulent pathogens. Genomically, they find that this adaptation repeatedly involves mutations in a small number of key regulatory genes, most notably codY, agr, and saeRS.

      Strengths:

      The main advantage of the research lies in its strong and thoroughly replicated experimental framework, enabling significant conclusions to be drawn based on the concept of parallel evolution. The study successfully integrates various phenotypic assays (virulence, growth, hemolysis, biofilm formation) with whole-genome sequencing, offering an extensive perspective on the adaptive landscape. The identification of certain regulatory genes as common targets of selection across distinct lineages is an important result that indicates a level of predictability in how pathogens adapt.

      Thank you very much.

      Weaknesses:

      (1) The main limitation of the paper is that its findings on the function of specific genes are based on correlation, not cause-and-effect evidence. While the parallel evolution evidence is strong, the authors have not yet performed the definitive tests (i.e., reconstruction of ancestral genes) to ensure that the mutations identified in isolation are enough to account for the virulence or resistance changes observed. This makes the conclusions more like firm hypotheses, not confirmed facts.

      We have replaced instances of “association” and “correlation” with wording similar to that suggested where applicable, including:

      L 342 – 344: “The loss of SCCmec and ACME was more often identified in populations exhibiting an increase in total growth from the ancestor outside the host…”

      L 371 – 375: “Mutations in three genes were regularly identified in populations exhibiting significant increases in virulence from the ancestor: codY, gdpP, and pbpA. Mutations in agr in general were not associated with changes in overall virulence, but MSSA populations harboring mutations in this gene were more likely to exhibit greater virulence compared to MRSA populations (Wilcoxon rank sum exact test P = 0.045).”

      L 377: “Mutations in specific genes were often found in populations able to hemolyze red blood cells…”

      L 379 – 381: “There were also significant differences between the mutations regularly identified in oxacillin-resistant populations evolved from the MSSA ancestor...”

      L 384 – 385: “By contrast, mutations in agr were often in populations exhibiting loss of hemolytic activity, consistent with previous findings...”

      L 409 – 410: “Mutations that arose during experimental evolution are regularly found in strains associated with human systemic infections.”

      We have also stated that ancestral reconstruction is needed:

      L 553 – 555: “Future experiments may include introducing these mutations into the ancestral background to directly link the mutations in these genes to evolved virulence.”

      (2) In some instances, the claims in the text are not fully supported by the visual data from the figures or are reported with vagueness. For example, the display of phenotypic clusters in the PCA (Figure 6A) and the sweeping generalization about the effect of antibiotics on the mutation rates (Figure S5) can be more precise and nuanced. Such small deviations dilute the overall argument somewhat and must be corrected.

      In reference to Fig. 6A, we have revised the statement as suggested: “…where populations exposed to host and sub-MIC oxacillin clustered together, largely separating from all other treatments…” Line 442

      In reference to Fig. S5, we conducted statistics to include both MRSA and MSSA populations and examined the effect of oxacillin on the number of mutations. While oxacillin had a significant effect on the number of mutations, we agree with the reviewer that this may be driven by the MRSA populations and have clarified: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence ( = 5.92, P = 0.015), although this is likely driven by MRSA populations.” Lines 310 – 311

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes the results of an evolution experiment where Staphylococcus aureus was experimentally evolved via sequential exposure to an antibiotic followed by passaging through C. elegans hosts. Because infecting C. elegans via ingestion results in lysis of gut cells and an immune response upon infection, the S. aureus were exposed separately across generations to antibiotic stress and host immune stress. Interestingly, the dual selection pressure of antibiotic exposure and adaptation to a nematode host resulted in increased virulence of S. aureus towards C. elegans.

      Strengths:

      The data presented provide strong evidence that in S. aureus, traits involved in adaptation to a novel host and those involved in antibiotic resistance evolution are not traded off. On the contrary, they seem to be correlated, with strains adapted to antibiotics having higher virulence towards the novel host. As increased virulence is also associated with higher rates of haemolysis, these virulence increases are likely to reflect virulence levels in vertebrate hosts.

      Weaknesses:

      Right now, the results are presented in the context of human infections being treated with antibiotics, which, in my opinion, is inappropriate. This is because

      (1) exposure to the host and antibiotics was sequential, not simultaneous, and thus does not reflect the treatment of infection, and

      (2) because the site of infection is different in C. elegans and human hosts.

      We have removed the two sentences referencing site of infection:

      Introduction: “In the host, antibiotic concentrations will gradually decline after administration due to metabolism and excretion.”

      Discussion: “…in addition to infection of antibiotic-treated hosts, where there is uneven distribution of drugs across tissues.”

      For our rationale for discussing humans in general, please see below.

      Nevertheless, the results are of interest; I just think the interpretation and framing should be adjusted.

      Thank you very much.

      Reviewer #3 (Public review):

      Summary:

      Su et al. sought to understand how the opportunistic pathogen Staphylococcus aureus responds to multiple selection pressures during infection. Specifically, the authors were interested in how the host environment and antibiotic exposure impact the evolution of both virulence and antibiotic resistance in S. aureus. To accomplish this, the authors performed an evolution experiment where S. aureus was fed to Caenorhabditis elegans as a model system to study the host environment and then either subjected to the antibiotic oxacillin or not. Additionally, the authors investigated the difference in evolution between an antibiotic-resistant strain, MRSA, and an isogenic susceptible strain, MSSA. They found that MRSA strains evolved in both antibiotic and host conditions became more virulent, and that strains evolved outside these conditions lost virulence. Looking at the strains evolved in just antibiotic conditions, the authors found that S. aureus maintained its ability to lyse blood cells. Mutations in codY, gdpP, and pbpA were found to be associated with increased virulence. Additionally, these mutations identified in these experiments were found in S. aureus strains isolated from human infections.

      Strengths:

      The data are well-presented, thorough, and are an important addition to the understanding of how certain pathogens might adapt to different selective pressures in complex environments.

      Thank you very much.

      Weaknesses:

      There are a few clarifications that could be made to better understand and contextualize the results. Primarily, when comparing the number of mutations and selection across conditions in an evolution experiment, information about population sizes is important to be able to calculate the mutation supply and number of generations throughout the experiment. These calculations can be difficult in vivo, but since several steps in the methodology require plating and regrowth, those population sizes could be determined. There was also no mention of how the authors controlled the inoculation density of bacteria introduced to each host. This would need to be known to calculate the generation time within the host. These caveats should be addressed in the manuscript.

      While the population sizes within hosts and generation time could be determined, we would need to conduct additional experiments (e.g., infecting nematodes with S. aureus, then crushing, plating, and counting colony forming units across time intervals) in order to obtain measurements for pathogen growth in hosts across time. For experimental evolution, we crushed a set number of dead nematodes (30) and all bacteria that were released were allowed to grow in liquid media before an aliquot (25%) was used to seed the next passage. Picking and crushing nematodes across 48 populations for one time point was an arduous task. The additional steps of picking, crushing, and plating nematodes across multiple time intervals at the same time experimental evolution was being performed would not be logistically sound.

      In terms of the inoculation density of bacteria, all nematodes were placed on abundant lawns of S. aureus. Nematodes were exposed to full lawns the entire infection step; bacteria remained in abundance. While we do not know the exact inoculum each individual nematode was exposed to, we know that they ingested the bacteria because of the high mortality rate. Furthermore, we followed the same procedure for every replicate across every host-associated treatment. Host individuals within and across passages were also genetically identical to one another. Altogether, these factors allowed for more consistency across the experiment, such that relative inoculum size should be similar across individual hosts. Please refer to the evolution experiment diagram (Author response image 1) for more details.

      Ultimately, while knowing the absolute population size, inoculum size, and generation time within the host is interesting, the rounds of selection (the number of times each population was exposed to the selective pressures) is also important in addressing our major question. Every treatment, which started out from one ancestral clone (MRSA or MSSA), was exposed to the same number of bouts of selection (passages), yet we see significant divergence in terms of traits and mutations. Future directions would certainly involve determining the number of steps (e.g., number of generations within hosts) required to reach these end points, but not knowing exactly how many steps were required do not detract from addressing the larger question of determining how pathogens respond to multiple selective pressures.

      Another concern is the number of generations the populations of S. aureus spent either with relaxed selection in rich media or under antibiotic pressure in between the host exposure periods. It is probable then that the majority of mutations were selected for in these intervening periods between host infection. Again, a more detailed understanding of population sizes would contribute to the understanding of which phase of the experiment contributed to the mutation profile observed.

      We conducted every step of the evolution experiment on the same timeline. For example, all replicates across treatments were grown in liquid media at the same time (see Author response image 1.). All populations were exposed to the same selective pressures at this step of the experiment. We can then compare populations that were subsequently exposed to hosts against those that were not. Populations passaged without a host served as the control. Mutations that were solely unique to host-exposed populations would more likely contribute to the traits of interest, compared to mutations that were in common between the host-exposed and no-host treatments. Similar comparisons could be made with the oxacillin-exposed and no-oxacillin populations.

      In general, the only differences between treatments would be driven by the treatments themselves. Given that we are interested in treatment-level effects, any differences in population size or generation time between treatments could contribute to the treatment effects we observe, and thus were not something we aimed to hold uniform across our experiment.

      Author response image 1.

      Schematic of procedural steps involved in one passage of S. aureus through nematodes (+host -ox) compared to without nematodes (-host -ox).

      Recommendations for the authors:

      Reviewing Editor Comments:

      We encourage you to address all other comments raised by the reviewers; however, the review team has identified the following points as the most critical and fundamental to improve your manuscript:

      (i) Reframing the narrative: You will need to adjust the narrative so that the study is presented as a "proof of principle" rather than a direct simulation of a human infection.

      While we referenced human infection, we believe the study had been presented as a proof of principle. Examples include:

      (1) We discussed the gap of knowledge in the first paragraph: “It is unclear how virulence evolves in the face of more than one selective pressure and whether this trait is constrained or facilitated by antibiotic resistance.” Lines 86 – 88

      (2) In the second to last paragraph in the Introduction, we presented the main hypotheses: “Adaptation may require resources to be expended toward either virulence or antibiotic resistance, leading to a trade-off between these traits (Ferenci, 2016). Alternatively, weaker selection from sub-MIC antibiotics may interact synergistically with hosts and facilitate the evolution or maintenance of high virulence and antibiotic resistance.” Lines 176 – 179

      (3) The last paragraph concluded with “Our findings ultimately emphasize the importance of considering the host context in the evolution of antibiotic resistance. Integrating multiple traits, such as virulence, antibiotic resistance, and fitness may be critical in identifying the factors that facilitate host shifts and persistence of drug-resistant pathogens.” Lines 613 – 616

      These paragraphs, which set up the context for our work, did not primarily discuss human infections.

      In the revised manuscript, we have further tempered language regarding human infection:

      L 169 - 172: “Experimentally evolving S. aureus in C. elegans thus allows us to track the early stages of virulence and antibiotic resistance evolution in novel host populations with the potential to identify conserved genomic regions underlying evolved traits.”

      L 595 – 596: “Additional direct tests are needed to evaluate the role of these mutations in adaptation of S. aureus to different infection sites.”

      L 610 – 611: “Pathogen evolution in a tractable invertebrate animal model yielded phenotypes and genotypes similar to those identified in mammalian hosts, highlighting the utility of evolution experiments to identify potential ecological and genetic mechanisms that may give rise to pathogen traits conserved across systems.”

      And removed some references to humans:

      In the Introduction: “In the host, antibiotic concentrations will gradually decline after administration due to metabolism and excretion.”

      In the Discussion: “…in addition to infection of antibiotic-treated hosts, where there is uneven distribution of drugs across tissues.”

      Otherwise, our rationale for referencing humans/mammalian systems in our Introduction include:

      Setting the context of our study system: we discussed humans and clinical significance when we first introduced S. aureus (lines 132 – 151) and experimental evolution (lines 153 – 172). Much of what is known about S. aureus outside the lab is when it is interacting with humans, thus we weaved in relevant information that has been discovered in other organisms.

      Hemolysis: This ability is important for S. aureus virulence toward C. elegans (Sifri et al., 2003).

      S. aureus genomic database: we intended to leverage this large-scale database of genomes isolated from S. aureus outside the lab to compare patterns emerging from experimental evolution to those in existing isolates. Due to its relevance as a major bacterial pathogen, most of the isolates happen to be from clinical settings.

      (ii) Adjusting the causal language: You will need to soften the language so that correlational claims do not appear to be causal.

      We have adjusted language as noted above.

      (iii) Clarifying methodological aspects: You will need to provide more details on the methodology, such as population sizes, and clarify the implications of these in the conclusions of the work.

      We have provided additional explanation of methodology and the role of control (no host) treatments above.

      Reviewer #1 (Recommendations for the authors):

      The paper is robust, and the study is of great significance. Tackling the subsequent issues would greatly enhance the paper and elucidate its findings.

      Major Recommendations:

      (1) Revising Causal Language: The main flaw of the manuscript lies in its presentation of correlational data as if it were causal. We highly suggest a thorough review of the text to soften causal language when connecting genotypes to phenotypes. The absence of ancestral reconstruction should be recognized as a constraint. Assertions ought to be presented as robust, evidence-based hypotheses. For instance, rather than saying a mutation "associated with significant increases in virulence," you might say "was regularly identified in groups that developed increased virulence, strongly suggesting this gene's role in the adaptation." This will more precisely clarify the contribution of the work.

      We have softened language and stated that ancestral reconstruction is needed as noted above.

      (2) Expand on Parallel Mutations: The examination of parallel evolution in Figure 4A is intriguing but would be notably stronger with additional details. I suggest including an additional supplementary figure or table detailing the specific non-synonymous mutations identified in the highly parallel genes (e.g., codY, agr, gdpP). It is essential for the reader to understand whether parallel evolution is happening at the gene level (different mutations in a single gene) or at the nucleotide level (the precise same mutation appearing again). Kindly specify if any of these mutations were nonsense mutations, as this suggests that the loss-of-function is advantageous.

      The full table of mutations is in fig share (10.6084/m9.figshare.28745558). We have added a Supplemental Table (Table S2) containing mutations in genes occurring in more than two populations. Many of these mutations were not the same, indicating parallel evolution at the gene level (lines 315 – 317).

      Minor Recommendations for Clarity and Accuracy:

      (1) Introduction:

      Lines 176-177: Please add a citation for the statement describing the function of the SCCmec cassette, as this is established knowledge.

      Done.

      (2) Results:

      Section Title (Line 254): The title "Host and sub-MIC antibiotic promoted growth..." is imprecise. Figure 3B shows that it is the combination of these factors that promotes growth in MRSA, while oxacillin alone is detrimental. Please revise the title to reflect this synergistic effect.

      “Synergistically” has been added to the title: “Host and sub-MIC antibiotic synergistically promoted growth of MRSA…” Lines 269 – 270

      Lines 261-263: The description of Figure 3B is incomplete. The text should explicitly state that the -host+ox treatment resulted in the lowest growth for MRSA, which provides a critical contrast and suggests a fitness cost.

      We have added “By contrast, exposure to sub-MIC oxacillin alone yielded the lowest growth, suggesting a fitness cost.” Lines 277 – 278

      Line 294: The claim that "Sub-MIC oxacillin selection also resulted in more mutations" is a generalization not supported for the MSSA genotype, according to Figure S5. Please revise this sentence to specify that this effect was observed in the MRSA populations.

      We have clarified: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence ( = 5.92, P = 0.015), although this is likely driven by MRSA populations.” Lines 310 – 311

      Lines 419-421: The claim that the +host+ox populations in Figure 6A "formed a distinct cluster" is an overstatement, as there is visible overlap with one other treatment (e.g., host-ox). Please revise this to more accurately describe the visual data (e.g., "clustered together, largely separating...").

      We have revised the statement as suggested: “…where populations exposed to host and sub-MIC oxacillin clustered together, largely separating from all other treatments…” Lines 442 – 443

      Lines 422-424: The interpretation of the MRSA PCA (Figure 6A) focuses on the correlation between virulence and sub-MIC growth. However, the correlation between "biofilm production" and "growth without oxacillin" appears visually stronger. Please address this correlation as well for a more complete interpretation.

      We have added “For MRSA populations, biofilm production and growth without oxacillin also appeared to be positively correlated.” Lines 447 – 448

      (3) Discussion:

      Lines 469-470: The statement that "exposure to oxacillin resulted in pathogens causing the greatest host mortality" is imprecise. The data in Figure 2A show that it is the combination of host and oxacillin. Please revise this for accuracy and add a direct citation to Figure 2A here.

      We have added clarification: “Nonetheless, we observed differing evolutionary trajectories, where exposure to oxacillin in host-associated treatments resulted in pathogens causing the greatest host mortality.” Lines 496 – 498

      Reviewer #2 (Recommendations for the authors):

      After reviewing the paper and reading the previous reviews from PLoS Biology, my biggest criticism of the paper is the way the story is told. In principle, the results are interesting and relevant, but the analogy to human infection and immune system/ antibiotic treatment strategies does not fit entirely with the experimental design or the results. I think the motivation needs to be reframed. In the study, antibiotic exposure is purely environmental, i.e., not in the host. How does environmental antibiotic use affect in vivo evolution, as this is not tested? As previous reviewers have pointed out, S. aureus is not an enteric pathogen in humans but most often causes skin infections. Furthermore, much of the results and discussion is focused on haemolysis of red blood cells, a cell type that C. elegans does not have. What the paper does present, on the other hand, and something that is interesting and novel, is a test in a model system of how a bacterial pathogen evolves to competing selection pressures. I might have hypothesised a priori that these competing pressures result in trade-offs, something which there is no evidence of, even though growth rate does not appear to be negatively impacted as a consequence of selection for drug resistance and virulence together. Instead, many traits are correlated and seemingly at the mechanistic level. This is cool and is a proof of principle, even if the system does not completely mirror reality, and I think the story should be told as such.

      We agree entirely with the reviewer that testing how pathogens respond to multiple selective pressures and the resulting lack of trade-offs are significant and interesting. We presented this question (lines 86 – 88) and our hypothesis about such trade-off in the Introduction (lines 176 – 179). As stated above, we had framed our paper to highlight these points and have removed references to antibiotic concentrations in treated humans.

      We measured and discussed hemolysis because it is important for virulence toward C. elegans (lines 195 – 197) (Sifri et al., 2003). We believe our manuscript contained a reasonable discussion of this trait. For example, three panels of the main figures presented the main hemolysis results (Figures 2B, 2C, and 2D), whereas 23 other panels did not at all involve hemolysis. In the Discussion, hemolysis took up half of the shortest paragraph (lines 509 – 519) and an additional sentence (line 589 – 591), out of seven total paragraphs.

      Specific comments:

      (1) L137-138. Can S. aureus really survive for long periods of time outside of the host? Can you clarify this statement? Do you mean it is an opportunistic pathogen and can also replicate in the environment?

      S. aureus can form biofilms and persist for weeks on inert surfaces (Kramer et al., 2024; Tran et al., 2023), indicating that it may replicate in non-host environments. We have included the phrase “opportunistic pathogen” to clarify (line 145).

      (2) L187 - to ascertain

      Corrected.

      (3) Figure 2B - there seems to be a benefit of haemolysis activity to oxacillin resistance, perhaps a crossover in mechanism? In MSSA, without a host, it goes to complete fixation, whereas it is completely lost when antibiotics aren't present. I know this is discussed later, but I would appreciate a more detailed hypothesis of why this could be.

      Antibiotics have been found to induce expression of virulence traits, such as in the case of oxacillin and hemolysis. Thus, it is reasonable that exposure to oxacillin during evolution would maintain MSSA’s hemolytic ability. We hypothesize that the loss of hemolysis in the absence of oxacillin may be due to the cost of hemolysis expression without a stimulant (oxacillin), hemolysis may not be expressed as often and be subject to deleterious mutations. Alternatively, the stress that cells were under favored virulence in some way, rather than the direct action of the antibiotic.

      (4) L225-228 - As C. elegans do not have red blood cells, why would we expect this? Do you see increased lysis of C. elegans gut cells? Or could it be due to iron accumulation as you are growing the staph on BHI?

      We measured and correlated nematode mortality with hemolytic ability because hemolysis had been found to be involved in virulence toward C. elegans (Sifri et al., 2003). The hemolysis phenotype is a surrogate for S. aureus virulence gene expression.

      (5) Figure 3A - There seems to be a growth cost of evolving oxacillin resistance in the absence of a host. Why might this be?

      MRSA populations exposed to oxacillin without a host during evolution visually exhibited the lowest growth rate. While this is an interesting question, the result was not statistically significant, so we cannot speculate in the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Some claims in the introduction are either non cited or not correctly stated. The second sentence has a claim about the interplay between antibiotic resistance and virulence with no citation listed. Additionally, there is a claim about S. aureus "evading detection" by attacking the host's immune cells. That is by definition not avoiding detection. Perhaps phrasing it as resisting host immune function would make it clearer.

      We have added a citation (lines 80 – 81) and clarified our wording: “Once inside the host, S. aureus resists host immune function by hindering or lysing immune cells.” Lines 140 – 141

      (2) Once in the introduction and in the discussion, the authors referred to S. aureus as a novel pathogen for C. elegans, I do not think enough is known to make this statement.

      This S. aureus strain is novel because it was isolated from humans, so at least in its recent evolutionary past, it has not interacted with C. elegans. Furthermore, we used a C. elegans isolate (N2) that had been frozen and maintained in the lab on E. coli, and had not been exposed to other microbes in its recent evolutionary past. Finally, S. aureus has not been found to be a native pathogen of C. elegans in nature (Ekroth et al., 2021).

      (3) Key suggestion: Change Figure 1C to reflect the design better. So you could have the +OXA before the host and then have an arrow looping back again to show the cycle of each step. So a figure that would have something like: MRSA > +OXA > +host>+OXA --> MRSA .

      We have updated the figure as suggested.

      (4) Suggest changing "greatest" on line 191, section header to greater.

      Done.

      (5) Line 258: Rich media can still provide selective pressures that are difficult to quantify - fast growth, cofactor and other nutrient limitations due to that fast growth

      We have adjusted our wording: “Importantly, rich media reduced the risk of introducing additional selective pressures than those being tested.” Lines 273 – 274

      (6) Why were intergenic mutations routinely ignored? These can often be very important phenotypically.

      We had focused on genes because there was a sufficient number of genes to discuss, but we have added a Supplemental Table (Table S2) containing all mutations (including intergenic and synonymous) appearing in more than 2 populations. We have also added information regarding mecA, an accessory gene, highlighting the role non-core genes may have in shaping bacterial evolution:

      “Despite evolving in similar environments, MRSA and MSSA populations differing only in the presence of an intact accessory gene (mecA)—proceeded on divergent evolutionary paths…” Lines 66 – 68

      “Carriage of Staphylococcal cassette chromosome mec (SCCmec), which encodes mecA, an accessory gene that provides resistance…” Lines 187 – 188

      “As MRSA and MSSA only differed in the presence of an intact mecA gene at the start of the experiment, accessory genes may play important roles in shaping bacterial evolution (Jackson et al., 2011).” Lines 472 – 474

      (7) Line 294: more mutations than what?

      We have clarified the sentence: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence…” Lines 310 – 311

      (8) Lines 295-297: wording is pretty confusing. It seems that the discussion is about increased mutation rates, possibly due to hypermutators resulting from mutL or recA mutations, but this isn't well-thought out and much is implied here. Furthermore, see the above comment about comparing mutations across conditions - it's hard to make inferences of mutation rates without knowing the mutation supply as a result of varying population sizes across conditions and through the experiment.

      We have clarified the sentence: “…there were only two mutations in DNA and mismatch repair genes (mutL and recA), suggesting repair genes were not the sole mechanism involved.” Lines 313 – 314

      Because all populations evolved from one ancestral clone (either MRSA or MSSA), all mutations that are found at the end of the experiment would have arisen de novo from that ancestor. Since all populations experienced the same number of passages/rounds of selection, we determined mutation rate by counting the number of mutations that were found at the last passage for each replicate population. Populations that acquired significantly more mutations had a higher mutation rate in terms of # of mutations/# of selection rounds.

      (9) Line 486: typo "Mutations genes".

      Corrected.

      (10) Line 487: "antibiotics may allow" is awkward; suggest changing to more precise language, possibly relating to pleiotropy if that is what was meant here.

      We had intended to mean “adaptation [to antibiotics] may allow”. We have clarified: “Mutations in genes involved in resistance to antibiotics were found more often in populations with increased virulence, suggesting that antibiotic adaptation may also favor evolution of virulence.” Lines 514 – 516

      REFERENCES

      Ekroth AKE, Gerth M, Stevens EJ, Ford SA, King KC. 2021. Host genotype and genetic diversity shape the evolution of a novel bacterial infection. ISME Journal 15:2146–2157. DOI: https://doi.org/10.1038/s41396-021-00911-3, PMID: 33603148

      Kramer A, Lexow F, Bludau A, Köster AM, Misailovski M, Seifert U, Eggers M, Rutala W, Dancer SJ, Scheithauer S. 2024. How long do bacteria, fungi, protozoa, and viruses retain their replication capacity on inanimate surfaces? A systematic review examining environmental resilience versus healthcare-associated infection risk by “fomite-borne risk assessment.” Clinical Microbiology Reviews. PMID: 39388143

      Sifri CD, Begun J, Ausubel FM, Calderwood SB. 2003. Caenorhabditis elegans as a model host for Staphylococcus aureus pathogenesis. Infection and Immunity 71:2208–2217. DOI: https://doi.org/10.1128/IAI.71.4.2208-2217.2003, PMID: 12654843

      Tran NN, Morrisette T, Jorgensen SCJ, Orench-Benvenutti JM, Kebriaei R. 2023. Current therapies and challenges for the treatment of Staphylococcus aureus biofilm-related infections. Pharmacotherapy 43:816–832. DOI: https://doi.org/10.1002/phar.2806, PMID: 37133439

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Gosselin et al., develop a method to target protein activity using synthetic single-domain nanobodies (sybodies). They screen a library of sybodies using ribosome/ phage display generated against bacillus Smc-ScpAB complex. Specifically, they use an ATP hydrolysis deficient mutant of SMC so as to identify sybodies that will potentially disrupt Smc-ScpAB activity. They next screen their library in vivo, using growth defects in rich media as a read-out for Smc activity perturbation. They identify 14 sybodies that mirror smc deletion phenotype including defective growth in fast-growth conditions, as well as chromosome segregation defects. The authors use a clever approach by making chimeras between bacillus and S. pnuemoniae Smc to narrow-down to specific regions within the bacillus Smc coiled-coil that are likely targets of the sybodies. Using ATPase assays, they find that the sybodies either impede DNA-stimulated ATP hydrolysis or hyperactivate ATP hydrolysis (even in the absence of DNA). The authors propose that the sybodies may likely be locking Smc-ScpAB in the "closed" or "open" state via interaction with the specific coiled-coil region on Smc. I have a few comments that the authors should consider:

      Major comments:

      (1) Lack of direct in vitro binding measurements:

      The authors do not provide measurements of sybody affinities, binding/ unbinding kinetics, stoichiometries with respect to Smc-ScpAB. Additionally, do the sybodies preferentially interact with Smc in ATP/ DNA-bound state? And do the sybodies affect the interaction of ScpAB with SMC?

      It is understandable that such measurements for 14 sybodies is challenging, and not essential for this study. Nonetheless, it is informative to have biochemical characterization of sybody interaction with the Smc-ScpAB complex for at least 1-2 candidate sybodies described here.

      We agree with the reviewer that adding such data would be reassuring and that obtaining solid data using purified components is not trivial, even for a smaller selection of sybodies. We have now incorporated ELISA data as new Table S1, which shows that most sybodies support clear binding to Smc-ScpAB. Curiously, while (only) some sybodies show a clear preference for ATP-bound or unbound Smc, this is not a strong predictor of the strength of phenotype observed in vivo. We have also attempted to characterize the binding of Smc to sybodies by other methods including pull-downs, cross-linking, and by biophysical methods (GCI). However, we prefer to not include these data as the outcomes are not clear due to inconsistencies in the behaviour of purified sybodies.

      (2) Many modes of sybody binding to Smc are plausible

      The authors provide an elaborate discussion of sybodies locking the Smc-ScpAB complex in open/ closed states. However, in the absence of structural support, the mechanistic inferences may need to be tempered. For example, is it also not possible for the sybodies to bind the inner interface of the coiled-coil, resulting in steric hinderance to coiled-coil interactions. It is also possible that sybody interaction disrupts ScpAB interaction (as data ruling this possibility out has not been provided). Thus, other potential mechanisms would be worth considering/ discussing. In this direction, did AlphaFold reveal any potential insights into putative binding locations?

      We have attempted to map the binding by structure prediction, however, so far, even the latest versions of AlphaFold are not able to clearly delineate the binding interface that we have confidently identified by the mapping using chimeric proteins. Indeed, many ways of binding are possible, including disruption of ScpAB interaction. However, since the mapped binding sites are located on the SMC coiled coils, the later scenario seems unlikely and would be an indirect consequence of altered coiled coil configuration, consistent with our current interpretation.

      (3) Sybody expression in vivo

      Have the authors estimated sybody expression in vivo? Are they all expressed to similar levels?

      We have tagged selected sybodies with gfp and performed live cell imaging. This shows that sybodies without strong phenotypes are similarly expressed at least at low inducer concentration. Moreover, many sybodies localize as foci in the cell presumably by binding to Smc complexes loaded onto the chromosome at ParB/parS sites. We have included example data in the revised version of the manuscript as Figure S4 and Figure S5. Notably, a sybody (Sb007) with a weak growth phenotype shows focal localization at low inducer concentration and high expression levels when fully induced, comparable to sybodies with strong phenotypes. Altogether, this suggests that the lack of phenotype is not due to absence of sybody expression or localization.

      (4) Sybodies should phenocopy ATP hydrolysis mutant of Smc

      The sybodies were screened against an ATP hydrolysis deficient mutant of Smc, with the rationale that these sybodies would interfere this step of the Smc duty cycle. Does the expression of the sybodies in vivo phenocopy the ATP hydrolysis deficient mutant of Smc? Could the authors consider any phenotypic read-outs that can indicate whether the sybody action results in an smc-null effect or specifically an ATP hydrolysis deficient effect?

      As alluded to above, we think that our selection gave rise to sybodies that bind various, possibly multiple Smc conformations. Consistent with this idea, the phenotypes of sybody expression are similar to null mutant rather than the ATP-hydrolysis defective EQ mutant, which display even more severe growth phenotypes in B. subtilis. To highlight this point, we have added the following notes to the text:

      “These conditions favour ATP-engaged particles alongside the typically predominant ATP-disengaged rod-shaped state.”

      “ELISA data revealed that nearly all clones bind purified Smc-ScpAB (Table 1). However, the ELISA signals of only few Sybodies showed clear dependence on the presence or absence of ATP and DNA (Table S1).”

      Significance:

      Overall, this is an impressive study that uses an elegant strategy to find inhibitors of protein activity in vivo. The manuscript is clearly written and the experiments are logical and well-designed. The findings from the study will be significant to the broad field of genome biology, synthetic biology and also SMC biology. Specifically, the coiled coil domain of SMC proteins have been proposed to be of high functional value. The authors have elegantly identified key coiled-coil regions that may be important for function, and parallelly exhibited potential of the use of synthetic sybody/designed binders for inhibition of protein activity.

      Reviewer #2 (Public review):

      Summary:

      Structural Maintenance of Chromosome proteins (SMCs), a family of proteins found in almost all organisms, are organizers of DNA. They accomplish this by a process known as loop extrusion, wherein double-stranded DNA is actively reeled in and extruded into loops. Although SMCs are known to have several DNA binding regions, the exact mechanism by which they facilitate loop extrusion is not understood but is believed to entail large conformational changes. There are currently several models for loop extrusion, including one wherein the coiled coil (CC) arms open, but there is a lack of insightful experimentation and analysis to confirm any of these models. The work presented aims to provide much-needed new tools to investigate these questions: conformation-selective sybodies (synthetic nanobodies) that are likely to alter the CC opening and closing reactions.

      The authors produced, isolated, and expressed sybodies that specifically bound to Bacillus subtilis Smc-ScpAB. Using chimeric Smc constructs, where the coiled coils were partly replaced with the corresponding sequences from Streptococcus pneumoniae, the authors revealed that the isolated sybodies all targeted the same 4N CC element of the Smc arms. This region is likely disrupted by the sybodies either by stopping the arms from opening (correctly) or forcing them to stay open (enough). Disrupting these functional elements is suggested to cause the Smc-dependent chromosome organization lethal phenotype, implying that arm opening and closing is a key regulatory feature of bacterial Smc-ScpAB.

      Significance:

      The authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Reviewer #3 (Public review):

      Summary:

      Gosselin et al. use the sybody technology to study effects of in vivo inhibition of the Bacillus subtilis SMC complex. Smc proteins are central DNA binding elements of several complexes that are vital for chromosome dynamics in almost all organisms. Sybodies are selected from three different libraries of the single domain antibodies, using the "transition state" mutant Smc. They identify 14 such mutant sybodies that are lethal when expressed in vivo, because they prevent proper function of Smc. The authors present evidence suggesting that all obtained sybodies bind to a coiled-coil region close to the Smc "neck", and thereby interfere with the Smc activity cycle, as evidenced by defective ATPase activity when Smc is bound to DNA.

      The study is well done and presented and shows that the strategy is very potent in finding a means to quickly turn off a protein's function in vivo, much quicker than depleting the protein.

      The authors also draw conclusions on the molecular mode of action of the SMC complex. The provide a number of suggestive experiments, but in my view mostly indirect evidence for such mechanism.

      My main criticism is that the authors have used a single - and catalytically trapped form of SMC. They speculate why they only obtain sybodies from one library, and then only identify sybodies that bind to a rather small part of the large Smc protein. While the approach is definitely valuable, it is biassed towards sybodies that bind to Smc in a quite special way, it seems. Using wild type Smc would be interesting, to make more robust statements about the action of sybodies potentially binding to different parts of Smc.

      The reviewer reports (Rev. #1 and Rev. #3) made us realize that the manuscript text was misleading on the this point. Although we used the purified ATP hydrolysis–deficient Smc protein for sybody isolation, this is not expected to restrict the selection to a specific conformation. As described in detail in Vazquez-Nunez et al. (Figure 5), this mutant displays the ATP-engaged conformation only in a smaller fraction of complexes (~25% in the presence of ATP and DNA), consistent with prior in vivo observations reported by Diebold-Durand et al. (Figure 5). Rather than limiting the selection to a particular configuration, our aim was to reduce the prevalence of the predominant rod state in order to broaden the range of conformations represented during sybody selection. Consistent with this interpretation, only a small number of isolated sybodies show strong conformation-specific binding in the presence or absence of ATP/DNA, as observed by ELISA (now included in the manuscript). Notably, the effect size of ATP/DNA on ELISA signals was not a strong predictor to the strength of phenotypes observed in vivo. The text has been revised accordingly. See line 84 and line 92.

      We are thus quite confident based prior work (and on the now included ELISA data) that the Smc ATPase mutation did not strongly bias the selection in one way or another. The surprising bias towards coiled coil binding sites has likely other explanations, as they likely form a preferred epitope recognized by sybodies from the loop library.

      Line 105: Alternatively, the other libraries did not produce good binders or these sybodies were 106 not stably expressed in B. subtilis. This could be tested using Western blotting - I am assuming sybody antibodies are commercially available. However, this test is not important for the overall study, it would just clarify a minor point.

      While there are antibody fragments available to augment the size of sybodies (PMID: 40108246), these recognize 3D-epitopes and are thus not suited for Western blotting. We did not follow up on the negative results of two of the three libraries but would like to point out again that there are several biases that likely emerge for the same reason (bias to library, bias to coiled coil binding site). If correct, then sybodies are likely ineffective in inactivating Smc in B. subtilis, with the notable exceptions of the sybodies that we have isolated and characterized in this manuscript. We have added this notion to the manuscript.

      Fig. 2B: is odd to count Spo0J foci per cells, as it is clear from the images that several origins must be present within the fluorescent foci. I am fine with the "counting" method, as the images show there is a clear segregation defect when sybodies are expressed, I believe the authors should state, though, that this is not a replication block, but failure to segregate origins.

      We agree that this is an important point. We have added the following statement to clarify this point: “These elongated cells are known to harbour expanded nucleoids, consistent with delayed oriC separation rather than delayed DNA replication”

      Testing binding sites of sybodies to the SMC complex is done in an indirect manner, by using chimeric Smc constructs. I am surprised why the authors have not used in vitro crosslinking: the authors can purify Smc, and mass spectrometry analyses would identify sites where sybodies are crosslinked to Smc. Again, I am fine with the indirect method, but the authors make quite concrete statements on binding based on non-inhibition of chimeric Smc; I can see alternative explanations why a chimera may not be targeted.

      We have made several attempts of testing direct binding with mixed outcomes and decided to not include those results in the light of the stronger and more relevant in vivo mapping. However, we have added ELISA results (new Table S1) that support a direct interaction.

      Smc-disrupting sybodies affect the ATPase activity in one of two ways. Again, rather indirect experiments. This leads to the point Revealing Smc arm dynamics through synthetic binders in the discussion. The authors are quite careful in stating that their experiments are suggestive for a certain mode of action of Smc, which is warranted.

      In line 245, they state More broadly, the study demonstrates how synthetic binders can trap, stabilize, or block transient conformations of active chromatin-associated machines, providing a powerful means to probe their mechanisms in living cells. This is off course a possible scenario for the use of sybodies, but the study does not really trap Smc in a transient conformation, at least this is not clearly shown.

      We agree and have simplified the statement by removing “stabilize” and “transient”.

      Overall, it is an interesting study, with a well-presented novel technology, and a limited gain of knowledge on SMC proteins.

      We respectfully disagree with the last point, since our unique results highlight the importance of the Smc coiled coils. which are less well represented in the SMC literature (when compared to the heads and hinge domains for example), likely (at least in part) due the mild effect of single point mutations on coiled coil dynamics.

      Significance:

      The work describes the gaining and use of single-binder antibodies (sybodies) to interfere with the function of proteins in bacteria. Using this technology for the SMC complex, the authors demonstrate that they can obtain a significant of binders that target a defined region is SMC and thereby interfere with the ATPase cycle.

      The study does not present a strong gain of knowledge of the mode of action of the SMC complex.

      As pointed out above, we respectfully disagree with this assertion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Lumen formation is a fundamental morphogenetic event essential for the function of all tubular organs, notably the vertebrate vascular network, where continuous and patent conduits ensure blood flow and tissue perfusion. The mechanisms by which endothelial cells organize to create and maintain luminal space have historically been categorized into two broad strategies: cell shape changes, which involve alterations in apical-basal polarity and cytoskeletal architecture, and cell rearrangements, wherein intercellular junctions and positional relationships are remodeled to form uninterrupted conduits. The study presented here focuses on the latter process, highlighting a unique morphogenetic module, junction-based lamellipodia (JBL), as the driver for endothelial rearrangements.

      Strengths:

      The key mechanistic insight from this work is the requirement of the Arp2/3 complex, the classical nucleator of branched actin filament networks, for JBL protrusion. This implicates Arp2/3-mediated actin polymerization in pushing force generation, enabling plasma membrane advancement at junctional sites. The dependence on Arp2/3 positions JBL within the family of lamellipodia-like structures, but the junctional origin and function distinguish them from canonical, leading-edge lamellipodia seen in cell migration.

      Weaknesses:

      The study primarily presents descriptive observations and includes limited quantitative analyses or genetic modifications. Molecular mechanisms are typically interrogated through the use of pharmacological inhibitors rather than genetic approaches. Furthermore, the precise semantic distinction between JAIL and JBL requires additional clarification, as current evidence suggests their biological relevance may substantially overlap.

      We have previously analyzed the effects of different ve-cadherin (cdh5) mutant alleles on EC rearrangements (Paatero et al., 2018; Sauteur et al., 2014).These mutants show complex defects (e.g. hypersprouting, reduced contact inhibition during anastomosis) in EC behavior early in vascular tube formation. We find that analysis of JBL dynamics and function is very difficult in such situations. The use of small molecule inhibitors allows acute interventions within a defined time-window and to avoid pleiotropic effects of genetic ablations. We have expanded our discussion on the distinction between JAIL and JBL and hope that this will clarify why – in our opinion – these terms should be used differentially in different cell biological contexts (see below and lines 348-374 in the manuscript).

      Reviewer #2 (Public review):

      Summary:

      In Maggi et al., the authors investigated the mechanisms that regulate the dynamics of a specialized junctional structure called junction-based lamellipodia (JBL), which they have previously identified during multicellular vascular tube formation in the zebrafish. They identified the Arp2/3 complex to dynamically localize at expanding JBLs and showed that the chemical inhibition of Arp2/3 activity slowed junctional elongation. The authors therefore concluded that actin polymerization at JBLs pushes the distal junction forward to expand the JBL. They further revealed the accumulation of Myl9a/Myl9b (marker for MLC) at the junctional pole, at interjunctional regions, suggesting that contractile activity drives the merging of proximal and distal junctions. Indeed, chemical inhibition of ROCK activity decreased junctional mergence. With these new findings, the authors added new molecular and cellular details into the previously proposed clutch mechanism by proposing that Arp2/3-dependent actin polymerization provides pushing forces while actomyosin contractility drives the merging of proximal and distal junctions, explaining the oscillatory protrusive nature of JBLs.

      Strengths:

      The authors provide detailed analyses of endothelial cell-cell dynamics through time-lapse imaging of junctional and cytoskeletal components at subcellular resolution. The use of zebrafish as an animal model system is invaluable in identifying novel mechanisms that explain the organizing principles of how blood vessels are formed. The data is well presented, and the manuscript is easy to read.

      Weaknesses:

      While the data generally support the conclusions reached, some aspects can be strengthened. For the untrained eye, it is unclear where the proximal and distal junctions are in some images, and so it is difficult to follow their dynamics (especially in experiments where Cdh5 is used as the junctional marker). Images would benefit from clear annotation of the two junctions. All perturbation experiments were done using chemical inhibitors; this can be further supported by genetic perturbations.

      We have added annotations to several figures and paid particular attention to the proximal and distal junctions.

      We have previously analyzed the effects of different ve-cadherin (cdh5) mutant alleles on EC rearrangements (Paatero et al., 2018; Sauteur et al., 2014). These mutants show complex defects (e.g. hypersprouting, reduced contact inhibition during anastomosis) in EC behavior early in vascular tube formation. We find that analysis of JBL dynamics and function is very difficult in such situations. The use of small inhibitors allows acute interventions within a defined time-window and to avoid pleiotropic effects of genetic ablations.

      Reviewer #3 (Public review):

      The paper by Maggi et al. builds on earlier work by the team (Paatero et al., 2018) on oriented junction-based lamellipodia (JBL). They validate the role of JBLs in guiding endothelial cell rearrangements and utilise high-resolution time-lapse imaging of novel transgenic strains to visualise the formation of distal junctions and their subsequent fusion with proximal junctions. Through functional analyses of Arp2/3 and actomyosin contractility, the study identifies JBLs as localized mechanical hubs, where protrusive forces drive distal junction formation, and actomyosin contractility brings together the distal and proximal junctions. This forward movement provides a unique directionality which would contribute to proper lumen formation, EC orientation, and vessel stability during these early stages of vessel development.

      Time-lapse live imaging of VEC, ZO-1, and actin reveals that VEC and ZO-1 are initially deposited at the distal junction, while actin primarily localizes to the region between the proximal and distal sites. Using a photoconvertible Cdh5-mClav2 transgenic line, the origin of the VEC aggregates was examined. This convincingly shows that VE-cadherin was derived from pools outside the proximal junctions. However, in addition to de novo VEC derived from within the photoconverted cell, could some VEC also be contributed by the neighbouring endothelial cell to which the JBL is connected?

      Yes, the green (non-converted) VE-cadherin can indeed originate from either of the two cells. The main point we want to make, based on our observations, is that the red (converted) VE-cadherin from the proximal junction (as defined by the ROI) does not contribute to the distal junction.

      As seen for JAILs in cultured ECs, the study reveals that Arp2/3 is enhanced when JBLs form by live imaging of Arpc1b-Venus in conjunction with ZO-1 and actin. Therefore Arp2/3 likely contributes to the initial formation of the distal junction in the lamellopodium.

      Inhibiting Arp2/3 with CK666 prevents JBL formation, and filopodia form instead of lamellopodia. This loss of JBLs leads to impaired EC rearrangements.

      Is the effect of CK666 treatment reversible? Since only a short (30 min) treatment is used, the overall effect on the embryo would be minimal, and thus washing out CK666 might lead to JBL formation and normalized rearrangements, which would further support the role of Arp2/3.

      We have performed washout experiments and find that the ectopic filopodia disappear when the inhibitor is removed. This experiment is shown in supplementary Figure 3 and supplementary Movies 12 and 13.

      From the images in Figure 4d it appears that ZO-1 levels are increased in the ring after CK666 treatment. Has this been investigated, and could this overall stabilization of adhesion proteins further prevent elongation of the ring?

      This is an interesting thought and we haven take a closer look. There is quite a bit of sample-to-sample variation in the ZO1 signal. The quantification (Author response image 1) indicates that there is no increase in the CK666 treated embryos on average.

      Author response image 1.

      To explore how the distal and proximal junctions merge, imaging of spatiotemporal imaging of Myl9 and VEC is conducted. It indicates that Myl9 is localized at the interjunctional fusion site prior to fusion. This suggests pulling forces are at play to merge the junctions, and indeed Y 27632 treatment reduces or blocks the merging of these junctions.

      For this experiment, a truncated version of VEC was use,d which lacks the cytoplasmic domain. Why have the authors chosen to image this line, since lacking the cytoplasmic domain could also impair the efficiency of tension on VEC at both junction sites? This is as described in the discussion (lines 328-332).

      This line was used because it labels the entire JBL protrusion more clearly. We have also included an example using the VE-cad-Venus line (supplementary Figure 4b), which shows a Myl-Cherry pattern consistent with the other examples.

      Since the time-lapse movies involve high-speed imaging of rather small structures, it is understandable that these are difficult to interpret. Adding labels to indicate certain structures or proteins at essential timepoints in the movies would help the readers understand these.

      We have added annotations and labels to all movies. We have also improved annotations in several figures (i.e. Figs. 1, 2, 5, 6 and 7)

      Recommendations for the authors:

      Reviewing Editor Comments:

      Overall, the reviewers are supportive of the manuscript but identify a number of areas where the clarity of the presented data could be improved, and further quantification could be provided to strengthen your conclusions. We would encourage you to address these minor concerns as best you can and to consider the recommendations of all three reviewers when deciding how to revise your manuscript.

      Reviewer #1 (Recommendations for the authors):

      Lumen formation is a fundamental morphogenetic event essential for the function of all tubular organs, notably the vertebrate vascular network, where continuous and patent conduits ensure blood flow and tissue perfusion. The mechanisms by which endothelial cells organize to create and maintain luminal space have historically been categorized into two broad strategies: cell shape changes, which involve alterations in apical-basal polarity and cytoskeletal architecture, and cell rearrangements, wherein intercellular junctions and positional relationships are remodeled to form uninterrupted conduits. The study presented here focuses on the latter process, highlighting a unique morphogenetic module, junction-based lamellipodia (JBL), as the driver for endothelial rearrangements.

      JBL are described as oscillating membrane protrusions emerging at endothelial junctions, operating in a ratchet-like manner to mediate convergent cell movements. This ratchet mechanism allows endothelial cells to approach each other, thereby aligning and joining local luminal segments into a continuous vascular structure. The study employs in vivo high-resolution time-lapse imaging, a technically demanding method that captures spatiotemporal dynamics of cytoskeletal and adhesion complexes during JBL activity with unprecedented detail.

      The key mechanistic insight from this work is the requirement of the Arp2/3 complex, the classical nucleator of branched actin filament networks, for JBL protrusion. This implicates Arp2/3-mediated actin polymerization in pushing force generation, enabling plasma membrane advancement at junctional sites. The dependence on Arp2/3 positions JBL within the family of lamellipodia-like structures, but the junctional origin and function distinguish them from canonical, leading-edge lamellipodia seen in cell migration.

      An intriguing observation is that a novel junction arises at the distal pole of a JBL. This distal junction is formed from a pool of VE-cadherin that is spatially redistributed from regions outside the initial JBL domain. The distal junction then merges with the proximal junction through a process dependent on actomyosin contractility, as was judged by Myl9 recruitment.

      The alternation between pushing forces (Arp2/3-dependent JBL protrusion) and pulling forces (actomyosin-driven junction fusion) defines JBL as a bidirectional mechanical module. Inhibition of actomyosin prevents merging of proximal and distal junctions, thereby stalling lumen continuity. This two-phase system, actin-based extension followed by actomyosin-mediated constriction, ensures both elongation and maturation of endothelial arrangements, ultimately securing vascular patency.

      This manuscript represents a robust and thoughtfully executed study that advances our understanding of lumen formation during vascular development. The overarching conclusions are well substantiated, and the results section provides a clear and detailed exposition of the key findings. I appreciate the explanatory movie at the end. Nevertheless, I offer several remarks for further improvement:

      (1) The fluorescent images presented are visually compelling, yet lack quantitative analysis in the initial figure. Although quantification is included in Figure 3, it is advisable to incorporate this analysis into Figure 1 as well. Early presentation of quantification will help the reader to appreciate the impact and significance of the findings from the outset.

      We appreciate the reviewer’s suggestion and have now added line graphs to measure the spatiotemporal intensities of the Utrophin and ZO-1 reporters in Figure 1b. These measurements demonstrate the sequence of F-actin protrusion and subsequent junctional movement. In Figure 1a, we have added a double-headed arrow which shows the overall movement of the junction towards the dorsal side of the forming DLAV.

      (2) For the fluorescence images, further quantitative analysis of membrane overlap, either in terms of width or pixel overlap, would enhance the rigor of the study. Temporal quantification of overlap may provide valuable insights into the stability and reproducibility of the process across experimental replicates.

      JBL are quite heterogenous with respect to size, shape and dynamics, which makes quantifications of membrane overlap (JBL size) across experimental replicates difficult. We have published some quantifications on JBL orientation and oscillation in our previous paper (Paatero et al., 2018, Nat. comm. Figures 1 and 2), which are in agreement with our current study.

      (3) When referencing the role of Arp2/3, the authors employ an ArpC1b transgenic fish. The results section should thus specifically address the involvement of ArpC1b, rather than generalizing to Arp2/3. In the discussion, it would be appropriate to speculate on the potential involvement of the complete Arp2/3 complex. Notably, the use of CK is acknowledged as a broadly accepted inhibitor of actin polymerization.

      As ArpC1b is a subunit of an active Arp2/3 complex (Padrick et al., 2011), we have used an ArpC1b-Venus as a readout for Arp2/3 localization. The construct has been validated before in cell culture (Law et al., 2021) as well as in zebrafish (Malchow et al., 2024) and the spatiotemporal distribution of the reporter shown to be consistent with Arp2/3 complex. We are stating this in the results section (lines 173-178) and subsequently use the term Arp2/3 to facilitate reading of the text. In the corresponding figure legends, we are maintaining the term ArpC1b. CK666 interferes with the dimerization of Arp2 and Arp3 subunits and thus prevents activity of the Arp2/3 complex.

      (4) The discussion regarding JAIL versus JBL involvement remains challenging to interpret. If JAIL structures arise from the loss of cell-cell contacts, both JAIL and JBL resemble membrane protrusions and are likely governed by similar molecular mechanisms, predominantly actin polymerization and Arp2/3 activity, with probable contribution from Rac1 signaling. The precise semantic distinction between JAIL and JBL warrants further clarification, as their biological relevance may be overlapping.

      We agree with the reviewer. Below we outline the reasons why lamellipodial protrusions that emanate from cell-cell junctions should not be indiscriminately called JAIL, but that JAIL and JBL constitute different cellular activities acting in different tissue contexts. We have modified the text in the Discussion (lines 348-374).

      (1) JAIL have originally been described in cell culture experiments (Abu-Taha et al., 2014). According to this and subsequent papers by the same group, local dissolution of endothelial adherens junctions (i.e. downregulation of VE-cadherin) triggers the formation of lamellipodia-like structures. These protrusions eventually retract, followed by the reestablishment of EC junctions.

      (2) In our in vivo studies, we observed lamellipodial protrusions during endothelial cell rearrangements, and we call these structures JBL (Paatero et al., 2018). While JBL appear very similar to JAIL in general (i.e. regulation by Arp2/3 and its localization), we also observe two critical differences. For one, JBL form while maintaining the original (proximal) junction. Moreover, a distal junction is formed at the front edge of the JBL, leading to a “double junction” configuration. In our current manuscript, we have examined the role of actomyosin contractility and find that it correlates with and is required for the merging of proximal and distal junctions during JBL cycles. These observations indicates that the proximal and distal junctions are essential components of JBL function during endothelial cell elongation and rearrangements. These salient and distinct features prompted us to adopt the term junction-based-lamellipodia (JBL), in order to differentiate them from JAIL.

      (3) We like to argue that JAIL and JBL represent similar but different lamellipodia-like protrusions. JAILs are associated with the maintenance of endothelial integrity, and control permeability and trans-endothelial cell migration, as has been suggested by several publications (Cao et al., 2017; Kipcke et al., 2025; Seebach et al., 2021; Taha et al., 2014). In contrast, JBL drive cell rearrangements, by step-wise elongation of cell junctions leading to convergent cell movements.

      (4) Although JAIL have also been implicated in endothelial cell migration (Cao and Schnittler, 2019; Cao et al., 2017; Seebach et al., 2021), neither junctional patterns nor junctional dynamics have been analyzed in this context. We therefore propose that JAIL and JBL are actin-based protrusions forming at endothelial cell-cell junctions, but act in different contexts to provide cell motility (JBL) or endothelial integrity (JAIL).

      (5) Some of the quantification plots, specifically in figures 5d and 6c, do not display significant differences or distribution patterns. It would be beneficial to revise these graphs to clearly represent statistical significance and underlying data distributions.

      Because of the spatiotemporal heterogeneity, it is difficult to perform statistical quantifications across samples. In Figure 5c/d, we have imaged/analyzed myl9-EGFP in a mosaic situation, in which only one of interacting cells expresses high levels of myl9-EGFP. This is a rare situation and we managed to image only this example. Nevertheless, it is consistent with our other expression data of myl9-reporters and also with our previous photoconversion experiments using photoconvertible UCHD (Paatero et al., 2018, Figure 4), which shows that actin-rich JBL form at the front end of the endothelial cell in the direction of junction elongation. In Figure 5d, we have quantified the average intensity of GFP signal within the region of interest. The newly added error bars indicate the standard deviation between pixel intensities within the ROI.

      In Figure 6c, we have analyzed the Myl9b-mCherry intensities and find that it is redistributed during a JBL cycle. The spatial distribution is evident from the heat-map and we have not included a standard deviation. Myl9b-mCherry levels are very heterogenous and is not possible to quantify intensities across samples. We have, however, included four more examples of Myl9b-mCherry distribution in Supplementary Figure 4. The patterns observed in these samples are consistent with those in Figure 6.

      (6) The observation of myosin recruitment does not inherently imply a concomitant increase in actomyosin contractile activity. The inclusion of phospho-MLC staining would considerably strengthen the evidence for enhanced actomyosin activity.

      This is a good suggestion and we have extensively tried different anti-P-Myl antibodies (and protocols), but did not get them to work reliably on zebrafish embryos. We therefore rely on published work that has established the correlation between the recruitment of myosin light chain and increased actomyosin tension (Fernandez-Gonzalez et al., 2009; Munjal et al., 2015).

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1a is not described/mentioned in the Results.

      The have corrected this (lines 102-108). We have also added measurements to better present the different dynamics of F-actin (UCHD) and ZO1 within the JBL and the relative endothelial cell movements (see Figure 1b), as suggested by reviewer#1.

      (2) In Figure 3a, the authors claim that Arp2/3 is deposited at the distal side of the junction ring. While it is clear where the proximal junction is (ZO1-rich), the distal junction is less so (hardly any ZO1). It is therefore difficult to agree based on this time-lapse imaging that Arpc1b-Venus is at the distal junction. Can the authors please include panels showing merged channels and annotate where the proximal and distal junctions are?

      The activation of the Arp2/3 complex and the formation of the distal junction are sequential events. We see that ArpC1b oscillates with an accumulation at the onset and during JBL protrusion. In contrast, the distal junction is formed when the protrusive activity has been stopped. One caveat of the analysis shown in Figure 3a is that our ZO1 reporters label the distal junction only very weakly – this is in particular the case for the ZO1-tdTomato knock-in. The distal junction is better visible in VE-cadherin and UCHD reporters, as shown in Figures 5 to 7.

      (3) In Figures 3b and c, it is also difficult to distinguish proximal and distal junctions in these images. Please mark the boundaries in the image panels (Figure 3b) and indicate on the x-axis where the proximal and distal junctions are (Figure 3c).

      In Figure 3b, we show ArpC1b-Venus and mRuby-UCHD side-by-side. This Figure demonstrates that the Arp2/3 complex maintains its position at the front of the JBL during the protrusive phase (always distal to the UCHD signal). The imaging is done at very short intervals (1/30sec), which makes it difficult to follow entire oscillations due to photo-bleaching of the ArpC1b reporter.

      (4) The treatment of CK666 resulted in perturbed localization of Arpc1b-Venus. Therefore, the inhibition of junctional elongation can also be explained by the mislocalization of Arp2/3, rather than the inhibition of Arp2/3 activity at the junctions. Can the authors discuss this or perform another experiment that is more specific to manipulating Arp2/3 activity?

      CK666 is a well-established inhibitor of Arp2/3. Structural and functional analyses have shown that CK666 interferes with the interaction between Arp2 and Arp3, thereby preventing the activation of the complex (Hetrick et al., 2013; Padrick et al., 2011). We therefore conclude that the phenotypes we observe in CK666 treatment are due to Arp2/3 inhibition.

      It is possible that CK666 prevents ArpC1b binding to the Arp2/3 complex. However, published work suggests that ArpC1b can bind to Arp2/3 also in its inactive state (Chou et al., 2022). Thus, we can only speculate why we lose localization ArpC1b under CK666. We prefer not to do so.

      (5) In Figures 5d and 6c, is the quantification of Myl9 intensity of one cell only? If so, can the authors show the dynamics of average Myl9 intensity i) between forwarding and non-forwarding JBL poles and ii) as the proximal and distal junctions merge several endothelial cells?

      Figure 5c/d depicts two interacting cells, expressing different levels of Myl9a-EGFP. This is a rare experimental situation and we managed to image only this example. We quantified the average signal at both poles of the junctional ring within a region of interest. The newly added error bars indicate the standard deviation between pixel intensities within the ROI. The analysis has been done on immunofluorescent images, therefore a dynamic analysis over time is not possible.

      In Figure 6c, we have analyzed the Myl9b-mCherry intensities and find that it is redistributed during a JBL cycle. The spatial distribution is evident from the heat-map and we have not included a standard deviation. Myl9b-mCherry levels are very heterogenous and is not possible to quantify intensities across samples. We have, however, included four more examples of Myl9b-mCherry distribution in Supplementary Figure 4. The patterns observed in these samples are consistent with those in Figure 6.

      (6) Figure 5. The 'f' in the figure legend should be 'e' since there is no panel 'f'.

      We have corrected this.

      (7) Figure 7. As the boundaries for proximal and distal junctions are not always clear, especially when Cdh5 appears as clusters, how do you determine where the two junctions are in order to measure the interjunctional space? Please offer a clearer explanation in the Methods.

      We have added the following in the M&M. “Junctional merging tracking Speed of junctional merge was evaluated by monitoring isolated junctional rings during DLAV formation. Inhibitor treatment Y-27632 (75 μM) or DMSO (1%) were applied 30 min before mounting. The same concentrations of chemicals were applied to the low-melting-point agarose mounting medium and the E3 medium on top of it before imaging and imaging the junctions for 10-15 min on an Olympus SpinSR spinning disc microscope. Distances were measured using Fiji software. In each frame, the interjunctional distance was defined as the maximum distance between the proximal and distal junctions. A line was manually drawn between the proximal and distal junctions in Fiji, and its length was recorded. The same proximal and distal junction landmarks were used consistently across all time points.”

      (8) One would think that upon the inhibition of junctional mergence (by ROCK inhibition), actin polymerization would persist to push the distal junction forward to elongate the JBL. However, there is instead a decrease in junctional elongation (Figure 7b). Can the authors speculate why? Additionally, junction elongation can probably be achieved by continuous "pushing" of the distal junction alone (through actin polymerization). Can the authors speculate why there is a need/what is the benefit of merging proximal and distal junctions for junction elongation?

      These are all very interesting questions, but they are quite complex and would require extensive and speculative answers, which is outside the scope of this study. Nevertheless, here are a few quick thoughts on these issues.

      (1) When endothelial cells elongate, they have to overcome tensile forces at the junctions (generated by the subjunctional actomyosin belt). JBL are providing a tractive and deforming force, which overcomes the tensile force and thus promotes junctional elongation.

      (2) The distal junction is then providing an anchor to which the actin cytoskeleton can attach. The space between proximal and distal junction becomes a compartment of local actomyosin contraction, which provides the force for the ratchet to move the proximal junction forward  junctional mergence.

      (3) Thus, it is not the protrusion (pushing) itself that elongates the cell but the elongation of the junction (driven by actomyosin contraction)!

      (4) The maintenance of the proximal junction is most likely needed to ensure endothelial integrity during the JBL cycles.

      (5) How the frequency and the size of JBLs is regulated is not known. One possible player that might be involved is an internal clock mechanism (e.g. a feedback loop via small GTPases (such as Rac)  Arp2/3 regulation). Another possibility is that JBL size is limited by it sweeping up basally localized VE-cadherin (in cis-configuration). Increasing cell-cell adhesion (by VE-cad trans-interactions between the JBL and the underlying cell) eventually stop the protrusion. It is also possible that an cell-autonomously controlled mechanism of F-actin polymerization (actin pulses) are involved in the regulation of the JBC cycle length.

      (9) The animation showing the molecular mechanism of JBL function during endothelial junction elongation (Video 25) is very helpful in understanding the dynamic coupling between junctional proteins, actomyosin cytoskeleton, and junction remodelling. However, I wonder why there are no Myosin II proteins binding to the actin bundles during the merging of proximal and distal junctions (between 0:25 and 0:28), since this is one of the main findings by the authors in this study.

      Since we show two JBL cycles, we want to spread the information over both of them.

      References:

      Cao, J. and Schnittler, H. (2019). Putting VE-cadherin into JAIL for junction remodeling. J. Cell Sci. 132.

      Cao, J., Ehling, M., März, S., Seebach, J., Tarbashevich, K., Sixta, T., Pitulescu, M. E., Werner, A. C., Flach, B., Montanez, E., et al. (2017). Polarized actin and VE-cadherin dynamics regulate junctional remodelling and cell migration during sprouting angiogenesis. Nat. Commun. 8, 1–20.

      Chou, S. Z., Chatterjee, M. and Pollard, T. D. (2022). Mechanism of actin filament branch formation by Arp2/3 complex revealed by a high-resolution cryo-EM structure of the branch junction. Proc. Natl. Acad. Sci. U. S. A. 119, e2206722119.

      Fernandez-Gonzalez, R., Simoes, S. de M., Röper, J. C., Eaton, S. and Zallen, J. A. (2009). Myosin II Dynamics Are Regulated by Tension in Intercalating Cells. Dev. Cell 17, 736–743.

      Hetrick, B., Han, M. S., Helgeson, L. A. and Nolen, B. J. (2013). Small molecules CK-666 and CK-869 inhibit actin-related protein 2/3 complex by blocking an activating conformational change. Chem. Biol. 20, 701–712.

      Kipcke, J. P., Odenthal-Schnittler, M., Aldirawi, M., Franz, J., Bojovic, V., Seebach, J. and Schnittler, H. (2025). TNF-α induces VE-cadherin-dependent gap/JAIL cycling through an intermediate state essential for neutrophil transmigration. Front. Immunol. 16,.

      Law, A. L., Jalal, S., Pallett, T., Mosis, F., Guni, A., Brayford, S., Yolland, L., Marcotti, S., Levitt, J. A., Poland, S. P., et al. (2021). Nance-Horan Syndrome-like 1 protein negatively regulates Scar/WAVE-Arp2/3 activity and inhibits lamellipodia stability and cell migration. Nature Communications 2021 12:1 12, 5687-.

      Malchow, J., Eberlein, J., Li, W., Hogan, B. M., Okuda, K. S. and Helker, C. S. M. (2024). Neural progenitor-derived Apelin controls tip cell behavior and vascular patterning. Sci. Adv. 10, 1174.

      Munjal, A., Philippe, J. M., Munro, E. and Lecuit, T. (2015). A self-organized biomechanical network drives shape changes during tissue morphogenesis. Nature 524, 351–355.

      Paatero, I., Sauteur, L., Lee, M., Lagendijk, A. K., Heutschi, D., Wiesner, C., Guzmán, C., Bieli, D., Hogan, B. M., Affolter, M., et al. (2018). Junction-based lamellipodia drive endothelial cell rearrangements in vivo via a VE-cadherin-F-actin based oscillatory cell-cell interaction. Nat. Commun. 9,.

      Padrick, S. B., Doolittle, L. K., Brautigam, C. A., King, D. S. and Rosen, M. K. (2011). Arp2/3 complex is bound and activated by two WASP proteins. Proc. Natl. Acad. Sci. U. S. A. 108, E472–E479.

      Sauteur, L., Krudewig, A., Herwig, L., Ehrenfeuchter, N., Lenard, A., Affolter, M. and Belting, H. G. (2014). Cdh5/VE-cadherin promotes endothelial cell interface elongation via cortical actin polymerization during angiogenic sprouting. Cell Rep. 9, 504–513.

      Seebach, J., Klusmeier, N. and Schnittler, H. (2021). Autoregulatory “Multitasking” at Endothelial Cell Junctions by Junction-Associated Intermittent Lamellipodia Controls Barrier Properties. Front. Physiol. 11,.

      Taha, A. A., Taha, M., Seebach, J. and Schnittler, H. J. (2014). ARP2/3-mediated junction-associated lamellipodia control VE-cadherin-based cell junction dynamics and maintain monolayer integrity. Mol. Biol. Cell 25, 245–256.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their paper, Shimizu and Baron describe the signaling potential of cancer gain-of-function Notch alleles using the Drosophila Notch transfected in S2 cells. These cells do not express Notch or the ligand Dl or Dx, which are all transfected. With this simple cellular system, the authors have previously shown that it is possible to measure Notch signaling levels by using a reporter for the 3 main types of signaling outputs, basal signaling, ligand-induced signaling and ligand-independent signaling regulated by deltex. The authors proceed to test 22 cancer mutations for the above-mentioned 3 outputs. The mutation is considered a cluster in the negative regulatory region (NRR) that is composed of 3 LNR repeats wrapping around the HD domain. This arrangement shields the S2 cleavage site that starts the activation reaction.

      The main findings are:

      (1) Figure 1: the cell system can recapture ectopic activation of 3 existing Drosophila alleles validated in vivo.

      (2) Figure 2: Some of the HD mutants do show ectopic activation that is not induced by Dl or Dx, arguing that these mutations fully expose the S2 site. Some of the HD mutants do not show ectopic activation in this system, a fact that is suggested to be related to retention in the secretory pathway.

      (3) Figure 3: Some of the LNR mutants do show ectopic activation that is induced by Dl or Dx, arguing that these might partially expose the S2 site.

      (4) Figure 4-6: 3 sites of the LNR3 on the surface that are involved in receptor heterodimerization, if mutated to A, are found to cause ectopic activation that is induced by Dl or Dx. This is not due to changes in their dimerization ability, and these mutants are found to be expressed at a higher level than WT, possibly due to decreased levels of protein degradation.

      Strengths and Weaknesses:

      The paper is very clearly written, and the experiments are robust, complete, and controlled. It is somewhat limited in scope, considering that Figure 1 and 5 could be supplementary data (setup of the system and negative data). However, the comparative approach and the controlled and well-known system allow the extraction of meaningful information in a field that has struggled to find specific anticancer approaches. In this sense, the authors contribute limited but highly valuable information.

      Reviewer #2 (Public review):

      Summary:

      This ambitious study introduced 22 mutations corresponding to amino acid substitution mutations known to induce cancer in human Notch1, located within the Negative Regulatory Region, into the Drosophila Notch gene. It comprehensively examined their effects on activity, intracellular transport, protein levels, and stability. The results revealed that the impact of amino acid substitutions within the Negative Regulatory Region can be grouped based on their location, differing between the Heterodimerization Domain and the Lin12/Notch Repeat. These findings provide important insights into elucidating the mechanisms by which amino acid substitution mutations in human Notch1 cause leukemia and cancer.

      Strengths:

      In this study, the authors successfully measured the activity of amino acid-substituted Notch with high precision by effectively leveraging the advantages of their previously established experimental system. Furthermore, they clearly demonstrated ligand-dependent and Deltex-dependent properties.

      Weaknesses:

      Amino acid substitution mutations exhibit interesting effects depending on their position, so interest naturally turns to the mechanisms generating these differences. Unfortunately, however, elucidating these mechanisms will require considerable time in the future. Therefore, it is reasonable to conclude that questions regarding the mechanism fall outside the scope of this paper.

      We thank the editors and reviewers for their initial reviews and constructive suggestions. We have revised the manuscript with some additional data contained in two additional supplementary figures and by the inclusion of additional text.

      Reviewer #3 (Public review):

      While this is indeed an exciting set of observations, the work is entirely cell-line-based, and is the primary reason why this approach dampens the enthusiasm for the study. The analysis is confined to Drosophila S2 cells, which may not fully recapitulate tissue or organism-level regulatory complexity observed in vivo. Some Drosophila HD domain mutants accumulate in the secretory pathway and do not phenocopy human T-ALL mutations. Possibly due to limitations on physiological inputs that S2 cells cannot account for, or species-specific differences such as the absence of S1 cleavage.

      Thus, the findings may not translate directly to understanding Notch 1 function in mammalian cancer models. While the manuscript highlights mechanistic variety, the functional significance of these mutations for hematopoietic malignancies or developmental contexts in live animals remains untested. Overall, the work does not yet provide evidence for altered Notch signaling that is physiologically relevant.

      S2 cells are a standard cell culture model which have been extensively used for analysing Notch signalling mechanisms and by and large are found to recapitulate the mechanisms of Notch activation and its regulation in vivo. However, we agree that it will be desirable in future work to build on our current findings by generating Notch mutants in vivo in Drosophila as the in vivo context may introduce additional nuances in the behaviour of the mutants.This can be done by overexpressing cDNA constructs in particular tissues, or more physiologically by generating endogenous gene mutations using CRISPR/Cas9 based gene editing. However, the likely outcome of the latter approach is embryo lethality due to constitutive over-activation during development. Therefore, methods of genetic manipulation need to be applied which allow the final activating mutant form to be generated in somatic clones. We feel that this would be considerable amount of additional work and is out of scope for the current study, but we look forward to developing this approach in future work.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (a) Table 1: Explain the rationale for mapping non-conserved residues between human and fly Notch; consider adding an alignment or supplementary figure.

      We have added a new Supplementary figure S2 showing an alignment of Notch sequences from different species to indicate the degree of conservation at the sites chosen for our mutagenesis study. Some locations were highly conserved and some locations less so. Both conserved and non-conserved residues were included to examine how structural perturbations at equivalent positions affect signalling activity, independent of sequence conservation. In addition to the new supplementary figure, we have changed the text in the Table 1 legend to clarify.

      (b) Add or discuss data connecting LNR and HD mutant expression levels with stability and degradation mechanisms.

      We have added additional text in the results section referring to Fig6A/B regarding the varying Notch protein levels between the different mutants. With regard to the slower degradation kinetics of certain LNR-C mutants in Fig6 E/F, we have also added a new supplementary figure S3 which shows that mutants from the LNR/HD interface do not behave similarly to the LNR-C mutants with respect to their degradation kinetics.

      (c) Some mutants, especially those retained in the secretory pathway, are insufficiently characterized. The mechanism underlying their differential trafficking and stability remains underexplored.

      We have added some extra text to the discussion section which explores the issue of secretory pathway retention of HD mutants in Drosophila cells further.

      (2) Figure Legends:

      (a) Figure 1A - Explain the ribbon vs. space-filling representation and color coding; include a definition of the Heterodimerization Domain.

      We have added extra text to the Figure 1A legend

      (b) Figure 2E - Clarify mutant selection; if possible, include additional examples for consistency.

      We added extra text regarding selection of mutants for study into the legend of Figure 2

      (c) Figure 3-4 - Explain logic for alanine substitutions; discuss difference at residue 1570 (P vs. A).

      We added the following text to the result section. “Y1532 and Y1535 are not mutated in human cancers and therefore could not be assessed through patient-derived variants. Alanine substitution provides a controlled way to probe their contribution to NRR integrity and activation sensitivity by selectively removing their side-chain interactions while preserving overall fold.” We added extra text in the discussion section regarding the differences in the outcomes of the 1570 to A and P mutations.

      (d) Figure 4 - Improve resolution and legibility.

      We have replaced figure 4.

      (e) Figure 6C - Correct residue numbering (1563, 1566).

      Thank you for spotting this. This has been corrected.

      (f) Figure 6F - Include control where protein levels do not increase.

      A new supplementary figure S3 has been added which included this control data.

      (3) Contextual and Conceptual Framing:

      (a) Incorporate the limitations of the S2 system, and delineate which mechanistic insights are likely conserved versus those that may be species- or context-specific.

      We have incorporated text to discuss S2 cell limitations.

      (b) The study does not test functional consequences in hematopoietic or developmental contexts. Expand the discussion to emphasize how these cell-based findings could inform future in vivo studies or mammalian cancer modeling.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript offers valuable structural and mechanistic insights into the structure and assembly of the Type II internal ribosome entry site (IRES) from encephalomyocarditis virus (EMCV) and the translation initiation complex, revealing a direct interaction between the IRES and the 40S ribosomal subunit. While a solid cryo-EM method was used, enhancing the overall resolution or adding complementary biochemical data would further improve the clarity and impact of this study. This manuscript will attract researchers in cap-independent translation, host-pathogen interactions, and virology.

      We thank the editorial team for a favourable assessment and for mentioning our work as ‘valuable’. In the following sections, we have addressed the weaknesses and recommendations pointed out by the Reviewers and hope for an improvement in the description of this work.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have studied how a virus (EMCV) uses its RNA (Type 2 IRES) to hijack the host's protein-making machinery. They use cryo-EM to extract structural information about the recruitment of viral Type 2 IRES to ribosomal pre-IC. The authors propose a novel interaction mechanism in which the EMCV Type 2 IRES mimics 28S rRNA and interacts with ribosomal proteins and initiator tRNA (tRNAi).

      Strengths:

      (1) Getting structural insights about the Type 2 IRES-based initiation is novel.

      (2) The study allows a good comparison of other IRES-based initiation systems.

      (3) The manuscript is well-written and clearly explains the background, methods, and results.

      We thank Reviewer 1 for appreciating our efforts and finding structural insights about the Type 2 IRES-based initiation presented in this study as novel.

      Weaknesses:

      (1) The main weakness of the work is the low resolution of the structure. This limits the possibility of data interpretation at the molecular level.

      However, despite the moderate resolution of the cryo-EM reconstructions, the model fits well into the density. The analysis of the EMCV IRES-48S PIC structure is thorough and includes meaningful comparisons to previously published structures (e.g., PDB IDs - 7QP6 and 7QP7). These comparisons showed that Map B1 represents a closed conformation, in contrast to Map A in the open state (Figure 2). Additionally, the proposed 28S rRNA mimicry strategy supported by structural superposition with the 80S ribosome and sequence similarity between the I domain of the IRES and the h38 region of 28S rRNA (Fig. 4) is well-justified.

      We agree that the low resolution of the map has compromised the data interpretation at the molecular level, and we thank the reviewer for appreciating our findings at this resolution. Due to the low resolution, we have reported findings for stretches or regions such as the domain I loops and stems, rather than individual nucleotides.

      (2) The lack of experimental validation of the functional importance of regions like the GNRA and RAAA loops is another limitation of this study.

      We agree about the lack of additional experiments other than Cryo-EM for probing the importance of regions such as GNRA and RAAA loops in this study. Previously, multiple studies have reported on the importance of GNRA and RAAA loops and we have cited them in the manuscript. The essentiality of RAAA loop for type 2 IRES was demonstrated in earlier report López de Quinto and Martínez-Salas, 1997 (Cited in manuscript). Further, the conservation of this loop across the type 2 IRES family adds to the importance of this loop (Manuscript Figure 6B). This loop and its flanking G-C stem are similar to h38 of 28S rRNA, and it appears that RAAA loop adopts a mimicry mechanism to interact with the 40S ribosomal protein- uS19, thus highlighting its importance for interaction with 40S. Experiments destabilising the G-C stem also compromise IRES activity, as shown for the case of FMDV IRES (Fernández et al 2011). Previous studies related to the mutation of the GNRA or GCGA loop in EMCV IRES have shown a deficiency in IRES activity (Roberts and Belsham, 1997; Robertson et al 1999), suggesting the importance of these regions in the viral IRES biology, and these reports are cited in the manuscript. Not only EMCV IRES, but mutation in the GUAA (representative of GNRA) loop of FMDV IRES also showed a significant reduction in IRES activity (López de Quinto and Martínez-Salas, 1997). In this work, we observe that the GCGA loop interacts with tRNA<sub>i</sub> in the EMCV IRES-48S PIC, thus implicating the importance of this loop. Moreover, incubation of FMDV IRES with 40S ribosomes has shown a decrease in SHAPE reactivity in domain 3 apex (position 170- 200 nucleotides) (Lozano et al 2018), which corresponds to EMCV IRES domain I apex.

      However, to address this concern in the revised manuscript we mutated these loops and performed luciferase assay (Supplementary figure 4 A). The results showed decreased IRES activity (Pg 10) and correlated with previous reports demonstrating the importance of these regions for overall IRES activity.

      (3) Minor modifications related to data processing and biochemical studies will further validate and strengthen the findings.

      (a) In the cryo-EM data section, the authors should include an image showing rejected particles during 2D classification. This would help readers understand why, despite having over 22k micrographs with sufficient particle distribution and good contrast, only a smaller number of particles were used in the final reconstruction. Additionally, employing map-sharpening tools such as Ewald sphere correction, Bayesian polishing, or reference-based motion correction might further improve the quality of the maps. Targeting high-resolution structures would be particularly informative.

      We have included the image for rejected 2D classes (Author response image 1). We agree with the Reviewer’s query related to the huge number of micrographs and relatively smaller number of particles for the final reconstruction. Since the total number of micrographs (22000) is the summation of multiple datasets, prepared and collected at different times, the distribution of the particles per micrograph was not uniform in all sessions, ranging from good to poor. Among these, around 8000 micrographs have poor particle number and distribution. As a result, the number of particles per micrograph is heterogeneous across the compiled dataset, and only 237054 ribosomal particles were obtained after multiple rounds of 2D and 3D classification. Further, the final reconstruction was performed using particles obtained after masked classification for IRES and ternary complex density. Only the particles that show the best density for both IRES and ternary complex are used for this map. Another set of particles that have only a portion of IRES and NO density for ternary complex forms another map. And we have a third map with an empty 40S.

      We thank the reviewer for the suggestions to improve the quality of the maps further. As suggested, we started with the processing of the data. However, during this process the common computational cluster that were using for this data processing had to be physically relocated, and unfortunately after the relocation we faced technical issues in accessing and continuing with the processing. Several attempts to resolve the issue with the help of IT team failed. Thus, we lost 3-4 months without any progress. Therefore, we used Relion on our in-house workstation to process the data files from the start, as our in-house computational resources are unequipped to run cryoSPARC processes (for large dataset due to memory limitations).

      We reprocessed the datasets in Relion5 and did ‘Bayesian Processing’, for reference-based beam-induced motion correction per-particle. Post-processing, we used cryoSPARC to merge the particles and tried classifying the good ribosome particles using focus-based masked classification, as shown in Supplementary Figure 1.1. However, this processing did not improve the resolution, as Map B (containing 40S, tRNA, IRES) had an overall resolution of 4.8 Å (Author response image 2). Therefore, we would like to report the same maps as given in the initial submission.

      We estimated the time to redo the entire processing using cryoSPARC on the common computational cluster, and it would take us another 3-4 months or more and we do not anticipate a massive improvement in the extra density.

      Author response image 1.

      The selected 2D classes and the rejected 2D classes from initial round of classification, and the final selected 2D classes, which were subjected to Ab-initio reconstruction to get the good ribosome particles.

      Author response image 2.

      Reprocessing of the entire dataset using Relion5 for polishing of selected particles, followed by 3D classification and refinements in cryoSPARC.

      (b) The strategic modelling of different IRES domains into the density, particularly the domain into the region above the 40S head, is appreciable. However, providing the full RNA tertiary structure (RNAfold) of the EMCV IRES (nucleotides 280-905) would better explain the logic behind the model building and its molecular interpretation.

      We thank the reviewer for appreciating the modelling of the domain I apex in the cryo-EM density. We tried to predict the full tertiary structure of the IRES using Alphafold3; however, inclusion of the full-length sequence from 280-905 gave models of extremely low confidence (Author response image 3), and a few domains do not abide by the secondary structure of EMCV IRES as reported in Duke et al 1992.

      Author response image 3.

      Prediction of tertiary structure of EMCV IRES (280-905 nucleotides) and zoomed features for each domain present in the IRES. The predicted aligned error plot for the RNA structure is shown.

      We used individual domains of EMCV IRES and predicted the tertiary structure, independent of other IRES domain using Alphafold3. As a result, the confidence scores improved, and the tertiary structures also correlated with the experimentally determined EMCV IRES secondary structure (Duke et al 1992; Maloney and Joseph, 2024). Although the overall tertiary structure of EMCV IRES is lacking, recent studies were able to solve the structures of EMCV IRES domains in complex with their respective binding proteins. We superimposed the independently predicted domains D, E, and F tertiary structure on the NMR ensemble of IRES domain D to F with PTB1 (Dorn et al 2023), where the predicted domains fit in the experimental model. Similarly, we used the cryo-EM structure of domain J-K-eIF4G-eIF4A (Imai et al 2023) and found a close fit with the predicted structures. The analysis highlighted that the domain I apex serves as the best fit with the extra density with respect to architecture and fitting. This analysis is now added in the revised manuscript in Supplementary figure- 3.2.

      Furthermore, 3D structural models of FMDV IRES domains 2, 3, and 4 (corresponding to EMCV IRES domains- H, I, and J-K) were predicted from SHAPE reactivity values and RNAComposer server (Figure 3, Lozano et al 2018). The predicted architecture of domain 3 apex (FMDV IRES) coincides with our domain I apex model (EMCV IRES).

      (c) Although the authors compare their findings with other types of IRESs (Types 1, 3, and 4), there is no experimental validation of the functional importance of regions like the GNRA and RAAA loops. Including luciferase-based assays or mutational studies of these regions for validation of structural interpretations is strongly recommended.

      We have discussed the possibility of how the other IRESs, such as type 1 and type 5, might use similar strategies as EMCV IRES to assemble the 48S PIC, given the similarity in the motif sequence and position across the viral IRESs. Like EMCV IRES, the type 1 IRES (Poliovirus, Coxsackie virus, etc.) also harbours the GNRA loop, preceded by a C-rich loop at its longest domain, known for long-range RNA-RNA interactions. The segment harbouring GNRA loop is highly conserved across the type 1 family of IRESs (Kim et al 2015). The Aichi viral IRES harbours a GNRA loop in its longest domain, that is, domain J. Deletion of the GNRA loop has compromised the IRES activity; however, substitution mutations in this region have elevated the IRES activity or remained unaltered (Yu et al 2011). We have hypothesized that these IRESs might use the GNRA motifs in their longest domain (domain IV in type 1, and domain J in Aichi virus- type 5) based on the location and architecture to that of EMCV IRES, where GNRA is present in the longest domain (I) and preceded by a C-rich loop where it can potentially mediate long-range interactions with tRNA<sub>i</sub>, as all these IRESs require eIF2-ternary complex for the formation of 48S PIC. Parallelly, like EMCV IRES, type 1 and type 5 IRESs have the placement of this GNRA motif-containing domain before the eIF4G-binding domain. Thus, we suggest the possibility of adoption of a similar strategy by these IRESs to interact with tRNA<sub>i</sub> during the formation of 48S PIC. During the revision of this work a preprint reported the structure of polioviral IRES-48S PIC where domain IV apex (similar to domain I apex in EMCV IRES) interacts with uS13 and uS19, and the GNRA loop directly interacts with tRNA<sub>i</sub> during start codon recognition (Velazquez et al 2025). We hypothesize that Aichiviral IRES might use this motif to mediate long-range interactions with tRNA<sub>i</sub>, similar to type 1 and type 2 IRESs, as all these IRESs require eIF2-ternary complex for the formation of 48S PIC.

      Reviewer #2 (Public review):

      Summary:

      The field of protein translation has long sought the structure of a Type 2 Internal Ribosome Entry Site (IRES). In this work, Das and Hussain pair cryo-EM with algorithmic RNA structure prediction to present a structure of the Type 2 IRES found in Encephalomyocarditis virus (EMCV). Using medium to low resolution cryo-EM maps, they resolve the overall shape of a critical domain of this Type 2 IRES. They use algorithmic RNA prediction to model this domain onto their maps and attempt to explain previous results using this model.

      Strengths:

      (1) This study reveals a previously unknown/unseen binding modality used by IRESes: a direct interaction of the IRES with the initiator tRNA.

      (2) Use of an IRES-associated factor to assemble and pull down an IRES bound to the small subunit of the ribosome from cellular extracts is innovative.

      (3) Algorithmic modeling of RNA structure to complement medium to low resolution cryo-EM maps, as employed here, can be implemented for other RNA structures.

      We thank Reviewer 2 for positive and encouraging comments on our work, appreciating our ‘innovative’ approach of using IRES-associated factor to assemble and pull down the IRES-bound ribosomal complex.

      Weaknesses:

      (1) Maps at the resolution presented prevent unambiguous modelling of the EMCV-IRES. This, combined with the lack of any biochemical data, calls into question any inferences made at the level of individual nucleotides, such as the GNRA loop and CAAA loop (Figure 4).

      We understand the concerns raised by the reviewer related to the resolution of the EMCV IRES-48S PIC map. We refrained from commenting on individual nucleotides or molecular interactions in the manuscript. Instead, we discuss loops, RNA stretches or motifs that could be inferred with more confidence in the IRES density as shown in Figure 4. The EMCV IRES can directly interact with the 40S ribosome using its domain H and I (Chamond et al 2014), however, the details of this interaction were unknown. We observe that the CAAA loop of domain I apex interacts with 40S ribosome based on the placement of a portion of domain I in the cryo-EM map. This is also reflected in the SHAPE data (Chamond et al 2014-Supplementary figures 2, and 8), where a decrease in reactivity is evident in the presence of 40S ribosome. In addition, incubation of EMCV IRES with rabbit reticulocyte lysate (RRL) offered protection to domain I apex regions, which included the CAAA loop (Maloney and Joseph, 2024- Figure 4b).

      Furthermore, this decrease in SHAPE reactivity pattern is evident for FMDV IRES domain 3 apex (similar to domain I in EMCV IRES) in the presence of 40S ribosome (Lozano et al 2018). Thus, these studies are consistent with the placement of IRES model in the cryo-EM map. Moreover, we performed structural analysis (mentioned above) which showed that the domain I apex serves as the best fit with the extra density with respect to architecture and fitting (Supplementary figure- 3.2).

      (2) The EMCV IRES contains an upstream AUG at position 826, where the PIC can assemble (Pestova et al 1996; PMID 8943341). It is unclear if this start codon was mutated in this study. If it were not mutated, placement of AUG-834 over AUG-826 in the P-site is unexplained.

      We thank the reviewer for bringing up this point, as we missed mentioning this in the initial submission. The EMCV IRES does not require scanning and directly positions the AUG-834 at the P site (Pestova et al 1996). In Pestova et al 1996, the intensity of the toeprint at AUG-834 is more intense than that of AUG-826. Further, AUG-834 lies in the Kozak context, whereas AUG-826 has a poor Kozak context, and AUG-826 codon is not in-frame with AUG-834. Therefore, the synthesis of the polypeptide requires AUG-834 at the P site. In our cryo-EM map, we observed that the tRNA<sub>i</sub> is in a P<sub>IN</sub> state, which indicates the recognition of the start codon, and we reasoned that it is more likely that AUG-834 is placed at the P site than AUG-826. We have mentioned this in the revised manuscript as we had NOT mutated AUG-826 (Pg 8).

      (3) The claims the authors make about (i) the general overall shape and binding site of the IRES, (ii) its gross interaction with the two ribosomal proteins, (iii) the P-in state of the 48S, (iv) the rearrangement of the ternary complex are all warranted. Their claims about individual nucleotides or smaller stretches of the IRES-without any supporting biochemical data-is not warranted by the data.

      We thank the reviewer for warranting major claims, and due to the low-resolution we have reported findings for stretches or regions such as the domain I loops and stems, rather than individual nucleotides. The interaction of domain I apical region with uS13, uS19, and tRNA<sub>i</sub> is also observed the high-resolution structure of reconstituted EMCV IRES-48S PIC that was reported in a preprint while our work was under peer review process (Bhattacharjee et al 2025). Thus, the reconstituted EMCV IRES-48S PIC (Bhattacharjee et al 2025) also supports our assignment of domain I and its conserved loops, interacting with ribosome and tRNA<sub>i</sub>.

      Reviewer #3 (Public review):

      Summary:

      Type II IRES, such as those from encephalomyocarditis virus (EMCV) and foot-and-mouth disease virus (FMDV), mediate cap-independent translation initiation by using the full complement of eukaryotic initiation factors (eIFs), except the cap-binding protein eIF4E. The molecular details of how IRES type II interacts with the ribosome and initiation factors to promote recruitment have remained unclear. Das and Hussain used cryo-electron microscopy to determine the structure of a translation initiation complex assembled on the EMCV IRES. The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Strengths:

      The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Weaknesses:

      While this reviewer acknowledges the technical challenges inherent in determining the structure of such a highly flexible complex, the overall resolution remains insufficient to fully support the authors' conclusions, particularly given that cryo-EM is the sole experimental approach presented in the manuscript.

      The study is biologically significant; however, the authors should improve the resolution or include complementary biochemical validation.

      We thank Reviewer 3 for acknowledging the technical challenges in this study and finding our study biologically significant. We understand the concerns related to low resolution and the requirement of complementary biochemical validation for our reported observations and interpretations in the manuscript. We tried to improve the resolution, but the improvement was not sufficient to resolve the IRES at the nucleotide level. Independently, another group has reported the same findings at a higher resolution while our work was under peer review process (Bhattacharjee et al 2025), which corroborates our structural data on EMCV IRES and its interaction with ribosome and tRNA<sub>i</sub> in its 48S PIC stage. Further, in the revised manuscript we also present biochemical validation for GNRA and RAAA loops in EMCV IRES. We mutated these loops and performed luciferase assay (Supplementary figure 4 A). The results showed decreased IRES activity (Pg 10) and correlated with previous reports (Roberts and Belsham, 1997; López de Quinto and Martínez-Salas, 1997; Robertson et al 1999) demonstrating the importance of these regions for overall IRES activity.

      Reviewing Editor Comments:

      The reviewers' comments are appended. While the reviewers acknowledge the complexity associated with this system, they also raised concerns about the modeling of RNA and registering its sequence in low-resolution maps. We believe that the strength of evidence and overall impact of your study can be elevated by providing higher-resolution cryo-EM data or complementary biochemical studies and addressing reviewers' concerns.

      Reviewer #2 (Recommendations for the authors):

      (1) Science:

      Have the authors tried a focused refinement (local refinement in cryoSPARC) using a generous mask that encloses the head and the IRES but excludes the ternary complex and the body of the 40S? This can be done with all the particles in map B (~55K) and has the possibility of improving the resolution of domain I which can be subsequently used to build a better model of the IRES. See the middle right panel, light yellow colored mask in Figure 1A in PMID 37659578 for the type of mask being suggested.

      We did another round of 2D classification to eliminate any residual junk in the ~55k particle set, corresponding to Map B. Post classification, 49439 particles were selected and refined using non-uniform refinement to get Map B11. The overall resolution of Map B11 was 4.6 Å. Thereafter, we made a mask around the 40S head-IRES-tRNA on Map B11 and subjected the class for local refinement. The overall local resolution in the masked region improved to 4.5 Å (Author response image 4).

      Author response image 4.

      Data processing- Map B particles were 2D classified, and further junk was cleared as rejected particles. The selected particles were refined using non-uniform refinement to get Map B11, and later, a focused mask circling the head-tRNA-IRES region was used for local refinement in the region to yield map B111.

      We estimated the local resolution across the focused region in Map B111 and compared this with that of Map B (Author response image 5). The local refinement shows minor improvement in the local resolution in this region, and is not sufficient to resolve the IRES density at the level of nucleotides.

      Author response image 5.

      Comparison of local resolution across head-IRES-tRNA in map B1 (as reported in the manuscript) and Map B111.

      (2) Presentation:

      (a) Please use the previously established convention of naming the domains: "domain I", "domain H", etc, instead of "I domain" or "J-K domain" while describing parts of the IRES.

      We have made the changes as per the established convention.

      (b) Figure 2B reports a 6.9 A distance vs. 7 A in the text. Please use ~ or approximately to keep numbers consistent.

      We have used ~ symbol to suggest the approximate distance.

      (c) References missing on page 15 when referring to "previously determined HCV and CrPV structures".

      We have added the references (Pg 12).

      (d) Please edit the text for typos and sentence structure.

      The typos and sentence structure were corrected wherever necessary.

      (e) Some phrases and sentences (e.g. last few sentences of the first paragraph in the discussion) could be rewritten for clarity.

      Previous sentence- “The domain I of EMCV IRES is similar to domain IV of polioviral IRES (or other type 1 IRESs such as Coxsackie viral IRES) in terms of length, secondary structure, and conserved motifs (GNRA, C-rich) positioning (Fig. 6C), therefore, anticipating a similar interaction with tRNA<sub>i</sub>, highlighting a sequestering tendency by competing with cellular mRNAs.”

      Rephrased sentence- “Like EMCV IRES, the type 1 IRES (Poliovirus, Coxsackie virus, etc.) also harbours the GNRA loop, preceded by a C-rich loop at its longest domain, known for long-range RNA-RNA interactions. The segment harbouring GNRA loop is highly conserved across the type 1 family of IRESs (Kim et al 2015). The domain I of EMCV IRES is similar to domain IV of polioviral IRES or other type 1 IRESs in terms of length, secondary structure, and conserved motifs (GNRA, C-rich) positioning (Fig. 6C). Therefore, we anticipate a similar interaction of domain IV (in type 1 IRES class) with tRNA<sub>i</sub>. Also, this interaction of IRES with tRNA<sub>i</sub> could be a strategy by which these IRESs can sequester the tRNA<sub>i</sub> pool in the cell, rendering them unavailable for capped cellular mRNAs.”

      Reviewer #3 (Recommendations for the authors):

      (1) For the revision process, the authors provided three atomic models alongside their corresponding cryo-EM density maps, including a 48S complex in closed conformation. Given this conformation, it is reasonable to interpret the structure as representing a post-start codon recognition state (late-stage initiation). However, this reviewer finds that the local resolution within the mRNA channel is insufficient to support the atomic model building as presented. The density does not allow for an unambiguous assignment of nucleotides in this region; the authors should either improve the local resolution or remove the modeled mRNA from the structure.

      We understand the concern of the Reviewer. Although the mRNA density in the channel is poor, we modelled the mRNA with AUG-834 at the P site because the known biology of EMCV IRES. The EMCV IRES does not require scanning and directly positions the AUG-834 at the P site (Pestova et al 1996). In Pestova et al 1996, the intensity of the toeprint at AUG-834 is more intense than that of AUG-826. Further, AUG-834 lies in the Kozak context, whereas AUG-826 has a poor Kozak context, and AUG-826 codon is not in-frame with AUG-834. Therefore, the synthesis of the polypeptide requires AUG-834 at the P site. In our cryo-EM map, we observed that the tRNA<sub>i</sub> is in a P<sub>IN</sub> state, which indicates the recognition of the start codon, and we reasoned that it is very likely that AUG-834 is placed at the P site.

      (2) As noted by the authors, the start codon in the EMCV IRES is positioned within a strong Kozak sequence. The nucleotide at position -3 is known to interact with eIF2α, yet, in the current model, A831 is positioned such that physical contact with eIF2α would be structurally impossible. This discrepancy raises concerns about the accuracy of the modeled eIF2α, which, like other regions of the structure, is not clearly supported by the cryo-EM density. The authors should revise the atomic model of eIF2α to ensure it is consistent with the experimental map and established molecular interactions.

      In our analysis of EMCV IRES-48S PIC, we could observe eIF2α and eIF2γ in Map B and B1. However, the local resolution was low to model the entire protein with side-chains (Supplementary figure 1.2 A). So, we used rigid body fitting of eIF2α and eIF2γ (Author response image 6). From the model, we could trace the backbone of Arg55, however could not resolve the side chain. Similarly, the mRNA in the channel was modelled based on placement of AUG-834 at the P site for EMCV IRES, which enabled us to model the flanking residues, rather than at the nucleotide-level resolution. We anticipate that a higher resolution structure will be able to capture this interaction of eIF2α with mRNA nucleotide (-3), therefore refrained from commenting on this interaction in the manuscript. In the revised manuscript, we have removed the side chains of eIF2α and eIF2γ, and kept the Cα-backbone only. The map-model statistics of map B1 is updated in table 1.

      Author response image 6.

      (left) Fitting of eIF2α model in the map. (right) Fitting of Cα backbone of eIF2α and mRNA in the map.

      (3) The authors observed additional density interacting with ribosomal proteins uS19 and uS13, and tRNA, which they tentatively assign to domain I of the IRES. Although the local resolution in this region does not allow an unambiguous assignment, the interpretation is reasonable. However, further structural and functional validation is necessary to support this assignment. The authors should improve the local resolution, either by performing focused refinement or by increasing the number of particles used in the reconstruction.

      The assignment of the extra density to domain I of the IRES was based on the architecture of the density. This density allows no other IRES domain to fit in this region (Supplementary figure 3.2). We tried to improve the local resolution using focused refinement, but the resolution was insufficient to resolve the IRES at the nucleotide level. Please see the above-mentioned comments in this regard on Pg 12.

      (4) Figure 5 shows a slight shift in the position of the ternary complex. Is the observed tRNA conformation compatible with the structural rearrangements required for 60S subunit joining?

      During the transition of 48S PIC to 80S elongation-competent complex, there are major changes in the conformation of tRNA<sub>i</sub>, due to the joining of eIF5B, and release of eIF2 (Petrychenko et al 2024). This joining event of eIF5B positions the tRNA<sub>i</sub> elbow and acceptor stem towards the 40S body to aid 60S ribosomal subunit joining (Petrychenko et al 2024). However, in the context of EMCV IRES-48S PIC, we observed that the position of tRNA<sub>i</sub> elbow and acceptor stem is towards the 40S head, and away from the body. On superimposing the human 48S PIC structure (before 60S joining), 48S-5 (PDB Id- 8PJ5- Petrychenko et al 2024), we note that tRNA<sub>i</sub> in EMCV IRES-48S PIC is away from the canonical tRNA<sub>i</sub> position (in contact with eIF5B). Therefore, we anticipate a change in tRNA<sub>i</sub> conformation during eIF5B joining and eIF2 release. This hypothesis coincides with the fact that the IRES interacting with the tRNA<sub>i</sub> elbow needs to be displaced from the position to facilitate the interaction of tRNA<sub>i</sub> with eIF5B. Moreover, this rearrangement would also aid in 60S joining and prevent any clash with the IRES domain I. We have added this in Results selection 5 and Figure 5D.

      (5) In the discussion section, the authors state: "eIF3-eIF4G interaction is dispensable for EMCV IRES-48S PIC formation, so we do not rule out the possibility that EMCV IRES may dislodge eIF3 from its position on the solvent surface as observed in the case of HCV IRES (Hashem et al, 2013)." This statement is highly speculative. Is there any experimental or structural evidence to support this proposed mechanism in the context of EMCV IRES?

      Previous biochemical reports on the eIF3-eIF4G interaction suggested that eIF4G residues from 1011-1104 interact with eIF3 (Villa et al 2013). In the context of EMCV IRES, this region of eIF4G is not required to form 48S PIC on the IRES, suggesting the eIF3-eIF4G interaction is dispensable for EMCV IRES-48S PIC formation. However, the recent structure of the human canonical 48S PIC has shown that the eIF4G-HEAT1 domain can interact with eIF3 subunits c, h, and l, and that eIF4G-bound eIF4A can interact with 40S ribosomal protein eS7, thus mediating the interaction between eIF4-bound mRNA and the 43S PIC (Brito Querido et al 2024) but the known eIF3-binding region in eIF4G was not captured in the map. Although the canonical eIF3-eIF4G interaction is essential in the case of cap-dependent initiation, this interaction could be dispensable for 48S PIC formation on EMCV IRES. In case of HCV IRES-mediated initiation, eIF3 is displaced from its canonical position that facilitates the binding of HCV IRES to 40S ribosomal subunit (Hashem et al 2013). We did not see any density corresponding to eIF3 in the obtained maps. Further, we have used focused classification using a mask on the canonical eIF3 position; however, we do not see any density corresponding to eIF3 in the EMCV IRES-48S PIC complex. Therefore, we hypothesized the possibility that eIF3 might be dislodged from its canonical binding site on the 40S ribosomal subunit. However, as per the recent independent report on EMCV IRES-48S PIC, eIF3 is present in the complex (Bhattarcharjee et al 2025).

      Hence, we have rephrased the existing sentence- “However, eIF3-eIF4G interaction is dispensable for EMCV IRES-48S PIC formation, so we do not rule out the possibility that EMCV IRES may dislodge eIF3 from its position on the solvent surface as observed in case of HCV IRES (Hashem et al 2013).”

      Rephrased sentence- “However, the canonical eIF3-eIF4G interaction (Villa et al 2013) is dispensable for EMCV IRES-48S PIC formation (Lomakin et al 2000; Sweeney et al 2014), and we do not see any density for eIF3 even after focused classification. However, as per the recent independent report on reconstituted EMCV IRES-48S PIC, eIF3 is present in the complex at the canonical position (Bhattarcharjee et al 2025). This position of eIF3 further highlights the possibility that eIF4G-eIF4A proteins are also placed similarly to the canonical eIF3-eIF4G-eIF4A position (Brito Querido et al 2024) in context to EMCV IRES-48S PIC. Thus, placing eIF4G-domain J-K close to ES6 of 40S ribosome, which coincides with the previous hydroxyl radical cleavage assay (Yu et al 2011).”

      (6) eIF4A has been shown to directly interact with eIF3 and facilitate recruitment of the 43S PIC. Does the interaction of the J-K domain with eIF4G/eIF4A, compatible with the known eIF4A-eIF3 interaction within the 43S PIC? In other words, during EMCV IRES-mediated initiation, could the eIF4A-eIF3 interaction functionally substitute for the eIF4G-eIF3 interaction?

      Reports on EMCV IRES-mediated translation initiation have shown eIF4G as an essential component of 48S PIC formation (Pestova et al 1996; Lomakin et al 2000; Kolupaeva et al 2003; Sweeney et al 2014), where eIF4G directly interacts with domain J-K of IRES and eIF4A, thus enabling loading of eIF4A on the IRES. In our study, the cryo-EM map of EMCV IRES-48S PIC lacks density for eIF3 and eIF4 proteins, and locating eIF4F is challenging due to the inherent flexibility associated with the complex. Previous studies on EMCV IRES-48S PIC have mapped the location of eIF4G close to ES6 towards the platform side of the body and eIF3 using the hydroxyl radical cleavage assay (Yu et al 2011). The human 48S initiation complex structures have shown a similar location for eIF4G, which is at the mRNA exit site, contacting eIF3 (Brito Querido et al 2020; Brito Querido et al 2024). On overlapping the 18S rRNA of EMCV IRES-48S PIC to that of the human 48S PIC in closed conformation (PDB Id- 8OZ0), and further superimposing the J-K-St- eIF4G- eIF4A (PDB Id- 8HUJ) on human 48S PIC (PDB Id- 8OZ0) with respect to HEAT1 of eIF4G, the domain J-K becomes positioned at the subunit face of 40S body, close to ES6 (Author response image 7). This correlates with the previously reported position for eIF4G with respect to EMCV IRES-48S PIC (Yu et al 2011). The predicted model shows no clashes with the canonical eIF4A-eIF3/ eIF4G-eIF4A-eIF3 interaction, or with the domain J-K-eIF4G-eIF4A model. Thus, highlighting a possibly compatible interaction axis among eIF3-eIF4G-eIF4A-domain J-K of IRES.

      Author response image 7.

      (upper left) Location of eIF4G-eIF4A in canonical human 48S PIC (PDB Id- 8OZ0). (upper right) Superimposition of 18S rRNA from human 48S and EMCV IRES 48S. (lower left) Superimposition of Human Closed 48S PIC structure (PDB Id- 8OZ0) on EMCV IRES-48S PIC model and placement of EMCV IRES- J-K domain-HEAT1-eIF4A structure (PDB Id- 8HUJ) with respect to eIF4G-HEAT1 domain. (lower right) Predicting location of eIF3 and eIF4 proteins in EMCV IRES-48S PIC.

      (7) Assuming that the additional density near the ternary complex corresponds to Domain I of the IRES and that the codon in the P site represents the EMCV AUG start codon, what is the authors' mechanistic model for EMCV IRES-mediated initiation? Specifically, how is the mRNA positioned or inserted into the 40S mRNA channel in the absence of canonical scanning? As it stands, the discussion does not sufficiently address this key aspect of the EMCV initiation mechanism.

      The EMCV IRES start codon (A-834) is directly placed in the P site (Pestova et al 1996), and the captured complex harboured the initiator tRNA in P<sub>IN</sub> state with AUG at the P site. This start codon is preceded by domains J-K-L, where the J-K domain interacts with eIF4 proteins via eIF4G1-HEAT1 domain, and L domain is 20 residues upstream of the AUG and known to interact with eIF4B (Pestova et al 1996; de Quinto et al 2001). Based on the position and binding partners for these domains, the domain L could be placed at the mRNA exit site, preceded by domain J-K, which could be placed close to eIF4G-eIF4A position on EMCV IRES 48S PIC, near expansion segment 6 (ES6). The domain J-K can interact with eIF4G, localized close to the left foot or ES6 as per previous biochemical experiments (Yu et al 2011). This suggests that position of eIF4G and eIF4A could be the same as that of cap-dependent initiation where it can interact with eIF3 core subunits as well as the IRES domain J-K and the predicted path of mRNA from the exit site can follow the path of mRNA in human closed 48S PIC (PDB Id- 8OZ0), where it interacts with eIF3 core.

      Examining the path of RNA in channel from the G-825 (exit site) to C-785 (domain J-K), we found the shortest distance is ~ 173 Å. This bridge could be filled by a single-stranded stretch of 40 nucleotides. However, the presence of domain L (stem loop- residues- 782 to 810) might hinder the placement of A-834 in the P-site (Author response image 8). We anticipate that to accommodate the start codon at the P site, either the domain L stem loop is resolved, which is an energetically expensive process (free energy of the thermodynamic ensemble is -11.12 kcal/mol, predicted using RNAfold). Another way could be a change in the orientation or conformation of domain J-K such that the start codon is directly placed at the P site without resolving domain L.

      Author response image 8.

      (left) The shortest distance between the last fitted residue- 825th of EMCV IRES to 785th of J-K domain of IRES (keeping eIF4G position same as that of PDB Id- 8OZ0) is 173 Å. (right) Tracing the path of mRNA (red) upstream of AUG coming out of the exit site of 40S ribosome and the possible position of eIF4G on EMCV IRES-48S PIC. Addition of nucleotides between C-785 and G-825 would fill the gap. The route of predicted mRNA from the exit channel is based on the mRNA (green) exiting the channel (PDB Id- 8OZ0).

      The domain I is followed by domain J-K, close to the left foot of the 40S ribosomal subunit as per previous biochemical experiments (Yu et al 2011). However, the minimum distance connecting the I domain at 601st nucleotide to 682nd nucleotide of domain J-K (at the predicted location) is ~300 Å, which might be difficult to be covered by 80 nucleotides (from 601 to 682), present as a double helical strand. We suppose there could be instances of J-K domain repositioning in the EMCV IRES-48S PIC such that the I domain apical region can contact the 40S head and simultaneously place the start codon at the P site (Author response image 9).

      Author response image 9.

      Rotated views of EMCV IRES domains- I apical part in contact with 40S head and tRNAi and predicted location of J-K domain in contact with eIF4G, close to the left foot of 40S (predicted from PDB Id- 8OZ0). The minimum distance connecting 601st nucleotide in I domain to 682nd nucleotide in J-K domain is 295.5 Å.

      We lack any details on the other IRES domains, such as domain I lower stem, domain J-K, or L; therefore, we refrained from commenting on these in our manuscript.

      (8) Supplementary Figure 1 is missing labels for the RNA ladders.

      The size of the DNA ladder used is mentioned.

      References:

      Bhattacharjee S, Abaeva IS, Brown ZP, Arhab Y, Fallah H, Hellen CUT, Frank J, Pestova TV. The mechanism of ribosomal recruitment during translation initiation on Type 2 IRESs. bioRxiv [Preprint]. 2025 Jun 11:2025.06.11.659010. doi: 10.1101/2025.06.11.659010. PMID: 40568087; PMCID: PMC12191231.

      Brito Querido J, Sokabe M, Díaz-López I, Gordiyenko Y, Fraser CS, Ramakrishnan V. The structure of a human translation initiation complex reveals two independent roles for the helicase eIF4A. Nat Struct Mol Biol. 2024 Mar;31(3):455-464. doi: 10.1038/s41594-023-01196-0. Epub 2024 Jan 29. PMID: 38287194; PMCID: PMC10948362.

      Brito Querido J, Sokabe M, Kraatz S, Gordiyenko Y, Skehel JM, Fraser CS, Ramakrishnan V. Structure of a human 48S translational initiation complex. Science. 2020 Sep 4;369(6508):1220-1227. doi: 10.1126/science.aba4904. PMID: 32883864; PMCID: PMC7116333.

      Chamond N, Deforges J, Ulryck N, Sargueil B. 40S recruitment in the absence of eIF4G/4A by EMCV IRES refines the model for translation initiation on the archetype of Type II IRESs. Nucleic Acids Res. 2014;42(16):10373-84. doi: 10.1093/nar/gku720. Epub 2014 Aug 26. PMID: 25159618; PMCID: PMC4176346.

      Dorn G, Gmeiner C, de Vries T, Dedic E, Novakovic M, Damberger FF, Maris C, Finol E, Sarnowski CP, Kohlbrecher J, Welsh TJ, Bolisetty S, Mezzenga R, Aebersold R, Leitner A, Yulikov M, Jeschke G, Allain FH. Integrative solution structure of PTBP1-IRES complex reveals strong compaction and ordering with residual conformational flexibility. Nat Commun. 2023 Oct 13;14(1):6429. doi: 10.1038/s41467-023-42012-z. PMID: 37833274; PMCID: PMC10576089.

      Duke GM, Hoffman MA, Palmenberg AC. Sequence and structural elements that contribute to efficient encephalomyocarditis virus RNA translation. J Virol. 1992 Mar;66(3):1602-9. doi: 10.1128/JVI.66.3.1602-1609.1992. PMID: 1310768; PMCID: PMC240893.

      Fernández N, Fernandez-Miragall O, Ramajo J, García-Sacristán A, Bellora N, Eyras E, Briones C, Martínez-Salas E. Structural basis for the biological relevance of the invariant apical stem in IRES-mediated translation. Nucleic Acids Res. 2011 Oct;39(19):8572-85. doi: 10.1093/nar/gkr560. Epub 2011 Jul 8. PMID: 21742761; PMCID: PMC3201876.

      Hashem Y, des Georges A, Dhote V, Langlois R, Liao HY, Grassucci RA, Pestova TV, Hellen CU, Frank J. Hepatitis-C-virus-like internal ribosome entry sites displace eIF3 to gain access to the 40S subunit. Nature. 2013 Nov 28;503(7477):539-43. doi: 10.1038/nature12658. Epub 2013 Nov 3. PMID: 24185006; PMCID: PMC4106463.

      Imai S, Suzuki H, Fujiyoshi Y, Shimada I. Dynamically regulated two-site interaction of viral RNA to capture host translation initiation factor. Nat Commun. 2023 Aug 28;14(1):4977. doi: 10.1038/s41467-023-40582-6. PMID: 37640715; PMCID: PMC10462655.

      Kim H, Kim K, Kwon T, Kim DW, Kim SS, Kim YJ. Secondary structure conservation of the stem-loop IV sub-domain of internal ribosomal entry sites in human rhinovirus clinical isolates. Int J Infect Dis. 2015 Dec;41:21-8. doi: 10.1016/j.ijid.2015.10.015. Epub 2015 Oct 27. PMID: 26518063.

      Lomakin IB, Hellen CU, Pestova TV. Physical association of eukaryotic initiation factor 4G (eIF4G) with eIF4A strongly enhances binding of eIF4G to the internal ribosomal entry site of encephalomyocarditis virus and is required for internal initiation of translation. Mol Cell Biol. 2000 Aug;20(16):6019-29. doi: 10.1128/mcb.20.16.6019-6029.2000. PMID: 10913184; PMCID: PMC86078.

      López de Quinto S, Martínez-Salas E. Conserved structural motifs located in distal loops of aphthovirus internal ribosome entry site domain 3 are required for internal initiation of translation. J Virol. 1997 May;71(5):4171-5. doi: 10.1128/JVI.71.5.4171-4175.1997. PMID: 9094703; PMCID: PMC191578.

      Lozano G, Francisco-Velilla R, Martinez-Salas E. Ribosome-dependent conformational flexibility changes and RNA dynamics of IRES domains revealed by differential SHAPE. Sci Rep. 2018 Apr 3;8(1):5545. doi: 10.1038/s41598-018-23845-x. PMID: 29615727; PMCID: PMC5882922.

      Maloney A, Joseph S. Validating the EMCV IRES Secondary Structure with Structure-Function Analysis. Biochemistry. 2024 Jan 2;63(1):107-115. doi: 10.1021/acs.biochem.3c00579. Epub 2023 Dec 11. PMID: 38081770; PMCID: PMC10896073.

      Pestova TV, Hellen CU, Shatsky IN. Canonical eukaryotic initiation factors determine initiation of translation by internal ribosomal entry. Mol Cell Biol. 1996 Dec;16(12):6859-69. doi: 10.1128/MCB.16.12.6859. PMID: 8943341; PMCID: PMC231689.

      Petrychenko V, Yi SH, Liedtke D, Peng BZ, Rodnina MV, Fischer N. Structural basis for translational control by the human 48S initiation complex. Nat Struct Mol Biol. 2024 Sep 17. doi: 10.1038/s41594-024-01378-4. Epub ahead of print. PMID: 39289545.

      Roberts LO, Belsham GJ. Complementation of defective picornavirus internal ribosome entry site (IRES) elements by the coexpression of fragments of the IRES. Virology. 1997 Jan 6;227(1):53-62. doi: 10.1006/viro.1996.8312. PMID: 9007058.

      Robertson ME, Seamons RA, Belsham GJ. A selection system for functional internal ribosome entry site (IRES) elements: analysis of the requirement for a conserved GNRA tetraloop in the encephalomyocarditis virus IRES. RNA. 1999 Sep;5(9):1167-79. doi: 10.1017/s1355838299990301. PMID: 10496218; PMCID: PMC1369840.

      Sweeney TR, Abaeva IS, Pestova TV, Hellen CU. The mechanism of translation initiation on Type 1 picornavirus IRESs. EMBO J. 2014 Jan 7;33(1):76-92. doi: 10.1002/embj.201386124. Epub 2013 Dec 15. PMID: 24357634; PMCID: PMC3990684.

      Velazquez MA, Nuthalapati SS, Hankinson J, Fominykh K, Lulla V, Sweeney TR, Hill CH. Structural and mechanistic insights into translation initiation on the enterovirus Type 1 IRES. bioRxiv [Preprint]. 2025 Oct 3: 2025.10.04.680434. doi: 10.1101/2025.10.04.680434.

      Yu Y, Sweeney TR, Kafasla P, Jackson RJ, Pestova TV, Hellen CU. The mechanism of translation initiation on Aichivirus RNA mediated by a novel type of picornavirus IRES. EMBO J. 2011 Aug 26;30(21):4423-36. doi: 10.1038/emboj.2011.306. PMID: 21873976; PMCID: PMC3230369.

    1. Author response:

      We thank the editors and reviewers for their careful consideration of our manuscript and for their constructive feedback, which we will address in detail in our revised version. We value that Reviewer 1 considered that “data they compiled and submitted to public databases is a valuable resource for the community.” We are also encouraged by Reviewer #2 when they stated that “The data set is very nice, and the annotations are extremely rigorous and more in-depth than other datasets that include these tissues, since these investigators have enriched significantly for this tissue of interest. Their use of PAGA to identify potential developmental relationships within the data is rigorous. I also would like to specifically point out how incredibly gorgeous the microscopy of the lmx1bb phenotype is in Figure 7. Wow.” We were encouraged by Reviewer #3’s comments that “The computational analysis is thorough, and the findings are clearly described. In situ hybridization provides corroboration of cell identities in many cases. This resource atlas will be of particular interest for studies of inner ear morphogenesis.”

      We spent a significant effort and time considering and addressing the reviewers’ public criticisms.

      Below we address the criticisms of the reviewers’ Public Reviews individually.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      Many of the clusters have not been annotated or rely on published data. For the ones for which no HCRs or UMAPs are shown, it is therefore difficult to estimate which of the markers are indeed the most cell type/state-specific ones.

      Major comments:

      (1) It would be very useful if the cluster numbers in the Excel files also had the associated cell type annotations as a second column (at least for the ones that are known). E.g., in Supplemental Table 2, the text states which clusters represent which neuromast and ear cell type, but these are not mentioned in the Excel table.

      Thank you for the suggestion, we will include additional annotations in the revised version.

      (2) Many of the clusters have not been annotated or rely on published data. For the ones for which no HCRs or UMAPs are shown, it is therefore difficult to estimate which of the markers are indeed the most cell-type/state-specific ones.

      We recognize the need to evaluate potential new markers, we will include a heat map of markers and clusters to assess cell-type/state specificity in the revised version.

      (3) Uploading the data to gEAR (https://umgear.org/dataset_explorer.html), a web-based, publicly available ear database, would further increase the usefulness of this study to the broader community.

      We appreciate the suggestion to upload to gEAR and will upload to the database in the near future.

      Method:

      The authors should provide the details about how many cells were sequenced for each ear developmental stage, how many cells were present per cluster (page 8), and how many cells were present in each subcluster of ear and lateral line clusters (page 10).

      We will add cell numbers for each cluster in the revised version as an additional column in the supplemental tables.

      Reviewer #2 (Public review):

      Weaknesses:

      A missed opportunity is that the authors describe creating an additional scRNAseq dataset from lmx1bb mutants, but do not show any comparative scRNAseq analyses that would identify broader sets of differentially expressed genes. It seems almost as if a key element of the study was removed at the last minute, and as a result, the discussion of changes in epcam expression in lmx1bb mutants in Figure 7 seems somewhat tacked onto the end of the study and not motivated by the analyses presented in the manuscript.

      Overall, I do not think this study requires any major revisions to be appropriate and useful to the community. This study would be potentially stronger with a more formal analysis of what gene expression changes occurred in otic tissue in lmx1bb mutants, but it is also useful without this. I did have a couple of minor suggestions for the presentation of some aspects that would have made it easier for me as a reader.

      We will include analysis of the lmx1bb mutant data in the revised version and value the suggestions for improved presentation. We will work on irmpoving presentation of the mutant data, including a UMAP with the WT cells in one color and the mutant cells in another color.

      Reviewer #3 (Public review):

      Weaknesses:

      The manuscript is incomplete. Important details that would allow replicable analysis are not provided, with notebooks not available on the referenced GitHub site, and additional files are missing.

      Python notebooks will be added shortly, and files for mapping in Drops data will be provided at the GitHub site.

      The authors make a detailed description of hair cells and supporting cells that are consistent with previous findings (Figures 2 and 3). By contrast, the analysis of distinct cell types that have not been previously well characterized in zebrafish is somewhat incomplete. Markers are described for cells forming the semicircular canals, including ccn1l1 (Figure 4). The authors report an intriguing pattern of its expression before overt bud formation; however, they provide no detailed expression analysis to support this assertion.

      The authors also identify new markers for subsets of periotic mesenchyme (Figure 6). These include epyc and otos, which mark distinct populations within the mammalian inner ear - cochlea supporting cells, spiral limbus, and ligament, respectively. Identification of the equivalent of the spiral ligament would be of particular interest. However, the expression analysis is not of sufficient resolution to identify which cell types these represent in the zebrafish inner ear.

      Thank you for your input regarding the analysis of the periotic mesenchyme. In the revised version, we will attempt to improve resolution of different populations, first by comparing epyc and otos expression by HCR. It is unclear how to correlate any patterns with structures that have yet to evolve, but we will look for similarities and differences to studies performed in mice (PMID: 37720106).

      Differences in gene expression are reported for lmx1bb mutants. However, none of the single-cell data for mutants is provided, and the table (S8) of differential gene expression is missing. Significantly more detail would be needed to interpret these findings.

      We will include analysis of the lmx1bb mutant data in the revised version and value the suggestions for improved presentation.

    1. Author response:

      We thank the Reviewing Editor and reviewers for their thoughtful and constructive evaluation of our manuscript, Programmed Delayed Splicing: A Mechanism for Timed Inflammatory Gene Expression. We are encouraged that the reviewers found the study valuable, the experimental design strong for the core findings. We appreciate the reviewers’ careful attention to the limits of inference in several parts of the manuscript, and will address these points in a revised version. We especially want to acknowledge that this paper has benefited from the abiding interest in splicing regulation by the editors and reviewers who have meticulously improved nearly every aspect of this multifaceted work in its present state.

      Our planned revisions will focus on five areas. First, we will more carefully evaluate and discuss the extent to which the hybrid-capture strategy may impose position-dependent constraints on apparent splicing behavior, particularly across 5′ and 3′ introns. Second, we will clarify the use of the term “bottleneck introns,” distinguishing descriptive use in the main text from the ranked subsets used in downstream analyses. Third, we will revise the framing of the reporter assays to make explicit that these measure steady-state reporter output and do not, on their own, resolve all downstream kinetic consequences of delayed splicing. Fourth, we will clarify the interpretation of the actinomycin D experiments as providing estimates of intron excision behavior under transcriptional arrest rather than a complete time-resolved model of splicing during TNF induction. Fifth, we will substantially revise the scope and stated limitations of the deep learning-aided interpretations of data in this work.

      Reviewer #1

      We thank Reviewer #1 for the positive assessment of the hybrid-capture strategy, the splice-site reporter experiments, and the potential value of the neural-network-based analysis. We appreciate the reviewer’s view that these approaches help extend a well-established system for studying temporal gene expression in TNF-stimulated macrophages. We address the main concerns raised in the public review below.

      (1) While evidence is provided that these introns are distinct from previously published splicing kinetics studies, “bottleneck” introns are not adequately placed in context for assessment of how they are similar or different.

      We appreciate this point and agree that the current manuscript does not yet place these introns in sufficiently clear context relative to prior literature. Our study builds on foundational work describing regulated changes in splicing kinetics, widespread intron retention, and detained introns as biologically meaningful modes of gene regulation, including transcript-specific regulation of splicing in response to stress (Pleiss, Mol Cell., 2007), widespread functional intron retention in mammals (Braunschweig, Genome Res., 2014), and the definition of detained introns as a distinct class of post-transcriptionally spliced introns (Boutz, Genes Dev., 2015). In revision, we will expand the comparison to previously described classes of delayed or retained introns and clarify more explicitly how the introns studied here are defined in the setting of inducible inflammatory transcripts and their temporal resolution over the course of stimulation. We will also revise the relevant Results and Discussion text so that the distinction is made directly in the manuscript rather than relying on inference from the broader presentation.

      (2) Splicing reporters are a good approach, but the complexities of post-transcriptional gene expression regulation are not adequately addressed.

      We agree that the interpretive limits of the reporter assays should be stated more clearly and consistently. In revision, we will revise the presentation of the minigene experiments to make explicit that these are steady-state reporter assays and therefore do not, on their own, resolve all downstream kinetic consequences of delayed splicing in the endogenous context. At the same time, we believe the assay remains informative because it provides a controlled system in which the contribution of splice donor sequence can be tested directly in matched reporter constructs. In that sense, the reporter experiments are valuable as a reductionist test of whether weak donor sequences are sufficient to alter reporter output, even if they do not fully recapitulate the broader endogenous post-transcriptional environment. We will emphasize that these data support an association between weak donor sites and altered reporter output, while moderating any broader mechanistic claims that extend beyond what the assay directly measures.

      (3) Deep learning models are a potentially powerful tool for identifying novel regulatory sequences; however, their use here is underdeveloped.

      We appreciate this concern and agree that the deep-learning section should be revised substantially. In a revised manuscript, we will clarify the training setup, the definition of the slow-intron subsets used in downstream analyses, and the interpretation of the attribution and motif analyses. Alongside, we believe the assay remains informative because it provides a controlled system in which the contribution of splice donor sequence can be tested directly in matched reporter constructs. In that respect, the reporter experiments are valuable as a reductionist test of whether weak donor sequences are sufficient to alter reporter output, even if they do not fully recapitulate the broader endogenous post-transcriptional environment. We will revise the framing of these results so that they are presented more explicitly as identifying candidate sequence features associated with delayed splicing, rather than as direct evidence of specific causal regulatory mechanisms.

      Reviewer #2

      We thank Reviewer #2 for the thoughtful and detailed comments, and for recognizing the strengths of the measurement strategy and the clarity of the manuscript. We appreciate the reviewer’s view that the study will be of interest to a broad audience, and we agree that several conclusions will be strengthened by additional analysis and clearer explanation. We address the main concerns raised in the public review below.

      (1) Concern regarding possible bias of the hybrid-capture strategy toward introns closer to the 3′ end, and whether 5′ introns should be treated separately in some analyses.

      We thank the reviewer for this careful and important point. We agree that this is a potential limitation of the approach and that it should be addressed more explicitly in the manuscript. Our assay begins with poly(A)-selected RNA and then enriches transcripts of interest through terminal-exon capture, so the molecules analyzed are completed, polyadenylated transcripts rather than nascent partial transcripts. This feature is important for reducing ambiguity arising from incomplete transcription, particularly in the chromatin-associated fraction. At the same time, we agree that for introns near the 5′ end, the assay may have limited power to distinguish very rapid splicing from moderately rapid splicing if excision is largely complete by the time the transcript is fully synthesized and polyadenylated.

      In revision, we will address this concern directly in two ways. First, we will revise the Results and Discussion to clarify that the assay provides a population-level measure of splice completion in completed transcripts and that interpretation is strongest for introns whose excision is not already fully resolved before transcript completion. Second, we will more systematically evaluate whether apparent slow splicing covaries with transcript position, distance from the 3′ end, and intron length, and we will perform sensitivity analyses with and without the most 5′ introns to determine which conclusions are robust to these positional constraints. We will also examine transcript coverage patterns in greater detail to better assess the extent to which library construction and  cDNA generation may contribute to apparent positional bias. Our preliminary inspection suggests that transcript position is not the sole determinant of the observed heterogeneity, but we agree that a more explicit treatment of this issue is warranted in the revised manuscript.

      (2) Request for more detailed discussion of alternative library-construction choices.

      We appreciate this suggestion and agree that the revised manuscript would benefit from a fuller discussion of the strengths and limitations of the current enrichment strategy. We chose poly(A) selection followed by terminal-exon capture because this design enriches completed transcripts of interest and reduces ambiguity from nascent partial transcripts, which is particularly important in the chromatin-associated fraction. This approach also provides greater read depth over the selected inflammatory transcripts, enabling more informative intron-level comparisons within the targeted dataset. In revision, we will clarify this rationale more explicitly in the manuscript. We will also discuss the tradeoffs of this design relative to alternative exon-targeting strategies and how those alternatives might provide different, but complementary, views of splicing kinetics.

      (3) Questions regarding biological replicates, error bars, and statistical analysis in Figure 1C and other plots.

      We agree that the replicate structure and intended interpretation of these plots should be clarified more explicitly. In revision, we will revise the figure legends and Methods to distinguish panels that display a single bulk RNA-seq time course (for example, Figure 1C) from panels that summarize distributions across many introns (for example, Figure 2 and Supplementary Figure 6). We will also add statistical comparisons where they are most appropriate and informative, such as in sequence-feature comparisons like Supplementary Figure 4C, while making clear that some CoSI panels are intended as descriptive summaries of intron-level heterogeneity rather than replicate-based inferential plots.

      (4) Concern that intron half-lives may be time-dependent during TNF induction, and that the logic of the actinomycin D measurements is therefore unclear.

      We appreciate this point and agree that the manuscript should distinguish more clearly between two related but non-identical quantities: the CoSI trajectories observed during ongoing TNF induction, and the interruption-based half-life estimates derived from actinomycin D treatment. The actinomycin D experiments were performed using multiple post-treatment timepoints, but they were designed to estimate intron excision behavior after transcriptional arrest under a defined set of conditions, rather than to measure whether an individual intron’s effective splicing rate changes across all phases of the TNF response. We agree that these estimates should therefore be interpreted as constrained measurements under the assay conditions used, rather than as a complete time-resolved model of splicing kinetics during induction. In revision, we will clarify this point in the Results, Methods, and Discussion, and we will more explicitly acknowledge that effective splicing behavior could vary across the induction time course.

      (5) Concern that the interpretation of Supplementary Figure 6 is unclear, particularly why delayed splicing in non-immediate groups appears to peak later rather than at the earliest time points.

      We appreciate this point and agree that the current presentation of Supplementary Figure 6 does not explain this behavior clearly enough. Our interpretation is not that delayed splicing is the sole determinant of early versus later induction classes. Rather, the earliest time points reflect a combination of transcriptional induction timing and RNA processing state. In this framework, the dip in CoSI shortly after stimulation reflects the appearance of newly induced, incompletely spliced transcripts, and the later kinetic groups appear to recover from this dip more slowly than the immediate-early group. Thus, the strongest signal of delayed splicing may become most apparent only after sufficient transcript accumulation, rather than necessarily at the very earliest time point. In revision, we will revise the text to make this logic clearer and will consider a more intuitive visualization of these group-specific CoSI trajectories.

      (6) Concern that the deep-learning setup does not make clear whether the model input and output are time-dependent.

      We appreciate this concern and agree that the current manuscript does not explain the model setup clearly enough. Briefly, we will clarify the role of the three TNF timepoints in model training, including the fact that these outputs were modeled jointly and that time itself was not provided as an explicit input to the model. We will also revise the Results and Methods so that the scope and interpretation of the resulting analyses are more explicit.

      Reviewer #3

      We thank Reviewer #3 for the positive assessment of the targeted capture design, the evaluation of overall interest of the findings, and the improvements in the current version. We appreciate the reviewer’s view that the study is intriguing and that the manuscript has been strengthened in revision. We agree, however, that the manuscript should more clearly distinguish what is directly demonstrated from what remains mechanistically unresolved. We address the main concerns raised in the public review below.

      (1) The study still does not fully resolve the downstream consequences of delayed splicing, including whether bottleneck introns lead primarily to delayed production of mature transcripts, reduced productive transcript output, or some combination of the two.

      We agree with this assessment. The current data do not fully resolve whether delayed splicing primarily delays mature transcript production, reduces productive transcript output, or reflects some combination of the two. In revision, we will further moderate the framing of the downstream consequences of delayed splicing and will revise the Abstract, Results, and Discussion to make clear that the present data do not fully distinguish among delayed mature transcript production, reduced productive transcript output, or a combination of both. We will ensure that the manuscript consistently presents these possibilities as alternatives not fully resolved by the current data.

      (2) The minigene reporter assays measure a steady-state level of the transcript and do not provide direct insight into kinetics.

      We agree and will revise the manuscript to make this limitation explicit throughout. In particular, we will ensure that the reporter assays are described consistently as steady-state reporter assays that support an association between splice donor strength and altered reporter output, while avoiding stronger claims that they directly resolve endogenous splicing kinetics or downstream transcript fate.

      (3) Given that the detailed analyses were performed on a selected subset of inflammation-induced transcripts, a broader evolutionary interpretation should be restrained.

      We agree that the broader evolutionary and mechanistic framing should be more carefully defined. In revision, we will restrain these interpretations so that they remain closely aligned with the inflammation-focused and targeted-transcript scope of the current study, and we will moderate language that extends beyond what is directly supported by the present dataset.

      Closing Remarks

      We again thank the reviewers for their constructive comments. We believe that the planned revisions will strengthen the manuscript by clarifying the scope of the mechanistic conclusions, sharpening the interpretation of the experimental approaches, and more carefully defining the role of the computational analyses. We appreciate the opportunity to revise the work and to provide this provisional response to accompany the Reviewed Preprint.

    1. Author response:

      Reviewer #1<br /> - The results showing that hh and vvl drive tracheal invaginaton independently of trh are reported in Figure 5 of (Matsuda et al. 2015 eLife 4:e09646).

      Reviewer #2

      Many images primarily show lateral views of whole embryos, which can make it difficult to fully assess some phenotypes; higher-magnification or sectional views would enhance clarity. There are also some minor inconsistencies in the description of invagination phenotypes, particularly regarding whether all trh+ cells remain in a 2D plane versus indications of partial invagination in hh vvl double mutants blocking apoptosis, which would benefit from further clarification.

      The data in our previous eLife publication (DOI: 10.7554/eLife.09646)1 were mostly projection views. Therefore, it is hard to conclude if the airway progenitors of hh vvl double mutants failed to invaginate or they invaginated to form sacs. We will provide magnified views of the progenitor invagination in hh vvl double mutants and describe the degrees of their invagination phenotypes.

      Reviewer #1

      The results showing dpp requirement for trh maintenance are partially reported in Figure 6 of (Matsuda 2015 eLife 4:e09646).

      Reviewer #2

      Finally, some statements in the abstract, especially regarding the role of grn, are not directly supported by data in this study and could be better aligned with the scope of the presented results.

      trh-lacZ (1-eve-1) has been used as the earliest and the strongest enhancer trap line to mark the airway primordia and the airway progenitors. Perdurance of beta-galactocidase proteins makes it difficult to conclude if the marker signals result from the active transcriptional state of the trh locus. In our previous eLife publication we showed that Trh proteins and trh_transcripts are not detectable in _H99 grn hh vvl quadruple mutants and in grn hh vvl triple mutants (Figure 5H and Figure 5-figure supplement 2A of DOI: 10.7554/eLife.09646, respectively)1, although trh-LacZ signals are detected in grn hh vvl triple mutants.

      Similarly, although we previously showed trh-LacZ expression in dpp mutant combinations, Figure 2 in the current manuscript, shows that even strong trh-LacZ signals do not always correlate with trh transcripts in dpp mutants. Therefore, in the current manuscript we included the data of dpp-driven positive regulation of trh transcripts at later stages since they have not been shown before.

      Assessments and advices of the Editors and the Reviewers are indispensable for improving the manuscript. We will address all the Reviewers comments (Weakness of Public review, major and minor issues of Recommendations for the authors) both experimentally and in the text.

      Sincerely yours,

      Christos Samakovlis on behalf of all authors

      • (1) Matsuda, R., Hosono, C., Samakovlis, C. & Saigo, K. Multipotent versus differentiated cell fate selection in the developing Drosophila airways. eLife 4 (2015).
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zacharia and colleagues investigate the role of the C-terminus of IFT172 (IFT172c), a component of the IFT-B subcomplex. IFT172 is required for proper ciliary trafficking and mutations in its C-terminus are associated with skeletal ciliopathies. The authors begin by performing a pull-down to identify binding partners of His-tagged CrIFT172968-C in Chlamydomonas reinhardtii flagella. Interactions with three candidates (IFT140, IFT144, and a UBX-domain containing protein) are validated by AlphaFold Multimer with the IFT140 and IFT144 predictions in agreement with published cryo-ET structures of anterograde and retrograde IFT trains. They present a crystal structure of IFT172c and find that a part of the C-terminal domain of IFT172 resembles the fold of a non-canonical U-box domain. As U-box domains typically function to bind ubiquitin-loaded E2 enzymes, this discovery stimulates the authors to investigate the ubiquitin-binding and ubiquitination properties of IFT172c. Using in vitro ubiquitination assays with truncated IFT172c constructs, the authors demonstrate partial ubiquitination of IFT172c in the presence of the E2 enzyme UBCH5A. The authors also show a direct interaction of IFT172c with ubiquitin chains in vitro. Finally, the authors demonstrate that deletion of the U-box-like subdomain of IFT172 impairs ciliogenesis and TGFbeta signaling in RPE1 cells.

      However, some of the conclusions of this paper are only partially supported by the data, and presented analyses are potentially governed by in vitro artifacts. In particular, the data supporting autoubiquitination and ubiquitin-binding are inconclusive. Without further evidence supporting a ubiquitin-binding role for the C-terminus, the title is potentially misleading.

      Strengths:

      (1) The pull-down with IFT172 C-terminus from C. reinhardtii cilia lysates is well performed and provides valuable insights into its potential roles.

      (2) The crystal structure of the IFT172 C-terminus is of high quality.

      (3) The presented AlphaFold-multimer predictions of IFT172c:IFT140 and IFT172c:IFT144 are convincing and agree with experimental cryo-ET data.

      Weaknesses:

      (1) The crystal structure of HsIFT172c reveals a single globular domain formed by the last three TPR repeats and C-terminal residues of IFT172. However, the authors subdivide this globular domain into TPR, linker, and U-box-like regions that they treat as separate entities throughout the manuscript. This is potentially misleading as the U-box surface that is proposed to bind ubiquitin or E2 is not surface accessible but instead interacts with the TPR motifs. They justify this approach by speculating that the presented IFT172c structure represents an autoinhibited state and that the U-box-like domain can become accessible following phosphorylation. However, additional evidence supporting the proposed autoinhibited state and the potential accessibility of the U-box surface following phosphorylation is needed, as it is not tested or supported by the current data.

      We thank the reviewer for this comment. IFT172C contains TPR region and Ubox-like region, which are admittedly tightly bound to each other. While there is a possibility that this region functions and exists as one domain, below are the reasons why we chose to classify these regions as two different domains.

      (1) TPR and Ubox-like regions are two different structural classes

      (2) TPR region is linked to Ubox-like region via a long linker which seems poised to regulate the relative movement between these regions.

      (3) Many ciliopathy mutations are mapped to the interface of TPR region and the Ubox region hinting at a regulatory mechanism governed by this interface.

      That said, we agree that the proposed autoinhibited state and its potential relief by phosphorylation remains a hypothesis that requires experimental validation. We have revised the manuscript to present this more clearly as a speculative model rather than an established mechanism. We clearly acknowledge this limitation on pg. 16-17 of the revised discussion: ‘The IFT172 U-box domain appears to be in an auto-inhibited state in our crystal structure of HsIFT172C2 (Fig. 2E), potentially explaining the absence of a robust auto-ubiquitination activity in-vitro. This structural inhibition is reminiscent of the RING ubiquitin ligase CBL [59], where phosphorylation and substrate binding trigger a conformational change that activates ligase activity [59,75]. Intriguingly, the phosphosite database [76] lists four residues (T1533, S1549, T1689, Y1691) at the U-box/TPR interface as phosphorylation sites (Fig. S2D). Phosphorylation of these residues could potentially alleviate the auto-inhibited state, suggesting a possible regulatory mechanism. Furthermore, a 30-residue linker connects the U-box domain to the last TPR of IFT172, likely providing significant conformational flexibility (Fig. 2A-B). This flexibility may be functionally crucial for the U-box domain, allowing it to adopt different conformations as needed for its various roles. However, we note that the proposed autoinhibition model and its potential regulation by phosphorylation remain hypothetical and require future experimental validation.

      (2) While in vitro ubiquitination of IFT172 has been demonstrated, in vivo evidence of this process is necessary to support its physiological relevance.

      We thank the reviewer for this important point. We agree that in vivo evidence of IFT172 ubiquitination would strengthen the physiological relevance of our findings. While our current study focuses on the in vitro characterization of this activity, we have revised the manuscript to more clearly state that demonstration of IFT172 ubiquitination activity in cells, including identification of bona fide substrates, is required to establish its physiological significance (p. 16). We consider this an important direction for future studies.

      (3) The authors describe IFT172 as being autoubiquitinated. However, the identified E2 enzymes UBCH5A and UBCH5B can both function in E3-independent ubiquitination (as pointed out by the authors) and mediate ubiquitin chain formation in an E3-independent manner in vitro (see ubiquitin chain ladder formation in Figure 3A). In addition, point mutation of known E3-binding sites in UBCH5A or TPR/U-box interface residues in IFT172 has no effect on the mono-ubiquitination of IFT172c1. Together, these data suggest that IFT172 is an E3-independent substrate of UBCH5A in vitro. The authors should state this possibility more clearly and avoid terminology such as "autoubiquitination" as it implies that IFT172 is an E3 ligase, which is misleading. Similarly, statements on page 10 and elsewhere are not supported by the data (e.g. "the low in vitro ubiquitination activity exhibited by IFT172" and "ubiquitin conjugation occurring on HsIFT172C1 in the presence of UBCH5A, possibly in coordination with the IFT172 U-box domain").

      We now consider this possibility and tone down our statements about the autoubiquitination activity of IFT172 in both the abstract and results/discussion parts of the revised version of the manuscript. We no longer refer to IFT172 as having auto-ubiquitination activity in the manuscript.

      (4) Related to the above point, the conclusion on page 11, that mono-ubiquitination of IFT172 is U-box-independent while polyubiquitination of IFT172 is U-box-dependent appears implausible. The authors should consider that UBCH5A is known to form free ubiquitin chains in vitro and structural rearrangements in F1715A/C1725R variants could render additional ubiquitination sites or the monoubiquitinated form of IFT172 inaccessible/unfavorable for further processing by UBCH5A.

      We agree and the conclusion on pg. 11 has now been changed to: Therefore, while mutations in the IFT172 U-box domain affect the formation of higher molecular weight ubiquitin conjugates, the prominent mono-ubiquitination of IFT172 is likely attributable to the E3-independent activity of UbcH5a, as this event is not impacted by these U-box mutations, rather than indicating an intrinsic auto-ubiquitination capacity of IFT172 itself.

      (5) Identification of the specific ubiquitination site(s) within IFT172 would be valuable as it would allow targeted mutation to determine whether the ubiquitination of IFT172 is physiologically relevant. Ubiquitination of the C1 but not the C2 or C3 constructs suggests that the ubiquitination site is located in TPRs ranging from residues 969-1470. Could this region of TPR repeats (lacking the IFT172C3 part) suffice as a substrate for UBCH5A in ubiquitination assays?

      We thank the reviewer for raising this important point about ubiquitination site identification. While not included in our manuscript, we did perform mass spectrometry analysis of ubiquitination sites using wild-type IFT172 and several mutants (P1725A, C1727R, and F1715A). As shown in Author response image 1, we detected multiple ubiquitination sites across these constructs. The wild-type protein showed ubiquitination at positions K1022, K1237, K1271, and K1551, while the mutants displayed slightly different patterns of modification. However, we should note that the MS intensity signals for these ubiquitinated peptides were relatively low compared to unmodified peptides, making it difficult to draw strong conclusions about site specificity or physiological relevance.

      Author response image 1.

      Consistent with the reviewer's suggestion, all detected ubiquitination sites fall within the TPR-containing region (residues 1022-1551), which is present in the C1 construct but absent from C2 and C3, explaining the construct-dependent ubiquitination pattern. We did not test the TPR region alone as a UBCH5A substrate, but this would be an informative experiment for future studies.

      (6) The discrepancy between the molecular weight shifts observed in anti-ubiquitin Western blots and Coomassie-stained gels is noteworthy. The authors show the appearance of a mono-ubiquitinated protein of ~108 kDa in anti-ubiquitin Western blots. However, this molecular weight shift is not observed for total IFT172 in the corresponding Coomassie-stained gels (Figures 3B, D, F). Surprisingly, this MW shift is visible in an anti-His Western blot of a ubiquitination assay (Fig 3C). Together, this raises the concern that only a small fraction of IFT172 is being modified with ubiquitin. Quantification of the percentage of ubiquitinated IFT172 in the in vitro experiments could provide helpful context.

      We acknowledge that the ubiquitin conjugation of IFT172 in vitro is weak, as stated in the manuscript (p. 16). The discrepancy between anti-ubiquitin Western blots and Coomassie-stained gels is consistent with only a small fraction of IFT172 being modified, which is expected given that the reaction likely reflects E3-independent ubiquitination by UBCH5A rather than a robust enzymatic activity of IFT172 itself. The anti-His Western blot (Fig. 3C) is more sensitive than Coomassie staining, explaining why the shift is visible there but not on Coomassie. We have not performed formal quantification of the ubiquitinated fraction, but based on the Coomassie data, we estimate it to be a minor proportion of total IFT172, consistent with the toned-down conclusions in our revised manuscript. The identification of physiological substrates and in vivo validation will be important future directions to establish the biological relevance of these observations.

      (7) The authors propose that IFT172 binds ubiquitin and demonstrate that GST-tagged HsIFT172C2 or HsIFT172C3 can pull down tetra-ubiquitin chains. However, ubiquitin is known to be "sticky" and to have a tendency for weak, nonspecific interactions with exposed hydrophobic surfaces. Given that only a small proportion of the ubiquitin chains bind in the pull-down, specific point mutations that identify the ubiquitin-binding site are required to convincingly show the ubiquitin binding of IFT172.

      We appreciate the reviewer's point regarding the potential for non-specific ubiquitin interactions and the value of mutational analysis for confirming specificity. While further mutagenesis of the predicted ubiquitin-binding interface was not performed for this revision, we note that our data show comparable tetra-ubiquitin pull-down by both the larger HsIFT172C2 construct and, importantly, the isolated HsIFT172C3 U-box domain itself (Fig. 4D). This localization of binding to the smaller U-box domain, coupled with our AlphaFold model predicting a specific interface with ubiquitin (Fig. 4E-F) and the observation that a mutation elsewhere (D1605R, Fig. 4C) does not abrogate this binding, collectively suggest a degree of specificity. We have revised the manuscript to more cautiously present these findings and acknowledge the need for future studies to definitively map the binding site. Specifically, we have now toned down the conclusion in the section on pg. 12-13 of the revised manuscript including a toned down heading: “IFT172 U-box domain pulls down ubiquitin in vitro”.

      (8) The authors generated structure-guided mutations based on the predicted Ub-interface and on the TPR/U-box interface and used these for the ubiquitination assays in Fig 3. These same mutations could provide valuable insights into ubiquitin binding assays as they may disrupt or enhance ubiquitin binding (by relieving "autoinhibition"), respectively. Surprisingly, two of these sites are highlighted in the predicted ubiquitin-binding interface (F1715, I1688; Figure 4E) but not analyzed in the accompanying ubiquitin-binding assays in Figure 4.

      We thank the reviewer for emphasizing the importance of mutational analysis to confirm the specificity of ubiquitin binding and for specifically inquiring about residues like F1715 and I1688 at the predicted ubiquitin interface. We tested purified HsIFT172C1 constructs containing the F1715A mutation (along with P1725A and C1727R variants) in pull-down assays with GST-Ubiquitin, see Author response image 2.

      Author response image 2.

      However, these experiments did not reveal a conclusive difference in ubiquitin binding for any of the tested variants compared to wild-type IFT172. The I1688A mutant, unfortunately, yielded insoluble protein and could not be evaluated. It is conceivable that the F1715A mutation was not disruptive enough to significantly alter binding, and future studies with different substitutions might be more informative. Nevertheless, our observations that the isolated HsIFT172C3 U-box domain itself pulls down tetra-ubiquitin (Fig. 4D), that our AlphaFold model predicts a specific interface (Fig. 4E-F), and that a mutation elsewhere (D1605R, Fig. 4C) does not abrogate this binding, collectively suggest a degree of specificity. We have revised the manuscript to present these ubiquitin binding findings cautiously, acknowledging the need for further investigation to definitively map the binding site and its functional relevance.

      (9) If IFT172 is a ubiquitin-binding protein, it might be expected that the pull-down experiments in Figure S1 would identify ubiquitin, ubiquitinated proteins, or E2 enzymes. These were not observed, raising doubt that IFT172 is a ubiquitin-binding protein.

      We acknowledge that the absence of ubiquitin or ubiquitinated proteins in our pull-down/MS experiment (Fig. S1) could raise questions about the ubiquitin-binding capacity of IFT172. However, several technical factors likely explain this. First, IFT172 appears to bind ubiquitin with low affinity, as indicated by our in vitro pull-downs and the AF-predicted interface. Second, we used extensive washes to remove non-specific interactors, which would also remove weak but potentially genuine ubiquitin interactions. Third, we did not include ubiquitination-preserving reagents such as NEM in our pull-down buffers, exposing ubiquitinated proteins to DUB-mediated deubiquitination during the experiment. These factors combined would strongly select against the detection of ubiquitin-related interactors under our experimental conditions.

      (10) The cell-based experiments demonstrate that the U-box-like region is important for the stability of IFT172 but does not demonstrate that the effect on the TGFb pathway is due to the loss of ubiquitin-binding or ubiquitination activity of IFT172.

      We acknowledge that our current data cannot definitively distinguish whether the TGFβ pathway defects arise from reduced IFT172 protein stability or from specific loss of ubiquitin-related functions of the U-box domain. Our experiments demonstrate that the U-box region is required for both IFT172 stability and proper TGFβ signaling, but we agree that establishing a direct mechanistic link between ubiquitin-binding/conjugation and signaling would require additional experiments such as point mutations that selectively disrupt ubiquitin-related activity without affecting protein stability. We have revised the discussion (p. 18-19) to more clearly acknowledge this limitation. Addition to text: “However, we note that our current experiments cannot distinguish whether these signaling effects result specifically from loss of ubiquitin-related functions of the U-box domain or from the reduced levels of functional IFT172 protein in the heterozygous U-box deleted cells. Targeted point mutations that selectively disrupt ubiquitin binding without affecting protein stability would be required to resolve this question.”

      (11) The challenges in experimentally validating the interaction between IFT172 and the UBX-domain-containing protein are understandable. Alternative approaches, such as using single domains from the UBX protein, implementing solubilizing tags, or disrupting the predicted binding interface in Chlamydomonas flagella pull-downs, could be considered. In this context, the conclusion on page 7 that "The uncharacterized UBX-domain-containing protein was validated by AF-M as a direct IFT172 interactor" is incorrect as a prediction of an interaction interface with AF-M does not validate a direct interaction per se.

      We agree with the reviewer that our AlphaFold-Multimer (AF-M) predictions alone do not constitute experimental validation of a direct interaction. We appreciate the reviewer's understanding of the technical challenges in validating this interaction experimentally. We have revised our text (p. 7) to state that "The uncharacterized UBX-domain-containing protein was predicted by AF-M as a potential direct IFT172 interactor" and discuss the AF-M predictions as computational evidence that suggests, but does not prove, a direct interaction.

      Reviewer #2 (Public review):

      Summary:

      Cilia are antenna-like extensions projecting from the surface of most vertebrate cells. Protein transport along the ciliary axoneme is enabled by motor protein complexes with multimeric so-called IFT-A and IFT-B complexes attached. While the components of these IFT complexes have been known for a while, precise interactions between different complex members, especially how IFT-A and IFT-B subcomplexes interact, are still not entirely clear. Likewise, the precise underlying molecular mechanism in human ciliopathies resulting from IFT dysfunction has remained elusive.

      Here, the authors investigated the structure and putative function of the to-date poorly characterised C-terminus of IFT-B complex member IFT172 using alpha-fold predictions, crystallography and biochemical analyses including proteomics analyses followed by mass spectrometry, pull-down assays, and TGFbeta signalling analyses using chlamydomonas flagellae and RPE cells. The authors hereby provide novel insights into the crystal structure of IFT172 and identify novel interaction sites between IFT172 and the IFT-A complex members IFT140/IFT144. They suggest a U-box-like domain within the IFT172 C-terminus could play a role in IFT172 auto-ubiquitination as well as for TGFbeta signalling regulation.

      As a number of disease-causing IFT72 sequence variants resulting in mammalian ciliopathy phenotypes in IFT172 have been previously identified in the IFT172 C-terminus, the authors also investigate the effects of such variants on auto-ubiquitination. This revealed no mutational effect on mono-ubiquitination which the authors suggest could be independent of the U-box-like domain but reduced overall IFT172 ubiquitination.

      Strengths:

      The manuscript is clear and well written and experimental data is of high quality. The findings provide novel insights into IFT172 function, IFT complex-A and B interactions, and they offer novel potential mechanisms that could contribute to the phenotypes associated with IFT172 C-terminal ciliopathy variants.

      Weaknesses:

      Some suggestions/questions are included in the comments to the authors below.

      Reviewer #3 (Public review):

      Summary:

      Zacharia et al report on the molecular function of the C-terminal domain of the intraflagellar transport IFT-B complex component IFT172 by structure determination and biochemical in vitro and cell culture-based assays. The authors identify an IFT-A binding site that mediates a mutually exclusive interaction to two different IFT-A subunits, IFT144 and IFT140, consistent with interactions suggested in anterograde and retrograde IFT trains by previous cryo-electron tomography studies. Additionally, the authors identify a U-box-like domain that binds ubiquitin and conveys ubiquitin conjugation activity in the presence of the UbcH5a E2 enzyme in vitro. RPE1 cell lines that lack the U-box domain show a reduction in ciliation rate with shorter cilia, and heterozygous cells manifest TGF-beta signaling defects, suggesting an involvement of the U-box domain in cilium-dependent signaling.

      Strengths:

      (1) The structural analyses of the C-terminal domain of IFT172 combine crystallography with structure prediction using state-of-the-art algorithms, which gives high confidence in the presented protein structures. The structure-based predictions of protein interactions are validated by further biochemical experiments to assess the specific binding of the IFT172 C-terminal domains with other proteins.

      (2) The finding that the IFT172 C-terminus interactions with the IFT-A components IFT140 and IFT144 appear mutually exclusive confirm a suggested role in mediating the binding of IFT-B to IFT-A in anterograde and retrograde IFT trains, which is of very high scientific value.

      (3) The suggested molecular mechanism of IFT train coordination explains previous findings in Chlamydomonas IFT172 mutants, in particular an IFT172 mutant that appeared defective in retrograde IFT, as well as mutations identified in ciliopathy patients.

      (4) The identification of other IFT172 interactors by unbiased mass spectrometry-based proteomics is very exciting. Analysis of stoichiometries between IFT components suggests that these interactors could be part of IFT trains, either as cargos or additional components that may fulfill interesting functions in cilia and flagella.

      (5) The authors unexpectedly identify a U-box-like fold in the IFT172 C-terminus and thoroughly dissect it by sequence and mutational analyses to reveal unexpected ubiquitin binding and potential intrinsic ubiquitination activity.

      (6) The overall data quality is very high. The use of IFT172 proteins from different organisms suggests a conserved function.

      Weaknesses:

      (1) Interaction studies were carried out by pulldown experiments, which identified more IFT172 interaction partners. Whether these interactions can be seen in living cells remains to be elucidated in subsequent studies.

      We agree with the reviewer that validation of protein-protein interactions in living cells provides important physiological context. While our pulldown experiments have identified several promising interaction partners and the AF-M predictions provide computational support for these interactions, we acknowledge that demonstrating these interactions in vivo would strengthen our findings. However, we believe our current biochemical and structural analyses provide valuable insights into the molecular basis of IFT172's interactions, laying important groundwork for future cell-based studies.

      (2) The cell culture-based experiments in the IFT172 mutants are exciting and show that the U-box domain is important for protein stability and point towards involvement of the U-box domain in cellular signaling processes. However, the characterization of the generated cell lines falls behind the very rigorous analysis of other aspects of this work.

      We thank the reviewer for noting that the characterization of our cell lines could be more rigorous. In the revised version of the manuscript, we have addressed this by providing additional validation data for all four engineered RPE1 cell lines. First, we performed Sanger sequencing to confirm precise in-frame integration of the GFP tag at the targeted loci and to exclude unintended insertions or deletions (indels), both for the full-length IFT172-eGFP lines (Fig. S6) and for the IFT172∆U-box-eGFP lines (Fig. S7). Second, we performed anti-IFT172 immunoblotting on all four cell lines alongside parental RPE1 cells, confirming expression of both the full-length and U-box-truncated IFT172 proteins (Fig. S8). Notably, the immunoblot revealed reduced steady-state levels of the IFT172∆U-box protein compared to full-length IFT172, providing direct biochemical evidence that loss of the U-box domain compromises IFT172 protein stability consistent with the ciliogenesis phenotype described in the main text. Together, these data verify the integrity of the edited loci at both the genomic and protein levels, and strengthen the validation of the cellular models used in this study.

      Overall, the authors achieved to characterize an understudied protein domain of the ciliary intraflagellar transport machinery and gained important molecular insights into its role in primary cilia biology, beyond IFT. By identifying an unexpected functional protein domain and novel interaction partners the work makes an important contribution to further our understanding of how ciliary processes might be regulated by ubiquitination on a molecular level. Based on this work it will be important for future studies in the cilia community to consider direct ubiquitin binding by IFT complexes.

      Conceptually, the study highlights that protein transport complexes can exhibit additional intrinsic structural features for potential auto-regulatory processes. Moreover, the study adds to the functional diversity of small U-box and ubiquitin-binding domains, which will be of interest to a broader cell biology and structural biology audience.

      Additional comments:

      The authors investigate the consequences of the U-box deletion on ciliary TGF-beta signaling. While a cilium-dependent effect of TGF-beta signaling on the phosphorylation of SMAD2 has been demonstrated, the precise function of cilia in AKT signaling has not been fully established in the field. Therefore, the relevance of this finding is somewhat unclear. It may help to discuss relevant literature on the topic, such as Shim et al., PNAS, 2020.

      We appreciate the reviewer's comment highlighting that the role of primary cilia in AKT signaling is not as well established as for SMAD2/3. However, we note that a direct functional link between AKT signaling and ciliogenesis has been demonstrated, showing that AKT regulates ciliogenesis initiation through a Rab11-effector switch mechanism (Walia et al., 2019; PMID: 31204173, co-authored by the corresponding author of this study). Furthermore, Shim et al. (PMID: 33753495) demonstrated a cilia-dependent reciprocal activation of AKT1 and SMAD2/3. In the revised manuscript (p. 19, ref. 97), we have expanded the discussion to cite these studies and provide a clearer literature context for the cilia-AKT connection, while acknowledging that the precise mechanism by which the IFT172 U-box domain influences AKT activation requires further investigation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Points for the discussion:

      (1) The discussion should mention that IFT-A subunits IFT121, IFT122 and IFT144 share a similar domain organization to IFT172 (TPRs terminating in Zn-finger-like domains). Do the authors consider these as potential ubiquitin-binding proteins with E3 ligase activity? The possibility that these Zn-finger-like regions share a common origin, and function to stabilize the proteins or mediate IFT subunit interactions without a role in ubiquitin biology should be considered.

      We appreciate this important point. We agree that the shared domain architecture across IFT121, IFT122, IFT144, and IFT172 raises the question of whether these C-terminal domains primarily serve structural rather than ubiquitin-related roles. We have added a discussion paragraph (p. 16) acknowledging that a structural/stabilizing function is the more parsimonious explanation, while noting that whether IFT172's U-box-like domain has additionally acquired ubiquitin-related activity remains an open question.

      (2) From their modeling data, do the authors have an explanation for why a substitution as conservative as D1605E would cause disease?

      The D1605E substitution maps to the IFT172-IFT-A interaction interface (Fig. 1F). While this is a conservative change, D1605 is located at a tightly packed protein-protein interface where even the addition of a single methylene group (the difference between aspartate and glutamate) could introduce steric clashes with residues of IFT140 or IFT144, or alter the precise geometry of hydrogen bonds or salt bridges critical for the interaction. Unfortunately, this level of detail is beyond the resolution of AlphaFold models. However, the fact that this residue is positioned directly at the binding interface provides a plausible structural rationale for its pathogenicity.

      (3) The authors speculate that the L1615P mutation in the Chlamydomonas fla11 strain causes a faulty switch to retrograde IFT and this provides a molecular basis for the retrograde IFT phenotype. However, because the mutation is also within the IFT144 binding site, why is anterograde IFT also not affected?

      The fla11 L1615P mutation resides in helix αA, which participates in both IFT144 (anterograde) and IFT140 (retrograde) interactions. The predominantly retrograde phenotype can be rationalized by the fundamentally different structural roles of the IFT172 C-terminus in anterograde versus retrograde trains. In anterograde trains, the IFT172 C-terminus acts as a flexible tether in stoichiometric excess (2:1 IFT-B:IFT-A ratio), providing an avidity effect that likely compensates for reduced binding affinity caused by L1615P (Lacey et al., 2023). Additional lateral interactions between IFT-B subunits further stabilize the anterograde polymer independently of the IFT172-IFT144 link. In contrast, the retrograde train requires the IFT172 C-terminus to adopt a rigid, resolved conformation that is integral to the IFT-A dimeric interface, with no redundant lateral interactions to compensate (Lacey et al., 2024). The helix-breaking L1615P mutation would specifically disrupt this precise structural requirement, explaining the selective retrograde IFT defect in fla11. We have added this discussion to the revised manuscript (p. 16).

      Minor:

      (1) On page 5, the authors describe the fla11 phenotypes including accumulation of IFT particles at the tip and accumulation of ubiquitinated proteins in the cilium. Could the authors please expand on how this suggests that IFT172 could be involved in ciliary ubiquitination events and discuss an alternative scenario of impaired assembly of functional retrograde IFT in this strain leading to accumulation of ubiquitinated proteins?

      In the revised manuscript (p. 16), we have expanded the discussion of the fla11 phenotype to address this point. We now discuss how the distinct structural roles of the IFT172 C-terminus in anterograde versus retrograde trains explain the selective retrograde IFT defect in fla11, and explicitly note that the accumulation of ubiquitinated proteins in fla11 cilia may reflect impaired retrograde IFT-mediated clearance rather than a direct role of IFT172 in ciliary ubiquitination.

      (2) The authors should also expand on the literature of known UBX-IFT interactions in their manuscript (e.g. Raman et al. PMID 26389662).

      We have expanded the discussion of UBX-IFT interactions in the revised manuscript (p. 7) by citing the work of Raman et al. (PMID 26389662), who identified a direct interaction between the UBX-domain protein UBXN10 and IFT-B via CLUAP1/IFT38 for VCP-mediated regulation of IFT complex integrity. This provides important context for our identification of a UBX-domain protein as an IFT172 interactor.

      (3) On page 11, I1688 is incorrectly referred to as I688.

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) The finding that the interaction with IFT140/144 is mutually exclusive is very interesting. Could you speculate on or do you have any data regarding the effects to the overall IFT-complex conformation and downstream biological effects depending on which partner is bound?

      I am not a structural biologist so this may be an irrelevant/impossible-to-answer question: I was also wondering as Ref 46 has shown that the dynein-2 motor complex binds to the edge of IFT-B2 (for assembled trains): Could the IFT172 C-terminus be involved here or somehow influence this interaction? In your mass spec data from Cr cilia using CrIFT172_968-C you don`t mention pulling down dynein-2 components so there doesn`t seem to be a direct interaction, but could the IFT-B2 conformation depend on if IFT172 has bound IFT-140 or IFT144 and hence this interaction influence the dynein-2 binding?

      We thank the reviewer for this insightful question. Based on recent cryo-ET structures of anterograde and retrograde IFT trains (Lacey et al., 2023; 2024), the switch from IFT144 to IFT140 binding fundamentally changes IFT172's structural role. In anterograde trains, the IFT172 C-terminus acts as a flexible tether tolerating the 2:1 IFT-B:IFT-A stoichiometry and permitting long polymer formation. In retrograde trains, it adopts a rigid conformation integral to the IFT-A dimeric interface, driving the formation of discrete retrograde units with distinct architecture.

      Regarding Dynein-2: while IFT172 does not directly bind Dynein-2 (consistent with our MS data), the reviewer's intuition is correct that IFT172's binding partner influences Dynein-2 association. In anterograde trains, autoinhibited Dynein-2 binds a composite surface formed between adjacent IFT-B2 repeats. When IFT172 switches to IFT140 at the ciliary tip, the resulting train depolymerization destroys this composite binding site, releasing Dynein-2 from its cargo mode to function as an active retrograde motor. The IFT172 binding switch may thus indirectly acts as a structural checkpoint for Dynein-2 activation.

      (2) The data provided regarding TGFbeta signalling effects in cells with heterozygous U-box-like domain deletions is interesting. While secondary effects of impaired ciliogenesis due to homozygous deletion of the U-box-like domain can cause difficulties to analysing cell signalling effects, it would still be interesting to check the effects of bi-allelic human IFT172 disease variants in this region as well (the human disease phenotype is recessive and human mutations are likely hypomorphic variants still allowing for ciliogenesis).

      Also, while there may be secondary effects, it would still be interesting to check homozygous U-box deleted cells as an aggravated effect would further support the data from the het cells.

      We agree that testing bi-allelic human disease variants would strengthen the physiological relevance of our findings. While generating knock-in RPE1 lines was beyond the scope of this revision, we have obtained preliminary data from patient-derived fibroblasts carrying bi-allelic IFT172 missense variants in the U-box region (NPH2161). TGF-β1 stimulation time courses in these fibroblasts show altered p-SMAD2 kinetics compared to control fibroblasts, consistent with the phenotype observed in our heterozygous U-box deleted RPE1 cells (see Author response image 3).

      Author response image 3.

      While these results are preliminary and require further replication, they support the involvement of the IFT172 U-box domain in TGF-β signaling regulation in a disease-relevant context. Regarding homozygous U-box deleted cells, the severe reduction in IFT172 protein levels and ciliogenesis defects (Fig. 5B,D) make it difficult to separate U-box-specific effects from secondary consequences of impaired cilia formation, as the reviewer notes. We consider this an important direction for future studies using targeted point mutations rather than domain deletions.

      (3) Figure 5 E-G: Overall, the effects upon TGFB1 addition are rather small compared to previously published data eg Clement et al Cell reports 2013 where one of the authors is the senior. Are RPE cells less responsive or do you have another theory? Did you check TGFB receptor levels to ensure the differences are not due to different levels of receptor expression? I feel it could be interesting to also check ciliary phopsho-SMAD localisation by IF. In Clement et al, loss of IFT88 results in reduced phospho-SMAD2 levels, do you have any theory why these opposite effects compared to the IFT172 loss of function could occur?

      We thank the reviewer for this insightful comment. The Tg737orpk fibroblasts used in Clement et al. (2013), which harbor a hypomorphic mutation in IFT88, exhibit severely stunted cilia. This defect broadly disrupts cilium-dependent signaling pathways, including R-SMAD activation, and is therefore expected to produce more pronounced signaling phenotypes. In contrast, our study utilizes RPE-1 cells with structurally intact cilia, enabling us to investigate more specific alterations in ciliary signaling associated with IFT172 function rather than the global effects of cilia loss. Consequently, the more modest effects observed in our system are consistent with the less severe structural and functional perturbation. Both fibroblasts and RPE-1 cells are known to express TGF-β receptors and to respond robustly to TGF-β stimulation, making it unlikely that differences in receptor abundance alone account for the observed discrepancies. We also note that increasing evidence supports a role for the primary cilium in fine-tuning TGF-β signaling output by coordinating both canonical (R-SMAD-mediated) and non-canonical (e.g., AKT/ERK-mediated) pathways. Our data raise the possibility that loss of the IFT172 U-box domain, or reduced IFT172 levels, may differentially affect this balance, rather than simply attenuating signaling uniformly, as seen with more severe ciliary defects such as IFT88 disruption in Tg737orpk cells. We agree that the current dataset does not fully resolve the underlying mechanism. We therefore consider it an important direction for future work to examine, in greater detail, the localization and phosphorylation status of key canonical and non-canonical signaling components in context of the primary cilium by IF analyses.

      (4) In the summary conclusion at the end of the discussions, the authors propose that IFT72 could directly influence the fate of ubiquitinated TGFB receptors. Do you have any data supporting the theory that TGFB ubiquitination is influenced by IFT172 ?

      We acknowledge that our current data are insufficient to establish a direct link between IFT172-dependent ubiquitination events and TGF-β receptor regulation. Accordingly, we have revised the Discussion (page 19) to remove our previous hypothesis proposing a role for IFT172 in modulating TGF-β receptor ubiquitination.

      While our experiments demonstrate that the U-box region is required for both IFT172 stability and proper TGF-β signaling, we agree that establishing a direct mechanistic connection between ubiquitin-related activity of IFT172 and signaling outcomes would require additional approaches such as targeted point mutations that selectively disrupt ubiquitin-binding or conjugation functions.

      Furthermore, we note that our current data do not allow us to distinguish whether the observed signaling phenotypes arise specifically from the loss of ubiquitin-related functions of the U-box domain or from reduced levels of functional IFT172 protein in the heterozygous U-box–deleted cells.

      (5) Wording:

      Abstract

      "IFT72..is associated with several disease variants causing ciliopathies". I would change this to "..and several disease-causing IFT172 variants have been identified in ciliopathy patients".

      Corrected.

      Introduction

      "Another cohort of patients with milder ciliopathy resembling BBS also presented with ...". I would reword this to "Another cohort of patients with phenotypically slightly different ciliopathy features resembling BBS also presented with ...". It`s not necessarily less severe (they may die of cardiovascular complications in their early thirties for example due to metabolic syndrome, they are intellectually impaired, become blind...), but rather different.

      Changed according to the reviewer’s recommendations.

      Reviewer #3 (Recommendations for the authors):

      (1) Recommended modifications:

      (a) The RPE lines generated should be described better, i.e. sequencing information should be provided, or some kind of evidence that the lines are what they are supposed to be.

      As also noted above, we acknowledge that the characterization presented for the RPE cell lines was insufficient in the initial version of the manuscript. In the revised version, we have addressed this limitation by including detailed sequencing analyses to validate the modifications introduced. Specifically, we provide sequencing data confirming both the integration of the GFP tag and the successful deletion of the U-box domain in all four engineered RPE cell lines. These data verify the integrity of the edited loci and exclude the presence of unintended insertions or deletions at the targeted regions. The corresponding results are presented in Figures S6 and S7 of the revised manuscript, thereby strengthening the validation of the cellular models used in this study.

      (b) It would be more convincing if more than one clone of the RPE lines were presented, as this could rule out possible clonal effects.

      We acknowledge that only a single clone was characterized for each of the four genotypes (IFT172-FL homozygous, IFT172-FL heterozygous, IFT172∆U-box homozygous, IFT172∆U-box heterozygous), and we agree that independent clones would provide stronger protection against clonal artifacts. Generating and validating additional clones was not feasible within the scope of this revision. However, several features of our data mitigate this concern. First, the phenotypes scale with allele dosage: the homozygous ∆U-box line shows the strongest reduction in IFT172 protein level, ciliation, and cilium length, while the heterozygous line shows intermediate defects (Fig. 5B, D and Fig. S8). A clonal off-target effect would not be expected to produce this dose-dependent pattern across two independently isolated lines. Second, the reduced steady-state IFT172 level in the ∆U-box lines (Fig. S8) is consistent with our in vitro observation that the U-box/TPR interface is required for protein stability, providing an independent biochemical rationale for the cellular phenotype. Third, Sanger sequencing of all four lines confirmed precise in-frame integration with no indels at the targeted locus (Figs. S6, S7). We have added a sentence to the Discussion (p. 20) acknowledging that confirmation in additional independent clones remains an important goal for follow-up work.

      (c) Figure 5C: distribution of the GFP-tagged IFT172∆U-box protein could be quantified to support the statement.

      In the revised version of the manuscript, we have included additional quantification of GFP fluorescence across all four cell lines to support our conclusions regarding IFT172 ciliary localization. The corresponding data for each cell line are presented in Figure S5C–F.

      (d) The final sentences include quite bold statements about a general function of IFT172 in signal regulation. Yet, the evidence is the weakest part of the work. It is only shown in i) one cell line, ii) in one cell clone that is not extensively characterized, and iii) for one signaling pathway that is not the best-studied cilia signaling pathway. Therefore, I recommend a more moderate statement.

      Abstract last sentence has now been toned down and reads: Our findings suggest that IFT172, beyond its structural role in bridging IFT-A and IFT-B complexes within IFT trains, harbors a conserved U-box-like domain with potential involvement in ciliary ubiquitination processes and signaling, providing new insights into the molecular mechanisms underlying IFT172-related ciliopathies.

      (e) The order of the figures is not followed in the main text, which is distracting.

      The order of figures is now consecutive in the revised manuscript.

      (2) Questions and comments to consider:

      (a) It is unclear why tetra-ubiquitin chains have been used.

      We thank the reviewer for this question. Recent evidence suggests that ubiquitin chains, rather than monomeric ubiquitin, act as sorting and signaling cues at the primary cilium (Shinde et al., 2020). To probe the ubiquitin-binding activity of IFT172, we therefore used a tetrameric ubiquitin chain as a model substrate, which better reflects the multivalent nature and binding avidity expected for physiological polyubiquitin signals than a ubiquitin monomer. Specifically, we used a recombinantly expressed linear (Met1-linked) tetra-ubiquitin chain, generated as a genetically encoded fusion. Linear ubiquitin chains are well-established non-degradative signaling chains recognized by a dedicated class of ubiquitin-binding domains, making them a suitable probe for detecting ubiquitin-binding activity outside the canonical proteasomal pathway. In addition, monomeric ubiquitin (~8 kDa) is poorly retained during membrane transfer in Western blotting, which further precluded its reliable use as a probe in our pull-down assays. Together, these considerations motivated the use of tetrameric ubiquitin as a biologically and technically appropriate substrate for assessing IFT172's ubiquitin-binding activity.

      (b) Figure 4D: described in the text as "pulldown tetraubiquitin at comparable levels", which is not obvious from the figure presented, it appears reduced by at least 30%.

      We thank the reviewer for this observation. As described on page 10 of the manuscript and evident from Figure 4D, the purified GST–HsIFT172C3 construct underwent substantial proteolytic cleavage during purification. This degradation limited our ability to include amounts of intact GST–HsIFT172C3 comparable to those of the full-length GST–HsIFT172C2 construct in the pull-down assays. Importantly, when accounting for the reduced proportion of full-length GST–HsIFT172C3 present in the assay, the observed differences in tetra-ubiquitin pull-down efficiency between the two constructs are expected to be comparable. This is supported by the Coomassie staining shown in Figure 4D, which reflects the relative abundance of the intact protein species used in the experiment.

      (c) With the proposed model, why would the fla11 mutant only affect retrograde IFT?

      We have revised our manuscript in page 16 of the discussion section providing a plausible explanation of why only retrograde IFT is affected in the fla11 mutant.

      (3) Minor copy-editing:

      (a) Page 3, first paragraph: led := leads.

      (b) Kinesin-2 and Dynein-2 should be hyphenated.

      (c) Page 4: wwp1 should be WWP1.

      (d) Bonafide should be italicized: bona fide.

      (e) Some abbreviations appear uncommon and therefore somewhat distracting: TGFB instead of TGF-beta, Cr in instances where specifically referred to the organism.

      (f) Unprecise lab jargon: "very C-terminal".

      (g) Lab jargon: "purified a C-terminal construct".

      (h) Lab jargon: "pull-downs".

      (i) Page 8: "DALI" only abbreviated.

      (j) Page 9: "Appearance ... were observed" should be "was".

      (k) Page 11: "I688" should be "I1688".

      (l) Page 12: "PDs" unclear.

      These minor points have been corrected.

      We have revised the text and figures to ensure using the widely accepted nomenclature, using TGF-β to refer to the signaling pathway and TGF-β1 specifically when referring to the ligand.

      We further revised the text to reflect the use “Chlamydomonas reinhardtii” in instances when referring to the organism and “Cr” when referring to the protein.

      We have removed the informal phrases "very C-terminal" and "purified a C-terminal construct" from the revised manuscript. We have retained the term "pull-down," as this is well-established and widely used terminology in the biochemistry literature to describe the affinity-based co-isolation assays used here. PD has been replaced with pull-down.

      The grammatical error on page 9 ("Appearance... “were observed") has been corrected to "was observed”.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to investigate the development of infants' responses to music by examining neural activity via EEG and spontaneous body kinematics using video-based analysis. The authors also explore the role of musical pitch in eliciting neural and motor responses, comparing infants at 3, 6, and 12 months of age.

      Strengths:

      A key strength of the study lies in its analysis of body kinematics and modeling of stimulus-motor coupling, demonstrating how the amplitude envelope of music predicts infant movement, and how higher musical pitch may enhance auditory-motor synchronization.

      EEG data provide evidence for enhanced neural responses to music compared to shuffled auditory sequences. These findings ecourage further investigation of the proposed developmental trajectory of neural responses to music and their link to musical behavior in infants.

      Comments on revisions:

      I thank the authors for the considerable effort devoted to revising the manuscript and addressing the raised questions and comments. I particularly appreciate the additional analyses and the extended arguments included in the discussion. I believe that this paper represents a valuable contribution to the literature on music development.

      One remaining comment concerns the evoked response observed in the shuffled condition, which I still find intriguing. Considering that the auditory events in the shuffled condition display a clear rise time, particularly for those events that were selected based on being preceded and followed by longer periods of silence, one would expect to observe an evoked response emerging from baseline. However, this pattern is not evident in the presented curves. The authors may further examine and discuss the shape and characteristics of these response patterns.

      We thank the Reviewer for highlighting this intriguing aspect of our data. We entirely agree that from a purely bottom-up, acoustic perspective, one would expect a clear onset-locked evoked response, such as an P1/P2 complex in adults or its developmental equivalent, given the prominent acoustic rise times and the surrounding periods of silence (such as those accounted for in the control analyses)

      The fact that these responses are not present in the curves for the shuffled condition was striking to us as well. We interpret this severe attenuation not as a failure of sensory perception, but potentially as a consequence of higher-level cognitive modulation. Specifically, because the shuffled condition completely lacks structural regularities, the brain might be unable to build reliable temporal and/or melodic expectations. In the absence of a learnable structure, the auditory system likely down-weights the processing of these random sequences to conserve cognitive resources, leading participants to attentionally disengage.

      This phenomenon aligns with both developmental and adult models of auditory processing. For instance, the "Goldilocks effect" demonstrates that infants systematically withdraw attention from auditory sequences that are entirely unpredictable (Kidd et al., 2014). Similarly, adult auditory literature suggests that while predictable patterns automatically capture attention, random and unpredictable acoustic streams could be actively tuned out (Dayan et al., 2000; Esber & Haselgrove, 2011).

      Following the Reviewer’s helpful suggestion to further discuss the characteristics of these response patterns, we have expanded our description and interpretation of the shuffled condition curves in the revised manuscript. We added the following text to the Methods and Discussion to explicitly address the dampened shape of these responses:

      p. 9: “Importantly, and in line with the adults’ data, all infant groups exhibited enhanced P1 amplitudes in response to music compared to shuffled music. Actually, across all groups, shuffled music did not elicit clear ERPs as the ones elicited by music”.

      p.20: “This process was markedly dampened or interrupted by shuffled music (Bianco et al., 2024, 2025; Lense et al., 2022), a finding that could be interpreted as evidence of disengagement from such highly unpredictable sequences (Dayan et al., 2000; Esber & Haselgrove, 2011; Kidd et al., 2014).”

      Reviewer #2 (Public review):

      Summary:

      Infants' auditory brain responses reveal processing of music (clearly different from shuffled music patterns) from the age of 3 months; however, they do not show related increase in spontaneous movement activity to music until the age of 12 months.

      Strengths:

      This is a nice paper, well designed, with sophisticated analyses and presenting clear results filling an important gap about early infant sensitivity, detection, and differentiation of musical sounds. The addition of EEG recordings (specifically ERPs) in response to music presentations at 3 different infant ages in the first postnatal year is important, and the manipulation of the music stimuli into shuffled, high and low pitch to capture differences in brain response processing and spontaneous movements is interesting. Further, the movement analysis based on Quantity of Movements (QoM) and movement subdivision into 10 distinct Principal Movements (PMs) is novel and creative.

      Overall, results show that ERPs responses to music occurs earlier than QoM in early development, and that even at 12 months, motor responses to music remain coarse and not rhythmically aligned with the music tempo. This work increases our fundamental understanding of infants' early music perception in relation to auditory processing and motor response.

      Comments on revisions:

      The authors have addressed my questions in their revision. I have no other questions. Thanks again for the opportunity to read and evaluate this interesting work.

      We thank the Reviewer for their time, their positive evaluation of our revised manuscript, and their constructive feedback throughout the review process, which has greatly helped us to strengthen this paper.

      Reviewer #3 (Public review):

      Summary

      This study provides a detailed investigation of neural auditory responses and spontaneous movements in infants listening to music. Analyses of EEG data (event-related potentials and steady-state responses) first highlighted that infants at 3, 6 and 12 months of age and adults showed enhanced auditory responses to music than shuffled music. 6-month-olds also exhibited enhanced P1 response to high-pitch vs low-pitch stimuli, but not the other groups. Besides, whole body spontaneous movements of infants were decomposed into 10 principal components. Kinematic analyses revealed that the quantity of movement was higher in response to music than shuffled music only at 12 months of age. Although Granger causality analysis suggested that infants' movement was related to the music intensity changes, particularly in the high-pitch condition, infants did not exhibit phase-locked movement responses to musical events, and the low movement periodicity was not coordinated with music.

      Strengths

      This study investigates an important topic on the development of music perception and translation to action and danse. It targets a crucial developmental period that is difficult to explore. It evaluates two modalities by measuring neural auditory responses and kinematics, while cross-modal development is rarely evaluated. Overall, the study fills a clear gap in the literature.

      Besides, the study uses state-of-the-art analyses. Detailed investigations were performed, as well as exploratory analyses in supplementary information. The discussion is rich in neurodevelopmental interpretations and comparisons with the literature. All steps are clearly detailed. The manuscript is very clear, well-written and pleasant to read. Figures are well-designed and informative. The authors' responses to previous reviews are also detailed and informative.

      Comments on revisions:

      The authors answered all my questions.

      Thank you very much for your positive evaluation and for taking the time to review our revisions. We deeply appreciate your insightful comments across the review rounds, which have helped us improve the clarity and rigor of our paper.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study presents results supporting a model that tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the stem cell niche and inhibit the differentiation of neighboring cells. The valuable findings show that GSC tumors often contain non-mutant cells whose differentiation is suppressed by the GSC tumorous cells. However, the evidence showing that the GSC tumors produce BMP ligands to suppress differentiation of non-mutant cells is incomplete due to concerns about the new HCR data.

      Thanks for this assessment. All concerns raised by the reviewers regarding the HCR data and others are followed by our responses below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Fig. 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Fig. 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Fig. 2). They present data suggesting that in 73% of SGCs BMP signaling is low (assessed by dad-lacZ) (Fig. 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Fig. 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Fig. 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Fig. 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what in seen in the ovarian stem cell niche. This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Fig. 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Fig. 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Fig. 2). They present data suggesting that in 73% of SGCs BMP signaling is low (assessed by dad-lacZ) (Fig. 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Fig. 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Fig. 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Fig. 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what in seen in the ovarian stem cell niche.

      Strengths:

      (1) Use of an excellent and established model for tumorous cells in a stem cell microenvironment

      (2) Powerful genetics allow them to test various factors in the tumorous vs non-tumorous cells

      (3) Appropriate use of quantification and statistics

      Thank you for your valuable comments, and we greatly appreciate them.

      Weaknesses:

      (1) What is the frequency of SGCs in nos>flp; bam-mutant tumors? For example, are they seen in every germarium, or in some germaria, etc or in a few germaria.

      This concern was addressed in the rebuttal. The line number is 106, not line 103.

      (2) Does the breakdown in clonality vary when they induce hs-flp clones in adults as opposed to in larvae/pupae?

      This concern was addressed in the rebuttal. However, these statements are no on lines 331-335 but instead starting on line 339. Please be accurate about the line numbers cited in the rebuttal. They need to match the line numbers in the revised manuscript.

      We have rechecked the line numbers and confirmed that the mismatch arose from the Word-to-PDF conversion process on the eLife website. As this issue has recurred and reviewers’ file-format preferences are unknown to us, we have added a clarifying note at the beginning of each response letter: “Please note that the line numbers cited refer to the revised manuscript in the Microsoft Word format”.

      (3) Approximately 20-25% of SGCs are bam+, dad-LacZ+. Firstly, how do the authors explain this? Secondly, of the 70-75% of SGCs that have no/low BMP signaling, the authors should perform additional characterization using markers that are expressed in GSCs (i.e., Sex lethal and nanos).

      The authors did not perform additional staining for GSC-enriched protein like Sex lethal and nanos.

      The 70-75% of SGCs that have low BMP signaling display the following characteristics: 1) dot-like spectrosomes, 2) positivity for Dad-lacZ, and 3) absence of bamP-GFP expression. This combination of traits is sufficient to classify them as GSC-like cells. Neither Sex lethal nor Nanos is expressed exclusively in GSCs (Chau et al., 2009; Li et al., 2009), rendering them unsuitable for distinguishing GSC-like from cystoblast-like cells.

      (4) All experiments except Fig. 1I (where a single germarium with no quantification) were performed with nos-Gal4, UASp-flp. Have the authors performed any of the phenotypic characterizations (i.e., figures other than figure 1) with hs-flp?

      In the rebuttal, the authors stated that they used nos>flp for all figures except for Fig. 1I. It would be more convincing for them to prove in Fig. 1 than there is not phenoytpic difference between the two methods and then switch to the nos>FLP method for the rest of the paper.

      We appreciate this suggestion. These data are included in Figure 1-figure supplement 3 in the revised manuscript.

      (5) Does the number of SGCs change with the age of the female? The experiments were all performed in 14-day old adult females. What happens when they look at young female (like 2-day old). I assume that the nos>flp is working in larval and pupal stages and so the phenotype should be present in young females. Why did the authors choose this later age? For example, is the phenotype more robust in older females? or do you see more SGCs at later time points?

      The authors did not supply any data to prove that the clones were larger in 14-day-old flies than in younger flies. Additionally, the age of "younger" flies was not specified. Therefore, the authors did not satisfactorily answer my concern.

      We appreciate this critical comment. Figure 1J includes the SGC phenotype data from 1-, 7-, and 14-day-old flies. Both 1- and 7-day-old flies are younger flies in our analyses. The evidence that germline clones were larger in 14-day-old flies than in younger flies was provided in Figure 1-figure supplement 2 in the revised manuscript.

      (6) Can the authors distinguish one copy of GFP versus 2 copies of GFP in germ cells of the ovary? This is not possible in the Drosophila testis. I ask because this could impact on the clonal analyses diagrammed in Fig. 4A and 4G and in 6A and B. Additionally, in most of the figures, the GFP is saturated so it is not possible to discern one vs two copies of GFP.

      In the rebuttal, the authors stated that they cannot differential one vs two copies of GFP. They used other clone labeling methods in Fig. 4 and 6. I think that the authors should make a statement in the manuscript that they cannot distinguish one vs two copies of GFP for the record.

      Thank you for this suggestion. Such statement has been added in the revised manuscript (Lines 177-178).

      (7) More evidence is needed to support the claim of elevated Dpp levels in bam or bgcn mutant tumors. The current results with dpp-lacZ enhancer trap in Fig 5A,B are not convincing. First, why is the dpp-lacZ so much brighter in the mosaic analysis (A) than in the no-clone analysis (B); it is expected that the level of dpp-lacZ in cap cells should be invariant between ovaries and yet LacZ is very faint in Fig. 5B. I think that if the settings in A matched those in B, the apparent expression of dpp-lacZ in the tumor would be much lower and likely not statistically significantly. Second, they should use RNA in situ hybridization with a sensitive technique like hybridization chain reactions (HCR) - an approach that has worked well in numerous Drosophila tissues including the ovary.

      The HCR FISH in Fig.5 of the revised manuscript needs an explanation for how the mRNA puncta were quantified. Currently, there is no information in the methods. What is meant but relative dpp levels. I think that the authors should report in and unbiased manner "number" of dpp or gbb puncta in TFs. For the germaria, I think that they should report the number of puncta of dpp or gbb divide by the total area in square pixels counted. Additionally, the background fluorescence is noticeably much higher in bamBG/delta86 germaria, which would (falsely) increase the relative intensity of dpp and gbb in bam mutants. Although, I commend the authors for performing HCR FISH, these data are still not convincing to me.

      We appreciate these critical comments. Due to variable puncta sizes and frequent clustering in TF and cap cells (see Figure 5A, C), direct quantification of puncta number was unreliable. Therefore, we quantified mean fluorescence intensity instead, as described in the revised figure legend of Figure 5 (Lines 603-604). In Author response image 1 1A, B (modified from Figure 5A, C) , magenta ovals indicate empty background fluorescence areas, which appear similar between w<sup>1118</sup> (wild-type control) and bam<sup>-/-</sup> germaria. In Author response image 1, the yellow oval outlines a neighboring germarium, not an empty area (see the DAPI channel).

      Author response image 1.

      In situ-HCR results of dpp and gbb in wild-type and bam mutant germaria. Magenta ovals indicate empty areas displaying only background fluorescence. In panel (B), the yellow oval outlines a neighboring germarium, not an empty area (see the DAPI channel below).

      (8) In Fig 6, the authors report results obtained with the bamBG allele. Do they obtain similar data with another bam allele (i.e., bamdelta86)?

      The authors did not try any experiments with the bamdelta86 allele, despite this allele being molecularly defined, where the bamBG allele is not defined.

      While we agree that repeating the experiments in Figure 6 with bam<sup>Δ86</sup> would be helpful, our mosaic analysis strategy for two genes on different chromosome arms is technically complex (see genotypes in Source data 1). Switching from bam<sup>BG</sup> to bam<sup>Δ86</sup> would necessitate extensive and time-consuming genetic recombination. Given that both alleles induce the SGC phenotype indistinguishably (Figure 1J), we believe that repeating these experiments with bam<sup>Δ86</sup> would not alter our key conclusion. We appreciate your understanding regarding this technical complexity.

      Reviewer #2 (Public review):

      In the current version, Zhang et al. have made substantial improvements to the manuscript. It is now easier to read, and the data are more solid compared with the previous version, supporting their conclusion that tumor GSCs secrete stemness factors (BMPs and Dpp) to suppress the differentiation of neighboring wild-type GSCs. This study should benefit a broad readership across developmental biology, germ cell biology, stem cell biology, and cancer biology.

      Thank you for your valuable comments, and we greatly appreciate them.

      However, the following suggestions may further improve the clarity and rigor of the research content:

      (1) Clarification of sample size (n).

      Each germarium can contain highly variable numbers of SGCs, sometimes reaching 50-100. When reporting "n" values, the authors are encouraged to also indicate the number of germaria analyzed. For example, in lines 126-128:

      "Notably, 74% of SGCs (n = 132) were GFP-negative, while the remaining 26% were GFP-positive (Figure 2B, C). This suggests that SGCs can be categorized into two distinct groups: those resembling GSCs (GSC-like) and those resembling cystoblasts (cystoblast-like)." Please clarify how many germaria were examined to obtain n = 132.

      We appreciate this comment. In 14-day-old fly ovaries, each germarium that met our criterion for quantifying the SGC phenotype contains approximately 1.5 SGCs (see Figure 1K). For the specific analysis of the “132” SGCs presented in Figure 2C, we did not record the number of germaria from which they originated.

      In addition, it is unclear whether the authors intend to suggest that the GFP-negative SGCs are GSC-like or cystoblast-like; this point should be clarified.

      Thank you for this suggestion. We intend to suggest that the bamP-GFP-negative SGCs are GSC-like, which information has been added in the revised manuscript (Line 129).

      (2) Improvement of Fig. 6 in situ hybridization images.

      The in situ hybridization images in Fig. 6 are not fully convincing. The control images, in particular, would benefit from higher resolution and enlarged views of the germarium region.

      Thank you for this valuable suggestion. The enlarged views of both the control and bam<sup>-/-</sup> germarium regions were included in Figure 5A, C in the revised manuscript.

      In panel C, abundant signals are also present outside the germarium, which may complicate interpretation and should be clarified or controlled for.

      In the right panel of Figure 5C, the abundant signals noted by the reviewer originate from neighboring germaria (see the DAPI channel), not from empty areas, which would be expected to show only background fluorescence. For more details, please refer to our response to Question (7) raised by Reviewer #1.

      Alternatively, the authors could strengthen the in situ analysis by using bam mutants or bam dpp / bam gbb double mutants as controls to better define signal specificity.

      We appreciate this comment. Homozygous dpp or gbb mutants are lethal, precluding the generation of dpp bam or gbb bam double-mutant flies. Additionally, the GFP signal was drastically reduced during our HCR processing, preventing mosaic clone analysis.

      Reviewer #3 (Public review):

      Zhang et al. investigated how germline tumors influence the development of neighboring wild-type (WT) germline stem cells (GSC) in the Drosophila ovary. They report that germline tumors generated by differentiation-arrested mutations (bam and bgcn) inhibit the differentiation of neighboring WT GSCs by arresting them in an undifferentiated state, resulting from reduced expression of the differentiation-promoting factor Bam. They find that these tumor cells produce low levels of the niche-associated signaling molecules Dpp and Gbb, which suppress bam expression and consequently inhibit the differentiation of neighboring WT GSCs non-cell-autonomously. Based on these findings, the authors propose that germline tumors mimic the niche to suppress the differentiation of the neighboring wild-type germline stem cells.

      Strengths:

      The study uses a well-established in vivo model to address an important biological question concerning the interaction between germline tumor cells and wild-type (WT) germline stem cells in the Drosophila ovary. If the findings are substantiated, this study could provide valuable insights that are applicable to other stem cell systems.

      Thank you for your valuable comments, and we greatly appreciate them.

      Weaknesses:

      The authors have addressed some of my concerns in the revised submission. However, the data presented do not allow the authors to distinguish whether the failed differentiation of WT stem cells/germline cells results from "arrested differentiation due to the loss of the differentiation niche" or from "direct inhibition by tumor-derived expression of niche-associated molecules Dpp and Gbb".

      Blocking Dpp or Gbb secretion specifically from germline tumor cells promoted differentiation of neighboring wild-type germ cells (Figure 6). This indicates that BMP ligands secreted by germline tumors are required to inhibit this differentiation. However, we cannot rule out the possibility that disruption of the differentiation niche also contributes to the SGC phenotype, a point highlighted in the manuscript (Line 204).

      The critical supporting data, HCR in situ results, are not sufficiently convincing.

      Below, we provide a point-by-point reply addressing each of your specific recommendations.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      It's a surprising that the authors failed to induce germline tumors at the adult stage, as this has been reported by many labs and would allow for time course analysis of SGC phenotype. As a result, the data in this manuscript address only events occurring after the germline tumor formation (with clonal induction at larval stage) and and focus on the already presene "arrested wild-type germ cells", without providing insight into the process of by which these arrested germ cells are formed.

      In our hands, inducing germline clones by the hs-FLP method at the adult stage was efficient in males but not in females, despite subjecting adult flies to intensive heat-shock at 37°C.

      The HCR in situ data exhibit a high background.

      Regarding the background issue, please see our response to Reviewer #1’s Question (7).

      First, the signal appears stronger in TF cells than in cap cells.

      As demonstrated by Li et al. (Li et al., 2016), dpp-lacZ (P4-lacZ) signals are also stronger in TF cells than in cap cells (see their Figure 4D').

      Second, both dpp and gbb are detected broadly in somatic cells including escort cells. These observations are inconsistent with published data.

      As shown in Figure 5A and C, dpp and gbb were detected broadly in somatic cells of bam<sup>-/-</sup> germaria, but not in those of w<sup>1118</sup> (wild-type) controls. To our knowledge, no previous study has reported the expression pattern of these ligands in a bam mutant background.

      To demonstrate the tumor-derived dpp and gbb, the HCR in situ analysis could be performed in the germarium with mosaic clones. If these niche-associated molecules are indeed expressed in tumor cells, the authors should observe a mosaic expression pattern of these molecules, with signal "ON" in tumor cells and "OFF" in neighbouring arrested germ cells.

      This is a great idea and was indeed our original approach. However, GFP signal was drastically reduced during our HCR processing, ultimately precluding mosaic clone analysis.

      References

      Chau, J., Kulnane, L.S., and Salz, H.K. (2009). Sex-lethal facilitates the transition from germline stem cell to committed daughter cell in the Drosophila ovary. Genetics 182, 121-132.

      Li, X., Yang, F., Chen, H., Deng, B., Li, X., and Xi, R. (2016). Control of germline stem cell differentiation by Polycomb and Trithorax group genes in the niche microenvironment. Development 143, 3449-3458.

      Li, Y., Minor, N.T., Park, J.K., McKearin, D.M., and Maines, J.Z. (2009). Bam and Bgcn antagonize Nanos-dependent germ-line stem cell maintenance. Proc Natl Acad Sci U S A 106, 9304-9309.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      This study presents a valuable theoretical exploration on the electrophysiological mechanisms of ionic currents via gap junctions in hippocampal CA1 pyramidal-cell models, and their potential contribution to local field potentials (LFPs) that is different from the contribution of chemical synapses. The biophysical argument regarding electric dipoles appears solid, but the evidence can be more convincing if their predictions are tested against experiments. A shortage of model validation and strictly comparable parameters used in the comparisons between chemical vs. junctional inputs makes the modeling approach incomplete; once strengthened, the finding can be of broad interest to electrophysiologists, who often make recordings from regions of neurons interconnected with gap junctions.

      We gratefully thank the editors and the reviewers for the time and effort in rigorously assessing our manuscript, for the constructive review process, for their enthusiastic responses to our study, and for the encouraging and thoughtful comments. We especially thank you for deeming our study to be a valuable exploration on the differential contributions of active dendritic gap junctions vs. chemical synapses to local field potentials. We thank you for your appreciation of the quantitative biophysical demonstration on the differences in electric dipoles that appear in extracellular potentials with gap junctions vs. chemical synapses.

      However, we are surprised by aspects of the assessment that resulted in deeming the approach incomplete, especially given the following with specific reference to the points raised:

      (1) Testing against experiments: With specific reference to gap junctions, quantitative experimental verification becomes extremely difficult because of the well-established non specificities associated with gap junctional modulators (Behrens et al., 2011; Rouach et al., 2003). In addition, genetic knockouts of gap junctional proteins are either lethal or involve functional compensation (Bedner et al., 2012; Lo, 1999), together making causal links to specific gap junctional contributions with currently available techniques infeasible.

      In addition, the complex interactions between co-existing chemical synaptic, gap junctional, and active dendritic contributions from several cell-types make the delineation of the contributions of specific components infeasible with experimental approaches. A computational approach is the only quantitative route to specifically delineate the contributions of individual components to extracellular potentials, as seen from studies that have addressed the question of active dendritic contributions to field potentials (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to local field potentials (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). The biophysically and morphologically realistic computational modeling route is therefore invaluable in assessing the impact of individual components to extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).

      Together, we emphasize that the computational modeling route is currently the only quantitative methodology to delineate the contributions of gap junctions vs. chemical synapses to extracellular potentials.

      (2) Model validation: The model used in this study was adopted from a physiologically validated model from our laboratory (Roy & Narayanan, 2021). Please note that the original model was validated against several physiological measurements along the somatodendritic axis. We sincerely regret our oversight in not mentioning clearly that we have used an existing, thoroughly physiologically-validated model from our laboratory in this study.

      (3) Comparisons between chemical vs. junctional inputs: We had taken elaborate precautions in our experimental design to match the intracellular electrophysiological signatures with reference to synchronous as well as oscillatory inputs, irrespective of whether inputs arrived through gap junctions or chemical synapses. A new Supplementary Figure S3 has been added to address this concern raised by the reviewers.

      In the revised manuscript, we have addressed all the concerns raised by the reviewers in detail. We have provided point-by-point responses to reviewers’ helpful and constructive comments below. We thank the editors and the reviewers for this constructive review process, which helped us in improving our manuscript with specific reference to emphasizing the novelty of our approach and conclusions. The specific changes incorporated into the revised manuscript are detailed below.

      Reviewer #1 (Public review):

      This manuscript makes a significant contribution to the field by exploring the dichotomy between chemical synaptic and gap junctional contributions to extracellular potentials. While the study is comprehensive in its computational approach, adding experimental validation, network-level simulations, and expanded discussion on implications would elevate its impact further.

      We gratefully thank you for your time and effort in rigorously assessing our manuscript, for the enthusiastic response, and the encouraging and thoughtful comments on our study. In what follows, we have provided point-by-point responses to the specific comments.

      Strengths

      Novelty and Scope

      The manuscript provides a detailed investigation into the contrasting extracellular field potential (EFP) signatures arising from chemical synapses and gap junctions, an underexplored area in neuroscience. It highlights the critical role of active dendritic processes in shaping EFPs, pushing forward our understanding of how electrical and chemical synapses contribute differently to extracellular signals.

      We thank you for the positive comments on the novelty of our approach and how our study addresses an underexplored area in neuroscience. The assumptions about the passive nature of dendritic structures had indeed resulted in an underestimation of the contributions of gap junctions to extracellular potentials. Once the realities of active structures are accounted for, the contributions of gap junctions increases by several orders of magnitude compared to passive structures (Fig. 1D).

      Methodological Rigor

      The use of morphologically and biophysically realistic computational models for CA1 pyramidal neurons ensures that the findings are grounded in physiological relevance. Systematic analysis of various factors, including the presence of sodium, leak, and HCN channels, offers a clear dissection of how transmembrane currents shape EFPs.

      We thank you for your encouraging comments on the experimental design and methodological rigor of our approach.

      Biological Relevance

      The findings emphasize the importance of incorporating gap junctional inputs in analyses of extracellular signals, which have traditionally focused on chemical synapses. The observed polarity differences and spectral characteristics provide novel insights into how neural computations may differ based on the mode of synaptic input.

      We thank you for your positive comments on the biological relevance of our approach. We also gratefully thank you for emphasizing the two striking novelties unveiling the dichotomy between gap junctions and chemical synapses in their contributions to field potentials: polarity differences and spectral characteristics.

      Clarity and Depth

      The manuscript is well-structured, with a logical progression from synchronous input analyses to asynchronous and rhythmic inputs, ensuring comprehensive coverage of the topic.

      We sincerely thank you for the positive comments on the structure and comprehensive coverage of our manuscript encompassing different types of inputs that neurons typically receive.

      Weaknesses and Areas for Improvement

      Generality and Validation

      The study focuses exclusively on CA1 pyramidal neurons. Expanding the analysis to other cell types, such as interneurons or glial cells, would enhance the generalizability of the findings. Experimental validation of the computational predictions is entirely absent. Empirical data correlating the modeled EFPs with actual recordings would strengthen the claims.

      We thank you for raising this important point. The prime novelty and the principal conclusion of this study is that gap junctional contributions to extracellular field potentials are orders of magnitude higher when the active nature of cellular compartments are accounted for. The lacuna in the literature has been consequent to the assumption that cellular compartments are passive, resulting in the dogma that gap junctional contributions to field potentials are negligible. Despite knowledge about active dendritic structures for decades now, this assumption has kept studies from understanding or even exploring the contributions of gap junctions to field potentials. The rationale behind the choice of a computational approach to address the lacuna were as follows:

      (1) The complex interactions between co-existing chemical synaptic, gap junctional, and active dendritic contributions from several cell-types make the delineation of the contributions of specific components infeasible with experimental approaches. A computational approach is the only quantitative route to specifically delineate the contributions of individual components to extracellular potentials, as seen from studies that have addressed the question of active dendritic contributions to field potentials (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to local field potentials (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). The biophysically and morphologically realistic computational modeling route is therefore invaluable in assessing the impact of individual components to extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).

      (2) With specific reference to gap junctions, quantitative experimental verification becomes extremely difficult because of the well-established non-specificities associated with gap junctional modulators (Behrens et al., 2011; Rouach et al., 2003). 'The non-specific actions of gap junctions are tabulated in Table 2 of (Szarka et al., 2021). In addition, genetic knockouts of gap junctional proteins are either lethal or involve functional compensation (Bedner et al., 2012; Lo, 1999), together making causal links to specific gap junctional contributions with currently available techniques infeasible.

      We highlight the novelty of our approach and of the conclusions about differences in extracellular signatures associated with active-dendritic chemical synapses and gap junctions, against these experimental difficulties. We emphasize that the computational modeling route is currently the only quantitative methodology to delineate the contributions of gap junctions vs. chemical synapses to extracellular potentials. Our analyses clearly demonstrates that gap junctions do contribute to extracellular potentials if the active nature of the cellular compartments is explicitly accounted for (Fig. 1D). We also show theoretically well-grounded and mechanistically elucidated differences in polarity (Figs. 1–3) as well as in spectral signatures (Figs. 5–8) of extracellular potentials associated with gap junctional vs. chemical synaptic inputs. Together, our fundamental demonstration in this study is the critical need to account for the active nature of cellular compartments in studying gap junctional contributions of extracellular potentials, with CA1 pyramidal neuronal dendrites used as an exemplar.

      In the revised version of the manuscript, we have emphasized the motivations for the approach we took, highlighting the specific novelties both in methodological and conceptual aspects, finally emphasizing the need to account for other cell types and gap junctional contributions therein. Importantly, we have emphasized the non-specificities associated with gap-junctional blockers as the reason why experimental delineation of gap junctional vs. chemical synaptic contributions to LFP becomes tedious. We believe that these points underscore the need for the computational approach that we took to address this important question, apart from the novelties of the study.

      In response to your constructive comments, we have added the following to the revised version of the manuscript, in the Introduction section as motivation for the specific route we took:

      “Given the complexity arising from the concurrent activity of chemical synapses, gap junctions, and active dendritic conductances across multiple neuronal populations, experimentally isolating the contributions of individual components to extracellular potentials remains highly challenging. To address this limitation, we employed a computational modeling approach, which provides a quantitative framework for systematically dissecting the distinct roles of specific cellular and synaptic elements. This strategy is consistent with previous studies that have successfully used computational methods to elucidate the contributions of active dendritic mechanisms to LFPs (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to LFPs (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). In addition, experimentally isolating the contribution of gap junctions is complicated by non-specific effects of available pharmacological modulators targeting these connections (Behrens et al., 2011; Rouach et al., 2003). Most genetic knockouts of gap junctional proteins are either lethal or trigger functional compensatory mechanisms (Bedner et al., 2012; Lo, 1999), thereby rendering causal attribution of specific gap junctional contributions infeasible with currently available experimental approaches. Consequently, biophysically and morphologically detailed computational modeling provides a crucial means to evaluate the impact of individual neuronal components on extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).”

      We thank you for raising this point as this allowed us to expand on the specific motivations for the approach we took, and to present the specific novelties of our study to the analyses of extracellular field potentials. Thank you.

      Role of Active Dendritic Currents

      The paper emphasizes active dendritic currents, particularly the role of HCN channels in generating outward currents under certain conditions. However, further discussion of how this mechanism integrates into broader network dynamics is warranted.

      We thank you for this constructive suggestion. We agree that it is important to consider the implications for broader network dynamics of the outward HCN currents that are observed with synchronous inputs. In the revised manuscript, we have elaborated on the implications of the outward HCN current to network dynamics in detail. The following paragraph has been added to Discussion subsection on “Outward HCN currents regulate extracellular potentials”:

      “HCN channels play a critical role in shaping hippocampal network dynamics by modulating neuronal excitability, oscillatory behavior, and susceptibility to pathological states (Kessi et al., 2022; Magee, 1998; Mishra & Narayanan, 2025; Nolan et al., 2004). The outward-like properties of the HCN current we observed may have specific functional implications at different scales. At the cellular scale, the manifestation of outward current during action potentials or plateau potentials could contribute to after hyperpolarization thereby regulating firing properties. In cortical and hippocampal pyramidal neurons, most single-neuron processing occurs in their elaborate dendritic branches, where there is spatiotemporal summation of different synaptic potentials, plateau potentials, back propagating action potentials, and dendritic spikes (Johnston & Narayanan, 2008; Major et al., 2013; Stuart & Spruston, 2015). Considering the heavy expression of HCN channels in the dendrites of hippocampal and cortical pyramidal neurons (Kole et al., 2006; Lorincz et al., 2002; Magee, 1998; Williams & Stuart, 2000), the back propagating action potentials, plateau potentials, or dendritic spikes at dendritic location could yield outward currents. These outward currents could act as a hyperpolarizing mechanism that suppresses spatiotemporal summation of the different dendritic potentials.

      At the network scale, such regulation of dendritic potentials and somatic firing could contribute to overall reduction in firing rates of different neurons in the network. For instance, as inhibitory neurons typically elicit action potentials at higher frequencies, somatic outward HCN currents would occur more frequently in inhibitory neurons that express HCN channels compared to excitatory neurons. However, the heavy expression of HCN channels in the dendrites and the higher prevalence of dendritic spikes and plateau potentials in dendrites (Basak & Narayanan, 2018; Larkum et al., 2022; Moore et al., 2017) imply that the impact on outward HCN currents might be higher. Thus, the presence of outward HCN currents would regulate network balance of excitation inhibition in an activity-dependent manner. Additionally, the outward component of the current through HCN channels could contribute to stabilization of network synchrony by promoting spike phase coherence and to modulation of spike-LFP phase relationships (Das et al., 2017; Ness et al., 2016, 2018; Seenivasan & Narayanan, 2020; Sinha & Narayanan, 2015, 2022).

      Together, the outward HCN current could play critical roles in regulating several cellular and network functions including spatiotemporal summation within single neurons, amplitude and phase of different oscillations, excitatory-inhibitory interactions, and rate and temporal coding involved in spatial navigation (Hussaini et al., 2011; Nolan et al., 2004; O'Keefe & Recce, 1993). In the context of brain rhythms, future investigations are needed to explore ripple-frequency oscillations, specifically to assess whether high-frequency network interactions are modulated by HCN outward currents. Importantly, future studies could specifically focus on delineating the prevalence and specific contributions of outward currents through HCN channels to single-neuron and network physiology.”

      We thank you for highlighting this point, as it allowed us to elaborate the broader roles of HCN channels to single-cell computation, network dynamics, and field potentials. Thank you.

      Analysis of Plasticity

      While the manuscript mentions plasticity in the discussion, there are no simulations that account for activity-dependent changes in synaptic or gap junctional properties. Including such analyses could significantly enhance the relevance of the findings.

      We thank you for this constructive suggestion. Please note that we have presented consistent results for both fewer and more gap junctions in our analyses (Figure 1 with 217 gap junctions and Supplementary Figure 1 with 99 gap junctions). Thus, our fundamentally novel result that gap junctions onto active dendrites differentially shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron. Thus, these results demonstrate that the conclusions about their contributions to LFP are invariant to plasticity in their gap junctional numerosity.

      We had only briefly mentioned plasticity in the Introduction to highlight the different modes of synaptic transmission and to emphasize that plasticity has been studied in both chemical synapses and gap junctions, playing a role in learning and adaptation. However, it seems that this wording inadvertently suggested that our study includes plasticity simulations. Therefore, we have removed that sentence from Introduction in the revised manuscript to ensure clarity.

      In the ‘Limitations of analyses and future studies’ section in Discussion, we suggested investigating the impact of plasticity mechanisms—specifically, activity-dependent plasticity of ion channels—on synaptic receptors vs. gap junctions and their effects on extracellular field potentials under various input conditions and plasticity combinations across different structures. We fully agree with the reviewer that such studies would offer valuable insights and further enhance the broader relevance of our findings. However, while our study implies this direction, it was not the primary focus of our investigation.

      In the revised manuscript, we have also expanded on intrinsic/synaptic plasticity and how they could contribute to LFPs (Sinha & Narayanan, 2015, 2022), while also pointing to simulations with different numbers of gap junction in this context. The following specific changes have been incorporated to the revised manuscript:

      Discussion subsection “Limitations of analyses and future directions”

      “We demonstrated that the contribution of gap junctions to extracellular field potentials remains consistent regardless of the number of gap junctions. Specifically, we showed that the distinct positive LFP deflections persisted irrespective of their relative density on neurons (Fig. 1 with 217 gap junctions and Supplementary Fig. 1 with 99 gap junctions). Previous studies have quantitatively demonstrated that intrinsic and synaptic plasticity modulate hippocampal LFPs and phase coding (Sinha & Narayanan, 2015, 2022). Future analyses should also assess the impact of activity-dependent plasticity in ion channels (on dendrites, axonal initial segments, and other compartments), in synaptic receptors, and in gap junctions (Andersen et al., 2006; Coulon & Landisman, 2017; Johnston & Narayanan, 2008; Magee & Grienberger, 2020; Mishra & Narayanan, 2021; Neves et al., 2008; O'Brien, 2014; Pereda, 2014; Vaughn & Haas, 2022) on extracellular potentials with various kinds of gap junctional inputs and different combinations of plasticity in various structures. Interactions among different forms of plasticity and how co-dependent plasticity in different components alters extracellular field potentials could provide deeper insights about physiological changes during learning and pathological changes observed in different neurological disorders (Sinha & Narayanan, 2022).”

      We thank you for highlighting this as this allowed us to improve on the specific focus of the manuscript and the study. Thank you.

      Frequency-Dependent Effects

      The study demonstrates that gap junctional inputs suppress highfrequency EFP power due to membrane filtering. However, it could delve deeper into the implications of this for different brain rhythms, such as gamma or ripple oscillations.

      We sincerely thank you for these insightful comments that we totally agree with. As it so happens, this manuscript forms the first part of a broader study where we explore the implications of gap junctions to ripple frequency oscillations. The ripple oscillations part of the work was presented as a poster in the Society for Neuroscience (SfN) annual meeting 2024 (Sirmaur & Narayanan, 2024). There, we simulate a neuropil made of hundreds of morphologically realistic neurons to assess the role of different synaptic inputs excitatory, inhibitory, and gap junctional and active dendrites to ripple frequency oscillations. We demonstrate there that the conclusions from single-neuron simulations in this current manuscript extend to a neuropil with several neurons, each receiving excitatory, inhibitory and gap-junctional inputs, especially with reference to high-frequency oscillations. Our network based analyses unveiled a dominant mediatory role of patterned inhibition in ripple generation, with recurrent excitations through chemical synapses and gap junctions in conjunction with return-current contributions from active dendrites playing regulatory roles in determining ripple characteristics (Sirmaur & Narayanan, 2024).

      Our principal goal in this study, therefore, was to lay the single-neuron foundation for network analyses of the impact of gap junctions on LFPs. We are preparing the network part of the study, with a strong focus on ripple-frequency oscillations, for submission for peer review separately. Please see abstract of our poster presented at the Society for Neuroscience annual meeting 2024 on the topic here: https://tinyurl.com/57ehvsep).

      In the revised manuscript, we have mentioned the results from our SfN abstract with reference to network simulations and high-frequency oscillations, while also presenting discussions from other studies on the role of gap junctions in synchrony and LFP oscillations. The following has been added to the revised manuscript under the Discussion subsection “High-frequency LFP power was suppressed with gap junctional inputs”:

      “In this context, our analyses lay the foundation for network analyses of the impact of gap junctions on LFPs. The conclusions from the single-neuron simulations in this study extend to a neuropil with several neurons, each receiving synaptic and gap junctional inputs, especially with reference to high-frequency ripple oscillations (Sirmaur & Narayanan, 2024). A neuropil made of hundreds of morphologically realistic pyramidal neurons was used to assess the role of different synaptic inputs excitatory, inhibitory, and gap junctional with different patterns of stimulation and active dendritic contributions to ripple-frequency oscillations. Network-based analyses have unveiled a dominant mediatory role of patterned inhibition in ripple generation, with recurrent excitations through chemical synapses and gap junctions, in conjunction with return-current contributions from active dendrites, playing modulatory roles in governing ripple characteristics (Sirmaur & Narayanan, 2024). Future studies could expand on these conclusions to explore the implications of frequency-dependent filtering (with reference to gap junctional coupling) on high-frequency extracellular oscillations.”

      We thank you for highlighting this point as it allowed us to expand on the implications for our analyses to brain rhythms, especially with reference to high-frequency oscillations. Thank you.

      Visualization

      Figures are dense and could benefit from more intuitive labeling and focused presentations. For example, isolating key differences between chemical and gap junctional inputs in distinct panels would improve clarity.

      We thank you for this constructive suggestion. We used the specific visualization throughout, where we place the outcomes associated with chemical synapses and gap junctions in the same figure, adjacent to each other. We believe that this offers visually intuitive distinction between the outcomes for chemical synapses and gap junctions, rather than placing them in different figures. Splitting them would place the outcomes in different figures and requires turning pages or placing two different figures adjacent to each other for quantitative comparison. We respectfully request that we be allowed to retain this form of visualization in the figures. Thank you.

      Contextual Relevance

      The manuscript touches on how these findings relate to known physiological roles of gap junctions (e.g., in gamma rhythms) but does not explore this in depth. Stronger integration of the results into known neural network dynamics would enhance its impact.

      We sincerely appreciate your valuable suggestion and acknowledge the importance of integrating our results into established neural network dynamics, particularly their implications for gamma rhythms. We have addressed this aspect in the revised version of our manuscript. We have added this to the Discussion subsection on “High-frequency LFP power was suppressed with gap junctional inputs” of the revised manuscript:

      “In the context of oscillations and gap-junctional coupling, electrical synapses have been shown to regulate the emergence and stability of the network interactions underlying rhythms of different frequencies, especially gamma-frequency oscillations (Bocian et al., 2009; Buhl et al., 2003; Draguhn et al., 1998; Hormuzdi et al., 2001; Konopacki et al., 2004; LeBeau et al., 2003; Posluszny, 2014; Traub et al., 2003). Specifically, both genetic and pharmacological manipulations of gap junctions have been shown to disrupt gamma rhythms. Genetic deletion of connexin-36 impairs the gamma oscillations associated with awake, active behavioral states (Buhl et al., 2003; Hormuzdi et al., 2001). High-frequency oscillations in the hippocampus have been shown to be sensitive to pharmacological agents like carbenoxolone and octanol that are known to inhibit gap junctions. Carbenoxolone has been known to reduce the transient gamma-frequency oscillations while octanol abolishes the persistent gamma rhythm (Draguhn et al., 1998; Hormuzdi et al., 2001; Posluszny, 2014; Traub et al., 2003). In the context of our results, where we demonstrate that the relative contributions of gap-junctional coupling to high-frequency extracellular potentials is low (Figs. 6–7), how do gap junctions contribute to enhanced extracellular gamma oscillations in these circuits?

      It should be noted that in hippocampal circuits, gamma oscillations emerge predominantly due to interactions between inhibitory interneurons through GABAA103046 receptors (Buzsaki & Wang, 2012; Colgin, 2016; Colgin & Moser, 2010; Wang, 2010; Wang & Buzsaki, 1996; Whittington et al., 1995). Thus, the presence of additional gap junctional coupling between these inhibitory neurons allows for tighter synchrony between these reciprocally inhibition-coupled neurons. In other words, the presence of gap junctions increases the probability of action potential generation in other neurons that are electrically coupled to them, together increasing the population of inhibitory neurons that elicit synchronous action potentials. When these synchronous action potentials act on the adjacent cells, both excitatory and inhibitory, the transmembrane GABAA receptor currents yield stronger gamma-frequency oscillations in the extracellular potentials (Draguhn et al., 1998; Hormuzdi et al., 2001; Posluszny, 2014; Traub et al., 2003). Thus, the stronger high-frequency oscillations observed in these scenarios is owing to the enhanced synchrony that is brought about the gap-junctional coupling, which translates to stronger transmembrane inhibitory receptor currents.

      These observations also strongly emphasize the utility of the computational approach we took in this study towards discerning the specific roles of gap junctions. Gap junctional coupling have strong physiological roles in terms of enhancing synchronous activity across the neurons that they couple and often express along with other receptors that connect the sets of neurons. Thus, the specific contributions of different neuronal components need to be studied with reference to how they contribute to physiological characteristics vs. their contributions to extracellular potentials. Thus, computational modeling offers an ideal route to understand the specific contributions of different neural-circuit components to extracellular field potentials and rhythms therein (Buzsaki et al., 2012; Einevoll et al., 2019; Einevoll et al., 2013; Sinha & Narayanan, 2022).”

      We thank you for highlighting this point as this allowed us to delineate the impact of gap junctions to regulating synchrony across connected neurons vs. modulating field potentials. Thank you.

      Reviewer #2 (Public review):

      This computational work examines whether the inputs that neurons receive through electrical synapses (gap junctions) have different signatures in the extracellular local field potential (LFP) compared to inputs via chemical synapses. The authors present the results of a series of model simulations where either electric or chemical synapses targeting a single hippocampal pyramidal neuron are activated in various spatio-temporal patterns, and the resulting LFP in the vicinity of the cell is calculated and analyzed. The authors find several notable qualitative differences between the LFP patterns evoked by gap junctions vs. chemical synapses. For some of these findings, the authors demonstrate convincingly that the observed differences are explained by the electric vs. chemical nature of the input, and these results likely generalize to other cell types. However, in other cases, it remains plausible (or even likely) that the differences are caused, at least partly, by other factors (such as different intracellular voltage responses due to, e.g., the unequal strengths of the inputs). Furthermore, it was not immediately clear to me how the results could be applied to analyze more realistic situations where neurons receive partially synchronized excitatory and inhibitory inputs via chemical and electric synapses.

      We gratefully thank you for your time and effort in rigorously assessing our manuscript, for the enthusiastic response, and the encouraging and thoughtful comments on our study. In what follows, we have provided point-by-point responses to the specific comments.

      Strengths

      The main strength of the paper is that it draws attention to the fact that inputs to a neuron via gap junctions are expected to give rise to a different extracellular electric field compared to inputs via chemical synapses, even if the intracellular effects of the two types of input are similar. This is because, unlike chemical synaptic inputs, inputs via gap junctions are not directly associated with transmembrane currents. This is a general result that holds independent of many details such as the cell types or neurotransmitters involved.

      We gratefully thank you for the positive comments and the encouraging words about the novel contributions of our study. We are particularly thankful to you for your comment on the generality of our conclusions that hold for different cell types and neurotransmitters involved.

      Another strength of the article is that the authors attempt to provide intuitive, non-technical explanations of most of their findings, which should make the paper readable also for non-expert audiences (including experimentalists).

      We sincerely thank you for the positive comments about the readability of the paper.

      Weaknesses

      The most problematic aspect of the paper relates to the methodology for comparing the effects of electric vs. chemical synaptic inputs on the LFP. The authors seem to suggest that the primary cause of all the differences seen in the various simulation experiments is the different nature of the input, and particularly the difference between the transmembrane current evoked by chemical synapses and the gap junctional current that does not involve the extracellular space. However, this is clearly an oversimplification: since no real attempt is made to quantitatively match the two conditions that are compared (e.g., regarding the strength and temporal profile of the inputs), the differences seen can be due to factors other than the electric vs. chemical nature of synapses. In fact, if inputs were identical in all parameters other than the transmembrane vs. directly injected nature of the current, the intracellular voltage responses and, consequently, the currents through voltage-gated and leak currents would also be the same, and the LFPs would differ exactly by the contribution of the transmembrane current evoked by the chemical synapse. This is evidently not the case for any of the simulated comparisons presented, and the differences in the membrane potential response are rather striking in several cases (e.g., in the case of random inputs, there is only one action potential with gap junctions, but multiple action potentials with chemical synapses). Consequently, it remains unclear which observed differences are fundamental in the sense that they are directly related to the electric vs. chemical nature of the input, and which differences can be attributed to other factors such as differences in the strength and pattern of the inputs (and the resulting difference in the neuronal electric response).

      We thank you for raising this important point. We would like to emphasize that our experimental design and analyses quantitatively account for the spatial distribution and temporal pattern of specific kinds of inputs that arrive through gap junctions and chemical synapses. We submit that our analyses quantitatively demonstrates that the fundamental difference between the gap junctional and chemical synaptic contributions to extracellular potentials is the absence of the direct transmembrane component from gap junctional inputs. We elucidate these points below:

      (1) Spatial distribution: The inputs were distributed randomly across the basal dendrites, irrespective of whether they were through gap junctions or chemical synapses. For both chemical synapses and gap junctions, the inputs were of the same nature: excitatory.

      (2) Different numbers of inputs: We have presented consistent results for both fewer and more gap junctions or chemical synapses in our analyses (see Figure 1 with 217 gap junctions or 245 chemical synapses and Supplementary Figure 2 with 99 gap junctions or 30 chemical synapses). Our fundamentally novel result that gap junctions onto active dendrites shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron.

      (3) Synchronous inputs (Figs. 1–3): For chemical synapses, the waveforms are in the shape of postsynaptic potentials. For gap junctional inputs, the waveforms are in the shape of postsynaptic potentials or dendritic spikes (to respect the active nature of inputs from the other cell). Here, the electrical response of the postsynaptic cell is identical irrespective of whether inputs arrive through gap junctions or chemical synapses: an action potential. We quantitatively matched the strengths such that the model generated a single action potential in response to synchronous inputs, irrespective of whether they arrived through chemical synaptic and gap junctional inputs. We mechanistically analyzed the contributions of different cellular components and show that the direct transmembrane current in chemical synapses is the distinguishing factor that determines the dichotomy between the contributions of gap junctions vs. chemical synapses to extracellular potentials (Figs. 2–3). In the revised manuscript, we have shown the intracellular responses to demonstrate that they are electrically matched (new Supplementary Figure 3).

      (4) Random inputs (Fig. 4): For random inputs, we did not account for the number of action potentials that arrived, as the only observation we made here was with reference to the biphasic nature of the extracellular potentials with gap junctional inputs in the “No Sodium” scenario. We note that in the “No Sodium” scenario, the time-domain amplitudes were comparable for the field potentials (Fig. 4B, Fig. 4D).

      (5) Rhythmic inputs (Fig. 5–8): For rhythmic inputs, please note that the intracellular and extracellular waveforms for every frequency are provided in supplementary figures S5– S11. It may be noted that the intracellular responses are comparable. In simulations for assessing spike-LFP comparison, we tuned the strengths to produce a single spike per cycle, ensuring fair comparison of LFPs with gap junctions vs. chemical synapses.

      Taken together, we demonstrate through explicit sets of simulations and analyses that the differences in LFPs were not driven by the strength or patterns of the inputs but rather by the differences in direct transmembrane currents, which are subsequently reflected in the LFPs. In the revised manuscript, we have emphasized these points in the Discussion section, apart from providing intracellular traces for cases where they were not provided before (new Supplementary Figure 3):

      Discussion subsection “Dominance of active dendritic currents with LFP associated with gap junctions”

      “Our analyses quantitatively demonstrates that the fundamental difference between the gap junctional and chemical synaptic contributions to extracellular potentials is the absence of the direct transmembrane component from gap junctional inputs. A multitude of factors suggests that the observed LFP differences result not from variations in input strength or patterns but rather from differences in direct transmembrane currents, which are subsequently reflected in the LFP signals.

      First, the inputs were distributed randomly across the basal dendrites, irrespective of whether they were through gap junctions or chemical synapses. For both chemical synapses and gap junctions, the inputs were exclusively excitatory in nature.

      Second, the results remained consistent regardless of the number of gap junctions or chemical synapses. (Fig. 1 with 217 gap junctions or 245 chemical synapses and Supplementary Fig. 2 with 99 gap junctions or 30 chemical synapses). Our fundamentally novel result that gap junctions onto active dendrites shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron.

      Third, for synchronous chemical synaptic inputs, the waveforms resembled typical postsynaptic potentials. Whereas, for gap junctional inputs, the waveforms showed characteristics of postsynaptic potentials or dendritic spikes (accounting the active nature of inputs from the potential presynaptic cells). Electrical response of postsynaptic cell remains identical, producing an action potential regardless of whether inputs arrive via gap junctions or chemical synapses. We quantitatively matched the strengths such that the model generated a single action potential in response to synchronous inputs, irrespective of whether they arrived through chemical synaptic or gap junctional inputs. We mechanistically analyzed the contributions of different cellular components and show that the direct transmembrane current in chemical synapses is the distinguishing factor that determines the dichotomy between the contributions of gap junctions vs. chemical synapses to extracellular potentials (Fig. 23).

      Fourth, for random inputs, the models were not specifically tuned to generate a single action potential. Here, the inputs served as a proxy for asynchronous inputs arriving from other subregions at random times.

      Finally, the intracellular responses were comparable for chemical synaptic and gap junctional rhythmic inputs (Supplementary Fig. S5-S11). Here, the model was tuned to elicit a single spike per cycle in simulations evaluating spike-LFP interactions, ensuring a fair comparison between LFPs from gap junctional and chemical synaptic inputs.”

      We have added a new Supplementary Figure 3 to the revised manuscript and have referred to this figure in the Results subsection. We thank you for raising these points as it allowed to elaborate on the several novelties and implications of our methodology and conclusions. Thank you.

      Some of the explanations offered for the effects of cellular manipulations on the LFP appear to be incomplete. More specifically, the authors observed that blocking leak channels significantly changed the shape of the LFP response to synchronous synaptic inputs - but only when electric inputs were used, and when sodium channels were intact. The authors seemed to attribute this phenomenon to a direct effect of leak currents on the extracellular potential - however, this appears unlikely both because it does not explain why blocking the leak conductance had no effect in the other cases, and because the leak current is several orders of magnitude smaller than the spike-generating currents that make the largest contributions to the LFP. An indirect effect mediated by interactions of the leak current with some voltage-gated currents appears to be the most likely explanation, but identifying the exact mechanism would require further simulation experiments and/or a detailed analysis of intracellular currents and the membrane potential in time and space.

      We thank you for raising this important question. Leak channels were among the several contributors to the positive deflection observed in LFPs associated with gap junctions. This effect was present not only in gap junctional models with intact sodium conductance but also in the no-sodium model, where the amplitude of the positive deflection was reduced across other models as well (Fig. 2F, I). Furthermore, even in the absence of leak conductance, a small positive deflection was still observed (Fig. 2F), leading us to further investigate other transmembrane currents over time and across spatial locations, from the proximal to the distal dendritic ends relative to the soma (Fig. 3D). We had observed that the dominant contributor in the case of chemical synapses was the inward synaptic current (Fig. 3A), whereas for gap junctions, the primary contributors were leak conductance along with other outward currents, such as potassium and HCN currents (Fig. 3D). Together, the direct transmembrane component of chemical synapses provides a dominant contribution to extracellular potentials. This dominance translates to differences in the relative contributions of indirect currents (including leak currents) to extracellular potentials associated chemical synaptic vs. gap junctional inputs. Our analyses of the exact ionic mechanisms (Fig. 3) demonstrates the involvement of several ion channels contributing to the indirect component in either scenario.

      In every simulation experiment in this study, inputs through electric synapses are modeled as intracellular current injections of pre-determined amplitude and time course based on the sampled dendritic voltage of potential synaptic partners. This is a major simplification that may have a significant impact on the results. First, the current through gap junctions depends on the voltage difference between the two connected cellular compartments and is thus sensitive to the membrane potential of the cell that is treated as the neuron "receiving" the input in this study (although, strictly speaking, there is no pre- or postsynaptic neuron in interactions mediated by gap junctions). This dependence on the membrane potential of the target neuron is completely missing here. A related second point is that gap junctions also change the apparent membrane resistance of the neurons they connect, effectively acting as additional shunting (or leak) conductance in the relevant compartments. This effect is completely missed by treating gap junctions as pure current sources.

      We thank you for raising this important point. We agree with the analyses presented by the reviewer on the importance of network simulations and bidirectional gap junctions that respect the voltages in both neurons. However, the complexities of LFP modeling precludes modeling of networks of morphologically realistic models with patterns of stimulations occurring across the dendritic tree. LFP modeling studies predominantly uses “post-synaptic” currents to analyze the impact of different patterns of inputs arriving on to a neuron, even when chemical synaptic inputs are considered. Explicitly, individual neurons are separately simulated with different patterns of synaptic inputs, the transmembrane current at different locations recorded, and the extracellular potential is then computed using line source approximation (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). Even in scenarios where a network is analyzed, a hybrid approach involving the outputs of a pointneuron-based network being coupled to an independent morphologically realistic neuronal model is employed (Hagen et al., 2016; Martinez-Canada et al., 2021; Mazzoni et al., 2015). Given the complexities associated with the computation of electrode potentials arising as a distance-weighted summation of several transmembrane currents, these simplifications becomes essential.

      Our approach models gap junctional currents in a similar way as the other model incorporate synaptic currents in LFP modeling (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). As gap junctions are typically implemented as resistors from the other neuronal compartment, we accounted for gap-junctional variability in our model by randomizing the scaling-factors and the exact waveforms that arrive through individual gap junctions at specific locations. Thus, the inputs were not pre-determined by “pre” neurons. Instead, the recorded voltages from potential synaptic partner neurons were randomized across locations and scaled using factors at the dendrites before being injected into the target neuron (Supplementary Fig. S1). While incorporating a network of interconnected neurons is indeed important, we utilized biophysical, morphologically realistic CA1 neuron model with different sets of input patterns to model LFPs, which were derived from the total transmembrane currents across all compartments of the multi-compartmental neuron model. Given the complexity of this approach, adding further network-level interactions or pre-post connections would have been computationally demanding.

      In the revised manuscript, we have elaborated on the general methodology used in LFP modeling studies to introduce synaptic currents. We have emphasized that our study extends this approach to modeling gap junctional inputs, while also highlighting randomization of locations and the scaling process in assigning gap junctional synaptic strengths. The following paragraphs were specifically added to the revised version of the manuscript:

      Methods subsection “Chemical synaptic and gap junctional inputs: Characteristics and temporal dynamics”:

      “The complexities of LFP modeling precludes modeling of networks of morphologically realistic models with patterns of stimulations occurring across the dendritic tree. LFP modeling studies predominantly uses post-synaptic currents to analyze the impact of different patterns of inputs arriving on to a neuron, even when chemical synaptic inputs are considered. Explicitly, individual neurons are separately simulated with different patterns of synaptic inputs, the transmembrane current at different locations recorded, and the extracellular potential is then computed using line source approximation (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). Even in scenarios where a network is analyzed, a hybrid approach involving the outputs of a point-neuron-based network being coupled to an independent morphologically realistic neuronal model is employed (Hagen et al., 2016; MartinezCanada et al., 2021; Mazzoni et al., 2015). Given the complexities associated with the computation of electrode potentials arising as a distance-weighted summation of several transmembrane currents, these simplifications become essential.”

      “Our approach models gap junctional currents in a similar way as the other model incorporate synaptic currents in LFP modeling (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). As gap junctions are typically implemented as resistors from the other neuronal compartment, we accounted for gap-junctional variability in our model by randomizing the scaling-factors and the exact waveforms that arrive through individual gap junctions at specific locations from potential presynaptic sources.”

      We thank for you highlighting these points as it allowed us to place our methodology in the specific context of the literature. Thank you.

      One prominent claim of the article that is emphasized even in the abstract is that HCN channels mediate an outward current in certain cases. Although this statement is technically correct, there are two reasons why I do not consider this a major finding of the paper. First, as the authors acknowledge, this is a trivial consequence of the relatively slow kinetics of HCN channels: when at least some of the channels are open, any input that is sufficiently fast and strong to take the membrane potential across the reversal potential of the channel will lead to the reversal of the polarity of the current. This effect is quite generic and well-known and is by no means specific to gap junctional inputs or even HCN channels. Second, and perhaps more importantly, the functional consequence of this reversed current through HCN channels is likely to be negligible. As clearly shown in Supplementary Figure S3, the HCN current becomes outward only for an extremely short time period during the action potential, which is also a period when several other currents are also active and likely dominant due to their much higher conductances. I also note that several of these relevant facts remain hidden in Figure 3, both because of its focus on peak values, and because of the radically different units on the vertical axes of the current plots.

      We thank you for raising this point and agree with you on every point. Please note that we do not assert that the outward HCN currents are exclusively associated with gap junctional inputs. Rather, our results show that synchronous inputs generate outward HCN currents in both chemical synapses (Fig. 3B; positive/outward HCN currents, except in the no sodium or leak model) and gap junctions (Fig. 3D; positive/outward HCN currents). We emphasized this in the case of gap junctions because, in the absence of inward synaptic currents, HCN (acting as outward currents with synchronous inputs) contributed to the positive deflection observed in the LFPs. While HCN would also contribute in the case of chemical synapses, its effect was negligible due to the presence of large inward synaptic currents. Since LFPs reflect the collective total transmembrane currents, the dominant contributors differ between these two scenarios, which we aimed to highlight. Since HCN exhibited outward currents in our synchronous input simulations, we have elaborated on this mechanism in the supplementary figure (Fig. S3). Our intention was not to emphasize this effect for only one synaptic mode but rather to highlight HCN's contribution to the positive deflection as one of the contributing factors.

      We agree that HCN currents are relatively small in magnitude; therefore, our conclusions were based on HCN being one of the several contributing factors. Leak conductance and other outward conductances, including HCN currents (Fig. 3D), collectively contribute to the positive deflections observed in the case of gap junctional synchronous inputs.

      In the revised manuscript, we have provided the following clarifications in the Results subsection on” Synchronous inputs: Outward transmembrane currents from active dendrites contribute to positive deflection in extracellular potentials associated with gap junctional inputs”:

      “It is important to note that despite their relatively small magnitude, the outward HCN currents (Fig. 3D) substantially contribute to positive extracellular potential deflections associated with gap junctional inputs (Fig. 2), together with leak and other outward conductances.”

      “While outward HCN currents (Fig. 3B) are also expected to influence LFPs under chemical synaptic input, their impact was minimal due to the predominance of large inward synaptic currents (Fig. 3A). As LFPs reflect the summation of all transmembrane currents, the dominant contributors vary across different modes of synaptic transmission.”

      We thank you for emphasizing this point. This allowed us to expand on the specific roles of HCN channels and potential contributions of the outward nature of the HCN current. We have also expanded the Discussion subsection on “Outward HCN currents regulate extracellular potentials” to elaborate on this aspect as well. Thank you.

      Finally, I missed an appropriate validation of the neuronal model used, and also the characterization of the effects of the in silico manipulations used on the basic behavior of the model. As far as I understand, the model in its current form has not been used in other studies. If this is the case, it would be important to demonstrate convincingly through (preferably quantitative) comparisons with experimental data using different protocols that the model captures the physiological behavior of at least the relevant compartments (in this case, the dendrites and the soma) of hippocampal pyramidal neurons sufficiently well that the results of the modeling study are relevant to the real biological system. In addition, the correct interpretation of various manipulations of the model would be strongly facilitated by investigating and discussing how the physiological properties of the model neuron are affected by these alterations.

      We thank you for raising this important point. The CA1 pyramidal neuronal model used in this study is built with ion-channel models derived from biophysical and electrophysiological recordings from these cells. As mentioned in the Methods section “Dynamics and distribution of active channels” and Supplementary Table S1, models for individual channels, their gating kinetics, and channel distributions across the somatodendritic arbor (wherever known) are all derived from their physiological equivalents. Importantly, these values were derived from previously validated models from the laboratory, which contain these very ion channel models and the exact same morphology (Roy & Narayanan, 2021). Please compare Supplementary Table S1 with Table 1 from (Roy & Narayanan, 2021). Please note that this model was validated against several physiological measurements along the somatodendritic axis (Fig. 1 of (Roy & Narayanan, 2021)).

      In the revised manuscript, we have explicitly mentioned this while also mentioning the different physiological properties that were used for the validation process from (Roy & Narayanan, 2021):

      Methods subsection “Pyramidal neuron model”

      “All parameters and their corresponding values for the neuronal model were derived from previously validated models (Roy & Narayanan, 2021). These CA1 models were validated against several physiological measurements along the somato dendritic axis (Roy & Narayanan, 2021).”

      “These channel distributions and the associated parametric values (Supplementary Table S1) were demonstrated to satisfy 22 different somato-dendritic measurements (Roy & Narayanan, 2021). Specifically, six physiological measurements input resistance, maximal impedance amplitude, resonance frequency, resonance strength, total inductive phase, and back-propagating action potential were validated with respective electrophysiological ranges at three somato-dendritic locations (Soma, ~150 µm dendrite, and ~300 µm dendrite) each (6×3=18 measurements). In addition, action potential firing frequency at each of 100 pA, 150 pA, 200 pA, and 250 pA (4 measurements) were also matched in the model to fall within the respective ranges of corresponding electrophysiological measurements. The electrophysiological ranges of intrinsic measurements were derived from respective somato-dendritic recordings (Malik et al., 2016; Narayanan et al., 2010; Narayanan & Johnston, 2007, 2008; Spruston et al., 1995). Together, the CA1 pyramidal model neuron used in this study was tuned to match several electrophysiological characteristics and ion-channel distributions (Roy & Narayanan, 2021).”

      We thank you for pointing us to this slip in elaborating on how the model was validated. We have now rectified this. Thank you.

      References

      Andersen, P., Morris, R., Amaral, D., Bliss, T., & O'Keefe, J. (2006). The hippocampus book. Oxford University Press.

      Basak, R., & Narayanan, R. (2018). Spatially dispersed synapses yield sharply-tuned place cell responses through dendritic spike initiation. Journal of Physiology, 596(17), 4173-4205. https://doi.org/10.1113/JP275310

      Bedner, P., Steinhauser, C., & Theis, M. (2012). Functional redundancy and compensation among members of gap junction protein families? Biochim Biophys Acta, 1818(8), 1971-1984. https://doi.org/10.1016/j.bbamem.2011.10.016

      Behrens, C. J., Ul Haq, R., Liotta, A., Anderson, M. L., & Heinemann, U. (2011). Nonspecific effects of the gap junction blocker mefloquine on fast hippocampal network oscillations in the adult rat in vitro. Neuroscience, 192, 11-19. https://doi.org/10.1016/j.neuroscience.2011.07.015

      Bocian, R., Posluszny, A., Kowalczyk, T., Golebiewski, H., & Konopacki, J. (2009). The effect of carbenoxolone on hippocampal formation theta rhythm in rats: in vitro and in vivo approaches. Brain Res Bull, 78(6), 290-298. https://doi.org/10.1016/j.brainresbull.2008.10.005

      Buhl, D. L., Harris, K. D., Hormuzdi, S. G., Monyer, H., & Buzsaki, G. (2003). Selective impairment of hippocampal gamma oscillations in connexin-36 knock-out mouse in vivo. J Neurosci, 23(3), 1013-1018. https://doi.org/10.1523/jneurosci.23-03-01013.2003

      Buzsaki, G., Anastassiou, C. A., & Koch, C. (2012). The origin of extracellular fields and currents--EEG, ECoG, LFP and spikes. Nat Rev Neurosci, 13(6), 407-420. https://doi.org/10.1038/nrn3241

      Buzsaki, G., & Wang, X. J. (2012). Mechanisms of gamma oscillations. Annual Review of Neuroscience, Vol 36, 35, 203-225. https://doi.org/10.1146/annurev-neuro-062111150444

      Colgin, L. L. (2016). Rhythms of the hippocampal network. Nat Rev Neurosci, 17(4), 239249. https://doi.org/10.1038/nrn.2016.21

      Colgin, L. L., & Moser, E. I. (2010). Gamma oscillations in the hippocampus. Physiology (Bethesda), 25(5), 319-329. https://doi.org/10.1152/physiol.00021.2010

      Coulon, P., & Landisman, C. E. (2017). The Potential Role of Gap Junctional Plasticity in the Regulation of State. Neuron, 93(6), 1275-1295. https://doi.org/10.1016/j.neuron.2017.02.041

      Das, A., Rathour, R. K., & Narayanan, R. (2017). Strings on a Violin: Location Dependence of Frequency Tuning in Active Dendrites. Front Cell Neurosci, 11, 72. https://doi.org/10.3389/fncel.2017.00072

      Draguhn, A., Traub, R. D., Schmitz, D., & Jefferys, J. G. (1998). Electrical coupling underlies high-frequency oscillations in the hippocampus in vitro. Nature, 394(6689), 189-192. https://doi.org/10.1038/28184

      Einevoll, G. T., Destexhe, A., Diesmann, M., Grun, S., Jirsa, V., de Kamps, M., Migliore, M., Ness, T. V., Plesser, H. E., & Schurmann, F. (2019). The Scientific Case for Brain Simulations. Neuron, 102(4), 735-744. https://doi.org/10.1016/j.neuron.2019.03.027

      Einevoll, G. T., Kayser, C., Logothetis, N. K., & Panzeri, S. (2013). Modelling and analysis of local field potentials for studying the function of cortical circuits. Nat Rev Neurosci, 14(11), 770-785. https://doi.org/10.1038/nrn3599

      Gold, C., Henze, D. A., Koch, C., & Buzsaki, G. (2006). On the origin of the extracellular action potential waveform: A modeling study. J Neurophysiol, 95(5), 3113-3128. https://doi.org/10.1152/jn.00979.2005

      Hagen, E., Dahmen, D., Stavrinou, M. L., Linden, H., Tetzlaff, T., van Albada, S. J., Grun, S., Diesmann, M., & Einevoll, G. T. (2016). Hybrid Scheme for Modeling Local Field Potentials from Point-Neuron Networks. Cereb Cortex, 26(12), 4461-4496. https://doi.org/10.1093/cercor/bhw237

      Halnes, G., Ness, T. V., Næss, S., Hagen, E., Pettersen, K. H., & Einevoll, G. T. (2024). Electric Brain Signals: Foundations and Applications of Biophysical Modeling. Cambridge University Press. https://doi.org/10.1017/9781009039826

      Hormuzdi, S. G., Pais, I., LeBeau, F. E., Towers, S. K., Rozov, A., Buhl, E. H., Whittington, M. A., & Monyer, H. (2001). Impaired electrical signaling disrupts gamma frequency oscillations in connexin 36-deficient mice. Neuron, 31(3), 487-495. https://doi.org/10.1016/s0896-6273(01)00387-7

      Hussaini, S. A., Kempadoo, K. A., Thuault, S. J., Siegelbaum, S. A., & Kandel, E. R. (2011). Increased size and stability of CA1 and CA3 place fields in HCN1 knockout mice. Neuron, 72(4), 643-653. https://doi.org/10.1016/j.neuron.2011.09.007

      Johnston, D., & Narayanan, R. (2008). Active dendrites: colorful wings of the mysterious butterflies. Trends Neurosci, 31(6), 309-316. https://doi.org/10.1016/j.tins.2008.03.004

      Kessi, M., Peng, J., Duan, H., He, H., Chen, B., Xiong, J., Wang, Y., Yang, L., Wang, G., Kiprotich, K., Bamgbade, O. A., He, F., & Yin, F. (2022). The Contribution of HCN Channelopathies in Different Epileptic Syndromes, Mechanisms, Modulators, and Potential Treatment Targets: A Systematic Review. Front Mol Neurosci, 15, 807202. https://doi.org/10.3389/fnmol.2022.807202

      Kole, M. H., Hallermann, S., & Stuart, G. J. (2006). Single Ih channels in pyramidal neuron dendrites: properties, distribution, and impact on action potential output [Research Support, Non-U.S. Gov't]. J Neurosci, 26(6), 1677-1687. https://doi.org/10.1523/JNEUROSCI.3664-05.2006

      Konopacki, J., Kowalczyk, T., & Golebiewski, H. (2004). Electrical coupling underlies theta oscillations recorded in hippocampal formation slices. Brain Res, 1019(1-2), 270-274. https://doi.org/10.1016/j.brainres.2004.05.083

      Larkum, M. E., Wu, J., Duverdin, S. A., & Gidon, A. (2022). The Guide to Dendritic Spikes of the Mammalian Cortex In Vitro and In Vivo. Neuroscience, 489, 15-33. https://doi.org/10.1016/j.neuroscience.2022.02.009

      LeBeau, F. E., Traub, R. D., Monyer, H., Whittington, M. A., & Buhl, E. H. (2003). The role of electrical signaling via gap junctions in the generation of fast network oscillations. Brain Res Bull, 62(1), 3-13. https://doi.org/10.1016/j.brainresbull.2003.07.004

      Lo, C. W. (1999). Genes, gene knockouts, and mutations in the analysis of gap junctions. Dev Genet, 24(1-2), 1-4. https://doi.org/10.1002/(SICI)1520-6408(1999)24:1/2%3C1::AID-DVG1%3E3.0.CO;2-U

      Lorincz, A., Notomi, T., Tamas, G., Shigemoto, R., & Nusser, Z. (2002). Polarized and compartment-dependent distribution of HCN1 in pyramidal cell dendrites. Nat Neurosci, 5(11), 1185-1193. https://doi.org/10.1038/nn962

      Magee, J. C. (1998). Dendritic hyperpolarization-activated currents modify the integrative properties of hippocampal CA1 pyramidal neurons. J Neurosci, 18(19), 7613-7624. https://doi.org/10.1523/jneurosci.18-19-07613.1998

      Magee, J. C., & Grienberger, C. (2020). Synaptic Plasticity Forms and Functions. Annual Review of Neuroscience, Vol 36, 43, 95-117. https://doi.org/10.1146/annurev-neuro090919-022842

      Major, G., Larkum, M. E., & Schiller, J. (2013). Active properties of neocortical pyramidal neuron dendrites [Review]. Annual Review of Neuroscience, Vol 36, 36, 1-24. https://doi.org/10.1146/annurev-neuro-062111-150343

      Malik, R., Dougherty, K. A., Parikh, K., Byrne, C., & Johnston, D. (2016). Mapping the electrophysiological and morphological properties of CA1 pyramidal neurons along the longitudinal hippocampal axis. Hippocampus, 26(3), 341-361. https://doi.org/10.1002/hipo.22526

      Martinez-Canada, P., Ness, T. V., Einevoll, G. T., Fellin, T., & Panzeri, S. (2021). Computation of the electroencephalogram (EEG) from network models of point neurons. PLoS Comput Biol, 17(4), e1008893. https://doi.org/10.1371/journal.pcbi.1008893

      Mazzoni, A., Linden, H., Cuntz, H., Lansner, A., Panzeri, S., & Einevoll, G. T. (2015). Computing the Local Field Potential (LFP) from Integrate-and-Fire Network Models. PLoS Comput Biol, 11(12), e1004584. https://doi.org/10.1371/journal.pcbi.1004584

      Mishra, P., & Narayanan, R. (2021). Stable continual learning through structured multiscale plasticity manifolds. Curr Opin Neurobiol, 70, 51-63. https://doi.org/10.1016/j.conb.2021.07.009

      Mishra, P., & Narayanan, R. (2025). The enigmatic HCN channels: A cellular neurophysiology perspective. Proteins, 93(1), 72-92. https://doi.org/10.1002/prot.26643

      Moore, J. J., Ravassard, P. M., Ho, D., Acharya, L., Kees, A. L., Vuong, C., & Mehta, M. R. (2017). Dynamics of cortical dendritic membrane potential and spikes in freely behaving rats. Science, 355(6331). https://doi.org/10.1126/science.aaj1497

      Narayanan, R., Dougherty, K. J., & Johnston, D. (2010). Calcium Store Depletion Induces Persistent Perisomatic Increases in the Functional Density of h Channels in Hippocampal Pyramidal Neurons. Neuron, 68(5), 921-935. https://doi.org/10.1016/j.neuron.2010.11.033

      Narayanan, R., & Johnston, D. (2007). Long-term potentiation in rat hippocampal neurons is accompanied by spatially widespread changes in intrinsic oscillatory dynamics and excitability. Neuron, 56(6), 1061-1075. https://doi.org/10.1016/j.neuron.2007.10.033

      Narayanan, R., & Johnston, D. (2008). The h channel mediates location dependence and plasticity of intrinsic phase response in rat hippocampal neurons. J Neurosci, 28(22), 5846-5860. https://doi.org/10.1523/JNEUROSCI.0835-08.2008

      Ness, T. V., Remme, M. W. H., & Einevoll, G. T. (2016). Active subthreshold dendritic conductances shape the local field potential. Journal of Physiology, 594(13), 38093825. https://doi.org/10.1113/JP272022

      Ness, T. V., Remme, M. W. H., & Einevoll, G. T. (2018). h-Type Membrane Current Shapes the Local Field Potential from Populations of Pyramidal Neurons. J Neurosci, 38(26), 6011-6024. https://doi.org/10.1523/jneurosci.3278-17.2018

      Neves, G., Cooke, S. F., & Bliss, T. V. (2008). Synaptic plasticity, memory and the hippocampus: a neural network approach to causality. Nat Rev Neurosci, 9(1), 65-75. https://doi.org/10.1038/nrn2303

      Nolan, M. F., Malleret, G., Dudman, J. T., Buhl, D. L., Santoro, B., Gibbs, E., Vronskaya, S., Buzsaki, G., Siegelbaum, S. A., Kandel, E. R., & Morozov, A. (2004). A behavioral role for dendritic integration: HCN1 channels constrain spatial memory and plasticity at inputs to distal dendrites of CA1 pyramidal neurons. Cell, 119(5), 719-732. https://doi.org/10.1016/j.cell.2004.11.020

      O'Brien, J. (2014). The ever-changing electrical synapse. Curr Opin Neurobiol, 29, 64-72. https://doi.org/10.1016/j.conb.2014.05.011

      O'Keefe, J., & Recce, M. L. (1993). Phase relationship between hippocampal place units and the EEG theta rhythm. Hippocampus, 3(3), 317-330. https://doi.org/10.1002/hipo.450030307

      Pereda, A. E. (2014). Electrical synapses and their functional interactions with chemical synapses. Nat Rev Neurosci, 15(4), 250-263. https://doi.org/10.1038/nrn3708

      Posluszny, A. (2014). The contribution of electrical synapses to field potential oscillations in the hippocampal formation. Front Neural Circuits, 8, 32. https://doi.org/10.3389/fncir.2014.00032

      Reimann, M. W., Anastassiou, C. A., Perin, R., Hill, S. L., Markram, H., & Koch, C. (2013). A biophysically detailed model of neocortical local field potentials predicts the critical role of active membrane currents. Neuron, 79(2), 375-390. https://doi.org/10.1016/j.neuron.2013.05.023

      Rouach, N., Segal, M., Koulakoff, A., Giaume, C., & Avignone, E. (2003). Carbenoxolone blockade of neuronal network activity in culture is not mediated by an action on gap junctions. Journal of Physiology, 553(Pt 3), 729-745. https://doi.org/10.1113/jphysiol.2003.053439

      Roy, A., & Narayanan, R. (2021). Spatial information transfer in hippocampal place cells depends on trial-to-trial variability, symmetry of place-field firing, and biophysical heterogeneities. Neural Netw, 142, 636-660. https://doi.org/10.1016/j.neunet.2021.07.026

      Schomburg, E. W., Anastassiou, C. A., Buzsaki, G., & Koch, C. (2012). The spiking component of oscillatory extracellular potentials in the rat hippocampus. J Neurosci, 32(34), 11798-11811. https://doi.org/10.1523/JNEUROSCI.0656-12.2012

      Seenivasan, P., & Narayanan, R. (2020). Efficient phase coding in hippocampal place cells. Physical Review Research, 2(3), 033393. https://doi.org/10.1103/PhysRevResearch.2.033393

      Sinha, M., & Narayanan, R. (2015). HCN channels enhance spike phase coherence and regulate the phase of spikes and LFPs in the theta-frequency range. Proc Natl Acad Sci U S A, 112(17), E2207-2216. https://doi.org/10.1073/pnas.1419017112

      Sinha, M., & Narayanan, R. (2022). Active Dendrites and Local Field Potentials: Biophysical Mechanisms and Computational Explorations. Neuroscience, 489, 111-142. https://doi.org/10.1016/j.neuroscience.2021.08.035

      Sirmaur, R., & Narayanan, R. (2024). Distinct extracellular signatures of chemical and electrical synapses impinging on active dendrites differentially contribute to ripplefrequency oscillations. Society for Neuroscience annual meeting, Chicago, USA.

      Spruston, N., Schiller, Y., Stuart, G., & Sakmann, B. (1995). Activity-dependent action potential invasion and calcium influx into hippocampal CA1 dendrites [Research Support, Non-U.S. Gov't]. Science, 268(5208), 297-300. https://doi.org/10.1126/science.7716524

      Stuart, G. J., & Spruston, N. (2015). Dendritic integration: 60 years of progress. Nat Neurosci, 18(12), 1713-1721. https://doi.org/10.1038/nn.4157

      Szarka, G., Balogh, M., Tengolics, A. J., Ganczer, A., Volgyi, B., & Kovacs-Oller, T. (2021). The role of gap junctions in cell death and neuromodulation in the retina. Neural Regen Res, 16(10), 1911-1920. https://doi.org/10.4103/1673-5374.308069

      Traub, R. D., Cunningham, M. O., Gloveli, T., LeBeau, F. E., Bibbig, A., Buhl, E. H., & Whittington, M. A. (2003). GABA-enhanced collective behavior in neuronal axons underlies persistent gamma-frequency oscillations. Proc Natl Acad Sci U S A, 100(19), 11047-11052. https://doi.org/10.1073/pnas.1934854100

      Vaughn, M. J., & Haas, J. S. (2022). On the Diverse Functions of Electrical Synapses. Front Cell Neurosci, 16, 910015. https://doi.org/10.3389/fncel.2022.910015

      Wang, X. J. (2010). Neurophysiological and computational principles of cortical rhythms in cognition. Physiol Rev, 90(3), 1195-1268. https://doi.org/10.1152/physrev.00035.2008

      Wang, X. J., & Buzsaki, G. (1996). Gamma oscillation by synaptic inhibition in a hippocampal interneuronal network model. J Neurosci, 16(20), 6402-6413. https://doi.org/10.1523/jneurosci.16-20-06402.1996

      Whittington, M. A., Traub, R. D., & Jefferys, J. G. (1995). Synchronized oscillations in interneuron networks driven by metabotropic glutamate receptor activation. Nature, 373(6515), 612-615. https://doi.org/10.1038/373612a0

      Williams, S. R., & Stuart, G. J. (2000). Site independence of EPSP time course is mediated by dendritic I(h) in neocortical pyramidal neurons [In Vitro]. J Neurophysiol, 83(5), 3177-3182. https://doi.org/10.1152/jn.2000.83.5.3177

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors Hall et al. establish a purification method for snake venom metalloproteinases (SVMPs). By generating a generic approach to purify this divergent class of recombinant proteins, they enhance the field's accessibility to larger quantities of SVMPs with confirmed activity and, for some, characterized kinetics. In some cases, the recombinant protein displayed comparable substrate specificity and substrate recognition compared to the native enzyme, providing convincing evidence of the authors' successful recombinant expression strategy. Beyond describing their route towards protein purification, they further provide evidence for self-activation upon Zn2+ incubation. They further provide insights on how to design high-throughput screening (HTS) methods for drug discovery and outline future perspectives for the in-depth characterization of these enzyme classes to enable the development of novel biomedical applications.

      Strengths:

      The study is well-presented and structured in a compelling way. The purification strategy results in highly pure protein products, well characterized by size exclusion chromatography, SDS page as well as confirmed by mass spectrometry analysis. Further, a significant portion of the manuscript focuses on enzyme activity, thereby validating function. Particularly convincing is the comparability between recombinant vs. native enzymes; this is successfully exemplified by insulin B digestion. By testing the fluorogenic substrate, the authors provide evidence that their production method of recombinant protein can open up possibilities in HTS. Since their purification method can be applied to three structurally variable SVMP classes, this demonstrates the robust nature of the approach.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      The universal applicability of the approach could be emphasized more clearly. The potential for this generic protocol for recombinant SVMP zymogen production to be adapted to other SVMPs is somewhat obscured by the detailed optimization steps. A general schematic overview would strengthen the manuscript, presented as a final model, to illustrate how this strategy can be extended to other targets with similar features. Such a schematic might, for example, outline the propeptide fusion design, including its tags, relevant optimizations during expression, lysis, purification (e.g., strategies for metal ion removal and maintenance of protease inactivity), as well as the controllable auto-activation.

      In the revised version of the manuscript, we moved the detailed description of the optimisation of SVMP expression, including mature SVMP expression, Marimastat addition, active site mutations and fusion of propeptides, into the supplement as supplementary text. We hope this improves the clarity and flow. As suggested, we now include a new figure outlining the SVMP production strategy and optimisation steps in the revised manuscript (new Figure S1).

      The product obtained from the purification protocol appears to be a heterogeneous mixture of selfactivated and intact protein species. The protocol would benefit from improved control over the selfactivation process. The Methods section does not indicate whether residual metal ions were attempted to be removed during the purification, which could influence premature activation.

      We agree that improved control of self-activation would be desirable. However, there is an issue: Previous studies reported that (1) SVMP zymogens are processed within secretory cells of the venom gland (Portes-Junior et al., 2014), and (2) mature SVMPs accumulate in secretory vesicles during venom production (Carneiro et al., 2002). Accordingly, preventing the auto-processing of SVMP zymogens is difficult to achieve because this would require Zn<sup>2+</sup> depletion within the insect cells during production which would result in cytotoxicity. We have included this information in the updated Discussion section of the revised manuscript.

      Additionally, it has not been discussed whether the shift to pH 8 in the purification process is necessary from the initial steps onwards, given that a lower pH would be expected to maintain enzyme latency.

      The shift to pH 8 is required for the affinity purification of the SVMP zymogens from the medium, involving the poly-histidine-tag and immobilized metal affinity chromatography (IMAC). At lower pH, the histidines would become protonated, preventing binding of the His-tag to the column. Thus, with the His-tag the shift to pH 7.5 or pH 8 is necessary.

      The characterization of PIII activity using the fluorogenic peptide effectively links the project to its broader implications for drug design. However, the absence of comparable solutions for PI and PII classes limits the overall scope and impact of the finding.

      We agree that such assays would be extremely useful. However, the development of fluorescence based high-throughput assays to test for PI and PII SVMP activity is beyond the scope of this study. Here, our overarching objective is to report a broadly applicable production method for PI, PII and PIII SVMPs.

      Overall, the authors successfully purified active SVMP proteins of all three structurally diverse classes in high quality and provided convincing evidence throughout the manuscript to support their claims. The described method will be of use for a broader community working with self-activating and cytotoxic proteases.

      Thank you.

      Reviewer #2 (Public review):

      Summary:

      The aim of the study by Hall et al. was to establish a generic method for the production of Snake Venom Metalloproteases (SVMPs). These have been difficult to purify in the mg quantities required for mechanistic, biochemical, and structural studies.

      Strengths:

      The authors have successfully applied the MultiBac system and describe with a high level of detail the downstream purification methods applied to purify the SVMP PI, PII, and PIII. The paper carefully presents the non-successful approaches taken (such as expression of mature proteins, the use of protease inhibitors, prodomain segments, and co-expression of disulfide-isomerases) before establishing the construct and expression conditions required. The authors finally convincingly describe various activity assays to demonstrate the activity of the purified enzymes in a variety of established SVMP assays.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      The manuscript suffers from a lack of bottoming out and stringent scientific procedures in the methodology and the characterization of the generated enzymes.

      As an example, a further characterization of the generated protein fragments in Figure 3 by intact mass spectroscopy would have aided in accurate mass determination rather than relying on SEC elution volumes against a standard. Protein shape and charge can affect migration in SEC.

      We agree that intact MS would be useful to determine the mass of the produced SVMPs. In this manuscript, we performed SEC as a purification step, removing aggregates. Furthermore, SEC allowed determining if the SVMPs form monomers or dimers. MS characterisation of intact SVMPs (and their PTMs) is not trivial and beyond the scope of this manuscript (see below).

      Also, the analysis of N-linked glycosylation demonstrates some reactivity of PIII to PNGase F, but fails to conclude whether one or more sites are occupied, or whether other types of glycosylation is present. Again, intact mass experiments would have resolved such issues.

      We concur that glycosylation of SVMPs is an important question. However, analysing the glycosylation of the SVMPs is beyond the scope of this manuscript; it is actually a project on its own: Intact MS can indeed provide information on glycosylation but is not very precise. Unambiguous assignment of the number and occupancy of glycosylation sites is more challenging, especially for large, glycosylated proteins such as our PIII SVMP zymogen. In practice, confident mapping of glycosylation sites would require peptide-level mass spectrometry following enzymatic digestion (Trypsin and Multi-Enzymatic Limited Digestion, ideally). Sample preparation, method optimization, MS acquisition, and data analysis together would require a significant investment. Moreover, we do not have access to the native PIII SVMP from Echis carinatus sochureki venom - this is the main point of our manuscript: we describe a protocol to produce SVMPs which could not be purified from venom. Therefore, a comparison of the glycosylation of the recombinant SVMP and the native SVMP cannot be performed unfortunately (see below).

      The activity assays in Figure 4 are not performed consistently with kinetic assays and degradation assays performed for some, but not all, enzymes, and there is no Echis ocellatus comparison in Figure 4h.

      This is correct. The suggested control experiment is not possible for the PII SVMP and PIII SVMP because we cannot purify the native PII and PIII SVMPs from Echis venom. We have highlighted this information in the revised manuscript in the insulin B degradation section.

      Overall, whilst not affecting the main conclusion, this leaves the reader with an impression of preliminary data being presented. For consistency, application of the same assays to all enzymes (high-grade purified) would have provided the reader with a fuller picture.

      In the revised manuscript, we included new data showing the requested characterisations of all three SVMPs.

      We have included the respective assays in Figure 5 and Supplementary Figure S11. In the original manuscript, we had omitted these assays as the data show no enzymatic activity in the respective assays. Specifically, we show that (1) PII does not cause insulin B degradation (Fig. S11b), (2) that the PI and PII SVMPs do not degrade the fluorogenic peptide which is prototypic for PIII SVMPs and MMPs (Fig. S11a), (3) PI and PIII do not cause platelet aggregation because they lack the entire disintegrin domain (PI) or the RGD motif (PIII) (Fig. 5a), and (4) that the PI and PII SVMPs, like the PIII SVMP, are not pro-coagulant and do not cause blood clotting (Fig. 5d,5e and Fig. S11c). We also included this new information in the main text of our revised manuscript.

      Overall, the data presented demonstrates a very credible path for the production of active SVMP for further downstream characterization. The generality of the approach to all SVMP from different snakes remains to be demonstrated by the community, but if generally applicable, the method will enable numerous studies with the aim of either utilizing SVMPS as therapeutic agents or to enable the generation of specific anti-venom reagents, such as antibodies or small molecule inhibitors.

      Thank you.

      Reviewer #3 (Public review):

      Summary:

      The presented study describes the long journey towards the expression of members' SVMP toxins from snake venom, which are toxins of major importance in a snakebite scenario. As in the past, their functional analysis relied on challenging isolation; the toxins' heterologous expression offers a potential solution to some major obstacles hindering a better understanding of toxin pathophysiology. Through a series of laborious and elegantly crafted experiments, including the reporting of various failed attempts, the authors establish the expression of all three SVMP subtypes and prove their activity in bioassays. The expression is carried out as naturally occurring zymogens that autocleave upon exposure to zinc, which is a novel modus operandi for yielding fusion proteins and sheds also some new light on the potential mechanism that snakes use to activate enzymatic toxins from zymogenic preforms.

      Strengths:

      The manuscript draws from an extensive portfolio of well-reasoned and hypothesis-driven experiments that lead to a stepwise solution. The wetlands data generated is outstanding, although not all experiments along this rocky road to victory were successful. A major strength of the paper is that, translationally speaking, it opens up novel routes for biodiscovery since a first reliable platform for expression of an understudied, yet potent toxin class is established. The discovered strategy to pursue expression as zymogens could see broad application in venom biotechnology, where several toxin types are pending successful expression. The work further provides better insights into how snake toxins are processed.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      The manuscript contains several chapters reporting failed experiments, which makes it difficult to follow in places.

      Based on a similar comment of Reviewer 1, we now moved the ‘failed’ experiments reporting on SVMP expression optimisation to the supplement as new supplementary text. We hope that the revisions have improved the clarity and overall readability of our manuscript.

      The reporting of experimental details, especially sample sizes and replicates, could be optimised.

      The number of replicates has now been added to the figure legends in the revised manuscript. Detailed experimental information is found in the revised Methods part.

      At the time of writing, it remains unclear whether the glycosilations detected at a pIII SVMP could have an impact on the bioactivities measured, which is a major aspect, and future follow-ups should clarify this.

      A detailed analysis of glycosylation of the PIII SVMP is beyond the scope of our manuscript (see above, response to Reviewer 2). Our manuscript describes a generic protocol to produce active SVMPs. Importantly, we cannot purify the native PIII SVMP from Echis carinatus sochureki venom. Therefore, it is not possible to compare our PIII SVMP with the native PIII SVMP.

      We agree that this is an important question, and we will aim in the future to perform such a comparison of a different insect cell-produced PIII with a native PIII SVMP that can be readily purified from venom.

      Finally, the work, albeit of critical importance, would benefit from a more down-to-earth evaluation of its findings, as still various persistent obstacles that need to be overcome.

      We consider cytotoxicity to be the principal bottleneck in SVMP production. In this study, we present a strategy to overcome this bottleneck.

      Major comments to the manuscript:

      (1) Lines 148-149: "indicating that expressing inactivated SVMPs could be a viable, although inefficient, approach". I think this text serves a good purpose to express some thoughts on the nature of how the current draft is set up. It is quite established that various proteases cause extreme viability losses to their expression host (whether due to toxicity, but surely also because of metabolic burden), which is why their expression as inactive fusion proteins is the default strategy in all cases I have thus far seen. I believe that, especially in venom studies, this is of importance given the increased toxicity often targeting cellular integrity, and especially here, because Echis are known to feed on arthropods at younger life history stages, making it very likely that some venom components are especially active against insects and other invertebrates. With that in mind, I would argue that exploring their production in inactive form is the obvious strategy one would come up with and not really the conclusion of a series of (well-conducted and scientifically sound!) experiments. For me, the insight of inactive expression is largely confirmatory of what is established, unless I miss something in the authors' rationale. If yes, it would be important to clarify that in the online version.

      We agree that producing zymogens represents a straightforward strategy and now, in hindsight, would have wished we had tested this first thing, it would have saved us and apparently many others significant effort. However, realising this, and implementing this approach took us considerable time and insight as we described in this manuscript. The alternative strategies we describe in the manuscript, in particular the use of inhibitors and active-site mutation, have been successfully applied for recombinant production of diverse enzymes before, including enzymes that are toxic to host cells.

      We have revised the manuscript as requested and moved the optimisation of SVMP expression to the Supplement. We hope this improved the clarity, overall readability of the text and thus addressed the reviewer’s comment.

      (2) Line 173: Here, Alphafold 3 was used, whereas in previous sections (e.g., line 153, line 210), it was Alphafold 2. I suggest using one release across the manuscript.

      Thank you for bringing this to our attention. In the revised version of the manuscript, we clarified that all models were generated using AlphaFold 3.

      (3) Line 252-254: I fully agree, the PIII SVMP is glycosylated. Glycosylation is an important mediator of snake venom activity, and several works have described their importance in the field. This raises the question, which glycosylations have been introduced here in the SVMP, and to verify that these are glycosylations that belong to those found in snakes. This is important as insects facilitate thousands of N- and O- O-glycosylations to modulate the activity of their proteome, of which many are specific to insects. If some of these were integrated into the SVMP, this could have an impact on downstream produced bioassays and also antigenicity (the surface would be somewhat different from natural toxins, causing different selection).

      We agree that glycosylation is important and warrants a follow-up in the future.

      However, most publications we found reported that de-glycosylation has a negative effect on stability and solubility of SVMPs, which is expected to have a knock-on effect on toxin activity (e.g. AndradeSilva et al., 2025; DOI: 10.1021/acs.jproteome.5c00249). It will be difficult to separate the two effects from each other. We found only a few examples where SVMP glycosylation (sialylation and Nglycosylation) modulated proteolytic and haemorrhagic functions, including interaction with substrates such as e.g. fibrinogen (Schluga et al., 2024; https://doi.org/10.3390/toxins16110486; Chen et al., 2008; 10.1111/j.1742-4658.2008.06540.x; Nikai et al., 2000; DOI: 10.1006/abbi.2000.1795. PMID: 10871038). In our manuscript, we show that our PIII SVMP is very cytotoxic and highly active in casein, fibrinogen and ESO10 degradation assays, with a K<sub>M</sub> and k<sub>cat</sub>/K<sub>M</sub> comparing favourably with other SVMPs and MMPs. We are not aware of a specific substrate for this particular PIII SVMP that depends on a distinct glycosylation pattern. Recombinant production of such SVMPs with specific glycosylation pattern requirement would be a challenge in all commonly used expression systems (yeast, plant, insect cells and mammalian cells). In fact, insect cell expression systems could be advantageous in this respect because the Sf21 and High Five (Hi5) lepidopteran cell lines we utilised are well-characterized for their ability to perform posttranslational modifications on complex secreted proteins:

      (1) N-Glycan conservation: Both Sf21 and Hi5 cells typically produce N-glycans that are trimmed to a core 'paucimannose' structure (Man3GlcNAc2), often with an alpha1,6-fucosylation. While snakes can produce more complex, sialylated N-glycans, glycomic studies of native venoms (e.g., Bothrops venom) have demonstrated that high-mannose and paucimannose structures are also prevalent in native SVMPs. Therefore, the recombinant glycoforms produced in our system are not 'unnatural' in the snake venom context but rather represent a subset of the native glycan microheterogeneity.

      (2) Occupancy vs structure: The critical function of glycosylation in PIII SVMPs is thought to be often structural, facilitating correct folding and protecting the large metalloprotease and disintegrin-like domains from proteolytic degradation. Because Sf21 and Hi5 cells recognize the same Nglycosylation sequon (Asn-X-Ser/Thr) as reptilian cells, the site-occupancy remains consistent with the native protein, preserving the overall topography of the toxin.

      (3) Activity and authentic self-processing: We acknowledge that insect-specific alpha1,3-fucosylation can occur in Hi5 cells and is potentially antigenic. As the recombinant SVMPs will be used for binder selections and for testing in silico designed binders, useful binders will be selected based on neutralising activity against venom toxins. Here, our assays focused on auto-activation and proteolytic activity, which is primarily driven by the catalytic Zn<sup>2+</sup>-site and the protein backbone.

      As stated above, analysis of glycosylation pattern of the PIII SVMP is a project on its own and beyond the scope of this manuscript.

      We have incorporated some of the above information into the discussion section of the revised manuscript to clarify that insect cell glycosylation does not recapitulate the full diversity of SVMP glycosylation observed in native venoms.

      (4) General comment for the bioassays: It would be good to specify the replicates again and report the data, including standard deviations.

      We included this information in the figure legends.

      Discussion:

      I think the data generated in the study is very valuable and will be instrumental for pushing the frontiers in SVMP research, but still I would like to see a bit of modesty in their discussion. As I have pointed out above, it is unclear which effect the glycosilations may have (i.e., are the glycosilations found reminiscent of natural ones?), despite their being functionally important. Also, yes, isolation of SVMPs is challenging, but the reality is that their expression is equally challenging, as evidenced by the heaps of presented negative data (with which I have no problems, I think reporting such is actually important). So far, the "generic" protocol has been used to express one member per structural class of Echis SVMP, but no evidence is provided that it would work equally well on other members from taxonomically more distant snakes (e.g., the pIII known from Naja oxiana). It is very likely, but at the time of writing, purely speculative.

      We have expressed additional PIII SVMPs from Echis and Daboia species and will report their production and characterisation in due course.

      Lastly, the reality is also that the expression in insect cells can only be carried out by highly specialized labs (even in the expression world, as most laboratories work with bacterial or fungal hosts), whereas the isolation can be attempted in most venom labs. That said, production in insect cells also has economic repercussions as it will be very challenging to generate yields that are economically viable versus other systems, which is pivotal because the authors talk about bioprospecting and the toxins used in snakebite agent research.

      We thank the reviewer for this perspective on the practicalities of protein expression. However, we respectfully disagree with the characterization of insect cell expression as an inaccessible or economically non-viable platform for toxin research. We offer the following points:

      (1) Prevalence and accessibility: Contrary to the suggestion that insect cell expression is restricted to highly specialized labs, the Baculovirus Expression Vector System (BEVS) has become a cornerstone of modern biologics production, structural biology and biochemistry. For instance, our MultiBac system (which is but one of several systems currently widely in use) is utilised by over 1,000 laboratories and institutions, academic and pharma/biotech, worldwide. The maturation of commercially available kits, automated platforms, and standardized protocols has moved this technology into the mainstream, making it a standard tool for any lab requiring high-quality eukaryotic proteins.

      (2) Biological necessity: Bacterial (E. coli) and fungal (P. pastoris) systems are widely accessible, however, they appear to be fundamentally incapable of producing functional SVMPs. SVMPs require complex disulfide-bond formation, intricate folding, and N-glycosylation for stability and solubility. Bacterial systems have been widely tried by us and others but typically result in very low expression or misfolded inclusion bodies. Of note, originally, we had invested significant effort to adapt P. pastoris to the production of eukaryotic proteins we are interested in, without success, before moving on to the MultiBac system. The SVMPs that we analysed here are highly cytotoxic, rendering the baculovirus/insect cell system in a way a logical choice given that the cells are no longer 'living' after infection with the baculovirus (but more akin membrane-enveloped bioreactors). Thus, one can make the argument that insect cells represent the most accessible middle ground that provides folding apparatus and necessary post-translational modifications (PTMs) required for biological relevance, and it is possible to produce mg amounts of SVMP proteins per litre cell culture as reported here in our manuscript.

      (3) Economic viability and bioprospecting: Regarding the economic argument, we contend that viability in bioprospecting is defined by functional yield rather than simple volume. Producing large quantities of non-functional or misfolded protein in a cheaper system is economically inefficient. Furthermore, for snakebite research, the ability to produce specific, pure isoforms recombinantly without the contamination of other toxic venom components found in native isolations is essential for high-throughput screening and drug design.

      (4) Scalability: Historically, insect cell production was seen as expensive, but current bioreactor technology and reduction in consumables and media costs allow for significant scaling. Many therapeutic reagents (vaccines, viral vectors, protein biologics) are produced routinely in baculovirus/insect cells. For the purposes of bioprospecting and lead identification, the yields provided by our Hi5/Sf21 system are sufficient for rigorous downstream bioassays and structural characterization.

      Again, I believe the paper is highly important and excellently crafted, but I think especially the discussion should see some refinement to address the drawbacks and to evaluate the paper's findings with more modesty.

      Thank you. We included the discussion about glycosylation patterns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It is not entirely clear to me if the final constructs are indeed "fusion-proteins" (line 172, 974), in the sense of chimeric proteins. From the current description, it appears that the prodomain is encoded in the same gene rather than fused as a separate domain. Thus, referring to these constructs as fusion proteins may overstate the degree of protein engineering involved in the study.

      This is correct. In the revised manuscript, ‘fusion protein’ is only used in the context of the propeptide SVMP fusion construct to avoid confusion.

      (2) Figure 2J: It is difficult to assess how much protein is secreted relative to the intracellular amounts. The blot is surely misleading, as the effective protein dilution differs substantially between intracellularly vs. extracellularly. Providing an estimate of the relative dilution of extracellular protein would help clarify the extent of secretion.

      We estimate that the SNP and SN fractions are at least 10-times more concentrated than the media fraction. The blot is analytical and not quantitative.

      (3) The manuscript appears to use both alphafold 2 and alphafold 3 for structural predictions. Clarification on the choice of the version and its impact on results would improve consistency.

      In the revised version of the manuscript, we clarify that all structural models were generated using AlphaFold 3.

      (4) Figure S3b and others: a clear description of the antibodies used in the Western blots would be appreciated (including in the methods).

      We included this information in the figure legends and a paragraph in the methods section for Western blots in the revised manuscript.

      (5) MTT cytotoxicity testing would be more convincing if done in a concentration-dependent manner.

      We repeated this assay using different concentrations of SVMPs and show the results as a new Figure 5f in the revised manuscript.

      (6) Figure S3c: It could be interesting to show the sequence coverage to get an impression of what part of the protein is there.

      We have included this information as Supplementary Figure S4d in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Overall, the study is presented in a step-by-step manner, and its conclusions are valid.

      (1) As suggested in the public review, further characterization of the purified material would be good, for example, by intact mass-spectroscopy to characterize the enzymes in further detail.

      Preliminary MALDI-MS analysis (performed in Loic Quinton’s laboratory) of our PIII SVMP revealed a broad and heterogeneous mass distribution, consistent with heterogeneity caused by the presence of multiple glycoforms (which is not unlike the microheterogeneity in native snake venom). However, owing to the inherent limitations of MALDI-MS for the analysis of glycoproteins, our data do not allow determination of the number of occupied N-glycosylation sites or the identification of additional types of glycosylation.

      Moreover, the relatively large molecular mass of these proteins (zymogen 70.2 kDa protein only, mature PIII 50.6 kDa protein only) makes analysis by electrospray ionisation mass spectrometry technically challenging.

      An MS-based deep analysis of the glycosylation patterns would therefore be a project on its own, and beyond the scope of the present manuscript.

      (2) The studies involving PII appear challenging due to low yields and stability of the enzyme and the mentioned self-degradation. Some studies, such as the casein-degradation, would benefit from working with a well-characterized batch of enzymes to ensure, it is not auto-degrading during the experiment.

      We believe that the finding that the PII SVMP degrades itself after incubation with Zn<sup>2+</sup> is an important observation. It is novel to the best of our knowledge. Moreover, the key message of our manuscript is that we can produce and characterise novel SVMPs that cannot be readily purified from venom (and thus are not well characterised).

      Besides, there are very few intact PII SVMPs in venom (e.g. Suntravat et al. BMC Molecular Biol 2016); the vast majority cleaves itself into a PI and a disintegrin.

      (3) Figure 4h. Degradation of insulin is only shown for recombinant PIII, not the native enzyme, and therefore doesn't convey any information with respect to how well they compare.

      We do not have available any native PII and PIII SVMPs for a comparison with the recombinant SVMPs (in our manuscript we show expression of new, uncharacterised SVMPs). We have included the PIII SVMP in the original manuscript to show that the enzyme is active and has a different specificity compared to PI SVMP. In the revised manuscript, we also included the PII SVMP insulin B degradation assay in Supplementary Figure S11b.

      (4) Figure 5a. Inconsistent use of enzymes - data for PII is presented (both as mature protein and Zymogen) and compared to PIII, but not PI, as both zymogen and mature protein. The current data presentation is confusing and gives the idea of the manuscript assembled with figures produced during the exploratory phase of the study, and not from subsequent experiments systematically conducted for the purposes of clarity and completeness.

      In the revised manuscript, we included the missing enzymatic characterisations in Figure 5 (panel a and e) and Supplementary Figure S11a-c. These data were initially not included because the respective enzymes are inactive in these assays.

      (5) The manuscript would benefit from editing to make it more concise. For an early-career reader, it is of interest and utility to follow the thought and experimental processes that led to the successful solution, but there is a risk of losing the reader's interest along the way by going through expression experiments that did not "work" in the typical sense of the word. To this reviewer, there is no added value in a full paragraph around co-expression with disulfide isomerase, as it did not improve the protein yield. A single sentence, "co-expression with PDI did not improve yields," with a reference to a supplemental figure would convey that message.

      We have moved the optimisation of SVMP expression to the Supplementary Information, which we hope has improved the clarity and flow of the main text.

      We note that the hypothesis that co-expression of protein disulfide isomerases (PDIs) enhances yields of functional SVMPs, given the high expression of PDIs in snake venom gland cells, is well established in the field. While we consider PDIs (and other chaperones) likely to play an important role in SVMP expression, we were unable to demonstrate this effect using the baculovirus-insect cell expression system and hypothesize that efficient insect and/or baculoviral PDIs are already present.

      (6) Similarly with N-linked glycosylation, the section needs a headline (line 241) and firming up of a sentence like "and possibly not all of the glycosylation..." which is vague and appears to state that it was not really of interest to pursue this further. My view is that either an experiment is done properly with a stated aim and purpose, interpreted, and then, based on whether the results are of interest to the main story or not, they are included. If N-linked glycosylation is to be included in the manuscript, it should be with a purpose (e.g., N-linked glycosylation affects enzyme activity). As it stands, the message is "there is some N-linked glycosylation" without further explanation, and this generates information without justifying the inclusion hereof.

      Please see our reply above regarding an in-depth characterisation of insect cell glycosylation of the recombinant PIII SVMP without access to the native enzyme for comparison. In our revised manuscript, we confirm that the PIII SVMP is glycosylated and that this at least partly accounts for the apparent discrepancy in molecular weight observed in SEC and SDS PAGE. We have modified the text to clarify the purpose of the PNGase deglycosylation experiment.

      (7) The manuscript, in its current form, appears to have been copied from a Thesis with very detailed step-by-step logic and description. While this is useful in a scholarly context, a scientific manuscript should be presented more compactly, assuming the readers know basic biochemistry.

      We trust that this Reviewer finds the revised version of our manuscript more compact and concise. 

      Reviewer #3 (Recommendations for the authors):

      (1) Material and Methods plus Figures:

      Please report the number of replicates per experiment and how data is presented (means/ medians/ standard deviation/ others), and add error bars to the plots where needed.

      In the revised manuscript we have included the number of repeats in the figure legends.

      (2) Abstract

      Line 4: I would not say that SVMPs are the most potent viper toxins. This place is probably taken by some of the highly neurotoxic PLA2, such as Crotoxin. Nevertheless, SVMPs are surely some of the most important toxins responsible for pathophysiological effects stemming from viper envenoming, but I would suggest rephrasing for accuracy.

      In the revised manuscript, we have modified this sentence.

      (3) Introduction

      Lines 27-31: I would like to see a reference supporting the existence of all SVMP types across vipers.

      We have included references supporting the existence of PI, PII and PIII SVMPs in viper venom. We also rewrote the sentence to state that “representatives of all three sub-classes are present in different viper venoms.” This clarifies that we do not say that all classes are present in all venoms.

      Lines 59-60: I am not sure if this should be considered such an important impediment. Essentially, many vipers yield double- to triple-digit mg amounts of crude venom per specimen from only a single milking.

      We have rewritten this text in the revised manuscript.

      Currently, it is not possible to purify any given SVMP of interest from venom; in particular for E. ocellatus SVMP isoform mixtures are typically purified rather than individual enzymes (see also introduction section of our manuscript line 57ff). Also, many SVMPs are not present in sufficient amounts in the venom. Here, we provide an approach to recombinantly produce any SVMP of interest, independent of its abundance in the venom.

      (4) Results

      Line 102: The army-fallworms name is Spodoptera, not Spotoptera. Please correct the typo.

      Done. Apologies for our oversight.

      Line 311: Please provide the data at least as a supplement.

      In the revised manuscript, we have included this experiment in Supplementary Figure S6c.

      Line 432- 433: It would be useful to clarify whether the protein should have a pro-coagulant activity (or not).

      We have changed this sentence as follows in the revised manuscript: This shows that our recombinantly produced SVMPs have no pro-coagulant activity, which was unknown before.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The pathogenic mechanism of the E182STOP variant is unclear. The mutant protein does not appear to affect WT protein localization, arguing against a dominant-negative effect. Yet, overexpression of HSD17B7-E182* alone causes toxicity in zebrafish and mislocalizes cholesterol in HEI-OC1 cells, suggesting a gain-of-function or toxic effect. In addition, the variant mRNA is expressed at a low level, consistent with nonsense-mediated decay. This apparent complexity and inconsistency need clearer explanation.

      We appreciate the reviewer’s careful evaluation of this mechanistic complexity. Based on our combined molecular, cellular, and in vivo data, we propose that the pathogenic effect of the HSD17B7-E182* variant reflects a composite mechanism, rather than a classical dominant-negative effect.

      At the transcript level, the E182* variant introduces a premature termination codon and shows markedly reduced mRNA abundance, consistent with partial degradation by nonsense-mediated mRNA decay. This reduction is expected to decrease overall HSD17B7 dosage, contributing a loss-of-function component. Unlike HSD17B7, the truncated HSD17B7<sup>E182*</sup> mislocalizes cholesterol in HEI-OC1 cells, and overexpression alone reduces hair cell MET function and startle response in zebrafish embryos. We therefore propose that the truncated protein disturbing local cholesterol homeostasis, thereby exerts a toxic or ectopic gain-of-function.

      We have revised the manuscript to clarify the dual-mechanism model.

      (2) The link to human deafness is based on a single heterozygous patient with no syndromic features. Given that nearly all known cholesterol metabolism disorders are syndromic, this raises concerns about causality or specificity. The term "novel deafness gene" is premature without additional cases or segregation data.

      We thank the reviewer for this important point. We fully agree that, based on a single heterozygous case without segregation data, it is premature to designate HSD17B7 as a novel deafness gene. Therefore, we have revised the manuscript to use the description of "candidate deafness genes".

      (3) The localization of HSD17B7 should be clarified better: In HEI-OC1 cells, HSD17B7 localizes to the ER, as expected. In mouse hair cells, the staining pattern is cytosolic and almost perfectly overlaps with the hair cell marker used, Myo7a. This needs to be discussed. Without KO tissue, HSD17B7 antibody specificity remains uncertain.

      We thank the reviewer for the constructive comments regarding HSD17B7 localization and antibody specificity.

      Regarding subcellular localization, the original Figure 1K was intended to demonstrate the expression of HSD17B7 in mouse hair cells. To address this concern, we performed additional immunostaining on dissected organ of Corti sections at P1, P4, and P7 using higher magnification. Using parvalbumin as a hair cell marker, HSD17B7 displayed a partially punctate intracellular pattern in hair cells (revised Figure 1K). This pattern is consistent with localization to membrane-associated compartments, including the endoplasmic reticulum, and agrees with the ER-associated localization observed in HEI-OC1 cells and zebrafish hair cells. In mature hair cells, ER-associated signals may appear cytosolic and overlap with general hair cell markers such as Myo7a.

      Regarding antibody specificity, although HSD17B7 knockout tissue was not available, we performed complementary validation experiments in HEI-OC1 cells. Cells were transfected with pCMV-Flag, pCMV-Flag-hHSD17B7WT, or pCMV-hHSD17B7WT-EGFP constructs and stained with anti-Flag, anti-EGFP, and anti-HSD17B7 antibodies. The HSD17B7 antibody signal showed strong co-localization with both FLAG- and EGFP-tagged HSD17B7 (revised Figure S1A and B), supporting its specificity.

      Reviewer #2 (Public review):

      (1) The statement that HSD17B7 is "highly" expressed in sensory hair cells in mice and zebrafish seems incorrect for zebrafish:

      (a) The data do not support the notion that HSB17B7 is "highly expressed" in zebrafish. Compared to other genes (TMC1, TMIE, and others), the HSB17B7 level of expression in neuromast hair cells is low (Figure 1F), and by extension (Figure 1C), also in all hair cells. This interpretation is in line with the weak detection of an mRNA signal by ISH (Figure 1G I"). On this note, the staining reported in I" does not seem to label the cytoplasm of neuromast hair cells. An antisense probe control, along with a positive control (such as TMC1 or another), is necessary to interpret the ISH signal in the neuromast.

      We thank the reviewer for this detailed evaluation and agree that the description of HSD17B7 expression in zebrafish hair cells requires clarification.

      To address this, we performed a quantitative comparison of average expression levels within neuromast hair cells using log-normalized single-cell RNA-seq data. This analysis shows that hsd17b7 is expressed at a level comparable to several known MET-associated genes (e.g., tmc1 and lhfpl5a) (revised Figure 1D). Regarding the pseudotime heatmap (Figure 1F), we now state that this analysis illustrates temporal expression dynamics within neuromast hair cell development.

      In addition, we have clarified the interpretation of the whole-mount in situ hybridization data by emphasizing that the signal indicates spatial enrichment rather than high transcript abundance.

      We have updated the figure panels, legends, and corresponding text in the Results section to reflect these changes.

      (b) However, this is correct for mouse cochlear hair cells, based on single-cell RNA-seq published databases and immunostaining performed in the study. However, the specificity of the anti-HSD17B7 antibody used in the study (in immunostaining and western blot) is not demonstrated. Additionally, it stains some supporting cells or nerve terminals. Was that expression expected?

      To assess antibody specificity, we performed validation experiments using distinct epitopes. In HEI-OC1 cells transfected with pCMV-Flag-HSD17B7, or pCMV-HSD17B7-EGFP constructs, immunostaining with anti-HSD17B7 showed strong co-localization with both FLAG- and EGFP-tag (revised Figure S1B). In addition, western blot analyses using the same constructs confirmed the specific detection of HSD17B7 protein (revised Figure S1B). These validation data have now been included as supplementary figures in the revised manuscript and provide independent supporting evidence for the specificity of the anti-HSD17B7 antibody.

      (2) A previous report showed that HSD17B7 is expressed in mouse vestibular hair cells by single-cell RNAseq and immunostaining in mice, but it is not cited: Spatiotemporal dynamics of inner ear sensory and non-sensory cells revealed by single-cell transcriptomics. Jan TA, Eltawil Y, Ling AH, Chen L, Ellwanger DC, Heller S, Cheng AG. Cell Rep. 2021 Jul 13;36(2):109358. doi: 10.1016/j.celrep.2021.109358.

      We have now cited this reference in the revised manuscript.

      (3) Overexpressed HSD17B7-EGFP C-terminal fusion in zebrafish hair cells shows a punctiform signal in the soma but apparently does not stain the hair bundles. One limitation is the consequence of the C-terminal EGFP fusion to HSD17B7 on its function, which is not discussed.

      We thank the reviewer for raising this important technical point. The apparent absence of an HSD17B7-EGFP signal in hair bundles is primarily due to the imaging strategy and the selection of representative images. In zebrafish hair cells, the EGFP signal within hair bundles is extremely strong. To better visualize the intracellular distribution of HSD17B7 within the hair cell soma, we selected representative confocal optical sections that were focused on the cell body rather than on the apical hair bundle plane. As a result, the hair bundle signal is not visible in the images shown.

      Importantly, we agree that C-terminal EGFP fusion may potentially influence protein localization or function. We have therefore revised the Discussion to discuss this limitation and to clarify that our central conclusions regarding HSD17B7 function are primarily supported by loss-of-function analyses, rescue experiments using untagged mRNA, and cholesterol perturbation phenotypes, rather than relying solely on EGFP-tagged overexpression constructs.

      (4) A mutant Zebrafish CRISPR was generated, leading to a truncation after the first 96 aa out of the 340 aa total. It is unclear why the gene editing was not done closer to the ATG. This allele may conserve some function, which is not discussed.

      Targeting regions close to the ATG is indeed a commonly used strategy for CRISPR-mediated gene disruption. In this study, sgRNA selection was guided by online CRISPR design tools (CRISPRscan), prioritizing predicted cutting efficiency and specificity. This strategy resulted in a frameshift mutation introducing a premature stop codon after amino acid 96 of the 340-aa Hsd17b7 protein.

      Importantly, this truncation removes most of the conserved catalytic core required for 17β-hydroxysteroid dehydrogenase activity, including key motifs involved in NAD(P)-binding and substrate recognition. Therefore, although the mutation does not occur immediately adjacent to the ATG, the resulting allele is predicted to lack enzymatic function. We have clarified this rationale and discussed the functional consequences of the truncation in the revised manuscript.

      (5) The hsd17b7 mutant allele has a slightly reduced number of genetically labeled hair cells (quantified as a 16% reduction, estimated at 1-2 HC of the 9 HC present per neuromast). On a note, it is unclear what criteria were used to select HC in the picture. Some Brn3C:mGFP positive cells are apparently not included in the quantifications (Figure 2F, Figure 5A).

      Upon re-evaluation, we recognized that the original figure annotations were not sufficiently clear and may have led to confusion regarding hair cell selection. In the original images, the absence of dashed outlines around some Brn3c:mGFP<sup>+</sup> cells may have been misinterpreted as their exclusion from analysis. To address this issue, we have revised Figures 2F and 5A by updating the annotations to ensure that all Brn3c:mGFP<sup>+</sup> hair cells within each neuromast are clearly visible and unambiguously included (revised Figures 2F and 6A). Corresponding figure legends have also been revised to clarify the criteria used for hair cell identification and quantification.

      (6) The authors used FM4-64 staining to evaluate the hair cell mechanotransduction activity indirectly. They found a 40% reduction in labeling intensity in the HCs of the lateral line neuromast. Because the reduction of hair cell number (16%) is inferior to the reduction of FM4-64 staining, the authors argue that it indicates that the defect is primarily affecting the mechanotransduction function rather than the number of HCs. This argument is insufficient. Indeed, a scenario could be that some HC cells died and have been eliminated, while others are also engaged in this path and no longer perform the MET function. The numbers would then match. If single-cell staining can be resolved, one could determine the FM4-64 intensity per cell. It would also be informative to evaluate the potential occurrence of cell death in this mutant. On another note, the current quantification of the FM4-64 fluorescence intensity and its normalization are not described in the methods. More importantly, an independent and more direct experimental assay is needed to confirm this point. For example, using a GCaMP6-T2A-RFP allele for Ca2+ imaging and signal normalization. 

      We have revised the FM4-64 quantification strategy. Instead of measuring fluorescence intensity at the neuromast level, FM4-64 uptake was re-quantified at the single hair cell level. Hair cells within each neuromast were identified based on mGFP labeling, and the mean FM4-64 fluorescence intensity was measured for each individual hair cell. The average FM4-64 intensity per hair cell was then calculated for each neuromast and used for group comparisons (revised Figures 2F, 6B, and 8B, Figure S5B). The updated quantification method, normalization procedure, and analysis pipeline have now been described in the revised Methods section.

      As supportive evidence, we further analyzed single-cell RNA-seq data from control and hsd17b7 mutant hair cells (revised Figure 3). This analysis revealed dysregulation of multiple genes involved in the MET machinery, including reduced expression of tip-link–associated components and altered expression of other MET-related genes. While these transcriptional changes do not constitute a direct functional assay, they are consistent with perturbation of MET-associated pathways and complement the FM4-64 findings.

      (7) The authors used an acoustic startle response to elicit a behavioral response from the larvae and evaluate the "auditory response". They found a significative decrease in the response (movement trajectory, swimming velocity, distance) in the hsd17b7 mutant. The authors conclude that this gene is crucial for the "auditory function in zebrafish".

      This is an overstatement:

      (a) First, this test is adequate as a screening tool to identify animals that have lost completely the behavioral response to this acoustic and vibrational stimulation, which also involves a motor response. However, additional tests are required to confirm an auditory origin of the defect, such as Auditory Evoked Potential recordings, or for the vestibular function, the Vestibulo-Ocular Reflex. 

      We thank the reviewer for highlighting the limitations in interpreting the acoustic startle assay. We have revised the manuscript to avoid overstatement and now describe the observed phenotype as a reduction in the behavioral response to acoustic and vibrational stimulation, rather than concluding a specific impairment of auditory function.

      (b) Secondly, the behavioral defects observed in the mutant compared to the control are significantly different, but the differences are slight, contained within the Standard Deviation (20% for velocity, 25% for distance). To this point, the Figure 2 B and C plots are misleading because their y-axis do not start at 0.

      We have corrected Figures 2B and 2C so that the y-axes start at zero, thereby providing a more transparent visualization of the behavioral differences. The figure legends have also been revised to clarify the presentation of the data.

      (8) Overexpression of HSD17B7 in cell line HEI-OC1 apparently "significantly increases" the intensity of cholesterol-related signal using a genetically encoded fluorescent sensor (D4H-mCherry). However, the description of this quantification (per cell or per surface area) and the normalization of the fluorescent signal are not provided. 

      The quantification of the D4H-mCherry signal in HEI-OC1 cells was performed at the single-cell level. Specifically, individual cells were segmented based on morphology, and the mean fluorescence intensity of D4H-mCherry per cell was measured. To account for variability in cell size and imaging conditions, fluorescence intensity was normalized to the background signal measured from cell-free regions in the same field of view. We have now clarified the quantification strategy and normalization procedure in the revised Methods and Results sections.

      (9) When this experiment is conducted in vivo in zebrafish, a reduction in the "DH4 relative intensity" is detected (same issue with the absence of a detailed method description). However, as the difference is smaller than the standard deviation, this raises questions about the biological relevance of this result.

      We have now clarified the quantification strategy and normalization procedure in the revised Methods and Results sections.

      (10) The authors identified a deaf child as a carrier of a nonsense mutation in HSB17B7, which is predicted to terminate the HSB17B7 protein before the transmembrane domain. However, as no genetic linkage is possible, the causality is not demonstrated.

      We thank the reviewer for raising this important point. Unfortunately, we were unable to obtain the parents' genetic testing data to perform formal genetic and linkage analysis. To address this limitation, we have revised the manuscript to avoid causal overstatement and now describe the HSD17B7 E182* variant as a candidate pathogenic variant associated with hearing loss. Importantly, our functional analyses in zebrafish and cell-based systems demonstrate that the E182* truncation abolishes key biological activities of HSD17B7, including subcellular localization, cholesterol regulation, mechanotransduction-related activity, and behavioral responses. These convergent functional data provide biological support for the potential pathogenic relevance of this variant.

      (11) Previous results obtained from mouse HSD17B7-KO (citation below) are not described in sufficient detail. This is critical because, in this paper, the mouse loss-of-function of HSD17B7 is embryonically lethal, whereas no apparent phenotype was reported in heterozygotes, which are viable and fertile. Therefore, it seems unlikely that heterozygous mice exhibit hearing loss or vestibular defects; however, it would be essential to verify this to support the notion that the truncated allele found in one patient is causal.

      Hydroxysteroid (17beta) dehydrogenase 7 activity is essential for fetal de novo cholesterol synthesis and for neuroectodermal survival and cardiovascular differentiation in early mouse embryos.

      Jokela H, Rantakari P, Lamminen T, Strauss L, Ola R, Mutka AL, Gylling H, Miettinen T,

      Pakarinen P, Sainio K, Poutanen M. Endocrinology. 2010 Apr;151(4):1884-92. doi: 10.1210/en.2009-0928. Epub 2010 Feb 25.

      We thank the reviewer for raising this important point. We acknowledge that previous work has shown that complete loss of Hsd17b7 in mice is embryonically lethal, whereas heterozygous animals are viable and fertile (Jokela et al., 2010). Notably, this study primarily focused on embryonic development, cholesterol metabolism, and cardiovascular and neuroectodermal survival, and auditory or vestibular functions were not specifically examined. Therefore, subtle or sensory organ–specific phenotypes in heterozygous mice cannot be excluded.

      The human variant identified in this study (E182*) is a nonsense mutation predicted to truncate the HSD17B7 protein prior to the transmembrane and cytoplasmic domains. We therefore present it as a candidate loss-of-function variant, providing supportive human genetic evidence that is consistent with our functional analyses in zebrafish hair cells, rather than as definitive proof of causality. We have revised the manuscript to clarify these points and to acknowledge this limitation.

      (12) The authors used this truncated protein in their startle response and FM4-64 assays. First, they show that contrary to the WT version, this truncated form cannot rescue their phenotypes when overexpressed. Secondly, they tested whether this truncated protein could recapitulate the startle reflex and FM4-64 phenotypes of the mutant allele. At the homozygous level (not mentioned by the way), it can apparently do so to a lesser degree than the previous mutant. Again, the differences are within the Standard Deviation of the averages. The authors conclude that this mutation found in humans has a "negative effect" on hearing, which is again not supported by the data. 

      We thank the reviewer for this important comment. We agree that the overexpression strategy employed in this study does not fully replicate the endogenous heterozygous state observed in patients, and that the magnitude of the observed effects varies across samples. Accordingly, our experiments were not intended to demonstrate a definitive causal role of the HSD17B7 <sup>E182*</sup> variant in hearing loss.

      Instead, the overexpression assays were designed to assess whether the truncated HSD17B7 protein displays abnormal cellular properties and whether its presence can interfere with processes relevant to hair cell function. Under these conditions, HSD17B7<sup>E182*</sup> exhibited aberrant subcellular localization, altered intracellular cholesterol distribution, and was associated with reduced FM4-64 uptake and changes in startle-associated behaviors, whereas the wild-type protein did not.

      We revised the manuscript to moderate our conclusions. Rather than claim that the E182* mutation has a definitive “negative effect on auditory function,” we now describe it as a functionally compromised allele that disrupts cholesterol distribution and MET-related activity under overexpression conditions, providing mechanistic support consistent with our zebrafish loss-of-function data and the identification of this variant in a patient with hearing loss. In addition, the "negative effect" statement was based on the result that overexpression of the E182* mutation in wild-type embryos caused the compromised MET function and startle response defect.

      (13) The authors looked at the distribution of the HSB17B7 in a cell line. The WT version goes to the ER, while the truncated one forms aggregates. An interesting experiment consisted of co-expressing both constructs (Figure S6) to see whether the truncated version would mislocalize the WT version, which could be a mechanism for a dominant phenotype. However, this is not the case.

      We thank the reviewer for raising this important point regarding a potential dominant-negative mechanism. Consistent with the reviewer’s interpretation, we found that HSD17B7<sup>WT</sup> predominantly localizes to the endoplasmic reticulum, whereas the truncated HSD17B7<sup>E182*</sup> protein forms intracellular aggregates. Importantly, we further observed that the E182* mutation markedly reduces the stability of both HSD17B7 mRNA and protein, resulting in substantially decreased abundance of the truncated protein (Figure S6B–E). As a consequence, the cellular levels of HSD17B7^E182* are abnormally low.

      Based on these findings, we consider it unlikely that the E182* variant exerts its effect through interference with the wild-type protein. Our results suggest that the heterozygous c.544G>T (p.E182*) variant contributes to auditory dysfunction through potential pathogenic mechanisms: 1, haploinsufficiency caused by reduced HSD17B7 expression, 2, functional impairment due to altered protein subcellular localization and cholesterol distribution.

      We have revised the Results and Discussion sections. Our conclusions now emphasize that the functional impact of this variant is attributable to decreased effective HSD17B7 dosage, consistent with the observed defects in cholesterol synthesis, MET-related activity, and auditory-associated phenotypes in our model.

      (14) Through mass spectrometry of HSB17B7 proteins in the cell line, they identified a protein involved in ER retention, RER1. By biochemistry and in a cell line, they show that truncated HSB17B7 prevents the interaction with RER1, which would explain the subcellular localization.

      Consistent with the reviewer’s interpretation, wild-type HSD17B7 interacts with RER1, a protein known to participate in ER retention, whereas this interaction is lost in the truncated HSD17B7 variant. We propose that RER1 is an interacting partner of HSD17B7, providing a mechanistic explanation for the protein's subcellular localization.

      (15) Information and specificity validation of the HSB17B7 antibody are not presented. It seems that it is the same used on mice by IF and on zebrafish by Western. If so, the antibody could be used on zebrafish by IF to localize the endogenous protein (not overexpression as done here). Secondly, the specificity of the antibody should be verified on the mutant allele. That would bring confidence that the staining on the mouse is likely specific.

      We thank the reviewer for raising this important point regarding antibody specificity and validation. Information on the HSD17B7 antibody and its validation has been provided in our response to comment 1, where we described the use of antibodies recognizing different epitopes and the experimental strategies employed to assess specificity (revised Figure S1A and B).

      Although the same antibody was used for Western blot analysis in zebrafish samples, its performance in immunofluorescence staining of zebrafish tissues was suboptimal, with relatively high background. For this reason, we did not rely on this antibody for endogenous Hsd17b7 localization in zebrafish by immunofluorescence and instead employed tagged constructs for subcellular localization analyses. This approach provides more reliable and interpretable localization information under the current experimental conditions.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Suggested revisions to help improve the study and the eLife Assessment:

      (1) FM4-64 uptake: Isolate the effect of hair cell loss and MET reduction.

      (2) Clarify the mechanistic model: Is the mutant protein pathogenic due to toxicity, lack of expression or function, or both? Come up with a clearer causal chain of events.

      (3) Mouse immunostaining: Validate the HSD17B7 antibody, and since mouse RNAseq data (gEAR database) suggest that HSD17B7 expression increases dramatically between P0-P5, show this developmental progression by immunostaining of the mouse organ of Corti at P0, P3, and P5.

      (4) The HSD17B7-E182* expression disrupts cholesterol (D4H staining) in OC1 cells. This should also be demonstrated in the mutant zebrafish.

      (5) Structural modeling of E182* is uninformative; half the protein is absent. This kind of analysis is better suited for missense variants. Suggest removing this analysis.

      We thank the Reviewing Editor for these constructive suggestions. The major points raised here substantially overlap with the concerns raised in the public reviews. In response, we have:

      (1) revised FM4-64 quantification and interpretation to better distinguish hair cell loss from MET impairment;

      (2) Clarify the mechanistic mode. Mechanistically, the mutation decreases mRNA abundance and significantly reduces protein levels. Moreover, expression of the p.E182* mutation disrupted the interaction between HSD17B7 and the ER retention receptor RER1, leading to aberrant subcellular localization and altered cholesterol distribution, thereby exacerbating HC dysfunction.

      (3) provided additional validation of the HSD17B7 antibody using antibodies targeting distinct epitopes, and extended mouse organ of Corti immunostaining to postnatal stages P1, P4, and P7 to demonstrate the developmental upregulation of HSD17B7 expression;

      (4) added in vivo zebrafish experiments demonstrating that expression of HSD17B7<sup>E182*</sup> disrupts cholesterol distribution in hair cells, consistent with the effects observed in HEI-OC1 cells using D4H staining;

      (5) removed the structural modeling of the E182* variant.

      Recommendations for the authors:

      The recommendations from Reviewer #1 and Reviewer #2 were carefully considered and addressed. Most of these points overlap with the public reviews and the Reviewing Editor's comments and have been addressed through a revised mechanistic interpretation, additional clarifications in the Methods, more moderate claims regarding auditory function and human genetics, and the removal or revision of potentially misleading analyses. In addition, a number of minor issues were corrected, including missing or incorrect references, repetitive or unclear statements in the Introduction, insufficient methodological details, imprecise terminology, and typographical or formatting errors. Collectively, these revisions improve the clarity, rigor, and transparency of the study without altering its central conclusions.

    1. Author response:

      We thank the editors and reviewers for thoroughly reviewing our manuscript and offering thoughtful and constructive feedback. We appreciate the positive reception of our work and welcome the opportunity to address the lingering concerns. In the coming revisions, we will be directly addressing the question of the miniprotein’s specificity and increase the precision in the language used to discuss our findings.

    1. Author response:

      eLife Assessment

      This study presents a valuable theoretical exploration on the electrophysiological mechanisms of ionic currents via gap junctions in hippocampal CA1 pyramidal-cell models, and their potential contribution to local field potentials (LFPs) that is different from the contribution of chemical synapses. The biophysical argument regarding electric dipoles appears solid, but the evidence can be more convincing if their predictions are tested against experiments. A shortage of model validation and strictly comparable parameters used in the comparisons between chemical vs. junctional inputs makes the modeling approach incomplete; once strengthened, the finding can be of broad interest to electrophysiologists, who often make recordings from regions of neurons interconnected with gap junctions.

      We gratefully thank the editors and the reviewers for the time and effort in rigorously assessing our manuscript, for the constructive review process, for their enthusiastic responses to our study, and for the encouraging and thoughtful comments. We especially thank you for deeming our study to be a valuable exploration on the differential contributions of active dendritic gap junctions vs. chemical synapses to local field potentials. We thank you for your appreciation of the quantitative biophysical demonstration on the differences in electric dipoles that appear in extracellular potentials with gap junctions vs. chemical synapses.

      However, we are surprised by aspects of the assessment that resulted in deeming the approach incomplete, especially given the following with specific reference to the points raised:

      (1) Testing against experiments: With specific reference to gap junctions, quantitative experimental verification becomes extremely difficult because of the well-established nonspecificities associated with gap junctional modulators (Behrens et al., 2011; Rouach et al., 2003). The non-specific actions of gap junctions are tabulated in Table 2 of (Szarka et al., 2021), reproduced below. In addition, genetic knockouts of gap junctional proteins are either lethal or involve functional compensation (Bedner et al., 2012; Lo, 1999), together making causal links to specific gap junctional contributions with currently available techniques infeasible.

      In addition, the complex interactions between co-existing chemical synaptic, gap junctional, and active dendritic contributions from several cell-types make the delineation of the contributions of specific components infeasible with experimental approaches. A computational approach is the only quantitative route to specifically delineate the contributions of individual components to extracellular potentials, as seen from studies that have addressed the question of active dendritic contributions to field potentials (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to local field potentials (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). The biophysically and morphologically realistic computational modeling route is therefore invaluable in assessing the impact of individual components to extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).

      Together, we emphasize that the computational modeling route is currently the only quantitative methodology to delineate the contributions of gap junctions vs. chemical synapses to extracellular potentials.

      (2) Model validation: The model used in this study was adopted from a physiologically validated model from our laboratory (Roy & Narayanan, 2021). Please note that the original model was validated against several physiological measurements along the somatodendritic axis. We sincerely regret our oversight in not mentioning clearly that we have used an existing, thoroughly physiologically-validated model from our laboratory in this study.

      (3) Comparisons between chemical vs. junctional inputs: We had taken elaborate precautions in our experimental design to match the intracellular electrophysiological signatures with reference to synchronous as well as oscillatory inputs, irrespective of whether inputs arrived through gap junctions or chemical synapses.

      In a revised manuscript, we will address all the concerns raised by the reviewers in detail. We have provided point-by-point responses to reviewers’ helpful and constructive comments below. We thank the editors and the reviewers for this constructive review process, which we believe will help us in improving our manuscript with specific reference to emphasizing the novelty of our approach and conclusions.

      Reviewer #1 (Public review):

      This manuscript makes a significant contribution to the field by exploring the dichotomy between chemical synaptic and gap junctional contributions to extracellular potentials. While the study is comprehensive in its computational approach, adding experimental validation, network-level simulations, and expanded discussion on implications would elevate its impact further.

      We gratefully thank you for your time and effort in rigorously assessing our manuscript, for the enthusiastic response, and the encouraging and thoughtful comments on our study. In what follows, we have provided point-by-point responses to the specific comments.

      Strengths

      Novelty and Scope

      The manuscript provides a detailed investigation into the contrasting extracellular field potential (EFP) signatures arising from chemical synapses and gap junctions, an underexplored area in neuroscience. It highlights the critical role of active dendritic processes in shaping EFPs, pushing forward our understanding of how electrical and chemical synapses contribute differently to extracellular signals.

      We thank you for the positive comments on the novelty of our approach and how our study addresses an underexplored area in neuroscience. The assumptions about the passive nature of dendritic structures had indeed resulted in an underestimation of the contributions of gap junctions to extracellular potentials. Once the realities of active structures are accounted for, the contributions of gap junctions increases by several orders of magnitude compared to passive structures (Fig. 1D).

      Methodological Rigor

      The use of morphologically and biophysically realistic computational models for CA1 pyramidal neurons ensures that the findings are grounded in physiological relevance. Systematic analysis of various factors, including the presence of sodium, leak, and HCN channels, offers a clear dissection of how transmembrane currents shape EFPs.

      We thank you for your encouraging comments on the experimental design and methodological rigor of our approach.

      Biological Relevance

      The findings emphasize the importance of incorporating gap junctional inputs in analyses of extracellular signals, which have traditionally focused on chemical synapses. The observed polarity differences and spectral characteristics provide novel insights into how neural computations may differ based on the mode of synaptic input.

      We thank you for your positive comments on the biological relevance of our approach. We also gratefully thank you for emphasizing the two striking novelties unveiling the dichotomy between gap junctions and chemical synapses in their contributions to field potentials: polarity differences and spectral characteristics.

      Clarity and Depth

      The manuscript is well-structured, with a logical progression from synchronous input analyses to asynchronous and rhythmic inputs, ensuring comprehensive coverage of the topic.

      We sincerely thank you for the positive comments on the structure and comprehensive coverage of our manuscript encompassing different types of inputs that neurons typically receive.

      Weaknesses and Areas for Improvement

      Generality and Validation

      The study focuses exclusively on CA1 pyramidal neurons. Expanding the analysis to other cell types, such as interneurons or glial cells, would enhance the generalizability of the findings. Experimental validation of the computational predictions is entirely absent. Empirical data correlating the modeled EFPs with actual recordings would strengthen the claims.

      We thank you for raising this important point. The prime novelty and the principal conclusion of this study is that gap junctional contributions to extracellular field potentials are orders of magnitude higher when the active nature of cellular compartments are accounted for. The lacuna in the literature has been consequent to the assumption that cellular compartments are passive, resulting in the dogma that gap junctional contributions to field potentials are negligible. Despite knowledge about active dendritic structures for decades now, this assumption has kept studies from understanding or even exploring the contributions of gap junctions to field potentials. The rationale behind the choice of a computational approach to address the lacuna were as follows:

      (1) The complex interactions between co-existing chemical synaptic, gap junctional, and active dendritic contributions from several cell-types make the delineation of the contributions of specific components infeasible with experimental approaches. A computational approach is the only quantitative route to specifically delineate the contributions of individual components to extracellular potentials, as seen from studies that have addressed the question of active dendritic contributions to field potentials (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to local field potentials (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). The biophysically and morphologically realistic computational modeling route is therefore invaluable in assessing the impact of individual components to extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).

      (2) With specific reference to gap junctions, quantitative experimental verification becomes extremely difficult because of the well-established non-specificities associated with gap junctional modulators (Behrens et al., 2011; Rouach et al., 2003). The non-specific actions of gap junctions are tabulated in Table 2 of (Szarka et al., 2021). In addition, genetic knockouts of gap junctional proteins are either lethal or involve functional compensation (Bedner et al., 2012; Lo, 1999), together making causal links to specific gap junctional contributions with currently available techniques infeasible.

      We highlight the novelty of our approach and of the conclusions about differences in extracellular signatures associated with active-dendritic chemical synapses and gap junctions, against these experimental difficulties. We emphasize that the computational modeling route is currently the only quantitative methodology to delineate the contributions of gap junctions vs. chemical synapses to extracellular potentials. Our analyses clearly demonstrates that gap junctions do contribute to extracellular potentials if the active nature of the cellular compartments is explicitly accounted for (Fig. 1D). We also show theoretically well-grounded and mechanistically elucidated differences in polarity (Figs. 1–3) as well as in spectral signatures (Figs. 5–8) of extracellular potentials associated with gap junctional vs. chemical synaptic inputs. Together, our fundamental demonstration in this study is the critical need to account for the active nature of cellular compartments in studying gap junctional contributions of extracellular potentials, with CA1 pyramidal neuronal dendrites used as an exemplar.

      In a revised version of the manuscript, we will emphasize the motivations for the approach we took, highlighting the specific novelties both in methodological and conceptual aspects, finally emphasizing the need to account for other cell types and gap junctional contributions therein. Importantly, we will emphasize the non-specificities associated with gap-junctional blockers as the reason why experimental delineation of gap junctional vs. chemical synaptic contributions to LFP becomes tedious. We hope that these points will underscore the need for the computational approach that we took to address this important question, apart from the novelties of the manuscript.

      Role of Active Dendritic Currents

      The paper emphasizes active dendritic currents, particularly the role of HCN channels in generating outward currents under certain conditions. However, further discussion of how this mechanism integrates into broader network dynamics is warranted.

      We thank you for this constructive suggestion. We agree that it is important to consider the implications for broader network dynamics of the outward HCN currents that are observed with synchronous inputs. In a revised manuscript, we will elaborate on the implications of the outward HCN current to network dynamics in detail.

      Analysis of Plasticity

      While the manuscript mentions plasticity in the discussion, there are no simulations that account for activity-dependent changes in synaptic or gap junctional properties. Including such analyses could significantly enhance the relevance of the findings.

      We thank you for this constructive suggestion. Please note that we have presented consistent results for both fewer and more gap junctions in our analyses (Figure 1 with 217 gap junctions and Supplementary Figure 1 with 99 gap junctions). Thus, our fundamentally novel result that gap junctions onto active dendrites differentially shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron. Thus, these results demonstrate that the conclusions about their contributions to LFP are invariant to plasticity in their gap junctional numerosity.

      We had only briefly mentioned plasticity in the Introduction to highlight the different modes of synaptic transmission and to emphasize that plasticity has been studied in both chemical synapses and gap junctions, playing a role in learning and adaptation. However, if this wording inadvertently suggests that our study includes plasticity simulations, we would remove it from Introduction in the updated manuscript to ensure clarity.

      In the ‘Limitations of analyses and future studies’ section in Discussion, we suggested investigating the impact of plasticity mechanisms—specifically, activity-dependent plasticity of ion channels—on synaptic receptors vs. gap junctions and their effects on extracellular field potentials under various input conditions and plasticity combinations across different structures. We fully agree with the reviewer that such studies would offer valuable insights and further enhance the broader relevance of our findings. However, while our study implies this direction, it was not the primary focus of our investigation.

      In the revised manuscript, we will expand on intrinsic/synaptic plasticity and how they could contribute to LFPs (Sinha & Narayanan, 2015, 2022), while also pointing to simulations with different numbers of gap junction in this context.

      Frequency-Dependent Effects

      The study demonstrates that gap junctional inputs suppress highfrequency EFP power due to membrane filtering. However, it could delve deeper into the implications of this for different brain rhythms, such as gamma or ripple oscillations.

      We sincerely thank you for these insightful comments that we totally agree with. As it so happens, this manuscript forms the first part of a broader study where we explore the implications of gap junctions to ripple frequency oscillations. The ripple oscillations part of the work was presented as a poster in the Society for Neuroscience (SfN) annual meeting 2024 (Sirmaur & Narayanan, 2024). There, we simulate a neuropil made of hundreds of morphologically realistic neurons to assess the role of different synaptic inputs — excitatory, inhibitory, and gap junctional — and active dendrites to ripple frequency oscillations. We demonstrate there that the conclusions from single-neuron simulations in this current manuscript extend to a neuropil with several neurons, each receiving excitatory, inhibitory and gap-junctional inputs, especially with reference to high-frequency oscillations. Our networkbased analyses unveiled a dominant mediatory role of patterned inhibition in ripple generation, with recurrent excitations through chemical synapses and gap junctions in conjunction with return-current contributions from active dendrites playing regulatory roles in determining ripple characteristics (Sirmaur & Narayanan, 2024).

      Our principal goal in this study, therefore, was to lay the single-neuron foundation for network analyses of the impact of gap junctions on LFPs. We are preparing the network part of the study, with a strong focus on ripple-frequency oscillations, for submission for peer review separately.

      In a revised manuscript, we will mention the results from our SfN abstract with reference to network simulations and high-frequency oscillations, while also presenting discussions from other studies on the role of gap junctions in synchrony and LFP oscillations.

      Visualization

      Figures are dense and could benefit from more intuitive labeling and focused presentations. For example, isolating key differences between chemical and gap junctional inputs in distinct panels would improve clarity.

      We thank you for this constructive suggestion. In the revised manuscript, we will enhance the visualization of the figures to ensure a clearer and more intuitive distinction between chemical synapses and gap junctions.

      Contextual Relevance

      The manuscript touches on how these findings relate to known physiological roles of gap junctions (e.g., in gamma rhythms) but does not explore this in depth. Stronger integration of the results into known neural network dynamics would enhance its impact.

      We sincerely appreciate your valuable suggestion and acknowledge the importance of integrating our results into established neural network dynamics, particularly their implications for gamma rhythms. We will address this aspect more comprehensively in the revised version of our manuscript.

      Reviewer #2 (Public review):

      This computational work examines whether the inputs that neurons receive through electrical synapses (gap junctions) have different signatures in the extracellular local field potential (LFP) compared to inputs via chemical synapses. The authors present the results of a series of model simulations where either electric or chemical synapses targeting a single hippocampal pyramidal neuron are activated in various spatio-temporal patterns, and the resulting LFP in the vicinity of the cell is calculated and analyzed. The authors find several notable qualitative differences between the LFP patterns evoked by gap junctions vs. chemical synapses. For some of these findings, the authors demonstrate convincingly that the observed differences are explained by the electric vs. chemical nature of the input, and these results likely generalize to other cell types. However, in other cases, it remains plausible (or even likely) that the differences are caused, at least partly, by other factors (such as different intracellular voltage responses due to, e.g., the unequal strengths of the inputs). Furthermore, it was not immediately clear to me how the results could be applied to analyze more realistic situations where neurons receive partially synchronized excitatory and inhibitory inputs via chemical and electric synapses.

      We gratefully thank you for your time and effort in rigorously assessing our manuscript, for the enthusiastic response, and the encouraging and thoughtful comments on our study. In what follows, we have provided point-by-point responses to the specific comments.

      Strengths

      The main strength of the paper is that it draws attention to the fact that inputs to a neuron via gap junctions are expected to give rise to a different extracellular electric field compared to inputs via chemical synapses, even if the intracellular effects of the two types of input are similar. This is because, unlike chemical synaptic inputs, inputs via gap junctions are not directly associated with transmembrane currents. This is a general result that holds independent of many details such as the cell types or neurotransmitters involved.

      We gratefully thank you for the positive comments and the encouraging words about the novel contributions of our study. We are particularly thankful to you for your comment on the generality of our conclusions that hold for different cell types and neurotransmitters involved.

      Another strength of the article is that the authors attempt to provide intuitive, non-technical explanations of most of their findings, which should make the paper readable also for non-expert audiences (including experimentalists).

      We sincerely thank you for the positive comments about the readability of the paper.

      Weaknesses

      The most problematic aspect of the paper relates to the methodology for comparing the effects of electric vs. chemical synaptic inputs on the LFP. The authors seem to suggest that the primary cause of all the differences seen in the various simulation experiments is the different nature of the input, and particularly the difference between the transmembrane current evoked by chemical synapses and the gap junctional current that does not involve the extracellular space. However, this is clearly an oversimplification: since no real attempt is made to quantitatively match the two conditions that are compared (e.g., regarding the strength and temporal profile of the inputs), the differences seen can be due to factors other than the electric vs. chemical nature of synapses. In fact, if inputs were identical in all parameters other than the transmembrane vs. directly injected nature of the current, the intracellular voltage responses and, consequently, the currents through voltage-gated and leak currents would also be the same, and the LFPs would differ exactly by the contribution of the transmembrane current evoked by the chemical synapse. This is evidently not the case for any of the simulated comparisons presented, and the differences in the membrane potential response are rather striking in several cases (e.g., in the case of random inputs, there is only one action potential with gap junctions, but multiple action potentials with chemical synapses). Consequently, it remains unclear which observed differences are fundamental in the sense that they are directly related to the electric vs. chemical nature of the input, and which differences can be attributed to other factors such as differences in the strength and pattern of the inputs (and the resulting difference in the neuronal electric response).

      We thank you for raising this important point. We would like to emphasize that our experimental design and analyses quantitatively account for the spatial distribution and temporal pattern of specific kinds of inputs that arrive through gap junctions and chemical synapses. We submit that our analyses quantitatively demonstrates that the fundamental difference between the gap junctional and chemical synaptic contributions to extracellular potentials is the absence of the direct transmembrane component from gap junctional inputs. We elucidate these points below:

      (1) Spatial distribution: The inputs were distributed randomly across the basal dendrites, irrespective of whether they were through gap junctions or chemical synapses. For both chemical synapses and gap junctions, the inputs were of the same nature: excitatory.

      (2) Different numbers of inputs: We have presented consistent results for both fewer and more gap junctions or chemical synapses in our analyses (see Figure 1 with 217 gap junctions or 245 chemical synapses and Supplementary Figure 2 with 99 gap junctions or 30 chemical synapses). Our fundamentally novel result that gap junctions onto active dendrites shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron.

      (3) Synchronous inputs (Figs. 1–3): For chemical synapses, the waveforms are in the shape of postsynaptic potentials. For gap junctional inputs, the waveforms are in the shape of postsynaptic potentials or dendritic spikes (to respect the active nature of inputs from the other cell). Here, the electrical response of the postsynaptic cell is identical irrespective of whether inputs arrive through gap junctions or chemical synapses: an action potential. We quantitatively matched the strengths such that the model generated a single action potential in response to synchronous inputs, irrespective of whether they arrived through chemical synaptic and gap junctional inputs. We mechanistically analyze the contributions of different cellular components and show that the direct transmembrane current in chemical synapses is the distinguishing factor that determines the dichotomy between the contributions of gap junctions vs. chemical synapses to extracellular potentials (Figs. 2–3). In a revised manuscript, we will show the intracellular responses to demonstrate that they are electrically matched.

      (4) Random inputs (Fig. 4): For random inputs, we did not account for the number of action potentials that arrived, as the only observation we made here was with reference to the biphasic nature of the extracellular potentials with gap junctional inputs in the “No Sodium” scenario. We note that in the “No Sodium” scenario, the time-domain amplitudes were comparable for the field potentials (Fig. 4B, Fig. 4D).

      (5) Rhythmic inputs (Fig. 5–8): For rhythmic inputs, please note that the intracellular and extracellular waveforms for every frequency are provided in supplementary figures S5– S11. It may be noted that the intracellular responses are comparable. In simulations for assessing spike-LFP comparison, we tuned the strengths to produce a single spike per cycle, ensuring fair comparison of LFPs with gap junctions vs. chemical synapses.

      Taken together, we demonstrate through explicit sets of simulations and analyses that the differences in LFPs were not driven by the strength or patterns of the inputs but rather by the differences in direct transmembrane currents, which are subsequently reflected in the LFPs. In a revised manuscript, we will add a section to emphasize these points apart from providing intracellular traces for cases where they are not provided.

      Some of the explanations offered for the effects of cellular manipulations on the LFP appear to be incomplete. More specifically, the authors observed that blocking leak channels significantly changed the shape of the LFP response to synchronous synaptic inputs - but only when electric inputs were used, and when sodium channels were intact. The authors seemed to attribute this phenomenon to a direct effect of leak currents on the extracellular potential - however, this appears unlikely both because it does not explain why blocking the leak conductance had no effect in the other cases, and because the leak current is several orders of magnitude smaller than the spike-generating currents that make the largest contributions to the LFP. An indirect effect mediated by interactions of the leak current with some voltage-gated currents appears to be the most likely explanation, but identifying the exact mechanism would require further simulation experiments and/or a detailed analysis of intracellular currents and the membrane potential in time and space.

      We thank you for raising this important question. Leak channels were among the several contributors to the positive deflection observed in LFPs associated with gap junctions. This effect was present not only in gap junctional models with intact sodium conductance but also in the no-sodium model, where the amplitude of the positive deflection was reduced across other models as well (Fig. 2F, I). Furthermore, even in the absence of leak conductance, a small positive deflection was still observed (Fig. 2F), leading us to further investigate other transmembrane currents over time and across spatial locations, from the proximal to the distal dendritic ends relative to the soma (Fig. 3D). We had observed that the dominant contributor in the case of chemical synapses was the inward synaptic current (Fig. 3A), whereas for gap junctions, the primary contributors were leak conductance along with other outward currents, such as potassium and HCN currents (Fig. 3D). Together, the direct transmembrane component of chemical synapses provides a dominant contribution to extracellular potentials. This dominance translates to differences in the relative contributions of indirect currents (including leak currents) to extracellular potentials associated chemical synaptic vs. gap junctional inputs. Our analyses of the exact ionic mechanisms (Fig. 3) demonstrates the involvement of several ion channels contributing to the indirect component in either scenario.

      In every simulation experiment in this study, inputs through electric synapses are modeled as intracellular current injections of pre-determined amplitude and time course based on the sampled dendritic voltage of potential synaptic partners. This is a major simplification that may have a significant impact on the results. First, the current through gap junctions depends on the voltage difference between the two connected cellular compartments and is thus sensitive to the membrane potential of the cell that is treated as the neuron "receiving" the input in this study (although, strictly speaking, there is no pre- or postsynaptic neuron in interactions mediated by gap junctions). This dependence on the membrane potential of the target neuron is completely missing here. A related second point is that gap junctions also change the apparent membrane resistance of the neurons they connect, effectively acting as additional shunting (or leak) conductance in the relevant compartments. This effect is completely missed by treating gap junctions as pure current sources.

      We thank you for raising this important point. We agree with the analyses presented by the reviewer on the importance of network simulations and bidirectional gap junctions that respect the voltages in both neurons. However, the complexities of LFP modeling precludes modeling of networks of morphologically realistic models with patterns of stimulations occurring across the dendritic tree. LFP modeling studies predominantly uses “post-synaptic” currents to analyze the impact of different patterns of inputs arriving on to a neuron, even when chemical synaptic inputs are considered. Explicitly, individual neurons are separately simulated with different patterns of synaptic inputs, the transmembrane current at different locations recorded, and the extracellular potential is then computed using line source approximation (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). Even in scenarios where a network is analyzed, a hybrid approach involving the outputs of a pointneuron-based network being coupled to an independent morphologically realistic neuronal model is employed (Hagen et al., 2016; Martinez-Canada et al., 2021; Mazzoni et al., 2015). Given the complexities associated with the computation of electrode potentials arising as a distance-weighted summation of several transmembrane currents, these simplifications becomes essential.

      Our approach models gap junctional currents in a similar way as the other model incorporate synaptic currents in LFP modeling (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). As gap junctions are typically implemented as resistors from the other neuronal compartment, we accounted for gap-junctional variability in our model by randomizing the scaling-factors and the exact waveforms that arrive through individual gap junctions at specific locations. Thus, the inputs were not pre-determined by “pre” neurons. Instead, the recorded voltages from potential synaptic partner neurons were randomized across locations and scaled using factors at the dendrites before being injected into the target neuron (Supplementary Fig. S1). While incorporating a network of interconnected neurons is indeed important, we utilized biophysical, morphologically realistic CA1 neuron model with different sets of input patterns to model LFPs, which were derived from the total transmembrane currents across all compartments of the multi-compartmental neuron model. Given the complexity of this approach, adding further network-level interactions or pre-post connections would have been computationally demanding.

      In a revised manuscript, we will introduce the general methodology used in LFP modeling studies to introduce synaptic currents. We will emphasize that our study extends this approach to modeling gap junctional inputs, while also highlighting randomization of locations and the scaling process in assigning gap junctional synaptic strengths.

      One prominent claim of the article that is emphasized even in the abstract is that HCN channels mediate an outward current in certain cases. Although this statement is technically correct, there are two reasons why I do not consider this a major finding of the paper. First, as the authors acknowledge, this is a trivial consequence of the relatively slow kinetics of HCN channels: when at least some of the channels are open, any input that is sufficiently fast and strong to take the membrane potential across the reversal potential of the channel will lead to the reversal of the polarity of the current. This effect is quite generic and well-known and is by no means specific to gap junctional inputs or even HCN channels. Second, and perhaps more importantly, the functional consequence of this reversed current through HCN channels is likely to be negligible. As clearly shown in Supplementary Figure S3, the HCN current becomes outward only for an extremely short time period during the action potential, which is also a period when several other currents are also active and likely dominant due to their much higher conductances. I also note that several of these relevant facts remain hidden in Figure 3, both because of its focus on peak values, and because of the radically different units on the vertical axes of the current plots.

      We thank you for raising this point and agree with you on every point. Please note that we do not assert that the outward HCN currents are exclusively associated with gap junctional inputs. Rather, our results show that synchronous inputs generate outward HCN currents in both chemical synapses (Fig. 3B; positive/outward HCN currents, except in the no sodium or leak model) and gap junctions (Fig. 3D; positive/outward HCN currents). We emphasized this in the case of gap junctions because, in the absence of inward synaptic currents, HCN (acting as outward currents with synchronous inputs) contributed to the positive deflection observed in the LFPs. While HCN would also contribute in the case of chemical synapses, its effect was negligible due to the presence of large inward synaptic currents. Since LFPs reflect the collective total transmembrane currents, the dominant contributors differ between these two scenarios, which we aimed to highlight. Since HCN exhibited outward currents in our synchronous input simulations, we have elaborated on this mechanism in the supplementary figure (Fig. S3). Our intention was not to emphasize this effect for only one synaptic mode but rather to highlight HCN's contribution to the positive deflection as one of the contributing factors.

      We agree that HCN currents are relatively small in magnitude; therefore, our conclusions were based on HCN being one of the several contributing factors. Leak conductance and other outward conductances, including HCN currents (Fig. 3D), collectively contribute to the positive deflections observed in the case of gap junctional synchronous inputs.

      We will ensure that we will account for all the points appropriately in a revised manuscript.

      Finally, I missed an appropriate validation of the neuronal model used, and also the characterization of the effects of the in silico manipulations used on the basic behavior of the model. As far as I understand, the model in its current form has not been used in other studies. If this is the case, it would be important to demonstrate convincingly through (preferably quantitative) comparisons with experimental data using different protocols that the model captures the physiological behavior of at least the relevant compartments (in this case, the dendrites and the soma) of hippocampal pyramidal neurons sufficiently well that the results of the modeling study are relevant to the real biological system. In addition, the correct interpretation of various manipulations of the model would be strongly facilitated by investigating and discussing how the physiological properties of the model neuron are affected by these alterations.

      We thank you for raising this important point. The CA1 pyramidal neuronal model used in this study is built with ion-channel models derived from biophysical and electrophysiological recordings from these cells. As mentioned in the Methods section “Dynamics and distribution of active channels” and Supplementary Table S1, models for individual channels, their gating kinetics, and channel distributions across the somatodendritic arbor (wherever known) are all derived from their physiological equivalents. Importantly, these values were derived from previously validated models from the laboratory, which contain these very ion channel models and the exact same morphology (Roy & Narayanan, 2021). Please compare Supplementary Table S1 with the Table 1 from (Roy & Narayanan, 2021). Please note that this model was validated against several physiological measurements along the somatodendritic axis (Fig. 1 of (Roy & Narayanan, 2021)).

      In a revised manuscript, we will explicitly mention this while also mentioning the different physiological properties that were used for the validation process from (Roy & Narayanan, 2021). We sincerely regret not mentioning these details in the current version of our manuscript.

      We will fix these in a revised version of the manuscript.

      References

      Bedner, P., Steinhauser, C., & Theis, M. (2012). Functional redundancy and compensation among members of gap junction protein families? Biochim Biophys Acta, 1818(8), 1971-1984. https://doi.org/10.1016/j.bbamem.2011.10.016

      Behrens, C. J., Ul Haq, R., Liotta, A., Anderson, M. L., & Heinemann, U. (2011). Nonspecific effects of the gap junction blocker mefloquine on fast hippocampal network oscillations in the adult rat in vitro. Neuroscience, 192, 11-19. https://doi.org/10.1016/j.neuroscience.2011.07.015

      Buzsaki, G., Anastassiou, C. A., & Koch, C. (2012). The origin of extracellular fields and currents--EEG, ECoG, LFP and spikes. Nat Rev Neurosci, 13(6), 407-420. https://doi.org/10.1038/nrn3241

      Einevoll, G. T., Destexhe, A., Diesmann, M., Grun, S., Jirsa, V., de Kamps, M., Migliore, M., Ness, T. V., Plesser, H. E., & Schurmann, F. (2019). The Scientific Case for Brain Simulations. Neuron, 102(4), 735-744. https://doi.org/10.1016/j.neuron.2019.03.027

      Gold, C., Henze, D. A., Koch, C., & Buzsaki, G. (2006). On the origin of the extracellular action potential waveform: A modeling study. J Neurophysiol, 95(5), 3113-3128. https://doi.org/10.1152/jn.00979.2005

      Hagen, E., Dahmen, D., Stavrinou, M. L., Linden, H., Tetzlaff, T., van Albada, S. J., Grun, S., Diesmann, M., & Einevoll, G. T. (2016). Hybrid Scheme for Modeling Local Field Potentials from Point-Neuron Networks. Cereb Cortex, 26(12), 4461-4496. https://doi.org/10.1093/cercor/bhw237

      Halnes, G., Ness, T. V., Næss, S., Hagen, E., Pettersen, K. H., & Einevoll, G. T. (2024). Electric Brain Signals: Foundations and Applications of Biophysical Modeling. Cambridge University Press. https://doi.org/DOI: 10.1017/9781009039826

      Lo, C. W. (1999). Genes, gene knockouts, and mutations in the analysis of gap junctions. Dev Genet, 24(1-2), 1-4. https://doi.org/10.1002/(SICI)1520-6408(1999)24:1/2%3C1::AID-DVG1%3E3.0.CO;2-U

      Martinez-Canada, P., Ness, T. V., Einevoll, G. T., Fellin, T., & Panzeri, S. (2021). Computation of the electroencephalogram (EEG) from network models of point neurons. PLoS Comput Biol, 17(4), e1008893. https://doi.org/10.1371/journal.pcbi.1008893

      Mazzoni, A., Linden, H., Cuntz, H., Lansner, A., Panzeri, S., & Einevoll, G. T. (2015). Computing the Local Field Potential (LFP) from Integrate-and-Fire Network Models. PLoS Comput Biol, 11(12), e1004584. https://doi.org/10.1371/journal.pcbi.1004584

      Ness, T. V., Remme, M. W. H., & Einevoll, G. T. (2018). h-Type Membrane Current Shapes the Local Field Potential from Populations of Pyramidal Neurons. J Neurosci, 38(26), 6011-6024. https://doi.org/10.1523/jneurosci.3278-17.2018

      Reimann, M. W., Anastassiou, C. A., Perin, R., Hill, S. L., Markram, H., & Koch, C. (2013). A biophysically detailed model of neocortical local field potentials predicts the critical role of active membrane currents. Neuron, 79(2), 375-390. https://doi.org/10.1016/j.neuron.2013.05.023

      Rouach, N., Segal, M., Koulakoff, A., Giaume, C., & Avignone, E. (2003). Carbenoxolone blockade of neuronal network activity in culture is not mediated by an action on gap junctions. Journal of Physiology, 553(Pt 3), 729-745. https://doi.org/10.1113/jphysiol.2003.053439

      Roy, A., & Narayanan, R. (2021). Spatial information transfer in hippocampal place cells depends on trial-to-trial variability, symmetry of place-field firing, and biophysical heterogeneities. Neural Netw, 142, 636-660. https://doi.org/10.1016/j.neunet.2021.07.026

      Schomburg, E. W., Anastassiou, C. A., Buzsaki, G., & Koch, C. (2012). The spiking component of oscillatory extracellular potentials in the rat hippocampus. J Neurosci, 32(34), 11798-11811. https://doi.org/10.1523/JNEUROSCI.0656-12.2012

      Sinha, M., & Narayanan, R. (2015). HCN channels enhance spike phase coherence and regulate the phase of spikes and LFPs in the theta-frequency range. Proc Natl Acad Sci U S A, 112(17), E2207-2216. https://doi.org/10.1073/pnas.1419017112

      Sinha, M., & Narayanan, R. (2022). Active Dendrites and Local Field Potentials: Biophysical Mechanisms and Computational Explorations. Neuroscience, 489, 111-142. https://doi.org/10.1016/j.neuroscience.2021.08.035

      Sirmaur, R., & Narayanan, R. (2024). Distinct extracellular signatures of chemical and electrical synapses impinging on active dendrites differentially contribute to ripple-frequency oscillations. Society for Neuroscience annual meeting (https://www.abstractsonline.com/pp8/?_gl=1*1bxo7m*_gcl_au*MTc5MTQ0NjE0NC4xNzI3MDcwOTMw*_ga*MTMxMTE5OTcyMy4xNzI3MDcwOTMx*_ga_T09K 3Q2WDN*MTcyNzA3MDkzMS4xLjEuMTcyNzA3MDkzNy41NC4wLjA.#!/20433/ presentation/13949), Chicago, USA.

      Szarka, G., Balogh, M., Tengolics, A. J., Ganczer, A., Volgyi, B., & Kovacs-Oller, T. (2021). The role of gap junctions in cell death and neuromodulation in the retina. Neural Regen Res, 16(10), 1911-1920. https://doi.org/10.4103/1673-5374.308069

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This study resolves a cryo-EM structure of the GPCR, GPR30, in the presence of bicarbonate, which the author's lab recently identified as the physiological ligand. Understanding the ligand and the mechanism of activation is of fundamental importance to the field of receptor signaling. This solid study provides important insight into the overall structure and suggests a possible bicarbonate binding site.

      Strengths:

      The overall structure, and proposed mechanism of G-protein coupling are solid. Based on the structure, the authors identify a binding pocket that might accommodate bicarbonate. Although assignment of the binding pocket is speculative, extensive mutagenesis of residues in this pocket identifies several that are important to G-protein signaling. The structure shows some conformational differences with a previous structure of this protein determined in the absence of bicarbonate (PMC11217264). To my knowledge, bicarbonate is the only physiological ligand that has been identified for GPR30, making this study an important contribution to the field. However, the current study provides novel and important circumstantial evidence for the bicarbonate binding site based on mutagenesis and functional assays.

      Weaknesses:

      Bicarbonate is a challenging ligand for structural and biochemical studies, and because of experimental limitations, this study does not elucidate the exact binding site. Higher resolution structures would be required for structural identification of bicarbonate. The functional assay monitors activation of GPR30, and thus reports on not only bicarbonate binding, but also the integrity of the allosteric network that transduces the binding signal across the membrane. However, biochemical binding assays are challenging because the binding constant is weak, in the mM range.

      The authors appropriately acknowledge the limitations of these experimental approaches, and they build a solid circumstantial case for the bicarbonate binding pocket based on extensive mutagenesis and functional analysis. However, the study does fall short of establishing the bicarbonate binding site.

      We thank the reviewer for this thoughtful and constructive assessment of our revised manuscript. We are grateful for the recognition of the overall quality of the cryo-EM structure and the proposed mechanism of G-protein coupling, as well as for highlighting the importance of identifying bicarbonate as a physiological ligand for GPR30 and the contribution this work makes to the receptor signaling field. We also appreciate the reviewer’s careful and balanced discussion of the inherent challenges posed by bicarbonate as a low-affinity, small, negatively charged ligand, and we fully agree that, given current experimental limitations, our data provide circumstantial—rather than definitive—evidence for the binding site and that higher-resolution structures would be required for direct visualization. Importantly, we value the reviewer’s acknowledgement that we transparently describe these limitations and that our extensive mutagenesis and functional analyses nonetheless build a solid case for the proposed bicarbonate-binding pocket, which we believe will serve as a useful framework for future biochemical and structural investigation

      Reviewer #1 (Recommendations for the authors):

      Overall, the authors do a good job responding to the previous review, with updated structures and experimental data. I have two comments on the current version:

      (1) When the authors compare their structure to a previously published structure of the same receptor, they say that the previous structure came out while the current manuscript was in revision (line 255). This is not correct. The previous manuscript was published May 14, 2024, and the current manuscript was received by eLife on May 20, 2024. This sentence should be corrected to "During the preparation of this manuscript..."

      We corrected the sentence accordingly (line 259).

      (2) Line 173: what other structures are the authors referring to? Citations should be included here.

      Is Line 193 correct? We added citations (line 190).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, "Cryo-EM structure of the bicarbonate receptor GPR30," the authors aimed to enrich our understanding of the role of GPR30 in pH homeostasis by combining structural analysis with a receptor function assay. This work is a natural development and extension of their previous work on Nature Communications (PMID: 38413581). In the current body of work, they solved the cryo-EM structure of the human GPR30-G-protein (mini-Gsqi) complex in the presence of bicarbonate ions at 3.15 Å resolution. From the atomic model built based on this map, they observed the overall canonical architecture of class A GPCR and also identified 3 extracellular pockets created by ECLs (Pockets A-C). Based on the polarity, location, size, and charge of each pocket, the authors hypothesized that pocket A is a good candidate for the bicarbonate binding site. To identify the bicarbonate binding site, the authors performed an exhaustive mutant analysis of the hydrophilic residues in Pocket A and analyzed receptor reactivity via calcium assay. In addition, the human GPR30-G-protein complex model also enabled the authors to elucidate the G-protein coupling mechanism of this special class A GPCR, which plays a crucial role in pH homeostasis.

      Strengths:

      As a continuation of their recent Nature Communications publication, the authors used cryo-EM coupled with mutagenesis and functional studies to elucidate bicarbonate-GPR30 interaction. This work provided atomic-resolution structural observations for the receptor in complex with G-protein, allowing us to explore its mechanism of action, and will further facilitate drug development targeting GPR30. There were 3 extracellular pockets created by ECLs (Pockets A-C). The authors were able to filter out 2 of them and hypothesized that pocket A was a good candidate for the bicarbonate binding site based on the polarity, location, and charge of each pocket. From there, the authors identified the key residues on GPR30 for its interaction with the substrate, bicarbonate. Together with their previous work, they mapped out amino acids that are critical for receptor reactivity.

      Weaknesses:

      When we see a reduction of a GPCR-mediated downstream signaling, several factors could potentially contribute to this observation: 1) a reduced total expression of this receptor due to the mutation (transcription and translation issue); 2) a reduced surface expression of this receptor due to the mutation (trafficking issue); and 3) a dysfunctional receptor that doesn't signal due to the mutation. In the current revision, based on the gating strategy, the surface expression of the HA-positive WT GPR30-expressing cells is only 10.6% of the total population, while the surface expression levels of the mutants range from 1.89% (P71A) to 64.4% (D111A). Combining this information with the functional readout in Figure 3F and G, as well as their previous work, the authors concluded that mutations at P71, E115, D125, Q138, C207, D210, and H307 would decrease bicarbonate responses. Among those sites,

      E115, Q138, and H307 were from their previous Nature Comm paper.

      Authors claim P71 and C207 make a structural-stability contribution, as their mutations result in a significant reduction in surface expression: P71A (1.89%) and C207A (2.71%). However, compared to 10.6% of the total population in the WT, (P71A is 17.8% of the WT, and C207A is 25.6% of the WT), this doesn't rule out the possibility that the mutated receptor is also dysfunctional: at 10 mM NaHCO3, RFU of WT is ~500, RFU of P71 and C207 are ~0.

      The authors also interpret "The D125ECL1A mutant has lost its activity but is located on the surface" and only mention "D125 is unlikely to be a bicarbonate binding site, and the mutational effect could be explained due to the decreased surface expression". Again, compared to 10.6% of the total population in the WT, D125A (3.94%) is 37.2% of the WT. At 10 mM NaHCO3, the RFU of the WT is ~500, the RFU of D125 is ~0. This doesn't rule out the possibility that the mutated receptor is also dysfunctional. It is not clear why D125A didn't make it to the surface.

      Other mutants that the authors didn't mention much in their text: D111A (64.4%, 607.5% of WT surface expression), E121A (50.4%, 475.5% of WT surface expression), R122 (41.0%, 386.8% of WT surface expression), N276A (38.9%, 367.0% of WT surface expression) and E218A (24.6%, 232.1% of WT surface expression) all have similar RFU as WT, although the surface expression is about 2-6 times more. On the other hand, Q215A (3.18%, 30% of WT surface expression) has similar RFU as WT, with only a third of the receptor on the surface.

      Altogether, the wide range of surface expression across the different cell lines, combined with the different receptor function readouts, makes the cell functional data only partially support their structural observations.

      We sincerely thank the reviewer for their careful reading and thoughtful evaluation of our manuscript on the cryo-EM structure of the bicarbonate receptor GPR30. We greatly appreciate the reviewer’s positive assessment of the overall significance of combining structural determination with extensive mutagenesis and functional assays to advance understanding of bicarbonate–GPR30 interactions and G-protein coupling, as well as their recognition that these atomic-level insights will be valuable for future mechanistic studies and drug-development efforts. We are also grateful for the reviewer’s constructive critique regarding the interpretation of reduced signaling in the context of variable surface expression across mutants, which highlights an important point about disentangling effects of expression/trafficking from intrinsic receptor dysfunction; these comments are highly insightful and will help us strengthen the clarity and rigor of our presentation and conclusions in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      In this revision, the authors have made a significant effort to improve and validate the structural observations, as well as address the comments in the previous submission. They updated the functional assays and evaluated the receptor function by measuring intracellular calcium mobilization, which is a more direct measurement for the downstream signaling of hGPR30-Gq signaling. They also used flow cytometry with an HA-antibody for a more direct measurement of the surface expression of the receptor, replacing their previous assay that normalized to the housekeeping gene Na-K-ATPase.

      I appreciate the effort the authors made to address the previous comments made by the reviewers. However, there are still some concerns about the current data.

      (1) The authors have addressed my previous comment on untangling the mixture of their previous and new data in the "insights into bicarbonate binding" section. They have made it clear that the importance of E115, Q138, and H307 in the receptor-bicarbonate interaction was shown in their Nature Communications paper.

      (2) The authors have addressed my previous comment on adding some content about the physiological concentration of HCO3, or referring more to their previous work about the rationale to select the bicarbonate dose in their functional assay.

      (3) The authors have updated Figure 3

      (4) The authors have updated Supplemental Figure 1 to show the full gel with molecular weight markers in the supplemental data to demonstrate the sample purity.

      (5) The authors have updated the predicted model using AF3

      (6) The authors added E218A as suggested before.

      Some new suggestions for this R1:

      (1) The wide range of surface expression across the different cell lines, combined with the different receptor function readouts, makes the cell functional data only partially support their structural observations.

      We acknowledge this limitation. The wide range of surface expression among cell lines, together with differences in assay modalities, may introduce variability that complicates direct quantitative comparisons and therefore only partially supports the structural observations. Future work using more standardized expression systems and matched functional readouts will be important to strengthen the structure–function linkage.

      (2) Line 101, "ICL1 and ECL1 contain short α helices", no α helix of ICL1 is shown in Figure 2C

      We removed the word “ICL1” (line 98).

      (3) For the unsolved region of ECL2, could the author put a dashed line connecting ECL2 with TM4? In the current Figure 2B, it looks like ECL2 connects TM3 and TM5.

      According to the suggestion, we corrected Figure 2B.

      (4) I appreciate that the authors updated the predicted model with AF3, but they didn't make it clear why they had the comparison between their cryo-EM structure (bicarbonate-activated G-protein-incorporated GPR30) and the predicted AF3 model (inactive GPR30)

      We wish to assert the usefulness of experimental structures, not merely predictions. These include structures independent of receptor activation, such as SS bonds.

      (5) I appreciate that the authors have addressed my previous comment on adding some content about the physiological concentration of HCO3, but it was still not clear to me why they picked 11 mM in Figure 3G for the bar graph. Also, since a dose-response curve was made in Figure 3F, why not just calculate and report the EC50 of NaHCO3 for each mutant?

      Thank you for your comment. Thank you for the comment. We’ve calculated the EC50 of the calcium response and assessed its correlation with receptors’ cell surface expression. We chose 11 mM in Fig .3G since our previous paper in Nature Communications showed the EC50 value of IPs assay was around 11 mM. However, the calcium response was more sensitive and gave a lower value than expected. Therefore, according to your advice, we deleted the bar graph with 11 mM responses, calculated EC50, and drew pictures of the correlation among cell surface expression, EC50, and maximum responses (Figure 3F-I, Supplementary File 1). Moreover, we revised the explanation about this mutagenesis study (lines139-154 and 217-230).

      (6) In the previous submission and comments, E218 was in close contact with bicarbonate in the previous Figure 4D (the bicarbonate is deleted in the new structure). I thank the authors for making an E218A mutant and performing the functional assay. As mentioned above, E218A (24.6%, 232.1% of WT surface expression) has a similar functional readout as WT. Doesn't this also indicate that E218A is partially broken, so you will need twice as much as WT to have the same downstream signal?

      Thank you for your comment. In our revised manuscript, we described the correlation between cell surface expression and EC50 and found that cell surface expression and the response to bicarbonate are not correlated, which you mentioned in your review comment (Figure 3F-I, Supplementary File 1). There are many possibilities that could explain this: GPR30 localization in specific spots on the plasma membrane might limit the response stoichiometry, GPR30 might also work intracellularly to blunt the increased response because of more GPR30 expression on PM, redundant GPR30 on PM might be broken, or E118A might be less functional and need twice as much as WT. We will examine cell surface expression of GPR30 and its response to bicarbonate in a future study.

      I would suggest that the authors in future studies consider using the Tet-on inducible cell lines, such as HEK293 Flp-In Trex. These cell lines will allow the authors to fine-tune the surface expression of their mutants to the same level with different doses of Tetracycline in their stable cell lines.

      We appreciate your advice. We’ll introduce Tet-on inducible cell lines for future research.

      Reviewer #3 (Public review):

      Summary

      GPR30 responds to bicarbonate and plays a role in regulating cellular pH and ion homeostasis. However, the molecular basis of bicarbonate recognition by GPR30 remains unresolved. This study reports the cryo-EM structure of GPR30 bound to a chimeric mini-Gq in the presence of bicarbonate, revealing mechanistic insights into its G-protein coupling. Nonetheless, the study does not identify the bicarbonate-binding site within GPR30.

      Strengths

      The work provides strong structural evidence clarifying how GPR30 engages and couples with Gq.

      Weaknesses

      Several GPR30 mutants exhibited diminished responses to bicarbonate, but their expression levels were also reduced. As a result, the mechanism by which GPR30 recognizes bicarbonate remains uncertain, leaving this aspect of the study incomplete.

      We sincerely thank the reviewer for this thoughtful and balanced assessment of our manuscript, including the clear summary of the central advance and the constructive identification of remaining limitations. We particularly appreciate the recognition that our cryo-EM analysis provides strong structural evidence for how GPR30 engages and couples with Gq, and we agree that pinpointing the bicarbonate-binding site remains a critical open question. In the revised manuscript, we will make this point more explicit, clarify the interpretation of the mutagenesis results in light of reduced receptor expression for some variants, and further strengthen the presentation and discussion of what our current data do—and do not—allow us to conclude regarding bicarbonate recognition by GPR30

      Reviewer #3 (Recommendations for the authors):

      The authors have removed the bicarbonate assignment from their model and have addressed all of my concerns. In this study, or in future work, it would be advisable for the authors to explore the use of bicarbonate mimetics with higher binding affinity to facilitate more definitive structural characterization.

      Thank you for this constructive suggestion. We agree that exploring bicarbonate mimetics with higher binding affinity would be an important next step to enable more definitive structural characterization of GPR30 and to strengthen mechanistic conclusions. In future work, we plan to pursue the identification and/or design of such mimetics, guided by the architecture and mutational landscape of the extracellular pocket described here, and to combine these ligands with optimized cryo-EM sample preparation and complementary functional assays to better stabilize and visualize the bound state.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examines the role of the long non-coding RNA Dreg1 in regulating Gata3 expression and ILC2 development. Using Dreg1-deficient mice, the authors show a selective loss of ILC2s but not T or NK cells, suggesting a lineage-specific requirement for Dreg1. By integrating public chromatin and TF-binding datasets, they propose a Tcf1-Dreg1-Gata3 regulatory axis. The topic is relevant for understanding epigenetic regulation of ILC differentiation.

      Strengths:

      (1) Clear in vivo evidence for a lineage-specific role of Dreg1.

      (2) Comprehensive integration of genomic datasets.

      (3) Cross-species comparison linking mouse and human regulatory regions.

      Weaknesses:

      (1) Mechanistic conclusions remain correlative, relying on public data.

      We agree that the mechanistic conclusions are of our study are indeed correlative and we mention this in the discussion. The primary work of the study is the discovery of Dreg1's necessity for ILC2 development via the new knockout mouse model. Re-analysing good quality publicly available data on rare cell populations is an appropriate approach and in line with DORA guidelines for ethical research.

      (2) Lack of direct chromatin or transcriptional validation of Tcf1-mediated regulation.

      The most appropriate way to examine direct Tcf1 target genes in primary cells is to examine the association of Tcf1 binding with the changes that occur in Tcf1-bound genes after Tcf7 knockout. By analysing publicly available data on ILC progenitors we indeed did this. We revealed that Tcf1 bound to Dreg1 and that Dreg1 was not expressed when Tcf1 was knocked out in ILC progenitors. In addition we examined H3K27ac at the Dreg1 locus in the same ILC progenitors to demonstrate that Tcf1 appears to be important for decorating the Dreg1 gene with this histone modification. We believe that this analysis is sufficient to conclude that Tcf1 is required for the expression of Dreg1 in ILC progenitors.

      (3) Human enhancer function is not experimentally confirmed.

      We agree that the potential human enhancer of GATA3 we identified has not been confirmed in human ILC. However, a previous study showed clear evidence that this region has GATA3 enhancer activity in human T cells. Therefore, while not specific to ILC2s the region where the DREG1 homologues lie does indeed harbour enhancer activity.

      (4) Insufficient methodological detail and limited mechanistic discussion.

      We have now made the changes suggested by the reviewer to both the methods/figure legends and also the discussion.

      Reviewer #1 (Recommendations for the authors):

      The authors generated Dreg1-deficient mice and demonstrated that loss of this locus selectively reduces ILC2s but not T or NK cells, indicating a lineage-specific requirement for Dreg1 in ILC development. By analyzing publicly available chromatin accessibility and transcription factor-binding datasets, they link Dreg1 expression to Tcf1-dependent chromatin activation and extend their findings to human data by identifying a syntenic GATA3 enhancer that produces homologous Dreg lncRNAs in ILC2s. While the study addresses an interesting question, most of the mechanistic interpretations rely heavily on publicly available datasets rather than the authors' own functional evidence. To establish causality and reinforce the overall conclusions, I provide below some comments and suggestions for additional experiments and clarifications that would considerably strengthen the manuscript.

      (1) In Figure 3, the authors use public datasets to argue that Tcf1 regulates Dreg1 expression by modulating chromatin accessibility and H3K27ac at its locus. However, since these data are derived from heterogeneous external sources, the conclusions remain associative. To better support causality, the authors should generate matched datasets from their own sorted progenitor populations and perform CUT&Tag for Tcf1 and H3K27ac in wild-type and Tcf7 knockout progenitors to directly test whether Tcf1 binding establishes an active chromatin state at Dreg1. Also, complementing this with nascent RNA or pre-mRNA quantification would link chromatin activation to transcriptional output. These experiments are technically feasible in progenitors and would substantially strengthen the claim that Tcf1 directly drives Dreg1 activation during ILC development.

      We believe that utilising publicly available data sufficiently answers this question while also adhering to ethical considerations. The ILC populations used to produce the publicly available data were akin to those we examined in our analyses, and the data was of sufficient quality. Moreover, they enable us to access data from Tcf1-deficient mice. Redoing large-scale chromatin profiling on rare cell types would require hundreds of mice to achieve sufficient cell numbers. Repeating this solely for “originality” contradicts the 3Rs principles (replacement, reduction, refinement) if high quality public data already exists and we feel will require years of redundant work. In addition, we believe the fact that the data derive from heterogenous external sources, yet align well, only strengthen our conclusions. We have now added mention to our use of publicly available data in the discussion.

      (2) In Figure 4, the authors provide correlative evidence from public datasets suggesting that the human region syntenic to the murine Dreg1 locus acts as a distal enhancer of GATA3 and gives rise to two ILC2-specific lncRNAs. To substantiate this claim, the authors should perform CUT&Tag for H3K27ac in human ILC2s to confirm enhancer activation and use 3C or HiChIP to demonstrate physical interaction with the GATA3 promoter. These experiments should be doable by fusing pooled ILC2 samples and would provide more direct evidence that this region actively regulates GATA3 expression.

      Assessing the activity of a distal enhancer region on its target gene in primary human cells is extremely difficult, due to a number of technical and biological complications such as enhancer redundancy. This is why we chose to reanalyse an extensive enhancer deletion screen performed in human T cells by Chen et al., AJHG 2023. This analysis clearly showed deletion of the region we identified as harbouring Dreg1 homologues affected GATA3 expression, thus confirming its enhancer activity. While we agree with the reviewer that specific profiling of human ILC populations for H3K27ac and 3D genome architecture would provide further correlative evidence this will be a time-consuming and costly endevour with human material and ultimately the definitive proof in ILCs would require specific deletion of this region in ILC2s. We have mentioned this caveat in the discussion.

      (3) Several figure legends lack essential methodological details. Figure 1 should specify how NK and ILC populations were gated, including intermediate steps and markers used. The same applies to Supplementary Figure 1, and particularly to Supplementary Figure 2, where gating strategies for progenitors are shown but not explained. Figure 2 should also indicate that these analyses were performed in bone marrow. Clearer legends are crucial for interpreting and reproducing the data.

      We have made the suggested changes.

      (4) It is also unclear throughout the manuscript whether the authors performed any ATACseq experiments themselves or relied entirely on public datasets. This information should be stated explicitly in the main text and figure legends, not only in the Methods section. Similarly, the source of the ChIPseq or CUT&Run datasets should be clearly indicated alongside the relevant figures.

      We apologise for not making this clearer and have now clearly articulated if the data was public in the text.

      (5) As the authors themselves suggest, performing experiments that selectively suppress Dreg1 transcription using antisense oligonucleotides or CRISPR interference at the Dreg1 promoter would provide more valuable mechanistic insights. Conducting these experiments in their own system would allow them to determine whether Dreg1 functions through its RNA product or as a DNA enhancer element, thereby strengthening the causal link between Dreg1 activity and Gata3 regulation.

      We agree with the reviewer, however, this, in our opinion is beyond the scope of this manuscript. The strength of this manuscript lies in the findings from the novel Dreg1 knockout mouse strain. Future studies will focus on understanding how Dreg1 influences Gata3 expression.

      (6) The discussion would benefit from a clearer and more integrated explanation of how Dreg1 fits into the transcriptional network that controls ILC2 differentiation. The authors could elaborate on whether Dreg1 fine-tunes Gata3 expression or functions as part of a regulatory loop with Tcf1, and better explain how this mechanism might be conserved in humans. In addition, the authors should explicitly acknowledge the limitations of relying on publicly available datasets and emphasize the need for direct experimental validation to support their mechanistic interpretation.

      We have now made these suggested inclusions.

      Reviewer #2 (Public review):

      The authors investigate the role of the long non-coding RNA Dreg1 for the development, differentiation, or maintenance of group 2 ILC (ILC2). Dreg1 is encoded close to the Gata3 locus, a transcription factor implicated in the differentiation of T cells and ILC, and in particular of type 2 immune cells (i.e., Th2 cells and ILC2). The center of the paper is the generation of a Dreg1-deficient mouse. While Dreg1-/- mice did not show any profound ab T or gd T cell, ILC1, ILC3, and NK cell phenotypes, ILC2 frequencies were reduced in various organs tested (small intestine, lung, visceral adipose tissue). In the bone marrow, immature ILC2 or ILC2 progenitors were reduced, whereas a common ILC progenitor was overrepresented, suggesting a differentiation block. Using ATAC-seq, the authors find that the promoter of Dreg1 is open in early lymphoid progenitors, and the acquisition of chromatin accessibility downstream correlates with increased Dreg1 expression in ILC2 progenitors. Examining publicly available Tcf1 CUT&Run data, they find that Tcf1 was specifically bound to the accessible sites of the Dreg1 locus in early innate lymphoid progenitors. Finally, the syntenic region in the human genome contains two non-coding RNA genes with an expression pattern resembling mouse Dreg1.

      The topic of the manuscript is interesting. However, there are various limitations that are summarized below.

      (1) The authors generated a new mouse model. The strategy should be better described, including the genetic background of the initially microinjected material. How many generations was the targeted offspring backcrossed to C57BL/6J?

      The mice were backcrossed for at least 2 generations to C57BL/6. This information is now included in the methods section.

      (2) The data is obtained from mice in which the Dreg1 gene is deleted in all cells. A cell-intrinsic role of Dreg1 in ILC2 has not been demonstrated. It should be shown that Dreg1 is required in ILC2 and their progenitors.

      We now provide new mixed bone marrow irradiation chimera data that shows that the effect is intrinsic to Dreg1-deficient ILC2 cells (Figure 1F and Supplementary Figure 1E-G).

      (3) The data on how Dreg1 contributes to the differentiation and or maintenance of ILC2 is not addressed at a very definitive level. Does Dreg1 affect Gata3 expression, mRNA stability, or turnover in ILC2? Previous work of the authors indicated that knockdown of Dreg1 does not affect Gata3 expression (PMID: 32970351).

      We have indeed shown that Dreg1-deficient ILC2P have reduced levels of Gata3 (Figure 2H) however we have not determined the exact mechanisms by which Dreg1 controls ILC2 development.

      (4) How Dreg1 exactly affects ILC2 differentiation remains unclear.

      We agree with the reviewer, however, this article is focused on the first description of the Dreg1 knockout mice and the surprisingly specific effect on ILC2 development.

      Reviewer #2 (Recommendations for the authors):

      (1) Relating to point 2 of public review:

      It should be shown that Dreg1 is required in ILC2 and their progenitors. Mixed bone marrow chimeras would be an adequate strategy.

      We have now done this and clearly showed that the effect is intrinsic to Dreg1-deficient ILC2s.

      (2) Relating to point 3 of public review:

      Minimally, Gata3 expression should be analyzed in ILC2, ILC2P, and the ILC progenitors by qRT-PCR and antibody stain.

      We have indeed shown reduced Gata3 levels by antibody stain in Figure 2H.

      (3) Relating to point 4 of public review:

      The manuscript would benefit from additional data studying ILC2 differentiation in (competitive) adoptive transfer experiments or using in vitro differentiation assays.

      We have performed the mixed bone marrow chimera experiments which are testing the competitiveness of Dreg1-deficient bone barrow with control wildtype. In this case the WT ILC2s outcompeted the Dreg1-deficient ILC2s for the same niche.

    1. Author response:

      eLife Assessment

      This valuable study reports a spatiotemporal atlas of mouse placental development and explores the role of glycogen trophoblast cells in fetal viability. Solid data are presented to support the main conclusion. This work will be of great interest to developmental DNA reproductive biologists.

      We thank the editors for this positive and balanced assessment of our study. We are encouraged that the spatiotemporal mouse placental atlas and the functional analysis of glycogen trophoblast cells were considered valuable, and that the data were viewed as providing solid support for the main conclusions.

      In the revised manuscript, we will further clarify the scope of these conclusions, particularly regarding the contribution of GC-associated glycogen metabolism to fetal viability in the global Ano6 knockout model. We will also refine the wording where needed to ensure that the mechanistic interpretation accurately reflects the strength of the available evidence.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors combine single-nucleus RNA sequencing with spatial transcriptomics to generate a spatiotemporal atlas of mouse placental development and explore the role of glycogen trophoblast cells in fetal viability. The study integrates several computational approaches, including trajectory analysis, regulatory network inference, and spatial mapping, together with histology and glycogen measurements. Based on these analyses, the authors propose that glycogen trophoblast cells provide metabolic support that is important for maintaining placental function and fetal survival.

      One of the main strengths of the study is the quality and scope of the dataset. The integration of snRNA-seq with Stereo-seq spatial transcriptomics provides a detailed view of placental organization across regions and developmental stages. This type of combined spatial and transcriptional analysis is still relatively rare in placental biology and represents an important contribution to the field. The atlas itself will likely be a valuable resource for future studies.

      Another strength is the effort to connect transcriptional findings with tissue-level validation. The glycogen staining and biochemical measurements support the interpretation that glycogen trophoblast cells contribute to placental metabolic function. The spatial analyses identifying macrophage accumulation in the labyrinth region of mutant placentas are also interesting and illustrate how spatial approaches can reveal microenvironmental changes that are difficult to detect otherwise.

      The main limitation of the study is that the conclusion that glycogen cells are essential mediators of metabolic support for fetal viability remains partly indirect. The transcriptomic and spatial data strongly suggest a role for these cells, but it is still difficult to determine whether glycogen cell dysfunction is the primary cause of fetal lethality or a consequence of broader placental abnormalities. Clarifying this point would strengthen the central message of the paper.

      Similarly, the macrophage accumulation observed in the labyrinth appears consistent with a response to tissue stress or injury, but its relationship to glycogen cell function is not fully explained. A clearer discussion of whether this represents a primary mechanism or a secondary effect would improve the interpretation.

      Overall, this is a strong dataset and a useful spatial atlas of placental development. The study provides convincing descriptive insight into glycogen trophoblast biology, and with some clarification of the mechanistic conclusions, the manuscript will be even stronger.

      We thank the reviewer for this constructive assessment of our manuscript. We are pleased that the reviewer recognized the quality and scope of the dataset, particularly the integration of snRNA sequencing with Stereo-seq spatial transcriptomics to generate a spatiotemporal atlas of mouse placental development. We also appreciate the reviewer’s view that this atlas represents a valuable resource for the placental biology and developmental biology communities. We also appreciate the reviewer’s important point that the causal relationship between glycogen trophoblast cell dysfunction, placental metabolic impairment, and fetal viability should be presented with appropriate caution. In the revised manuscript, we will clarify that our data support a strong association between impaired glycogen trophoblast cell function, altered placental glycogen metabolism, and fetal lethality in the global Ano6 knockout model, but do not by themselves establish glycogen trophoblast dysfunction as the sole or primary cause of fetal loss. We will revise the relevant sections to avoid overstatement and to distinguish more clearly between direct experimental evidence, correlative spatial-transcriptomic observations, and mechanistic interpretation. Similarly, we agree that the macrophage accumulation observed in the labyrinth region is most appropriately interpreted as a spatially localized immune or tissue-stress response in the mutant placenta. In the revised manuscript, we will expand the discussion to clarify that, while this observation may reflect downstream consequences of placental dysfunction and altered tissue homeostasis, the current data do not establish macrophage accumulation as a primary mechanism linking glycogen trophoblast defects to fetal lethality. We will therefore frame this finding as an important microenvironmental alteration revealed by the spatial atlas, rather than as definitive evidence of a direct causal pathway.

      Reviewer #2 (Public review):

      This manuscript constructs a spatiotemporal transcriptomic atlas (STAMP) of the mouse placenta from E9.5-E18.5 by integrating Stereo-seq and snRNA-seq, and identifies two glycogen trophoblast cell (GC) subtypes (GC-1 and GC-2), a spatial transition from the junctional zone (JZ) to the decidua, and metabolic defects in Ano6-null placentas including GC persistence, glycogen accumulation, reduced glycogenolysis metabolites, and partial rescue by maternal glucose supplementation. The breadth of the dataset and the integration of atlas construction with PAS/TEM/LC-MS analyses are impressive, and the study has the potential to provide a valuable resource for the placental biology community.

      However, in its current form, the central claim that "GC-mediated metabolic support is essential/indispensable for fetal viability" is not sufficiently disentangled from the complex phenotype of a global Ano6 knockout model. In addition, the stage-level biological replication in the atlas and the claim of "single-cell resolution" require more careful presentation. Therefore, while the study is interesting and potentially impactful, substantial revisions are required, particularly to recalibrate the strength of the conclusions and causal interpretations.

      Major comments

      (1) The most significant concern is that the manuscript overinterprets the phenotype observed in a global Ano6 knockout as direct evidence that GC glycogen metabolism is essential for fetal viability. The authors themselves report multiple severe placental abnormalities in the knockout, including reduced placental size and weight, structural defects in the labyrinth, impaired vascularization, and accumulation of abnormal regions. Previous studies cited in the manuscript also indicate that Ano6 deficiency leads to defects in syncytiotrophoblast formation, impaired maternofetal exchange, and perinatal lethality.

      In this context, the current data support an association between GC metabolic defects and fetal lethality, but do not establish that GC glycogen metabolism is the primary causal driver. The conclusion should therefore be moderated (e.g., "contributes to" rather than "is essential for"), unless additional placenta-specific or GC-specific functional validation is provided.

      (2) Maternal glucose supplementation is an interesting functional experiment, but in its current form, it provides supportive rather than definitive mechanistic evidence. While survival improves (from ~3% to ~10%), the rescue remains partial. Moreover, the readouts are largely limited to metabolite restoration (glucose, G1P, G6P) in the placenta and fetal liver.

      To support a stronger causal claim, the authors should assess whether glucose supplementation also rescues: placental morphology (especially labyrinth structure), GC number and PAS staining, ultrastructural glycogen features (TEM), fetal growth and developmental outcomes.

      (3) The atlas is constructed from nine placentas across developmental stages, suggesting limited biological replication per stage. It remains unclear how robust the observed temporal trends are to litter effects, sex differences, or sectioning variability.

      Furthermore, the "single-cell resolution" is not directly measured but inferred via image segmentation and reference-based mapping (e.g., TACCO). This should be more explicitly stated, as it represents computational inference rather than direct single-cell measurement.

      The authors should:

      - clearly report biological replicates per stage (including litter and sex),

      - demonstrate reproducibility of key patterns across independent samples,

      - refine the wording to reflect segmentation- and reference-based single-cell inference.

      (4) The proposed developmental trajectory (JZ progenitor → GC precursor → GC-1 → GC-2) and the claim of GC migration from JZ to decidua are based on spatial distribution and computational trajectory analyses (Monocle, CytoTRACE).

      While this is a compelling model, it remains inferential. The language throughout the manuscript should be softened (e.g., "consistent with spatial transition" rather than "migration"). Ideally, additional experimental validation, such as stage-resolved RNAscope/immunostaining quantification or lineage tracing, would strengthen this claim.

      (5) The manuscript concludes that ANO6 deficiency leads to impaired glycogen utilization, based primarily on the observation that differentiation markers and glycogenolytic enzyme transcripts are unchanged.

      However, this demonstrates what is not altered rather than what is mechanistically responsible for the defect. A more direct mechanistic link is needed, such as changes in enzyme activity, altered intracellular localization, effects on ion homeostasis or membrane biology.

      (6) The statistical framework requires clarification. Several analyses use n = 4-8 placentas or "independent experiments," but it is unclear whether these represent independent litters or multiple samples from the same dam.

      Given the risk of pseudoreplication in placental studies, the authors should define whether n refers to placentas or litters, report the number of dams per genotype, and ensure appropriate statistical treatment (e.g., litter-based analysis or mixed-effects models).

      We thank the Reviewer for the careful evaluation of our manuscript and for recognizing the breadth of the STAMP dataset and the value of integrating spatial transcriptomics, snRNA-seq, PAS, TEM and LC-MS analyses.

      We agree that the current manuscript overstates some mechanistic conclusions. In the revision, we will moderate the central claim and more clearly acknowledge that the global Ano6 knockout model has complex placental defects.

      Comment 1: Causality in the global Ano6 knockout model

      We agree that our current data do not prove that GC glycogen metabolism is the primary cause of fetal lethality in the global Ano6 knockout model. In the revised manuscript, we will avoid presenting GC dysfunction as the sole causal mechanism. We will replace stronger terms such as “essential” or “indispensable” with more measured wording such as “contributes to” or “supports.” We will frame impaired GC-associated glycogen metabolism as one important component of Ano6-null placental dysfunction.

      Comment 2: Maternal glucose supplementation

      We agree that maternal glucose supplementation provides supportive, but not definitive, mechanistic evidence. In the revision, we will describe the partial survival rescue more cautiously and will not use it as proof of GC-specific causality. Where possible, we will also assess whether glucose supplementation affects additional phenotypes, including fetal growth, placental morphology, GC abundance and PAS/glycogen readouts.

      Comment 3: Biological replication and single-cell resolution

      We agree that the replication structure and the wording of “single-cell resolution” need clarification. We will report the number of placentas, litters and available sex information for each stage. We will also revise the wording to make clear that the spatial single-cell annotation is based on image segmentation and snRNA-seq reference mapping, rather than direct single-cell measurement by Stereo-seq alone.

      Comment 4: GC trajectory and spatial transition

      We agree that the proposed GC trajectory and JZ-to-decidua transition remain inferential. We will soften the language throughout the manuscript, using terms such as “spatial transition,” “redistribution,” or “consistent with migration” rather than stating that migration has been directly proven.

      Comment 5: Mechanism of impaired glycogen utilization

      We agree that unchanged GC markers and glycogenolytic enzyme transcripts do not reveal the direct mechanism. In the revision, we will state more clearly that these data argue against gross GC differentiation defects or transcriptional loss of glycogenolytic enzymes, but that the direct mechanism may involve enzyme activity, localization, ion homeostasis or ANO6-dependent membrane biology.

      Comment 6: Statistical framework

      We agree that the statistical framework needs clearer reporting. We will define what each n represents, including placenta, section, litter, dam or independent experiment, and will revise the analysis or description where needed to minimize concerns about pseudoreplication.

      Overall, we appreciate these comments and will use them to make the revised manuscript more precise, transparent and appropriately cautious.

  3. Apr 2026
    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a new autofocusing method, LUNA (Locking Under Nanoscale Accuracy), designed to overcome severe focus drift, a major challenge in long-term time-lapse microscopy. Using this method, they address a fundamental question in bacterial cold shock response: whether cells halt growth and division following an abrupt temperature downshift. Through single-cell analysis, the authors uncover a multi-phase adaptation process with distinct growth deceleration dynamics, and show that bacterial cells adapt to cold shock in a largely uniform manner across the population. Overall, this work provides new insights into the bacterial cold shock response at the single-cell level, extending beyond what can be inferred from population-level measurements.

      Strengths:

      (1) The LUNA method shows improved performance compared to existing autofocusing systems, achieving nanoscale precision over a large focusing range. Its focusing speed is sufficient for the experiments presented, with potential for further improvement through faster motors and optimized control algorithms, suggesting broad applicability. Theoretical simulations and experimental validation together provide strong support for the method's robustness.

      (2) Using LUNA, the authors address a long-standing question in bacterial physiology: whether cells arrest growth and division during the acclimation phase following cold shock. Single-cell analyses across the full course of cold adaptation reveal features that are obscured in bulk-culture studies. Cells continue to grow and divide at reduced rates while maintaining cell size regulation, and exhibit a three-phase adaptation program with distinct growth dynamics. This response appears uniform across the population, with no evidence for bet-hedging. Overall, the experiments are well designed, and the analyses are solid and support the authors' conclusions.

      (3) The authors further propose a model describing how population-level optical density (OD) depends on cell dry mass density, volume, and concentration. Following cold shock, cells grow more slowly and exhibit smaller sizes, explaining the apparently unchanged OD. This model provides a valuable conceptual framework for interpreting OD-based growth measurements, a widely used method in microbiology, and will be of broad interest to the field.

      Weaknesses:

      No major weaknesses identified.

      Comments on revisions:

      The authors have thoroughly addressed all of my questions. I thank them for their clear clarifications and thoughtful revisions, and I greatly appreciate their efforts in improving the manuscript.

      We sincerely thank the reviewer’s for the encouraging comments and positive assessment. We greatly appreciate the reviewer’s constructive feedback during the review process, which helped us improve the manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study presents LUNA, an autofocus method that compensates for focus drift during rapid temperature changes. Using this approach, the authors show that E. coli cells continue to grow and divide during cold shock, revealing a coordinated, multi-phase adaptation process that could not be deduced from traditional population measurements. They propose a scattering-theory-based model that reconciles the paradox between growth differences of the bacteria at the single-cell level vs population level.

      Strengths:

      (1) The LUNA approach is pretty creative, turning coma aberration from what is normally a nuisance into an exploit. LUNA enabled long-term single-cell imaging during rapid temperature downshifts.

      (2) The authors show that the long-assumed growth arrest during cold shock from population-level measurements is misleading. At the single-cell level, bacteria do not stop growing or dividing but undergo a continuous, three-phase adaptation process. Importantly, this behavior is highly synchronized across the population and not based on bet-hedging.

      (3) Finally, the authors propose a model to resolve a long-standing paradox between single-cell vs population behavior: if cells keep growing, why does optical density (OD) of the culture stop increasing? Using light-scattering theory, they show that OD depends not only on cell number but also on cell volume, which decreases after cold shock. As a result, OD can remain flat, or even decrease, despite continued biomass accumulation. This demonstrates that OD is not a reliable proxy for growth under non-steady conditions.

      Weaknesses:

      (1) While the authors theoretically explain the advantages of LUNA over existing autofocus methods, it is unclear whether practical head-to-head comparisons have been performed, apart from the comparison to Nikon PFS shown in Video S1. As written, the manuscript gives the impression that only LUNA can solve this problem, but such a claim would require more systematic and rigorous benchmarking against alternative approaches.

      (2) No mutants/inhibitors used to test and challenge the proposed model.

      (3) Cells display a high degree of synchronization, but they are grown in confined microfluidic channels under highly uniform conditions. It is unclear to what extent this synchrony reflects intrinsic biology versus effects imposed by the microfluidic environment.

      (4) To further test and generalize the model, it would be informative to also examine bacterial responses at intermediate temperatures rather than focusing primarily on a single cold-shock condition.

      Comments on revisions:

      The authors have addressed my comments in their response, but have chosen not to incorporate most of them into the manuscript. Readers may refer to the peer review section for further details.

      We thank the reviewer for this additional comments and for the careful suggestions, and we appreciate that the raised points are valuable for a broader discussion of the topic. In the revised manuscript, we have incorporated the comments most directly relevant to the scope and central conclusions of the study, and have clarified these points in the text where appropriate. Specifically, we have clarified several key issues, including the interpretation of the OD lag as a “combined effect,” the performance and application scope of LUNA, the alignment of cell-cycle progression after cold shock, and relevant methodological details.

      For the remaining contextual issues, we have kept the detailed discussion in the response to reviewers rather than expanding the manuscript extensively, so as to preserve the focus and readability of the main text. We hope that the revisions now better acknowledge the reviewer’s concerns while maintaining a concise presentation of the central findings.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors developed a new autofocusing method, LUNA (Locking Under Nanoscale Accuracy), to address severe focus drift-a major challenge in time-lapse microscopy. Using this method, they tackle a fundamental question in bacterial cold shock whether cells halt growth and division following an abrupt temperature downshift. Overall, the experimental design, modeling, and data analysis are solid and well executed. However, several points require clarification or further support to fully substantiate the authors' conclusions.

      Strengths:

      (1) The LUNA method outperforms existing autofocusing systems with nanoscale precision over a large focusing range. The focusing time is reasonable for the presented experiments, and the authors note potential improvements by using faster motors and optimized control algorithms, suggesting broad applicability. The theoretical simulations and experimental validation provide solid support for the robustness of the method.

      (2) Using LUNA, the authors address a long-standing question in bacterial physiology: whether cells arrest growth and division after an abrupt cold shock. Single-cell analyses monitoring the entire course of cold adaptation and steady-state growth reveal features that are obscured in bulk-culture studies: cells continue to grow at reduced rates with smaller cell sizes, resulting in an apparently unchanged population-level OD. The experiments are well designed and analyses are generally solid and largely support the authors' conclusions.

      (3) The authors also propose a model describing how population-level OD measurements depend on cell dry mass density, volume, and concentration. This provides a valuable conceptual contribution to the interpretation of OD-based growth measurements, which remain a gold-standard method in microbiology.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) It is unclear whether the author's model explaining the population-level OD during acclimation is broadly applicable. Most analyses focus on a shift from 37˚C to 14˚C, where the model agrees well with experimental data. However, in the 37˚C to 12˚C experiment, OD600 decreases after cold shock (Fig. 5e), and the computed OD does not match the experimental measurements (Fig. S16a). Although the authors attribute this discrepancy to a "complicated interplay," no further explanation is provided, which limits confidence in the model's general applicability.

      Thank you for this careful evaluation regarding the model generality. In the experiment with a temperature shift from 37°C to 12°C, the measured OD600 values were 0.243 at 0 hours and 0.242 at 5 hours. In comparison, our model-computed OD600 values were 0.243 at 0 hours and 0.271 at 5 hours. The absolute difference between the measured and computed values at 5 hours is therefore 0.028.

      Given the typical experimental variability in OD600 measurements and the limited linear range of the OD-to-biomass approximation (generally considered reliable below ~0.5), this deviation is quantitatively modest. We appreciate your valuable feedback and are happy to provide further clarification if needed.

      (2) The manuscript proposes that cell-cycle progression becomes synchronized across the population after cold shock, but the supporting evidence is not fully convincing. If synchronization refers primarily to the uniform reduction in growth rate following cold shock, this could plausibly arise from global translation inhibition affecting all cells. However, the additional claim that "cells encountering a relatively late CSR will accelerate division to maintain synchronization" is not strongly supported by the presented data.

      We appreciate your critical reading, which has helped us identify ambiguities in our terminology and strengthen the clarity of our work. Regarding the term “synchronization”, we would like to clarify that it refers to two different scenarios: (i) the synchrony in the timing of growth rate changes after cold shock. The cells initiate the slowdown in growth almost simultaneously, suggesting a highly coordinated, non-stochastic population-level response to cold shock; (ii) the synchrony in division cycle progression.

      In the sentence you referenced “cells encountering a relatively late CSR will accelerate divisions to maintain synchronization”, we intended to describe that cells maintain consistent progression of the division cycle after cold shock, meaning that after the same number of elapsed cycles, different cells are at a similar stage in their division timing (Figure 4f, 4g, Figure S14). The term “accelerate” refers to our observation that cells which complete a given cycle later than others tend to have shorter subsequent inter-division intervals, thereby “catching up” to maintain alignment in cycle number across the population. We acknowledge that using “synchronization” in this scenario may be ambiguous, and we will replace it with more precise phrasing “progression of division cycle” to accurately convey this finding.

      (3) Several technical terms used in the method development section are not clearly defined and may be unfamiliar to a broad readership, which makes it difficult to fully understand the methodology and evaluate its performance. Examples include depth of focus, focusing precision, focusing time, focusing frequency, and drift threshold value. In addition, the reported average focusing time per location (~0.6 s) lacks sufficient context, limiting the reader's ability to assess its significance relative to existing autofocusing methods.

      Thank you for your valuable comments and suggestions. In response, we have added more detailed descriptions in the Methods section of the revised version.

      The reviewer noted that the reported average focusing time (~0.6 s) lacks sufficient context, which may limit readers’ ability to assess its significance relative to existing autofocusing methods. We would like to clarify that the core innovation of this work lies in the proposed theoretical framework for autofocusing, which offers advantages over existing methods in terms of focusing precision and range. While focusing time is a practically relevant performance metric, it is primarily presented here as an implementation-dependent parameter rather than a central theoretical contribution of this study. In our experimental setup, an average focusing time of 0.6 s proved sufficient for routine timelapse imaging in microscopy, thereby demonstrating the practical usability of LUNA.

      Reviewer #2 (Public review):

      Summary:

      This study presents LUNA, an autofocus method that compensates for focus drift during rapid temperature changes. Using this approach, the authors show that E. coli cells continue to grow and divide during cold shock, revealing a coordinated, multi-phase adaptation process that could not be deduced from traditional population measurements. They propose a scattering-theory-based model that reconciles the paradox between growth differences of the bacteria at the single-cell level vs population level.

      Strengths:

      (1) The LUNA approach is pretty creative, turning coma aberration from what is normally a nuisance into an exploit. LUNA enabled long-term single-cell imaging during rapid temperature downshifts.

      (2) The authors show that the long-assumed growth arrest during cold shock from population-level measurements is misleading. At the single-cell level, bacteria do not stop growing or dividing but undergo a continuous, three-phase adaptation process. Importantly, this behavior is highly synchronized across the population and not based on bet-hedging.

      (3) Finally, the authors propose a model to resolve a long-standing paradox between single-cell vs population behavior: if cells keep growing, why does optical density (OD) of the culture stop increasing? Using light-scattering theory, they show that OD depends not only on cell number but also on cell volume, which decreases after cold shock. As a result, OD can remain flat, or even decrease, despite continued biomass accumulation. This demonstrates that OD is not a reliable proxy for growth under non-steady conditions.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) While the authors theoretically explain the advantages of LUNA over existing autofocus methods, it is unclear whether practical head-to-head comparisons have been performed, apart from the comparison to Nikon PFS shown in Video S1. As written, the manuscript gives the impression that only LUNA can solve this problem, but such a claim would require more systematic and rigorous benchmarking against alternative approaches.

      Thank you for your insightful comment regarding the comparison of LUNA with other autofocus methods.

      In our study, we primarily compared LUNA with the Nikon PFS system (as shown in Video S1) because Nikon PFS is one of the most widely used commercial autofocus systems in single-cell time-lapse imaging, and its manufacturer provides well-defined performance parameters (e.g., focusing precision within 1/3 depth-of-focus, response time <0.7 s), which facilitates a quantitative comparison. For other commercial systems, such as Olympus ZDC, Zeiss Definite Focus, Leica AFC, and ASI CRISP, the publicly available specifications are often less clearly defined, or are measured under inconsistent conditions, making a direct head-to-head comparison challenging and potentially misleading. Additionally, in our preliminary experiments, we also tested an Olympus microscope and observed severe focus drift during slow cooling processes. From a physical perspective, LUNA is specifically designed to meet the demanding requirements of single-cell experiments, including a wide focusing range and high precision, while existing commercial systems may not physically achieve the combination of range and accuracy needed for such extreme conditions.

      (2) No mutants/inhibitors used to test and challenge the proposed model.

      We agree that such approaches would provide valuable mechanistic insights and further strengthen the validation of the model presented in this study. In the current work, our primary goal was to introduce LUNA autofocusing method and demonstrate its capability to resolve bacterial cold shock response at the single-cell level with unprecedented precision. As such, we focused on characterizing the wild-type physiological dynamics under cold shock, which already revealed several previously unreported phenomena. We acknowledge that the use of genetic mutants or chemical inhibitors targeting specific cold shock proteins or regulatory pathways would be a logical and powerful next step to dissect the underlying molecular mechanisms and test the causality of the observed growth dynamics. We plan to address this in future work by incorporating such perturbations to further test and refine the model.

      (3) Cells display a high degree of synchronization, but they are grown in confined microfluidic channels under highly uniform conditions. It is unclear to what extent this synchrony reflects intrinsic biology versus effects imposed by the microfluidic environment.

      The reviewer raises a pertinent question regarding whether the observed high degree of cell synchronization represents an intrinsic biological phenomenon or an artifact induced by the microfluidic environment.

      Over the past decade, microfluidic chips, including the specific design used in our work, have become a widely accepted and powerful tool in microbial physiology research. A broad consensus has emerged within the community that the microenvironment within these microchannels does not significantly interfere with or perturb the natural physiological behavior of microorganisms (Dusny, C. & Grünberger, Curr Opin Biotechnol. 63, 26-33 (2020)). This understanding is also supported by the fact that key findings obtained with microfluidic single-cell technologies are reproducible by other methods. For example, the adder model of cell-size homeostasis in E. coli firstly observed in microfluidic chips has been repeatedly validated by different methods (Taheri-Araghi, S. et al. Curr. Biol. 25, 385-391 (2015)). Therefore, while we acknowledge the importance of considering environmental effects, we are confident that the synchronization we report reflects the genuine biological dynamics of E. coli cells.

      (4) To further test and generalize the model, it would be informative to also examine bacterial responses at intermediate temperatures rather than focusing primarily on a single cold-shock condition.

      We thank the reviewer for this thoughtful suggestion. In designing our experiments, we aimed to study the bacterial cold shock response at the single-cell level. A key feature of this response is that it is typically triggered only when the temperature drops below a certain threshold within a short time duration. We therefore chose to lower the temperature from 37 °C to 14 °C as rapidly as possible. This approach allowed us to leverage the unique capabilities of LUNA while also providing an opportunity to explore this biological process in greater detail.

      We agree that investigating bacterial responses across intermediate temperatures would be highly informative for understanding how temperature changes affect cellular physiology. However, this direction addresses a distinct scientific question that lies beyond the scope of the current work. We fully acknowledge its value and do have the intention to explore it in future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major points:

      (1) To strengthen the generality of the conclusions regarding cold shock response, it would be helpful to include a similar single-cell analysis of growth and division (cell size and concentration) for the 37˚C to 12˚C temperature shift. In this case, the experimental acclimation lasts ~5 hours, whereas the model predicts ~2 hours (Fig. S16a). Examining whether the model still holds or whether additional factors (e.g., further reductions in cell size) contribute to the observed OD decrease would clarify this discrepancy.

      We thank the reviewer for this valuable suggestion. Our model for explaining the population-level OD dynamics during acclimation does not depend on single-cell time-lapse microscopy data. Instead, the single-cell inputs used for parameterization were obtained from flow cytometry measurements, which quantify population-wide single-cell distributions. Therefore, the model is not intrinsically restricted to a specific imaging-based experimental setup or to a particular temperature shift.

      Most of the quantitative analysis presented in the manuscript focuses on the 37°C to 14°C transition, where the model shows strong agreement with experimental OD measurements. We selected this condition because it provides high-quality, internally consistent datasets at both the single-cell and population levels. However, the modeling framework itself is mechanistic and parameter-based, rather than temperature-specific. In principle, it can be applied to other temperature shifts, provided that the corresponding single-cell growth and state-transition parameters are experimentally determined.

      Regarding the temperature shift from 37°C to 12°C, the model demonstrates good agreement with the experimental observation that acclimation lasts approximately 5 hours. The minor deviations in several data points during the acclimation period can be attributed to systematic errors in the measurement of cell concentration and volume, as illustrated in the lower panel of Figure S16a. We are open to extend our analysis to additional temperature shifts in future work to further validate the model’s generality.

      (2) Related to weakness #2, it would be helpful for the authors to clarify their definition of "synchronization" and to provide additional explanation or evidence supporting this claim. In particular, further discussion of the data in Fig. 4f, 4g, and S14 could help strengthen the proposed hypothesis.

      We thank the reviewer for this constructive suggestion. In previous response (public review weakness #2), we clarified the definition of “synchronization” in the revised manuscript by explicitly distinguishing between two types of synchrony: (i) the synchrony in the timing of growth rate changes after cold shock, and (ii) the synchrony in division cycle progression. For the latter, we now use the more precise term “progression of division cycle” to avoid ambiguity. Furthermore, we have expanded the discussion of the data in Figures 4f, 4g, and S14 to better support the claim that cells actively maintain alignment in cycle progression. We hope these revisions address the reviewer’s concern and strengthen the evidence for our hypothesis.

      Minor points:

      (1) Line 78: "... and concluded that the OD lag is actually the outcome of the synergy of changes in bacterial concentration and volume, ..." The term synergy usually implies a combined effect greater than the sum of individual effects. Are the changes in bacterial concentration and volume synergistic here?

      We agree with your observation that the term "synergy" in scientific contexts typically implies an interaction effect that is greater than the sum of individual effects. In our original phrasing, we intended to convey that the observed OD lag is a result of the combined contributions from both changes in bacterial concentration and changes in cell volume, rather than being dominated by a single factor. We did not mean to imply a super-additive interaction between these two variables.

      We acknowledge that the relationship between bacterial concentration and cell volume can be complex and may even exhibit interdependence under certain conditions (e.g., under nutrient limitation at high OD). However, using "synergy" could indeed be misleading. To ensure terminological precision and avoid any potential misinterpretation, we will revise the text in the revised manuscript. We will replace "synergy" with a more neutral and accurate phrase "combined effect".

      (2) Figure 2d: Why does the focusing time increase even after temperature stabilizes following the downshift? Does focus drift depend not only on rapid cooling but also on the lower steady-state temperature? Additional explanation would be helpful.

      As noted in the Methods section ("Time-lapse imaging of bacteria under CS"), when the temperature was lowered, the objective lens heater was stopped, which caused a slightly longer focusing time. This is because prior to the temperature downshift, the objective heater maintained the objective at a temperature close to that of the sample (37°C), minimizing any thermal gradient between them. After the temperature decrease to 14°C, while the sample chamber was precisely controlled at the target low temperature, the objective lens now without active heating gradually equilibrated to ambient room temperature (approximately 22–25°C). This created a stable temperature mismatch between the relatively warmer objective and the colder sample. Such a temperature gradient can cause minor thermal expansion or contraction of the objective lens barrel, leading to a small but persistent shift in the focal plane. Consequently, the focusing time remained slightly elevated (∼0.6 s) compared to the 37°C condition (∼0.3 s), even after the sample temperature had stabilized. This offset reflects the steady-state thermal disequilibrium between the objective and the sample, rather than a transient cooling effect. We hope this explanation clarifies the reviewer’s concern.

      (3) Line 234: "Reanalysis of the protein synthesis dynamics after CS revealed increase in CSPs synthesis (Figure 3e)." A citation is needed here. Additionally, the dataset referenced here was generated using a 37˚C to 10˚C cold shock.

      We thank the reviewer for the insightful comments and the careful reading of our manuscript. We have now added the appropriate citation in the main text (Zhang, Y. et al. Molecular Cell 70, 274–286 (2018)). The dataset used in this reanalysis was generated under a 37°C to 10°C cold shock, rather than 12°C, and we have clarified this in the Methods section to avoid any ambiguity.

      We would also like to clarify our rationale for using this published dataset in the present context. To our knowledge, no published dataset exists with comparable protein synthesis dynamics specifically at 12°C. Our intention here was to reference a well-characterized cold-shock dataset to support the qualitative point that CSP synthesis increases and ribosome synthesis decreases after cold shock. In cold shock studies, many qualitative conclusions are broadly consistent across low-temperature conditions (e.g., below ~15°C, and in some cases more broadly below ~20°C), including the observation that the ribosomal protein fraction is relatively insensitive to temperature change (Herendeen, S. L. et al. Journal of Bacteriology. 139, 185–194 (1979), Knapp, B. D. & Huang, K. C. Annual Review of Biophysics. 51, 499–526 (2022)). We appreciate the reviewer’s valuable feedback, which has helped us improve the clarity and accuracy of our work.

      (4) Figure 3f and 3g: How is growth rate defined here, and why do the elongation rate and growth rate yield different results? My understanding is that, during steady-state growth, cell elongation rate increases as cells progress through a single cell cycle prior to division, whereas G0 cells exhibit reduced elongation rate following cold shock. Is this correct? More explanation is also needed for "linear growth in growth mode" (Line 267).

      Thank you for this important comment. In our manuscript, we use:

      Elongation rate = dL/dt (the absolute rate of increase in cell length; y-axis in Figure 3f)

      Growth rate = (dL/dt)/L (i.e., λ, y-axis in Figure 3g; also referred to in some studies as the instantaneous growth rate)

      Because these are different quantities, they do not necessarily follow the same trend across the cell cycle. To clarify the logic behind our “growth mode” classification (also see Willis & Huang, Nat Rev Microbiol 2017):

      For a rod-shaped cell growing in length L,

      (1) Exponential growth means the elongation rate is proportional to cell size, i.e.,

      𝑑𝐿/𝑑𝑡 ∝ 𝐿

      or equivalently,

      (𝑑𝐿/𝑑𝑡)/𝐿) = constant

      (2) Linear growth means the elongation rate is constant throughout the cell cycle, i.e.,

      𝑑𝐿/𝑑𝑡 = constant

      which implies that

      (𝑑𝐿/𝑑𝑡)/𝐿)

      decreases as the cell elongates.

      Based on these two basic cases, additional growth modes (e.g., super-exponential, sub-exponential, sub-linear) can also be defined, as illustrated in the Author response image 1.

      Author response image 1.

      With this definition, our interpretation of Figure 3f and 3g is as follows: before cold shock, cells are consistent with approximately exponential growth (red line in Figure 3g), whereas after cold shock, the G0 cells are better described as undergoing approximately linear growth (yellow line in Figure 3f).

      (5) Figure S12: Why are the curves not continuous across GN, G0, G1, and G2?

      In this figure, we present two different metrics: elongation rate (𝑑𝐿/𝑑𝑡) in panel (a) and growth rate (𝜆 = (𝑑𝐿/𝑑𝑡)/𝐿) in panel (b). During bacterial division, the cell length approximately halves while the growth rate remains constant under steady-state conditions. As a result, elongation rate, which is proportional to the instantaneous length, also halves at each division event, leading to the observed discontinuities at the time points corresponding to divisions (GN, G0, G1, and G2). In contrast, growth rate is inherently continuous across divisions, as shown in panel (b), although minor apparent discontinuities may appear due to the finite temporal resolution of our measurements. We hope this explanation clarifies the figure.

      (6) Figure 4d: X-axis labels are missing.

      Thank you for your insightful comment. The six panels share identical axes in Figure 4d. To enhance the visual focus on the data trends across different generations, we intentionally displayed the X-axis label and numerical tick labels only on the first panel. The subsequent panels show only the tick marks without the numerical labels, as their scale is identical to that of the first panel.

      (7) Line 285 and Figure 4e: "The changes in λ are highly synchronized in time, with the exact time lag between any pair of ξ not exceeding 2 min ..." What is the definition of time lag?

      In our study, the term "time lag" refers to the absolute difference in time at which a large sudden drop of the λ curve occurs between any two pairs of ξ. Essentially, it quantifies how closely the dynamic changes in λ are aligned across different groups. A time lag of zero would indicate perfect synchrony, while a value within 2 minutes implies that the variations in λ for any pair of ξ occur nearly simultaneously.

      (8) Figure S14: Why can the elapsed cycles take negative values?

      In Figure S14, we plotted the centered values. Specifically, at each time point, we calculated the mean elapsed cycle number across all lineages, and then subtracted this mean from each group’s value. The resulting values are presented in the figure as “Elapsed cycles (zero-centered)”. Thus, negative values are expected and meaningful they represent lineages that are progressing more slowly than the average at that time point. This transformation helps to highlight the relative differences among groups over time, while removing the overall temporal trend (which is already shown in Figure 4g).

      (9) Figure 5 legend: Fitting for the acclimation has a R2 of -0.263 (Pearson correlation coefficient -0.00). R^2 should not be negative, and it doesn't agree with the calculated Pearson correlation coefficient.

      Thank you for this important observation. Indeed, R<sup>2</sup> should normally fall within the range [0, 1]. This discrepancy arises because the fitting model used differs from the default linear regression, and we did not specify this in the original figure legend. In the revised manuscript, this has been corrected. The explanation why R<sup>2</sup> is negative here is as follows:

      The linear fit used is y = a·x (i.e., no-intercept, forced through the origin). This is based on the physical principle that when OD is zero (no bacteria), the total bacterial mass must also be zero. For ordinary linear models with an intercept, R<sup>2</sup> ranges from 0 to 1. However, for no-intercept models, the calculation of total sum of squares (SS<sub>tot</sub>) differs (typically relative to zero rather than the mean of y), and R<sup>2</sup> can become negative if the fit performs worse than the baseline y = 0. Here, R<sup>2</sup> = -0.263 simply indicates that for these specific data points, the origin-constrained linear fit does not outperform the trivial y=0 model. Regarding the Pearson correlation: The near-zero coefficient (-0.00) suggests no significant linear trend between X and Y, which is consistent with the poor fit performance.

      (10) Language and typos: The manuscript contains grammatical errors and typos that require careful proofreading (one example: Line 56 "..., and reflection-based approaches ...").

      We thank the reviewer for the careful reading and for drawing our attention to the language and typographical issues in the manuscript. In the revised version, we will carefully proofread the entire text and correct any errors and inconsistencies, including the example pointed out in line 56.

      Reviewer #2 (Recommendations for the authors):

      (1) The LUNA section is extremely technical and advanced for most biologists - it might be useful to include a few sentences in simple language why LUNA helps solve the biology question.

      We thank the reviewer for the valuable suggestion. We have now added a concise, plain-language overview at the end of the LUNA section (Performance Analysis of LUNA):

      “In brief, LUNA locks the focal plane with nanometer-scale precision over an ultra-large range rapidly, ensuring stable focus during long-term imaging for reliable observation of fine subcellular structures and dynamics.”

      (2) The suggestions I included in the weakness section are not mandatory to perform, but will be helpful to at least discuss in the paper.

      We thank the reviewer for the thoughtful comment and for acknowledging that the suggestions in the weakness section are not mandatory. We have carefully considered each point raised and have provided detailed responses in the point-by-point reply. While we recognize the potential value of these suggestions for further expanding the study, we respectfully believe that incorporating them into the current manuscript would go beyond the intended scope of this work.

      Thanks

      Otherwise, great job with the paper!

      We are truly grateful to the reviewer for the encouraging feedback and appreciate the time and effort invested in improving our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Concerns persist regarding the interpretation of data and the validation of experiments. First, the presence of T cells, NKT cells, and neutrophils in both the control and METH-treated hippocampi suggests that blood contamination rather than immune cell infiltration is the cause. Since the authors claim that METH disrupts the blood-brain barrier, increasing the infiltration of these immune cells, identifying the source of these immune cells is critical.

      We sincerely appreciate the valuable suggestions you have provided. Your professional perspective impresses us. Based on your suggestion, we conducted a systematic review and in-depth analysis of the experimental process.

      As you have pointed out, we believe that the T cells, NK cells and neutrophils detected in the single-cell sequencing of the mouse hippocampus may have a blood-derived origin. However, this does not mean that the presence of these cell types in the control group is abnormal, because in many literature, these cells can also be found in the hippocampus of control mice. Nevertheless, clarifying the origin and location of these cells will help to further strengthen the persuasiveness of the research hypothesis. Although there is currently no systematic discussion on the role of such cells in the field of methamphetamine neurotoxicity research, we believe that the relevant findings still have certain reference value for subsequent research in this field.

      Our response is based on the following description:

      (1) Insufficient perfusion during the extraction of the hippocampus may lead to a certain degree of blood contamination.

      Given that the single-cell sequencing technique employed in this study can detect all the mRNA of the entire cell, in order to ensure that the cells are in the optimal physiological state and to minimize the stress response caused by the experimental operation on the cells, we perfused the anesthetized mice with cold PBS for approximately 3 min (this has been supplemented in the Materials and methods Line165-166), and completed the rapid dissection and collection of the mouse hippocampus on the ice surface within 2 min, and immediately placed it in an appropriate amount of tissue preservation solution for storage. The time of tissue perfusion might be insufficient or the perfusion volume might not be adequate, resulting in the incomplete expulsion of all the blood. Subsequently, the decomposition operations of the tissue samples were all carried out in the preservation solution or PBS buffer, which to some extent reduced the potential interference of blood components on the experimental results. Additionally, T cells, NKT cells and neutrophils in the capillary perivascular spaces of the hippocampal tissue might still remain and be successfully captured, and were reflected in the final sequencing data.

      (2) The presence of T cells, NKT cells, and neutrophils in the brain tissue of normal mice has been supported by existing literature. Moreover, several studies have specifically described the localization of these immune cell types within the brain parenchyma.

      Contemporary studies have completely changed the view of brain immunity from envisioning the brain as isolated and inaccessible to peripheral immune cells to an organ in close physical and functional communication with the immune system for its maintenance, function, and repair. Circulating immune cells reside in special niches in the brain’s borders, the choroid plexus, meninges, and perivascular spaces, from which they patrol and sense the brain in a remote manner [1].

      A large-scale mouse brain cell atlas study also reported that approximately 8% of non-neuronal cells are immune cells, including microglia, boundary-associated macrophages, lymphocytes, dendritic cells, and monocytes [2].

      Hang Yao et al. demonstrated through flow cytometry that neutrophils were present in the hippocampal tissues of both healthy control mice and depressed mice (Fig.2 H) [3]. Wei Su et al. identified through single-cell sequencing that dendritic cells, neutrophils, macrophages, T cells, and NKT cells were present in the brain tissues of non-transgenic (Non-Tg) control mice (Fig.1a-b), and the localization of these cells was explicitly characterized as brain parenchyma in the study [4]. Tomomi M Yoshida et al. discovered through immunohistochemistry (IHC) and single-cell sequencing techniques that there were a certain number of CD3+ and CD4+ T cells in the hippocampus and other regions of the brain, and they observed that these cells were located outside the blood vessels. (Fig.1a-c, g) [5].

      (3) Both the analysis of immune cells within blood vessels and those in the brain parenchyma contribute to elucidating the immune effects in the hippocampal microenvironment under chronic METH exposure, as well as their interactions with other cell types. At present, the understanding of the neurotoxicity of methylphenidate and the immune system is still limited to the central resident immune cells, such as microglia, astrocytes and oligodendrocytes [6]. Adaptive immune cells and myeloid cells recruited from the circulation have also been implicated in brain development, function, and aging. Their depletion during developmental stages can disrupt critical neural processes, including glial cell maturation, neuronal activity, and myelinogenesis. However, the precise developmental stage at which lymphocyte infiltration into the central nervous system occurs remains to be elucidated [7].

      Our data results indicate that during chronic METH abuse, T cells are more active and participate in the regulation of cytokines through complement signaling. At the same time, the frequency of cell communication between endothelial cells and epithelial cells is increased. Moreover, microglia upregulated the processes of cell chemotaxis and migration, as well as the communication with immune cells such as T cells, and to some extent, this also suggests an enhanced infiltration of T cells. However, we also recognize that the current conclusions regarding immune cell infiltration based on sequencing data and literature reports lack the support of experimental data. Currently, we are conducting morphological analysis using the same batch of brain tissue samples to further validate the relevant findings.

      Immune fluorescence staining and flow cytometry can be utilized to further determine the locations of these immune cells in the hippocampus. The classical pathways through which peripheral immune cells enter the brain mainly include the BBB and the choroid plexus. In June 2025, Kim N. Green et al. published a study in Neuron, further revealing that during the developmental stage and in cases of inflammatory diseases, immune cells can also infiltrate the brain parenchyma through a newly identified channel - the medial ventricle, thereby further confirming that these cells have the ability to migrate to the central nervous system under specific physiological or pathological conditions [8].

      (1) Castellani G, Croese T, Peralta Ramos JM, Schwartz M. Transforming the understanding of brain immunity. Science. 2023 Apr 7;380(6640):eabo7649. doi: 10.1126/science.abo7649.

      (2) Zhang M, Pan X, Jung W, Halpern AR, Eichhorn SW, Lei Z, Cohen L, Smith KA, Tasic B, Yao Z, Zeng H, Zhuang X. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature. 2023 Dec;624(7991):343-354. doi: 10.1038/s41586-023-06808-9.

      (3) Yao H, Jiang SY, Jiao YY, Zhou ZY, Zhu Z, Wang C, Zhang KZ, Ma TF, Hu G, Du RH, Lu M. Astrocyte-derived CCL5-mediated CCR5+ neutrophil infiltration drives depression pathogenesis. Sci Adv. 2025 May 23;11(21):eadt6632. doi: 10.1126/sciadv.adt6632.

      (4) Su W, Saravia J, Risch I, Rankin S, Guy C, Chapman NM, Shi H, Sun Y, Kc A, Li W, Huang H, Lim SA, Hu H, Wang Y, Liu D, Jiao Y, Chen PC, Soliman H, Yan KK, Zhang J, Vogel P, Liu X, Serrano GE, Beach TG, Yu J, Peng J, Chi H. CXCR6 orchestrates brain CD8+ T cell residency and limits mouse Alzheimer's disease pathology. Nat Immunol. 2023 Oct;24(10):1735-1747. doi: 10.1038/s41590-023-01604-z.

      (5) Yoshida TM, Nguyen M, Zhang L, Lu BY, Zhu B, Murray KN, Mineur YS, Zhang C, Xu D, Lin E, Luchsinger J, Bhatta S, Waizman DA, Coden ME, Ma Y, Israni-Winger K, Russo A, Wang H, Song W, Al Souz J, Zhao H, Craft JE, Picciotto MR, Grutzendler J, Distasio M, Palm NW, Hafler DA, Wang A. The subfornical organ is a nucleus for gut-derived T cells that regulate behaviour. Nature. 2025 Jul;643(8071):499-508. doi: 10.1038/s41586-025-09050-7.

      (6) Shi S, Sun Y, Zan G, Zhao M. The interaction between central and peripheral immune systems in methamphetamine use disorder: current status and future directions. J Neuroinflammation. 2025 Feb 15;22(1):40. doi: 10.1186/s12974-025-03372-z.

      (7) Castellani G, Croese T, Peralta Ramos JM, Schwartz M. Transforming the understanding of brain immunity. Science. 2023 Apr 7;380(6640):eabo7649. doi: 10.1126/science.abo7649.

      (8) Hohsfield LA, Kim SJ, Barahona RA, Henningfield CM, Mansour K, Vallejo KD, Tsourmas KI, Kwang NE, Ghorbanian Y, Angulo JAA, Gao P, Pachow C, Inlay MA, Walsh CM, Xu X, Lane TE, Green KN. Identification of the velum interpositum as a meningeal-CNS route for myeloid cell trafficking into the brain. Neuron. 2025 May 28:S0896-6273(25)00351-4. doi: 10.1016/j.neuron.2025.05.004.

      (2) Secondly, the pseudotime analysis, which suggests altered neural stem cell (NSC) differentiation, is not conclusively supported by the current data and requires further validation.

      We sincerely appreciate your valuable feedback, which we find highly relevant and constructive. It is important to acknowledge that the sequencing data presented in our study currently lacks experimental validation. Nevertheless, considering that existing research on the effects of METH on neural stem cell differentiation predominantly emphasizes observational phenomena and remains limited in terms of in vivo experimental evidence and mechanistic investigations, we aim to contribute our analytical findings as a reference for further scholarly exploration in this field.

      Our study utilized pseudotime analysis (powered by Monocle2) to reconstruct an "imaginary timeline" (pseudo-time) based on intercellular gene expression similarities, thereby modeling the dynamic state transitions of cells during continuous biological processes. Drawing upon single-cell RNA sequencing data captured as "snapshots" from hippocampal astrocytes, neural stem cells, and neuroblasts in mice four weeks after METH exposure, we applied computational algorithms to integrate the originally discrete cellular states into a continuous pseudo-time trajectory. This approach was employed to elucidate the differentiation stages of these cell populations, identify potential branching points in their developmental pathways, and uncover the key regulatory genes driving the differentiation process. Pseudotime analysis, as a computational approach grounded in mathematical modeling, yields inferences that are contingent upon the underlying assumptions of the algorithms employed. Consequently, experimental validation through methodologies such as time-series sampling and lineage tracing is essential to substantiate the derived biological interpretations. In light of the insufficiency of such empirical verification to date, our conclusions concerning alterations in the dynamic behavior of neural stem cell differentiation remain preliminary and require further experimental support.

      In Figures 5C and 5F, we present the expression profiles of the four genes exhibiting the most statistically significant differences across the differentiation trajectory. In Figures 5B and 5E, we conducted GO and KEGG functional enrichment analyses on the genes that showed significant differential expression at different differentiation stages. While no studies within the current METH research domain have reported on the potential effects of these genes on neural stem cell differentiation, emerging evidence from related fields provides preliminary insights into their functional roles. For instance, the Flt1 gene (also known as VEGFR1), referred to as the vascular endothelial growth factor receptor, has been demonstrated to play a critical role in the conversion of Müller glial cells into neurons within the zebrafish retina [1], serves as a critical regulator in promoting definitive neural stem cell survival [2]. Furthermore, it substantiates the intricate interconnection between neurons, neural stem cells, and vascular cells, as identified in our cell communication analysis. Hsp1b gene plays a significant role in ferroptosis and autophagy processes of nerve cells[3, 4], and may be closely related to the self-renewal ability of neural stem cell, while METH may impair neural stem cell function by disrupting autophagy, leading to reduced self-renewal capacity and altered differentiation potential [5]. In METH group, Sox11 has been shown to play a critical role in early differentiation and neuronal growth, both during perinatal development and in adult neurogenesis [6] Fos gene plays a critical regulatory role in the differentiation of neural stem cells into neurons and in modulating neuronal functional activities [7]; Alterations in Ccl5 expression levels may indicate astrocyte-mediated inflammatory responses, which could represent one of the underlying mechanisms through which METH promotes the differentiation of neural stem cells into astrocytes.

      Thank you very much for your thoughtful questions and valuable suggestions. These suggestions have helped us gain a deeper understanding of the areas where we can improve, and have guided us toward more meaningful directions for future research.

      (1) Mitra S, Devi S, Lee MS, Jui J, Sahu A, Goldman D. Vegf signaling between Müller glia and vascular endothelial cells is regulated by immune cells and stimulates retina regeneration. Proc Natl Acad Sci U S A. 2022 Dec 13;119(50):e2211690119. doi: 10.1073/pnas.2211690119.

      (2) Wada T, Haigh JJ, Ema M, Hitoshi S, Chaddah R, Rossant J, Nagy A, van der Kooy D. Vascular endothelial growth factor directly inhibits primitive neural stem cell survival but promotes definitive neural stem cell survival. J Neurosci. 2006 Jun 21;26(25):6803-12. doi: 10.1523/JNEUROSCI.0526-06.2006.

      (3) Meng J, Fang J, Bao Y, Chen H, Hu X, Wang Z, Li M, Cheng Q, Dong Y, Yang X, Zou Y, Zhao D, Tang J, Zhang W, Chen C. The biphasic role of Hspb1 on ferroptotic cell death in Parkinson's disease. Theranostics. 2024 Aug 1;14(12):4643-4666. doi: 10.7150/thno.98457.

      (4) Sisto A, van Wermeskerken T, Pancher M, Gatto P, Asselbergh B, Assunção Carreira ÁS, De Winter V, Adami V, Provenzani A, Timmerman V. Autophagy induction by piplartine ameliorates axonal degeneration caused by mutant HSPB1 and HSPB8 in Charcot-Marie-Tooth type 2 neuropathies. Autophagy. 2025 May;21(5):1116-1143. doi: 10.1080/15548627.2024.2439649.

      (5) Gu C, Wang Z, Luo W, Ling H, Cui X, Deng T, Li K, Huang W, Xie Q, Tao B, Qi X, Peng X, Ding J, Qiu P. Impaired olfactory bulb neurogenesis mediated by Notch1 contributes to olfactory dysfunction in mice chronically exposed to methamphetamine. Cell Biol Toxicol. 2025 Feb 20;41(1):46. doi: 10.1007/s10565-025-10004-y.

      (6) Rasetto NB, Giacomini D, Berardino AA, Waichman TV, Beckel MS, Di Bella DJ, Brown J, Davies-Sala MG, Gerhardinger C, Lie DC, Arlotta P, Chernomoretz A, Schinder AF. Transcriptional dynamics orchestrating the development and integration of neurons born in the adult hippocampus. Sci Adv. 2024 Jul 19;10(29):eadp6039. doi: 10.1126/sciadv.adp6039.

      (7) Pagin M, Pernebrink M, Pitasi M, Malighetti F, Ngan CY, Ottolenghi S, Pavesi G, Cantù C, Nicolis SK. FOS Rescues Neuronal Differentiation of Sox2-Deleted Neural Stem Cells by Genome-Wide Regulation of Common SOX2 and AP1(FOS-JUN) Target Genes. Cells. 2021 Jul 12;10(7):1757. doi: 10.3390/cells10071757.

      Reviewer #2 (Public review):

      (1) Despite this potential novelty, the study has numerous weaknesses. Notably, single-cell RNA sequencing was unable to capture an adequate number of neuronal populations. Neurons accounted for only approximately 0.6% of the total nuclei, representing a significant underrepresentation compared to their actual physiological proportion. Given that the behavioral effects of METH are likely mediated by neuronal dysfunction, readers would reasonably expect to see transcriptional changes in neurons. The authors should explain why they were unable to capture a sufficient number of neurons and justify how this incomplete dataset can still provide meaningful scientific insights for researchers studying METH-induced hippocampal damage and behavioral alterations.

      Thank you sincerely for bringing this important issue to our attention.

      Firstly, this represents an unavoidable technical bottleneck. The single-cell sequencing (scRNA-seq) we perform involves the detection of mRNA at the whole-cell level, a process that necessitates cells with high structural integrity, robust viability, and minimal exposure to external stimuli. During the preparation of single-cell suspensions, mature neurons due to their highly differentiated state, morphological rigidity, and excessively long axons often fail to maintain structural integrity. These cells typically undergo death during the dissociation process, lose viability, and are subsequently excluded prior to sequencing. To retain a substantial amount of neuron-related data, an alternative technique single-cell nuclear sequencing (snRNA-seq) should be employed. This method does not necessitate cell viability and focuses exclusively on the nuclei of individual cells, thereby capturing mRNA information solely from the nuclear compartment. Consequently, mRNA data originating from the cytoplasm and organelles will not be represented.

      Secondly, numerous studies have shown that the neurological damage caused by chronic exposure to methamphetamine exhibits a high degree of similarity in clinical manifestations and pathogenesis to neurodegenerative diseases (such as Alzheimer's disease, Parkinson's disease, etc.) [1-4].

      We fully acknowledge the central role of neurons in cognitive functions and the pathogenesis of cognitive disorders. However, despite decades of neuron-centric research that has yielded significant advancements, major challenges remain in elucidating disease origins, identifying early pathological events, and developing effective therapeutic strategies. For example, current models fail to adequately explain early disease events. Many pathological hallmarks of cognitive disorders such as amyloid plaques, neurofibrillary tangles, and α-synuclein aggregation emerge in the extracellular space long before overt neuronal loss or dysfunction occurs, and are increasingly recognized to be initiated or modulated by non-neuronal cells, including astrocytes and microglia [5]. Furthermore, the critical contribution of the neural microenvironment is often overlooked. Neuronal function and survival are highly dependent on this microenvironment, which is predominantly established and maintained by non-neuronal cell types such as astrocytes, oligodendrocytes, microglia, vascular endothelial cells, pericytes, and interstitial cells and matrix [6-10]. Additionally, systemic factors such as metabolic dysregulation, peripheral inflammation, and vascular pathology are closely associated with cognitive disorders. These factors often initially impact non-neuronal cells, particularly those forming the blood-brain barrier (e.g., endothelial cells) or mediating immune responses (e.g., microglia), before exerting downstream effects on neurons [11,12]. Finally, current therapeutic approaches for neuron face significant limitations, highlighting an urgent need for novel intervention strategies.

      During the development of neurodegenerative chronic diseases, although the structural or functional abnormalities of neurons are the direct factors leading to clinical symptoms (such as cognitive decline), this process is often regulated by various auxiliary cell types such as glial cells, immune cells, and stromal cells, and constitutes a complex pathological mechanism network. It is worth noting that the chronic and persistent progression of the disease usually results from the failure of these auxiliary cells to effectively provide support and nutrition to neurons, and even in some pathological states, they transform into effector cells that promote neuronal damage [13,14]. In recent years, a growing number of evidence has demonstrated that glial cells, immune cells, and stromal cells exert critical regulatory functions in the pathogenesis of neurodegenerative diseases. These cell types not only contribute to the maintenance of neural microenvironmental homeostasis during the early stages of disease progression but also display substantial functional heterogeneity in modulating inflammatory responses, synaptic plasticity, the repair of neuronal injury, linking genetic risks with environmental factors and the pathogenic mechanism of pathological protein propagation [15-19]. These research results indicate that they have the potential to become key therapeutic targets in clinical interventions: 1. compared to neurons themselves, they are more susceptible to being targeted by drugs or biological agents (such as antibodies), and have higher accessibility; 2. Non-neuronal cells (especially glial cells) exhibit high plasticity and reactivity during the course of diseases, providing an opportunity window for intervening in their functional states (such as inhibiting harmful activation and promoting protective functions); 3. they can serve as early intervention targets before irreversible damage occurs to neurons, helping to prevent or delay the progression of the disease;4. intervention methods targeting these targets are diverse, including immunomodulation, anti-inflammatory, vascular protection, and metabolic regulation strategies, which are usually more feasible in practical applications than directly protecting the fragile neurons.

      Early pharmacological studies have extensively characterized the neurotoxic effects of METH, including the induction of autophagy, apoptosis, oxidative stress, endoplasmic reticulum stress, and dopaminergic neurotoxicity [20]. However, therapeutic options and pharmacological interventions for METH abuse remain limited [21]. In recent years, increasing attention has been directed toward the impact of METH on non-neuronal cells. Research into mechanisms such as neuroinflammatory responses, blood-brain barrier disruption, and immune modulation is progressively contributing to a more comprehensive understanding of METH-induced neural injury [22-24]. Moreover, METH is a substance that induces widespread damage across multiple organ systems and diverse cell types throughout the body. Beyond its effects on neurons, various cell types exhibit distinct responses to METH exposure, which differ significantly depending on the duration of exposure. Our research dataset encompasses high-quality whole-cell mRNA sequencing data from multiple cell types within the hippocampus of mice subjected to chronic METH exposure, offering substantial data support and a robust foundation for in-depth investigation into the pathological mechanisms underlying METH-induced neurodamage.

      Thirdly, the selection of scRNA-seq was guided by our experimental objectives and prior research experience. Our earlier investigations have primarily centered on astrocytes, endothelial cells, and microglia. This single-cell sequencing study is intended to enhance our understanding of these neural support cells, comprehensively explore their underlying mechanisms and cellular interactions, and ultimately provide a solid foundation and reference for future research. However, our experience and infrastructure in the field of neuronal research remain relatively limited. To ensure the generation of high-quality data and to systematically advance the experimental objectives, we have prioritized the analysis of the neural microenvironment as the central focus of this study.

      Fourthly, the hippocampal region is a brain area with highly specialized and collaborative characteristics, which can be further divided into the ventral hippocampus, the dorsal hippocampus, and multiple subregions such as DG, CA1, CA2, and CA3. The neurons in these subregions exhibit strong heterogeneity, and the experimental methods we currently adopt are still unable to precisely distinguish the neurons in these different regions, which may to some extent affect the accuracy of data interpretation. To address the impact of neuronal heterogeneity, we believe that single-cell spatial transcriptomics technology can be adopted for in-depth research. However, due to the high cost of this technology, it is currently difficult to apply it in our research group.

      (1) Lappin JM. Rare but relevant: Methamphetamine and Parkinson's disease. Addiction. 2025 Apr;120(4):797-800. doi: 10.1111/add.16695. Epub 2024 Oct 22. PMID: 39434702.

      (2) Lappin JM, Darke S. Methamphetamine and heightened risk for early-onset stroke and Parkinson's disease: A review. Exp Neurol. 2021 Sep;343:113793. doi: 10.1016/j.expneurol.2021.113793. Epub 2021 Jun 21. PMID: 34166684.

      (3) Shukla M, Vincent B. The multi-faceted impact of methamphetamine on Alzheimer's disease: From a triggering role to a possible therapeutic use. Ageing Res Rev. 2020 Jul;60:101062. doi: 10.1016/j.arr.2020.101062.

      (4) Shrestha P, Katila N, Lee S, Seo JH, Jeong JH, Yook S. Methamphetamine induced neurotoxic diseases, molecular mechanism, and current treatment strategies. Biomed Pharmacother. 2022 Oct;154:113591. doi: 10.1016/j.biopha.2022.113591.

      (5) Gabitto MI, et al.. Integrated multimodal cell atlas of Alzheimer's disease. Nat Neurosci. 2024 Dec;27(12):2366-2383. doi: 10.1038/s41593-024-01774-5.

      (6) Stogsdill JA, Harwell CC, Goldman SA. Astrocytes as master modulators of neural networks: Synaptic functions and disease-associated dysfunction of astrocytes. Ann N Y Acad Sci. 2023 Jul;1525(1):41-60. doi: 10.1111/nyas.15004.

      (7) Terreros-Roncal J, et al.. Impact of neurodegenerative diseases on human adult hippocampal neurogenesis. Science. 2021 Nov 26;374(6571):1106-1113. doi: 10.1126/science.abl5163.

      (8) Zhu K, Fu Y, Zhao Y, Niu B, Lu H. Perineuronal nets: Role in normal brain physiology and aging, and pathology of various diseases. Ageing Res Rev. 2025 Jun;108:102756. doi: 10.1016/j.arr.2025.102756.

      (9) Depp C, Doman JL, Hingerl M, Xia J, Stevens B. Microglia transcriptional states and their functional significance: Context drives diversity. Immunity. 2025 May 13;58(5):1052-1067. doi: 10.1016/j.immuni.2025.04.009.

      (10) Sweeney MD, Zhao Z, Montagne A, Nelson AR, Zlokovic BV. Blood-Brain Barrier: From Physiology to Disease and Back. Physiol Rev. 2019 Jan 1;99(1):21-78. doi: 10.1152/physrev.00050.2017.

      (11) Nation DA, et al.. Blood-brain barrier breakdown is an early biomarker of human cognitive dysfunction. Nat Med. 2019 Feb;25(2):270-276. doi: 10.1038/s41591-018-0297-y.

      (12) Montagne A, Zhao Z, Zlokovic BV. Alzheimer's disease: A matter of blood-brain barrier dysfunction? J Exp Med. 2017 Nov 6;214(11):3151-3169. doi: 10.1084/jem.20171406. Epub 2017 Oct 23.

      (13) Huang Q, Wang Y, Chen S, Liang F. Glycometabolic Reprogramming of Microglia in Neurodegenerative Diseases: Insights from Neuroinflammation. Aging Dis. 2024 May 7;15(3):1155-1175. doi: 10.14336/AD.2023.0807.

      (14) Shi FD, Yong VW. Neuroinflammation across neurological diseases. Science. 2025 Jun 19;388(6753):eadx0043. doi: 10.1126/science.adx0043.

      (15) Xu X, Mei B, Yang Y, Li J, Weng J, Yang Y, Zhu Q, Zhang H, Liu X. Astrocytes Lingering at a Crossroads: Neuroprotection and Neurodegeneration in Neurocognitive Dysfunction. Int J Biol Sci. 2025 Apr 28;21(7):3122-3143. doi: 10.7150/ijbs.109315.

      (16) Bedolla A, et al.. Adult microglial TGFβ1 is required for microglia homeostasis via an autocrine mechanism to maintain cognitive function in mice. Nat Commun. 2024 Jun 21;15(1):5306. doi: 10.1038/s41467-024-49596-0.

      (17) Castellani G, Croese T, Peralta Ramos JM, Schwartz M. Transforming the understanding of brain immunity. Science. 2023 Apr 7;380(6640):eabo7649. doi: 10.1126/science.abo7649.

      (18) Chen YH, Jin SY, Yang JM, Gao TM. The Memory Orchestra: Contribution of Astrocytes. Neurosci Bull. 2023 Mar;39(3):409-424. doi: 10.1007/s12264-023-01024-x.

      (19) Deng Q, Wu C, Parker E, Liu TC, Duan R, Yang L. Microglia and Astrocytes in Alzheimer's Disease: Significance and Summary of Recent Advances. Aging Dis. 2024 Aug 1;15(4):1537-1564. doi: 10.14336/AD.2023.0907.

      (20) Jayanthi S, Daiwile AP, Cadet JL. Neurotoxicity of methamphetamine: Main effects and mechanisms. Exp Neurol. 2021 Oct;344:113795. doi: 10.1016/j.expneurol.2021.113795.

      (21) Paulus MP, Stewart JL. Neurobiology, Clinical Presentation, and Treatment of Methamphetamine Use Disorder: A Review. JAMA Psychiatry. 2020 Sep 1;77(9):959-966. doi: 10.1001/jamapsychiatry.2020.0246.

      (22) Shi S, Sun Y, Zan G, Zhao M. The interaction between central and peripheral immune systems in methamphetamine use disorder: current status and future directions. J Neuroinflammation. 2025 Feb 15;22(1):40. doi: 10.1186/s12974-025-03372-z.

      (23) Pang L, Wang Y. Overview of blood-brain barrier dysfunction in methamphetamine abuse. Biomed Pharmacother. 2023 May;161:114478. doi: 10.1016/j.biopha.2023.114478.

      (24) Shaerzadeh F, Streit WJ, Heysieattalab S, Khoshbouei H. Methamphetamine neurotoxicity, microglia, and neuroinflammation. J Neuroinflammation. 2018 Dec 12;15(1):341. doi: 10.1186/s12974-018-1385-0.

      (2) Another significant weakness of this study is the lack of a cohesive hypothesis or overarching conclusion regarding how METH impacts neural populations. The authors provide a largely descriptive account of transcriptional alterations across various cell types, but the manuscript lacks clear, biologically meaningful conclusions. This descriptive approach makes it difficult for readers to identify the key findings or take-home messages. To improve clarity and impact, the authors should focus on developing and presenting a few plausible hypotheses or mechanistic scenarios regarding METH-induced neurotoxicity, grounded in their scRNA-seq data. Including schematic figures to illustrate these hypotheses would also help readers better understand and interpret the study.

      We sincerely appreciate your valuable comments on our article. As you pointed out, the current research lacks experimental verification to further support our conclusions. To enhance the clarity and readability of the mechanism explanation, we have added several hypothetical diagrams (such as Figures.7, 8, and 9) in the discussion section to present the biological mechanisms reflected by the data more intuitively. Additionally, relevant verification work is underway, such as marking specific cell types with marker proteins. Author response image 1 shows some of our preliminary experimental results that have not been published yet, and their trends are consistent with the conclusions of this article. However, since the complete verification still requires a certain period of time, to ensure the rigor of the data, these results have not been included in the current manuscript for the time being. Finally, we would like to thank you again for your constructive suggestions.

      Author response image 1.

      (3) The final major weakness of this study is its poor readability. It appears that the authors did not adequately proofread the manuscript, as there are numerous typographical errors (e.g., line 333: trisulting; line 756: essencial), unsupported scientific claims lacking citations (e.g., lines 485, 503, 749-753), and grammatically incorrect sentences (e.g., lines 470-472, 540-543, 749-753). In addition, many paragraphs are unorganized and overly descriptive, which further hinders clarity. Some figures are also problematic - too small in size and overcrowded with text in fonts that are difficult to read. It is recommended that the authors carry out quality control. There are too many typographical and grammatical errors to list individually; the authors should carefully review and revise the entire manuscript to address all of these issues.

      We truly appreciate your thoughtful feedback and sincerely apologize for any inconvenience experienced by you and other readers.

      The text of this research manuscript was manually entered, which unfortunately resulted in some spelling and grammatical errors. In response, we have carefully revised the entire manuscript using word processing tools in the second version. Meanwhile, we have restructured and organized some lengthy paragraphs to enhance the clarity and readability of the content.

      Regarding the issue you raised about certain viewpoints lacking citation support, we have added the necessary references to those sections and reviewed the entire text to ensure all scientific claims are properly supported. 

      As for the image clarity, we made sure the submitted images met the 600dpi resolution requirement. However, we acknowledge that there were clarity issues in the final published version. We have since re-adjusted and re-uploaded the images to improve their quality.

      We are committed to continuously improving the manuscript and enhancing the overall quality of our academic presentation. Thank you sincerely for your kind attention to our work, your careful review, and the valuable suggestions you provided.

      Reviewer #3 (Public review):

      (1) While the bioinformatics analyses are extensive, the study is primarily descriptive at the molecular level. The absence of experimental validation, such as targeted mRNA/protein quantification and gene knockdown/overexpression to confirm the causal relationship between these identified genes and METH-induced cognitive deficits, is a notable limitation.

      We sincerely appreciate your valuable comments and suggestions. Indeed, there are still certain limitations in our manuscript in some aspects. It may not be able to systematically answer specific questions, and it is also difficult to fully clarify the functional roles of certain genes or specific cell types through experimental evidence.

      Although our manuscript still has certain limitations, we believe that the publication of this research is expected to provide new perspectives and theoretical support for the in-depth exploration of METH toxicity damage-related fields, thereby promoting the progress of research in this direction:

      (1) At present, the single-cell sequencing datasets on chronic damage caused by METH are still relatively limited, especially in terms of studies at the whole-cell level. Our dataset is expected to fill the research gap in this field to some extent, providing reference and support for subsequent related research.

      (2) During the sampling process of the sequencing experiment, we ensured high cell viability and sequencing quality. The experiment exhibited good reproducibility (each group consisted of 10 mice, and 2 mice from each group were selected to mix their hippocampal tissues into one sample), and the obtained data had high credibility.

      (3) The effects of METH have a wide distribution pattern across various organs and tissues. Through single-cell sequencing data, the common and differential expression patterns of related genes under different conditions can be systematically analyzed, which is helpful for future targeted knockout studies of these genes and provides a predictive basis for the evaluation of intervention measures, thereby enabling precise regulation of gene functions.

      (4) This is conducive to the orderly implementation of our subsequent research plans. Our subsequent research plan can be further developed based on a specific aspect of this study. We are indeed planning to do exactly that. During our earlier research on astrocytes, we discovered that astrocytes have two phenotypes (protective and inflammatory) in neuroinflammation. Given that astrocytes in the hippocampus show great variability depending on their location, the cells they come into contact with, and the stimuli they receive, we aim to investigate the changes in the function of astrocyte subpopulations in chronic METH-induced cognitive impairment. We focused on the role of the cAMP signaling pathway in the transformation of astrocyte phenotypes and attempted to link changes in astrocyte energy metabolism to their inflammatory phenotype. In addition, we found that endothelial cells can be easily distinguished into many subpopulations, which are related to their specific functions in immune responses, material transport, vascular growth regulation, energy metabolism, and other processes. We believe that single-cell technology can help us find the key mechanisms and intervention targets of chronic METH abuse-induced damage with greater precision.

      (2) While the discussion extensively covers the functional implications of specific molecular pathways and cell types, it would greatly benefit from a comparison of these findings with existing RNA sequencing data from other METH models in hippocampal tissue.

      We are very grateful for your professional suggestions, which have been of great help in improving the quality of our manuscript. We agree that comparing our findings with existing RNA sequencing data from other METH models in hippocampal tissue would strengthen the discussion. In response to your suggestion, we have actively reviewed relevant literature and databases, and attempted to request the database administrators and original authors for the download and use of the relevant data. However, as data integration still requires some time, we may not be able to conduct a detailed analysis of the data in this revised version. We can only discuss the conclusions of some authors.

      Palsamy Periyasamy et al. published a scRNA sequencing (live-cell) study on chronic METH exposure almost at the same time as us. They also adopted a similar gradual incremental 4-week METH exposure model and conducted sequencing analysis on glial cells in the cerebral cortex of mice [1]. The changes they observed in the circadian rhythm, adherens junctions, Rap1 signaling pathway, and cAMP signaling pathway (Disscusion, Lines 892-897) in the cortical astrocytes were also similar in the astrocytes of the hippocampal region that we studied. Similarly, in oligodendrocytes, we observed an upregulation trend of key genes regulating the circadian rhythm, such as Per2, Per3, and Nr1d1 (Disscusion, Lines 916-939). This result is consistent with their research findings. Non etheless, we believe that the changes in oligodendrocytes in terms of metabolic regulation and axonal function homeostasis are more significant.

      Pingming Qiu et al. further confirmed the correlation between the NF-κB signaling pathway in hippocampal astrocytes under METH action and neuroinflammation, neuroinjury, and learning and memory impairments in mice by integrating the GEO dataset [2]. This conclusion is also consistent with the sequencing results and analysis conclusions we obtained (Results, Lines 473-476).

      In terms of the neuro-immune system disorder caused by chronic METH exposure, our research findings are consistent with those of Biao Wang et al [3]. We both observed that METH exposure may involve the participation of related immune cells (such as T cells, monocytes) and may be related to the regulation of the innate immune response and the homeostasis of myeloid cells, etc. Through the identification and analysis of cell subtypes, we further revealed that these signals may be closely related to the interaction between microglia and other immune cells mediated by MHC molecules (Disscusion, Lines 870-894).

      Currently, the research results related to METH are still scattered and lack systematicness. There are differences among the research models, and there are relatively few studies on chronic exposure and in vivo experiments. Sequencing data sets with strong correlations are also scarce. We hope that this dataset can comprehensively and elaborately depict the molecular map of the hippocampus of mice after chronic METH exposure (although due to technical limitations, mature neurons die during dissociation, thus making it impossible to obtain the relevant data). In addition, we also hope to integrate the single-cell sequencing data and spatial transcriptome data of the hippocampus of mice after chronic METH exposure, providing a reliable data foundation and theoretical support for subsequent research in this field.

      Finally, we would like to express our sincere gratitude for your valuable suggestions and support. Although we still need some time to further refine the manuscript based on your opinions, we sincerely hope that more readers will provide us with constructive feedback to promote the continuous improvement and deepening of this research.

      (1) Oladapo A, Deshetty UM, Callen S, Buch S, Periyasamy P. Single-Cell RNA-Seq Uncovers Robust Glial Cell Transcriptional Changes in Methamphetamine-Administered Mice. Int J Mol Sci. 2025 Jan 14;26(2):649. doi: 10.3390/ijms26020649.

      (2) Li K, Ling H, Wang X, Xie Q, Gu C, Luo W, Qiu P. The role of NF-κB signaling pathway in reactive astrocytes among neurodegeneration after methamphetamine exposure by integrated bioinformatics. Prog Neuropsychopharmacol Biol Psychiatry. 2024 Feb 8;129:110909. doi: 10.1016/j.pnpbp.2023.110909.

      (3) Wu L, Liu X, Jiang Q, Li M, Liang M, Wang S, Wang R, Su L, Ni T, Dong N, Zhu L, Guan F, Zhu J, Zhang W, Wu M, Chen Y, Chen T, Wang B. Methamphetamine-induced impairment of memory and fleeting neuroinflammation: Profiling mRNA changes in mouse hippocampus following short-term and long-term exposure. Neuropharmacology. 2024 Dec 15;261:110175. doi: 10.1016/j.neuropharm.2024.110175.

      (3) The conclusion that "prolonged METH use may progressively impair cognitive function" may not be uniformly supported by the behavioral data: Figures 1C and F (discrimination and preference indexes) exhibited that the 4-week test further declined in the METH group compared to the 2-week. In contrast, Figure 1E and H present a contradictory pattern.

      Thank you very much for pointing this out. Your observation is very detailed and constructive. Regarding the conclusion "prolonged use of METH may progressively impair cognitive function", our main basis is the discrimination index and preference index shown in Figures 1C and 1F. These two indicators are usually calculated based on the total exploration time of new and old objects by mice. They are widely adopted as important references for cognitive function assessment in many relevant literature [1-3], thus providing strong support for our conclusion. The exploration frequency data we provided can, on the one hand, reflect the curiosity of mice towards new things, and on the other hand, can be calculated as the average time of each exploration by "total exploration time / exploration frequency", thereby evaluating their learning interest and the degree of their focus during exploration. We believe this is also of certain significance for reflecting the effect of METH on learning. As for the fact that there is no statistically significant difference in the exploration frequency of new and old objects in the 4-week-old mice in Figure 1H, we are also regretful about this. This might be due to the fact that our tests allow mice to freely explore in a stress-free environment, and there are significant differences among individual mice within the group. However, the mean values still show certain differences between the two groups. Compared to the mice at 2 weeks, the mice at 4 weeks have undergone a NOR test once and may have formed memories, which were retained in the subsequent assessment after four weeks. Moreover, we believe that injecting normal saline to the control group mice for a long time may affect their emotional state, because they cannot obtain the same pleasure as that brought by METH from the injection behavior.

      (1) Riva M, Moriceau S, Morabito A, Dossi E, Sanchez-Bellot C, Azzam P, Navas-Olive A, Gal B, Dori F, Cid E, Ledonne F, David S, Trovero F, Bartolomucci M, Coppola E, Rebola N, Depaulis A, Rouach N, de la Prida LM, Oury F, Pierani A. Aberrant survival of hippocampal Cajal-Retzius cells leads to memory deficits, gamma rhythmopathies and susceptibility to seizures in adult mice. Nat Commun. 2023 Mar 18;14(1):1531. doi: 10.1038/s41467-023-37249-7.

      (2) Lu Y, Chen X, Liu X, Shi Y, Wei Z, Feng L, Jiang Q, Ye W, Sasaki T, Fukunaga K, Ji Y, Han F, Lu YM. Endothelial TFEB signaling-mediated autophagic disturbance initiates microglial activation and cognitive dysfunction. Autophagy. 2023 Jun;19(6):1803-1820. doi: 10.1080/15548627.2022.2162244.

      (3) Arroyo-García LE, Tendilla-Beltrán H, Vázquez-Roque RA, Jurado-Tapia EE, Díaz A, Aguilar-Alonso P, Brambila E, Monjaraz E, De La Cruz F, Rodríguez-Moreno A, Flores G. Amphetamine sensitization alters hippocampal neuronal morphology and memory and learning behaviors. Mol Psychiatry. 2021 Sep;26(9):4784-4794. doi: 10.1038/s41380-020-0809-2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Summary:

      This manuscript investigates whether newborns can use speaker identity to separate verbal memories, aiming to shed light on the earliest mechanisms of language learning and memory formation. The authors employ a well-designed experimental paradigm using functional nearinfrared spectroscopy (fNIRS) to measure neural responses in newborns exposed to familiar and novel words, with careful counterbalancing and acoustic controls. Their main finding is that newborns show differential neural activation to novel versus familiar words, particularly when speaker identity changes, suggesting that even at birth, infants can use indexical cues to support memory.

      Strengths:

      Major strengths of the work include its innovative approach to a longstanding question in developmental science, the use of appropriate and state-of-the-art neuroimaging methods for this age group, and a thoughtful experimental design that attempts to control for order and acoustic confounds. The study addresses a significant gap in our understanding of how infants process and remember speech, and the data are presented transparently, with clear reporting of both significant and non-significant results.

      Weaknesses:

      However, there are notable weaknesses that limit the strength of the conclusions. The main recognition effect is restricted to a specific subgroup of participants and emerges only during a particular testing window, raising questions about the robustness and generalizability of the findings. The sample size, while typical for infant neuroimaging, is modest, and the statistical power is further reduced by missing data and group-dependent effects. Additionally, the claims regarding episodic memory and evolutionary implications are somewhat overstated, as the paradigm primarily demonstrates memory retention over a few minutes without evidence of the rich, contextually bound recall characteristic of fully developed episodic memory.

      Overall, the authors have achieved their primary aim of demonstrating that speaker identity can facilitate memory separation in newborns, providing valuable preliminary evidence for early indexical processing in language learning. The results are intriguing and likely to stimulate further research, but the limitations in effect robustness and theoretical interpretation mean that the findings should be viewed as an important step forward rather than a definitive answer. The methods and data will be of interest to researchers studying infant cognition, memory, and language, and the study highlights both the promise and the challenges of probing complex cognitive processes in the earliest stages of life.

      We thank the reviewer for their thoughtful and positive assessment of our work, and for giving us the opportunity to clarify points that may have been unclear in the original manuscript.

      First, considering that the recognition response was quite consistent in previous studies, we expected the effect to emerge within a specific testing window, in either the first or the second block, depending on task difficulty. Accordingly, our analytical approach was designed to reflect this expectation, which was subsequently confirmed by the results. Second, the main recognition effect is not restricted to a specific subgroup of participants. Recognition responses were observed in both groups in the left IFG and bilateral STG. The only group-specific modulation was found in the right IFG, where the effect was primarily driven by Group A. This suggests that activity in this specific region may be influenced by contextual factors such as the nature and amount of recently processed stimuli. We have clarified these points in the revised manuscript to avoid the impression that the core effect is limited to a subset of participants or not generalizable across studies. 

      Regarding the sample size, a formal calculation was initially attempted based on the effect size reported in a closely related ANOVA-based study (Benavides-Varela et al., 2011; Study 2: Word recognition after intervening melodies, main effect for the comparison same vs novel word [F(1,26) = 19.318; p<0.0001 effect size f =.87). However, inputting this information into a dedicated software (G*power; α = 0.05; number of groups =1; number of measurements = 2) leads to an estimated sample size of N = 5 to 7 (depending on the desired power, range = 0.800.95). This sample size is unrealistically small and not representative of current research standards in the field. A proper formal power analysis for the LMM is otherwise hard to perform, as we lack information about the expected variance and random-effects structure. We therefore aligned our sample size with prior newborn studies using similar stimuli and experimental designs, and with fNIRS studies in newborns and infants (for recent metanalysis see De Roever et al., 2018; Boek et al., 2023; Gemignani et al., 2023; which examined studies with mean N =24; N range= 186 and sample sizes often including various conditions and groups). Note also that our design includes a within-subject comparison, our analytical approach models subject-level variance and handles unbalanced datasets and missing data (which are common in infant studies), thereby improving statistical sensitivity. We have now explicitly clarified this choice in the Introduction.

      Finally, we revised the discussion to ensure that interpretations are aligned with our findings, by including a limitations section and a more explicit note regarding theories of memory.

      Episodic memory is a multifaceted construct that matures over time through the integration of the what–who-where–when information. The present study does not aim to demonstrate the presence of a fully developed episodic memory system at birth; rather, it shows that specific features of episodic-like processing (i.e., what–who) are already bound from the first days of life. Future studies may track the progressive integration of additional episodic-related components leading to a mature episodic memory system.

      Reviewer #1 (Recommendations for the authors):

      (1) I wonder why a control condition with same-speaker interference was not included. Adding such a control would allow you to directly test whether the observed effects are truly due to speaker changes, rather than other acoustic or procedural factors. If it is not feasible to add this condition, please discuss its absence explicitly and clarify how it impacts the interpretation of your findings.

      We thank the reviewer for raising the issue of a same-speaker interference control. A similar control has been tested previously using a closely related paradigm, showing that recognition does not persist when neonates hear another word produced by the same speaker during the retention period (Benavides-Varela et al., 2011). As noted in the manuscript, there were some methodological differences between that study and the current one. Most importantly, in the present study familiarization was reduced (from ten to five blocks) and the retention interval increased (two to three minutes), making the current paradigm more demanding. We reasoned that, if newborns forgot the word under the prior (less challenging) study, they would also forget it here if a same-speaker interference control would have been implemented. With the current manipulation, despite the difficulty of the paradigm, the recognition response was observed. This pattern suggests that speaker change, rather than general procedural factors, is central to the observed effect. Given these prior findings and the ethical constraints of testing newborns, we believe that adding a new same-speaker control is not essential. We have now made this rationale more explicit in the manuscript (discussion section, limitations, p. 16), hoping that this clarification will make our methodological choices clearer.

      (2) It wasn't clear if Group A and Group B have the same number of infants, and whether they were randomly assigned. Please specify.

      Participants were initially assigned to Group A or Group B in a counterbalanced way to maintain comparable group sizes. Due to attrition and subsequent exclusion for various reasons (e.g., low signal quality, fussiness, technical issues), the final sample consisted of 17 infants in Group A and 15 infants in Group B. We have now specified this information in the revised manuscript (p. 20).

      (3) Please specify the exact number of fNIRS channels assigned to each region of interest (ROI), as it is currently difficult to map the channel numbers in Supplementary Table 2 to the optode montage shown in Figure 2. Additionally, report the percentage of usable channels after quality control.

      The inferior frontal gyrus left and right ROIs comprised 4 channels each, the superior temporal gyrus left and right ROIs 5 channels each, and the parietal lobe left and right ROIs 7 channels each. This information has been added to the methods section, along with the average number of channels contributing to each ROI after data rejection and the percentage of channels rejected throughout the recording (p. 23).

      (4) Also, a formal power analysis to justify your sample size would be helpful for evaluating the reliability of your findings and is increasingly expected in developmental neuroimaging research.

      Thanks for this suggestion. As stated in the public response, we agree that power analyses constitute an important component of methodological rigor in the field. In our case, a formal calculation was initially attempted based on the effect size reported in a closely related ANOVAbased study (Benavides-Varela et al., 2011; Study. 2: Word recognition after intervening melodies, main effect for the comparison same vs novel word [F(1,26) = 19.318; p<0.0001 effect size f =.87).

      However, inputting this information into a dedicated software (G’power; α = 0.05; power range = 0.80-0.95; number of groups =1; number of measurements = 2) leads to an estimated sample size of N = 5 to 7, which is unrealistically small and not representative of current research standards in the field. A proper formal power analysis for the LMM is otherwise hard to perform, as we lack information about the expected variance and random-effects structure. We therefore aligned our sample size with prior newborn studies using similar stimuli and experimental designs, and with fNIRS studies in newborns and infants (for recent metanalysis see De Roever et al., 2018; Boek et al., 2023; Gemignani et al., 2023; which examined studies with mean N =24; N range= 1-86 and sample sizes often including various conditions and groups. Note also that our design includes a within-subject comparison, and our analytical approach models subject-level variance and handles unbalanced datasets and missing data (which are common in infant studies), thereby improving statistical sensitivity.

      (5) The manuscript references episodic memory explicitly in the abstract and introduction, emphasizing the role of speaker identity in enabling episodic-like memory from birth. However, this concept is not sufficiently addressed or delineated in the discussion. Episodic memory is generally understood as recalling events with contextual details, involving complex integrative processes that extend beyond simple recognition of auditory stimuli. Your paradigm demonstrates memory retention over a few minutes but does not provide strong evidence for the hallmark features of episodic memory, such as contextual binding or autobiographical recollection. Moreover, infant speech recognition and memory formation in early life are influenced by the immediacy and complexity of sensory input, which may not necessarily engage fully developed episodic systems. Clarifying these distinctions and making sure your interpretations and claims are consistent with them would enhance the conceptual clarity of the manuscript.

      We agree that episodic memory is a multifaceted construct that, in its mature form, entails the ability to retrieve past events with contextual detail, typically involving autobiographical recollection and the integration of what–-who-where–when information (Tulving, 1993). Our study does not aim to demonstrate the presence of a fully developed episodic memory system at birth, nor do we claim that newborns’ performance satisfies all hallmark criteria of mature episodic memory. 

      Here, we focused on sensitivity to speaker identity as a contextual dimension relevant to memory formation. Within this narrower sense, both, the patterns of activation and the localization of the response provide evidence for early source–content binding (i.e., what–who), which can be considered a foundational aspect of episodic-like processing. Following up on this foundational step, future studies may track the gradual integration of additional aspects (where-when), ultimately leading to the maturation of a fully functional human episodic memory system.

      We have now clarified this point in the revised manuscript (p. 17)

      (6) Please add a dedicated limitations section. This should address the group-dependent nature of your main effects, the timing-specific recognition response, and any other methodological constraints that may impact the generalizability of your results.

      We thank the reviewer for this comment. We have made our best to expose the limitations of our study in the text (p.16), specifically regarding the reasons for the lack of a control condition and the effects of frequent changes in sleeping states in newborns. 

      (7) Consider revising sections where claims may be overstated, particularly regarding episodic memory and evolutionary implications.

      These sections have now been revised in the abstract and throughout the manuscript to ensure that interpretations remain proportionate to the data and consistent with current theoretical frameworks.

      Reviewer #2 (Public review):

      Summary:

      Previous studies by some of the same authors of the actual manuscript showed that healthy human newborns memorize recently learned nonsense words. They exposed neonates to a familiarization period (several minutes) when multiple repetitions of a bisyllabic word were presented, uttered by the same speaker. Then they exposed neonates to an "interference period" when newborns listened to music or the same speaker uttering a different pseudoword. Finally, neonates were exposed to a test period when infants hear the familiarized word again. Interestingly, when the interference was music, the recognition of the word remained. The word recognition of the word was measured by using the NIRS technique, which estimates the regional brain oxygenation at the scalp level. Specifically, the brain response to the word in the test was reduced, unveiling a familiarity effect, while an increase in regional brain oxygenation corresponds to the detection of a "new word" due to a novelty effect. In previous studies, music does not erase the memory traces for a word (familiarity effect), while a different word uttered by the same speaker does.

      The current study aims at exploring whether and how word memory is interfered with by other speech properties, specifically the changes in the speaker, while young children can distinguish speakers by processing the speech. The author's main hypothesis anticipates that new speaker recognition would produce less interference in the familiarized word because somehow neonates "separate" the processing of both words (familiarized uttered by one speaker, and interfering word, uttered by a different speaker), memorizing both words as different auditory events.

      From my point of view, this hypothesis is interesting, since the results would contribute to estimating the role of the speaker in word learning and speech processing early in life.

      Strengths:

      (1) New data from neonates. Exploring neonates' cognitive abilities is a big challenge, and we need more data to enrich the knowledge of the early steps of language acquisition.

      (2) The study contributes new data showing the role of speaker (recognition) on word learning (word memory), a quite unexplored factor. The idea that neonates include speakers in speech processing is not new, but its role in word memory has not been evaluated before. The possible interpretation is that neonates integrate the process of the linguistic and communicative aspects of speech at this early age.

      (3) The study proposes a quite novel analytic approach. The new mixed models allow exploring the brain response considering an unbalanced design. More than the loss of data, which is frequent in infants' studies, the familiarization, interference and learning processes may take place at different moments of the experiment (e.g. related to changes in behavioural states along the experiment) or expressed in different regions (e.g. related to individual variations in optodes' locations and brain anatomy).

      Weaknesses:

      I did not find major weaknesses. However, I would like to have more discussion or explanation on the following points.

      (1) It would be fine to report the contribution of each infant to the analysis, i.e. how many good blocks, 1 to 5 in sequence 1 and 2, were provided by each infant.

      (2) Why did the factor "blocknumber" range from 0 to 4? The authors should explain what block zero means and why not 1 to 5.

      (3) I may suggest intending to integrate the changes in brain activity across the 3 phases. That is, whether changes in familiarization relate to changes in the test and interference phases. For instance, in Figure 2, the brain response distinguishes between same and novel words that occurred over IFG and STG in both hemispheres. However, in the right STG there was no initial increase in the brain response, and the response for the same was higher than the one for novels in the 5th block.

      (4) Similarly, it is quite amazing that the brain did not increase the activity with respect to the familiarization during the interference phase, mainly over the left hemisphere, even if both the word and speaker changed. Although the discussion considers these findings, an integrated discussion of the detection of novel words and the detection of a novel speaker over time may benefit from a greater integration of the results.

      Appraisal:

      The authors achieved their aims because the design and analytic approaches showed significant differences. The conclusions are based on these results. Specifically, the hypothesis that neonates would memorize words after interference, when interfered speech is pronounced by a different speaker, was supported by the data in blocks 2 and 5, and the potential mechanisms underlying these findings were discussed, such as separate processing for different speakers, likely related to the recognition of speaker identity.

      I think the discussion is well-structured, although I may suggest integrating the changes into the three phases of the study. Maybe comparing with other regions, not related to speech processing.

      Evaluating neonates is a challenge. Because physiology is constantly changing. For instance, in 9 minutes, newborns may transit from different behavioral states and experience different physiological needs.

      We thank the reviewer for their constructive and positive appraisal of our work and for drawing attention to points that benefited from further clarification or discussion in the manuscript.

      In the following, we address each point in turn, using the numbering of the reviewer’s identified concerns.

      (1) In the Methods section (“Data Processing and Analysis”, p. 22), we have added detailed information about the number of data points contributed by each infant to the analyses.

      (2) The factor “blocknumber” ranged from 0 to 4 for statistical purposes, allowing Block 0 to serve as the reference (intercept) in the model. This coding facilitated the interpretation of parameter estimates. We now clarify this in the revised manuscript (p. 7).

      (3) Thanks for this relevant suggestion. In the Discussion, we now explicitly discuss the relationship across phases. We also acknowledged that a thorough examination of these issues lies beyond the scope of the present study as it will require future work based on multivariate and connectivity analyses.

      (4) We thank the reviewer for this comment. In the revised manuscript, we have expanded the Discussion to clarify the absence of a strong novelty response during interference. The discussion highlights how the temporal properties of the hemodynamic response and the functional demands of each phase jointly shape the observable fNIRS signal in newborns, with purely sensory novelty effects likely increasing with maturation.

      Finally, we agree that evaluating the transitions of sleeping states can further strengthen and clarify the results obtained in the present study. This has now been added as one of the limitations of this study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary

      The manuscript by K.H. Lee et al. presents Spyglass, a new open-source framework for building reproducible pipelines in systems neuroscience. The framework integrates the NWB (Neurodata Without Borders) data standard with the DataJoint relational database system to organize and manage analysis workflows. It enables the construction of complete pipelines, from raw data acquisition to final figures. The authors demonstrate their capabilities through examples, including spike sorting, LFP filtering, and sharpwave ripple (SWR) detection. Additionally, the framework supports interactive visualizations via integration with Figurl, a platform for sharing neuroscience figures online.

      Strengths:

      Reproducibility in data analysis remains a significant challenge within the neuroscience community, posing a barrier to scientific progress. While many journals now require authors to share their data and code upon publication, this alone does not ensure that the code will execute properly or reproduce the original results. Recognizing this gap, the authors aim to address the community's need for a robust tool to build reproducible pipelines in systems neuroscience.

      We appreciate the summary and the recognition of the key need for maximally reproducible scientific workflows.

      Weaknesses:

      The issues identified here may serve as a foundation for future development efforts.

      (1) User-friendliness:

      The primary concern is usability. The manuscript does not clearly define the intended user base within a modern systems neuroscience lab. Improving user experience and lowering the barrier to entry would significantly enhance the framework's potential for broad adoption. The authors provide an online example notebook and a local setup notebook. However, the local setup process is overly complex, with many restrictive steps that could discourage new users. A more streamlined and clearly documented onboarding process is essential. Additionally, the lack of Windows support represents a practical limitation, particularly if the goal is widespread adoption across diverse research environments.

      We agree that usability is critical, and we now clarify that Spyglass

      “… is designed to be used by everyone in a laboratory who works with the data, both as a general-purpose tool to enable the development of new analysis pipelines and a tool that allows those pipelines and associated results to be frozen and packaged to enable reproducibility…”

      To address the local setup issue, we have now created an interactive quick start program to guide new users through the setup (scripts/install.py). It now leads the user through a few prompts with sensible defaults to reduce the complexity of the setup. It aids the user in installing the Spyglass dependencies and creating the Data joint configuration file. We also validate the configuration to make sure the set up was successful (scripts/validate.py). Combined, these should reduce the complexity and set up time for most users while allowing expert users to configure Spyglass as they need. We thank the reviewer for the suggestion.

      We also agree that the lack of support for Windows is a key issue, and that is something we plan to address in the coming years. We note that it may be possible to run Spyglass under the Windows Subsystem for Linux (WSL 2), which allows users to run Linux programs on a Windows machine without the need for a virtual machine or dual boot setup.

      (2) Dependency management and long-term sustainability:

      The framework depends on numerous external libraries and tools for data processing. This raises concerns about long-term maintainability, especially given the short lifespan of many academic software projects and the instability often associated with Python's backward compatibility. It would be helpful for the authors to clarify how flexible and modular the pipeline is, and whether it can remain functional if upstream dependencies become deprecated or change substantially.

      This is a very good point that reflects a broad challenge to maintainability and reproducibility. We now explicitly raise this point in our Limitations section, and note that

      “…even in cases where reproducing a result would require installing older versions of software, the results themselves remain accessible within NWB files referenced in Spyglass, ensuring that previous results can be built on even as packages evolve.”

      The merge table pattern also allows us to update (version) our pipelines as software changes. For example, we have already done so for changes in SpikeInterface versions for the version 1 pipeline for spike sorting. New and older versions of the pipeline (v0 and v1) are accessed through the merge table SpikeSortingOutput. This allows the user to have consistent results despite the version change.

      (3) Extensibility for custom pipelines:

      A further limitation is the insufficient documentation regarding the creation of custom pipelines. It is unclear how a user could adapt Spyglass to implement their own analysis workflows, especially if these differ from the provided examples (e.g., spike sorting, LFP analysis that are very specific to the hippocampal field). A clearer explanation or example of how to extend the framework for unrelated or novel analyses would greatly improve its utility and encourage community contributions.

      Here we failed to provide the required links to the documentation. We now explicitly refer to documentation on Custom Pipeline, which include a link to a YouTube video walking users through the creation of such a pipeline:

      Specifically, Spyglass uses DataJoint syntax to define tables as Python classes (see online documentation on Custom Pipelines and this video for examples).

      (4) Flexibility vs. Standardization:

      The authors may benefit from more explicitly defining the intended role of the framework: is Spyglass designed as a flexible, general-purpose tool for developing custom data analysis pipelines, or is its primary goal to provide a standardized framework for freezing and preserving pipelines post-publication to ensure reproducibility? While both goals are valuable, attempting to fully support both may introduce unnecessary complexity and result in a tool that is not well-suited for either purpose. The manuscript briefly touches on this tradeoff in the introduction, and the latter-pipeline preservation-may be the more natural fit for the package. If so, this intended use should be clearly communicated in the documentation to help users understand its scope and strengths.

      We appreciate this point, and have now clarified in the beginning of the Results section that

      It is both a general-purpose tool to enable the development of new analysis pipelines and a tool that allows those pipelines and associated results to be frozen and packaged to enable reproducibility.

      In practice, our lab uses Spyglass to systematize analyses to enable rapid application across many datasets. Then, once a paper has been finalized, we can export the data and the code in a package that enables reproduction. Being able to do both things is, in our view, a key strength of Spyglass. More broadly, we feel it is critical that there be a clear path for users to take their analysis code and make it reproducible. That process normally involves a very substantial amount of work, and our goal was to reduce the burden on users and make this a straightforward extension of how analyses are carried out.

      Impact:

      This work represents a significant milestone in advancing reproducible data analysis pipelines in neuroscience. Beyond reproducibility, the integration of cloud-based execution and shareable, interactive figures has the potential to transform how scientific collaboration and data dissemination are conducted. The authors are at the forefront of this shift, contributing valuable tools that push the field toward more transparent and accessible research practices.

      We appreciate this positive assessment.

      Reviewer #1 (Recommendations for the authors):

      (1) "The authors write: ‘the relational database, a well-known data structure that uses tables to organize data.’ This phrasing may be misleading… It would be more accurate to describe them as ‘well-established’ rather than ‘well-known.’"

      We have made this change.

      (2) The statement "It makes it easy to apply the same analysis to multiple datasets, as users need to specify only the data and parameters for computation ("what") rather than the execution details ("how")." would benefit from further elaboration. Specifically, how does this approach compare in practice to using a simple configuration file (e.g., YAML or JSON) to manage parameters and execution logic? A comparison or example would help ground the claim.?"

      We agree one could in principle do something similar with configuration files, but this is a discipline that the user must impose on themselves, as configuration files in general have no constraint on how they are to be used. On the other hand, a system like Spyglass enforces the separation of data from parameters by design. We have now added a brief comment on this point in the Results:

      “It provides a structure to organize and systematize the analysis parameters, data, and outputs into different tables. This contrasts with user-generated configuration files where each user could adopt their own idiosyncratic approach to specifying parameters and data.”

      We also come back to this point in the Discussion:

      Other approaches do away with the relational database altogether. For example, DataLad uses version control tools such as git and git-annex to manage both code and data as files [39]. This enables the creation of a data analysis environment and decentralized data sharing. For building analysis pipelines, it may be combined with other tools for managing the sequential execution of scripts. For example, Snakemakeb[40] (and related projects such as Cobrawap [41]) allows the users to gather and define the input, output, and the associated scripts to execute for each analysis step, thereby tracking the dependency between steps. But because these tools do not provide any formal structure for data analysis or parameter specification, they lack the advantages of the relational database that we discussed, such as being able to easily organize or search for the records of previous analysis based on specific parameters, efficient data sharing and access management to multiple users, and built-in data integrity checks based on constraints native to the database (e.g. primary keys).

      (3) The sentence ‘It enables easy access to multiple datasets via queries’ may overstate the benefit… clarify what specific advantages database queries offer.

      We agree that this is an important feature and we added the following as an example of the advantage of being able to query the database:

      It enables easy access to multiple datasets via queries (e.g. to find all datasets with recordings from a particular brain region or that used a particular behavioral paradigm)

      (4) Specifically, Spyglass uses DataJoint syntax to define tables as Python classes’ lacks clarity… Expanding this explanation with a brief, concrete example would

      We agree that this sentence does not provide information on how to use DataJoint syntax to define a table. We carefully considered adding that syntax to the manuscript, but we are concerned that doing so here and in other places where syntax examples could be used would decrease the readability of the document. We also noted that other papers that present analysis frameworks typically provide much less information.

      Nevertheless, it is clear that users would benefit from a concrete example, and as we mentioned above, we have added a link to the documentation describing how to make custom schema and pipelines, as well as a YouTube video that we created to walk users through this process.

      (5) The authors write: "Selection tables associate parameter entries with data object entries." This terminology is confusing. From a naming perspective, it is not immediately obvious what a "selection table" is or how it differs from other components. Moreover, shouldn't parameter entries be associated with a specific pipeline rather than directly with data objects? Further clarification is needed. "

      We appreciate that our terminology was not clear. The idea behind a selection table is that there are many data entries and many potential sets of parameters that can be used to analyze each of those entries. We have now revised this section of the text and added an explanatory paragraph:

      An analysis pipeline consists of sets of tables downstream of the Common tables. In each step in the analysis, the user populates one of four table types (Figure 2A):

      Data tables contain pointers to data objects in either the original NWB file or ones generated by an upstream analysis.

      Parameter tables contain a list of the parameters needed to fully specify the desired analysis.

      Selection tables allow users to select and pair a data entry and a parameter entry, defining the input to the Compute table.

      Compute tables execute the computations to carry out the analysis using the Data and Parameters specified in the Selection table entry. These results are then stored and can serve as Data for downstream analysis.

      This design has multiple features that we have found to be beneficial. First, Parameter tables store the full set of parameters needed to specify a given analysis. For example, a Parameter table entry for a firing rate analysis of a single neuron might specify the bin size and smoothing to be used for that analysis. Multiple such entries can be defined, allowing a user to select the most appropriate one for the question being addressed. Second, because Selection tables specify which Parameter table entry was used for a given analysis on the associated Data table entry, they provide the key information needed to know which parameters were used to generate the entry in the downstream Compute table. Third, it is simple to associate a given Data table entry with multiple Parameter table entries and then re-run the analysis on those pairs. This enables a user to understand how their choice of parameters impacts their results, something that is otherwise difficult to manage and track.

      (6) Including ‘fitting state-space models’ as a standard example may be misleading… Presenting it as a routine task might set unrealistic expectations."

      We agree and have changed “standard” to “a diverse range of”.

      (7) Figure 2 would benefit from clearer sequential logic. For example, the object ‘LFPSelection’ appears after a method call referencing it."

      We agree that the figure was not explained adequately. We now make it clear in the caption that the method call creates the entry in the LFPSelection table, and is thus upstream of the picture of the table entry that was created.

      (8) Example 3 would be strengthened by a comparison to SpikeInterface, a framework increasingly adopted by the community."

      Here we clearly did not explain the spike sorting pipeline sufficiently thoroughly. As we now clarify in the text:

      This pipeline uses SpikeInterface [19] to perform the operations critical for spike sorting, but also tracks all of the parameters used and provides a system for tracking multiple sorting curations.

      Thus, Spyglass takes advantage of the special purpose routines within SpikeInterface, but also provides an organizational framework for the outputs, and, equally critically, allows direct use of the outputs of sorting in downstream analyses with the ability to go back and know which sorting parameters were used for that analysis.

      (9) The authors state: "These are saved as Docker containers and optionally uploaded to DANDI." However, it is unclear how end users are expected to interact with these containers. Additional guidance or an example interaction would be valuable.

      We agree that this interaction was not described in the text, and we have now added the following to explain how a user might interact with these containers:

      ...This can be done by (i) hosting the database on the cloud and granting access to users outside the lab; or (ii) exporting and sharing parts of the database that were used by the project. Spyglass facilitates the second option by providing functions that automatically log the table entries and NWB files used for creating figures of a manuscript in a Python environment (Table 1, 05_Export). The dependencies of these entries are traced through the database to compile the complete set of raw, intermediate, and plotted NWB files and their corresponding database entries. These are stored in the `Export` table, which also generates a bash script to create SQL dumps of the identified database entries.

      To upload these files to DANDI, users must first register a new dandiset for their project and record their API and dandiset ID. With this information, they can then use the method `DandiPath.compile_dandiset()` to automatically validate, organize, and upload all project files to the DANDI archive. Additionally, this process stores the archive information for each file in the `DandiPath` table, allowing `fetch_nwb` to automatically stream data from the DANDI cloud storage when not available locally.

      To create a sharable docker image of the project, we provide a template repository spyglass-export-docker. Users first download a local copy of this repo and copy the SQL dump file, environment yaml, and figure-generating notebooks generated during spyglass export into the appropriate folders. Running the provided docker compose scripts then generates two linked docker containers: one running the reconstructed spyglass SQL database, and a second connected to this database and running a jupyter hub with a python environment matching that used when generating the figures. These can be readily shared with new users to provide them immediate access to all steps of the analysis process and the corresponding data through DANDI streaming

      (10) The phrase "not requiring a central location to track available files and providing a user-friendly Python API" is somewhat vague. Does this imply that multiple sources can exist for the same NWB file? How does the system handle potential version conflicts, such as when an NWB file is modified locally? A clearer explanation would help users understand the system's behavior in collaborative scenarios. "

      This is an important point that we now explain in the manuscript:

      Critically, the downloaded files are never modified locally within Spyglass and attempt to access a modified file would result in a DataJoint error. This ensures that each user is working on the same underlying data even if they are at different sites.

      To provide interested readers with more details, we also now point them to the repo for more information:

      We point interested readers to the Kachery GitHub repo (https://github.com/magland/kachery) for further descriptions.

      (11) "The concept of a ‘kachery zone’ in Figure 4 is ambiguous. Is this storage local or in the cloud? If a third-party storage system is involved, it should be explicitly labeled and described in the diagram."

      We agree that the depiction of a Kachery zone in Figure 4 is hard to understand. For the reviewer’s reference, a Kachery zone defines a list of users that have permissions to upload and download a particular set of files that have been linked to that zone. This is a explained in the tutorials, and to simplify the figure we have replaced the Kachery zone with a remote computer.

      (12) If one of the manuscript's goals is to showcase the functionality of the pipeline, Figure 5 would be more informative if it also illustrated the workflow or steps involved in generating the displayed figures.

      We have added a supplementary figure (Supplementary Figure 1) related to figure 5 that illustrates the main data workflow used in generating the figure. In addition, we note that the code for generating the figure 5 and supplemental are included in the code repository for the paper (https://github.com/LorenFrankLab/spyglass-paper/).

      (13) In the conclusion, the authors write: "By contrast, Spyglass begins with a shared data format that includes the raw data and offers both transparent data management and reproducible analysis pipelines using a formal data structure." However, the tools discussed in the previous paragraph seem to offer similar capabilities. The real challenge in transparent data management often lies in the technical overhead associated with setting up and maintaining a database, particularly when collaborating across labs.

      Here we may not have explained the differences between Spyglass and these other approaches sufficiently clearly. The various tools mentioned in the paragraph above this one do not begin with a shared format nor do they include a formal data structure. That said, we agree that maintaining a database accessible across labs is a key challenge. We note here that we provide tutorials to ease this process, which are linked and described in the manuscript (e.g. Table 1).

      (14) Specifying a preferred IDE… may not be necessary. This recommendation could be made optional or omitted."

      We agree that it may not be necessary, but we have also noted that users come to Spyglass with a very wide range of expertise, and in our lab it has been helpful to specify the IDE.

      Reviewer #2 (Public review):

      Summary:

      This valuable paper presents Spyglass, a comprehensive software framework designed to address the critical challenges of reproducibility and data sharing in neuroscience.

      The authors have developed a robust ecosystem built on community standards such as NWB and DataJoint, and demonstrate its utility by applying it to datasets from two independent labs, successfully validating the framework's ability to reproduce and extend published findings. While the framework offers a powerful blueprint for modern, reproducible research, its immediate broad impact may be tempered by the significant upfront investment required for adoption and its current focus on electrophysiological data. Nevertheless, Spyglass stands as an important and practical contribution, providing a well-documented and thoughtfully designed path toward more transparent and collaborative science.

      Strengths:

      (1) Principled solution to a foundational challenge:

      The work offers a concrete and comprehensive framework for reproducibility in neuroscience, moving beyond abstract principles to provide an implemented, end-to-end ecosystem.

      (2) Pragmatic and robust architectural design:

      Features such as the "cyclic iteration" motif for spike-sorting curation and the "merge" motif for pipeline consolidation demonstrate deep, practical experience with neurophysiological analysis and address real-world challenges.

      (3) Cross-laboratory validation:

      The successful replication and extension of published hippocampal decoding findings across independent datasets strongly support the framework's utility and underscore its potential for enabling reproducible science.

      (4) Accessibility through documentation and demos:

      Extensive tutorials and the availability of a public demo environment lower some of the barriers to adoption.

      We appreciate the Reviewer’s recognition of these strengths.

      Weaknesses:

      (1) High barrier to adoption:

      The requirement to convert all data into NWB, maintain a relational database, and train users in structured workflows is a significant hurdle, particularly for smaller labs.

      We agree that this is a significant hurdle, but we also believe that it comes with many advantages. It is also increasingly easy to do given the many community-supported tools, regardless of how much resource the lab has. These points are discussed in detail in “Why NWB?” section.

      We also note that, to our knowledge, there is no simpler alternative that provides the key features of Spyglass.

      (2) Limited tool integration:

      The current pipelines, while useful, still resemble proof-of-principle demonstrations.

      Closer integration with established analysis libraries such as Pynapple and others could broaden the toolkit and reduce duplication of effort.

      Here we clearly failed to explain that we have integrated other libraries, including Pynapple. We now make this clear in the Results section:

      Our goal was take advantage of other open source packages, and we have therefore integrated support for Pynapple [21], a general purpose neural data analysis package. We also built our pipelines to take advantage of other community-developed, open-source packages, like GhostiPy [20], SpikeInterface [19], DeepLabCut [2] and Moseq [29].

      We also have added a specific reference to the relevant function call in the Practical use cases and extensions section:

      For example, the user can conveniently read specific data types from the NWB file by first ingesting it into Spyglass and accessing database tables with Spyglass functions (e.g. fetch_nwb) or even load those objects in a format compatible with Pynapple [21] (fetch_pynapple).

      Pynapple support is actually aided by our design choice of relying on NWB. Because NWB files can be loaded by Pynapple, any analysis that uses a NWB file that can be read by Pynapple can be loaded as a Pynapple object. We have provided methods to do so.

      (3) Experimental metadata support:

      While NWB provides a solid foundation for storing neurophysiology data streams, it still lacks broad and standardized support for experimental metadata, including descriptions of conditions, subject details, and procedures, as well as links across datasets. This limitation constrains one of Spyglass's key promises: enabling reproducible, crosslaboratory science. The authors should clarify how Spyglass plans to address or mitigate this gap - for example, by adopting or contributing to metadata extensions, providing templates for experimental conditions, or integrating with complementary systems that manage metadata across datasets.

      This is an important point. First, NWB provides methods for creating new metadata extensions, and our laboratory has contributed to multiple such extensions and have adopted metadata extensions as they come to exist (for example, we are currently integrating the ndx-pose extension, which has broader support for pose estimation algorithms such as DLC and SLEAP, enabling us to capture relationships between body parts). These extensions, once incorporated into NWB, make it easy to create parallel Spyglass tables that read in the associated metadata. Second, we note that by storing the metadata from the NWB file in a database, Spyglass naturally supports searches across datasets where the metadata is the same (e.g. all the datasets from a given subject or using a given behavioral apparatus).

      That said, for these searches to be easy, the underlying NWB files need to use the same ontologies (naming systems). Creating shared naming systems within and across labs is very challenging, but even here having a database helps greatly, as it provides a way to find all the names used for a given field and to thereby make an effort to standardize them.

      Finally, while Spyglass aims to enable reproducibility, it will not be possible to solve all standardization issues of the field. We believe that Spyglass is an important step forward in standardization and reproducibility in that it encourages users to use the same data format and processing. To our knowledge, there is no software like it in the field of systems neuroscience. Limitations of the field and of current progress does not invalidate the contribution of Spyglass as a framework.

      We now mention all these issues in the Limitations section of the Discussion.

      (4) Cross-laboratory interoperability:

      While demonstrated across two datasets, the manuscript does not fully address how Spyglass will handle the diversity of metadata standards, acquisition systems, and labspecific practices that remain major obstacles to reproducibility.

      We agree that the current version of Spyglass does not fully address this diversity. Neverless, we note that the NWB standard is increasingly widely adopted in our field, and that by building on this standard, it is much similar to create structures that store relevant data across labs.

      (5) Visualization limitations:

      Beyond the export system and Figurl, NWB offers relatively few options for interactive data exploration. The ability to explore data flexibly and discover new phenomena remains limited, which constrains one of the potential strengths of standardized pipelines.

      We agree that there are many other tools, and we have considered additional integrations. We have chosen not to proceed in this direction because the various visualization tools are well constructed, and therefore already easy to use with data retrieved from Spyglass. Thus, users can choose to use Matplotlib, Seaborn, or any of many other visualization tools and apply thos to data accessed through Spyglass without the need for more explicit integration.

      Spyglass is well-positioned to become a community framework for reproducible neuroscience workflows, with the potential to set new standards for transparency and data sharing. With expanded modality coverage, tighter integration of existing community tools, stronger solutions for cross-lab interoperability, and richer visualization capabilities, it could have a transformative impact on the field.

      We appreciate this summary and will continue to try to make Spyglass more powerful, generalizable, and accessible to the community.

      Reviewer #2 (Recommendations for the authors):

      (1) Documentation/User onboarding:

      While extensive documentation exists, new users may feel overwhelmed. A single Quickstart or "golden path" guide and a one-command validation script would substantially improve usability.

      As mentioned in the response to reviewer 1, we have added an interactive quickstart program to walk users through installation and setup (scripts/install.py) and validate the install (scripts/validate.py). This should greatly reduce the complexity of the set-up process and allow new users to use Spyglass quickly and confidently. We thank the reviewer for the suggestion.

      (2) Permission handling and multi-user scaling:

      Current ad hoc solutions (like cautious deletes) may not scale well in large collaborations. This should be acknowledged, but it is not a fatal weakness given the framework's early stage.

      This is a fair point and we now mention this when cautious delete is introduced in the Methods:

      Though this is not a formal permission-management system, it serves to prevent accidental deletions. We note that this system does incur additional overhead, and while that has not been an issue for us, it is possible that this would become problematic in use for much larger cross-laboratory collaborations.

      (3) Benchmarking and performance evaluation:

      "More systematic testing (e.g., reproducibility across independent users, computational efficiency) would be reassuring, but the lack of it does not invalidate the proof-of-principle demonstration. "

      We agree. So far at least two other labs have adopted this system and we are working with a consortium funded by the Simons Foundation to use Spyglass as a data sharing system across a larger number of labs.

      (4) Support for Cloud solution:

      To lower the barrier to adoption, the authors should consider cloud integration, such as preconfigured Docker/Cloud templates or hosted options, so end-users do not need to maintain databases and storage locally.

      We agree that cloud-based solutions could be a good option for some labs, although we note that the cost of cloud-based computing can be very high. There is also the burden of moving and storing the data to where it needs to be processed, which can be particularly time intensive with the large-scale data being generated by many laboratories.

      At the reviewer’s suggestion, we have added a docker-compose support to lower the barrier to adoption. This includes:

      docker-compose.yml with health checks and persistent storage

      .env.example configuration template

      This allows one-command database setup: `docker compose up –d`

      (5) Integration of greater modalities:

      The authors should consider expanding support to other major data types, particularly calcium imaging, photometry, and other optical physiology data.

      We entirely agree that pipelines to ingest and process these datatypes would be very valuable, and we would welcome collaborations with experts and the general community to build these pipelines. We are, for example, working with a collaborating lab on a photometry pipeline. However, we only have so many people to build and maintain Spyglass, so we are limited by the capacity and expertise of our developers.

      (6) Integrate more community tools:

      Closer integration with community tools such as Pynapple, Neurosift, and SpikeInterface would broaden functionality and position Spyglass as a hub rather than a parallel ecosystem.

      As we mentioned in our responses to Reviewer 1, we entirely agree, and in fact we have already integrated Pynapple support into Spyglass. Because we store files in the NWB format and Pynapple supports NWB, it was easy for us to convert any data we have into the Pynapple format upon request, thus making it easily analyzable by the Pynapple package. Moreover, we use SpikeInterface for the SpikeSorting pipline, and similarly provide pipelines built on other open source projects. As we now clarify in the text:

      Spyglass includes pipelines for a diverse range of analysis tasks in systems neuroscience, such as the analysis of LFP, spike sorting, video and position processing, and fitting state-space models for decoding neural data. Tutorials for all pipelines are available on the Spyglass documentation website (Table 1). Our goal was take advantage of other open source packages, and we have therefore integrated support for Pynapple [21], a general purpose neural data analysis package. We also built our pipelines to take advantage of other community-developed, open-source packages, like GhostiPy [20], SpikeInterface [19], DeepLabCut [2] and Moseq [29].

      (7) Direct Dandi archive upload functionality:

      Scripts and tutorials for uploading data directly from Spyglass to DANDI, with validation of metadata completeness, would provide users with a direct pipeline from raw data to a public archive.

      The tutorials for DANDI upload are included as part of the export tutorial notebook (https://lorenfranklab.github.io/spyglass/latest/notebooks/05_Export/). We agree that this was not apparent from the manuscript before and have noted this within the Manuscript table describing these notebooks.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors wanted to determine whether the set-19 gene, one of 38 SET-domain containing genes in C elegans, has a clear function in vivo with respect to lysine methylation. The question is not only whether it can modify this histone tail residue, but also what the impact of a loss of this locus is on the inheritance of repressive chromatin states.

      Strengths:

      The authors clearly achieved their goal, and it is convincingly shown that SET_19 is indeed a somatic cell histone methyltransferase with a striking specificity for H3K23. There is both recombinant protein work, quantitative mapping in vivo, of histone marks and transcriptional changes, and the authors rule out some other hypotheses that have been in the literature. Overall, this provides a compelling argument that SET-19 is indeed the major somatic cell HMT for this residue. Interestingly, the phenotypes are rather minimal, consistent with redundancy in the physiological roles of histone methylation, and redundancy as well in HMT function. For the most part, the data are not over-interpreted. The genetic alleles used, assuming they are confirmed, were revealing and well-documented.

      Thanks very much for the positive comments on our work.

      The alleles used in this study were confirmed by PCR and Sanger sequencing, and the sequence information will be added in the revised manuscript.

      Weaknesses:

      The major weaknesses are easily fixed. The major weaknesses mainly reflect a slight overstatement of certain data (claiming insignificance, when it is not clear how that was determined) and claiming a bit too much about SET-32, which was independently claimed to be an H3K23 HMT. Clearly, the two SET domain enzymes are not redundant, nor is the claim that SET-32 has no role in H3K23 methylation completely convincing. Especially in germline or embryonic conditions. Finally, the imaging is not of very high quality, nor are the images fully quantitated. These points can be easily remedied.

      Thanks very much for the comments.

      We agree that some interpretations in the original manuscript were too strong, particularly regarding the negative results and the role of SET-32. Our in vitro assays show that SET-32 exhibits H3K23me1 activity and, at higher SAM concentrations, activity toward H3K23me2/3. These findings indicate that SET-32 does have a role in H3K23 methylation. SET-32 is expressed in germ cells, oocytes, and embryos. It is quite likely that redundancy of H3K23 methyltransferase activity exists in these tissues. In the revised manuscript, we will tone down the interpretations and expand the Discussion section to include this possibility. We will also replace the relevant images with higher-quality versions and provide quantitative analyses for Figures 6a and 6b.

      Reviewer #2 (Public review):

      Summary:

      This manuscript identifies SET-19 as a somatic H3K23 methyltransferase in C. elegans, building on previous genetic evidence for a role of set-19 in H3K23me3 regulation. The authors combine quantitative mass spectrometry, western blotting, in vitro methyltransferase assays, ChIP-seq, and RNA-seq to show that loss of set-19 causes a strong reduction of H3K23me3, particularly in somatic tissues, and is associated with derepression of a subset of genes enriched for H3K23me3. They further conclude that SET-19 is dispensable for canonical feeding RNAi and for transgenerational or intergenerational inheritance of RNAi, distinguishing its function from other heterochromatin-associated methyltransferases such as SET-25, SET-32, and the H3K27 HMTs. Overall, the work adds an important piece to the H3K23 methylation pathway and tissue-specific chromatin regulation in C. elegans.

      Strengths:

      Very strong genetic and biochemical evidence for SET-19 as the major H3K23me3 HMT.

      The mass spectrometry and western blot data convincingly demonstrate a strong reduction of H3K23me3 in two independent set-19 alleles and rescue by GFP::SET-19, which is a major strength (Figure 1, including Figure 1f).

      The in vitro methyltransferase assays (Figure 2) showing robust H3K23me1/2/3 activity for SET-19 SET+CC and only modest H3K23me activity for SET-32, together with the SAM titration experiment in Figure 2C, are very informative and nicely support the conclusion that SET-19 is a high-activity H3K23 methyltransferase compared to SET-32.

      The ChIP-seq analysis is central to the conclusions that H3K23me3 is enriched on chromosome arms, co-localizes with H3K9me3/H3K27me3, and is strongly reduced in set-19 mutants.

      Thanks very much for the positive comments on our work.

      Weaknesses:

      (1) The global reduction of H3K23me3 in Figure 3b,c and Figure S4c is convincing, but the correlation analysis between H3K23me3 loss and mRNA changes in Figure 3g could be strengthened. Currently, the analysis appears to focus on broad categories; it would be helpful to provide:

      Representative genome browser tracks (e.g., exemplary gene coverage plots) for several genes that show clear H3K23me3 peaks in wild type, reduction in set-19, and concomitant upregulation of mRNA levels, and for a few genes that retain H3K23me3 and do not change expression. This would make the link between chromatin changes and transcriptional output more concrete.

      Thanks very much for the suggestion.

      To address this point, we will include representative genome browser tracks for selected genes in the revised manuscript. These examples will help better illustrate the relationship between H3K23me3 loss and mRNA expression changes.

      (2) In Figure S4C, the authors note a pronounced reduction of H3K23me3 mainly on chromosome arms, but in the current data, it appears that the impact might be arm-specific (i.e., stronger reduction in one arm than the other in a chromosome), with a notable pattern at the X chromosome tip where H3K23me3 seems increased. This is potentially interesting and should be briefly commented on in the Results or Discussion, for example, whether this reflects compensatory activity of another HMT, changes in chromatin organization, or could be a technical artifact.

      Thanks very much for bringing up this point.

      As shown in Figure S4C, the overall chromosomal distribution pattern of H3K23me3 is broadly similar between wild type and set-19 mutants, with pronounced enrichment over one chromosomal arm, whereas the center and the opposite arm show relatively lower signal. In set-19 mutants, this asymmetry becomes more pronounced, with a larger difference between the highly enriched arm and the lower-signal regions. This pattern is particularly evident on chromosomes I, II, V, and X. These observations suggest that the effect of set-19 loss on H3K23me3 is not uniform across chromosomal regions.

      Substantial H3K23me3 signal remains in specific regions in set-19 mutants, suggesting that additional enzyme(s) also contribute to H3K23me3 methylation. For example, SET-19 appears to function predominantly in somatic tissues, yet the ChIP-seq assays were performed using whole animals, including the germline. Alternatively, there might be compensatory activity of another HMT. In the revised manuscript, we will state these points more explicitly in the Results section and discuss the residual and locally increased H3K23me3 signals.

      (3) Figure 3d suggests that some actively expressed genes can also display relatively high H3K23me3 levels, which complicates a simple model of H3K23me3 as exclusively repressive. If feasible, a limited additional analysis stratifying genes by both H3K23me3 and H3K9me3/H3K27me3 status might clarify whether these highly expressed, H3K23me3 marked genes differ in other chromatin features.

      Thanks very much for the suggestion.

      To address this point, we will perform additional stratified analyses of H3K23me3-marked genes according to their H3K9me3 and/or H3K27me3 status. We will also compare highly and weakly expressed H3K23me3-marked genes to examine whether they differ in other chromatin features, including H3K9me3, H3K27me3, and, if feasible, H3K4me3 and H3K36me3.

      (4) The authors argue that SET-19 primarily affects H3K23me3 and not other canonical repressive marks, based largely on mass spectrometry. It would significantly strengthen the mechanistic conclusions if the authors could assess H3K9me3 and H3K27me3 profiles in set-19 mutants, ideally by ChIP-seq or at least by focused ChIP-qPCR at a subset of loci that lose H3K23me3 and are derepressed at the RNA level. This would address whether H3K23me3 loss occurs independently of changes in other heterochromatin marks, or whether there is crosstalk.

      Thanks very much for the suggestions.

      As suggested, H3K9me3 and H3K27me3 ChIP-seq in wild-type and set-19 mutants will be performed. We will compare their genome-wide distributions and identify loci with significantly altered H3K9me3 and/or H3K27me3 enrichment. These analyses should help clarify whether H3K23me3 loss occurs largely independently of H3K9me3/H3K27me3 changes or reflects potential crosstalk among these repressive chromatin marks. In addition, we will examine H3K9me3 and H3K27me3 enrichment at genes showing both H3K23me3 loss and increased mRNA expression in set-19 mutants to assess whether derepression at these loci is accompanied by changes in other canonical repressive marks.

    1. Author response:

      [These author responses are to reviews from another journal.]

      Reviewer #1:

      This manuscript investigates the behaviour of a variety of clock proteins in cultured cells when epitope tagged and transiently expressed and try to draw general implications for endogenous function of circadian clock proteins.

      Clock proteins are expressed at low levels in most cells, and so the clock interacting proteins (other kinases, phosphatases, ubiquitin-conjugated enzymes, etc.) are likewise probably at low abundance. Over-expression of one or two or even three components of a multicomponent system is going to produce odd and obscure non-physiological imbalances. The authors do not extend detailed study of these imbalances to more physiologic levels so the importance of their observations to clock function is not clear, and importantly, they are not tested in more biologically relevant models.

      To study the function of components within a system, the steady state must be perturbed in one way or another. This can be achieved through pharmacological treatment, mutagenesis, downregulation, or overexpression. Such interventions are inherently non-physiological, and the relevance of the resulting observations must therefore be carefully validated.

      In our study, the purpose of PER2 overexpression was to investigate its subcellular dynamics in the absence and presence of CRYs, specifically CRY1. This is far less trivial than it might appear at first glance, because our data clearly show that PER2 overexpression triggers, within 24 h, the accumulation of endogenous CRY1 (Fig. 1A), due to PER2-mediated stabilization of CRY1 (Fig. 4). PER2 overexpression also induces the accumulation of endogenous PER1, CK1, and BMAL1 (Fig. 2).

      This effect was not considered in previous studies, such as Yagita et al. (2002), in which PER2 subcellular localization was assessed at a single time point following transient transfection. Yagita et al. found roughly equal proportions of cells with PER2 exclusively in the nucleus, exclusively in the cytoplasm, or distributed between both compartments. Such extreme cell-to-cell variability cannot be explained solely by PER2’s shuttling dynamics, as that would imply synchronous export in one cell and synchronous import in another.

      Our time-resolved analysis of DOX-induced PER2 expression strongly suggests that the variability reported by Yagita et al. reflects a heterogeneous population of unsynchronized cells at different temporal stages along a trajectory from cytoplasmic PER2 (unbound) to nuclear PER2 fully saturated with CRYs (bound), owing to stabilization of endogenous CRYs. Similarly, Öllinger et al. (2014) analyzed PER2 nuclear export in cells constitutively expressing PER2-Dendra. Under such steady-state conditions, PER2-Dendra is already in complex with endogenous CRYs. The slow export rate and lack of dependence on additional CRY1 expression therefore likely reflect export of the complex, which is intrinsically slow.

      Thus, prior to our work, no data on the true shuttling dynamics of PER2 were available.

      Importantly, our results show not only that CRY1 promotes nuclear accumulation of PER2 (as reported by Öllinger et al.) but also that, conversely, PER2 promotes cytosolic accumulation of CRY1, depending on their expression ratio. Since CRY1 is predominantly nuclear and PER2 predominantly cytosolic, and because a PER2 dimer can bind one or two CRY1 molecules, our data suggest that the shuttling equilibrium depends on PER2 saturation state: a PER2 dimer bound to one CRY1 remains cytosolic, whereas a dimer bound to two CRY1 is nuclear.

      These observations are novel and have not been reported previously. They were only possible through time-resolved analysis of overexpressed proteins.

      A number of the findings are confirmatory rather than novel - the phosphorylation-regulated nuclear-cytoplasmic shuttling of CK1 and PER proteins is long known, and it's not clearly stated what is novel here. 

      We acknowledge prior work by Milne et al. (2001), who showed that kinase-dead CK1 is predominantly nuclear and that prolonged treatment with leptomycin B (16 h) enhances its nuclear localization. We cite this study at the beginning of the relevant paragraph. While we confirm these earlier observations, our work extends them in several important and novel ways:

      (1) Rapid dynamics of CK1 localization – We show that pharmacological inhibition of CK1 with PF670 induces rapid (within 1 h) depletion of CK1δ from the centrosome, accompanied by nuclear accumulation and elevated CK1δ levels. These kinetics have not previously been reported. We also show that proteasome inhibition with MG132 enhance centrosomal staining, indicating that centrosomal binding sites are not saturated. Together, the data show that CK1δ equilibrates rapidly between its binding partners. 

      (2) Integration of localization with protein stability – We relate the known localization patterns of WT CK1 and the kinase-dead mutant K38R to CK1 degradation dynamics and further compare them to the tau-like kinase mutant CK1δ-R1178Q. This integration of subcellular localization data with turnover mechanisms provides new mechanistic insight.

      (3) Comprehensive regulatory model – In the revised manuscript, we now include a schematic summarizing how CK1δ is posttranslationally regulated via subcellular shuttling, nuclear degradation, and dynamic interactions with binding partners (Figure EV5C). To our knowledge, such a comprehensive view of CK1δ regulation, linking localization, stability, and partner association, has not been presented before.

      We believe these additions clearly distinguish our findings from prior reports and highlight the novel aspects of our study.

      The formation of PER and CRY and CK1 complexes likewise is well established. The finding that formation of multiprotein complexes stabilize otherwise unstable over-expressed proteins is interesting but not novel.

      We fully agree that the existence of PER–CRY–CK1 complexes is well established. It is also known that PER2 stabilizes CRY1 by occupying the FBXL3 binding site and that CRY1 promotes the nuclear accumulation of PER2. We do not present these established interactions as novel findings.

      Our novel contribution, as outlined above, is the discovery that the shuttling and subcellular localization of PER2 and CRY1 are mutually dependent on their expression ratio. Specifically, we show for the first time that the steady-state shuttling distribution PER2 alone is cytosolic due to its rapid nuclear export wherease CRY1 is predominantly nuclear (known). Given that CRY1 facilitates the nuclear import of PER2 (known) and that a PER2 dimer can bind either one or two CRY1 molecules, our data showing that cytoplasmic PER2-CRY1 foci contain less CRY1 than nuclear foci lead us to conclude that cytoplasmic PER2 complexes contain one CRY1 molecule, while nuclear complexes contain two.

      This model provides a mechanistic explanation for the distribution of PER2 between the cytosol and nucleus and for the relatively lower cytosolic CRY1 levels. Moost importantly, we further show (for the first time) that CK1-mediated phosphorylation of PER2 displaces CRY1. This phosphorylation event would produce PER2 dimers with one or no CRY1 bound, promoting their export to the cytosol. We believe this represents a novel and potentially important mechanism for regulating circadian clock function.

      The results from many of the imaging assays are not quantitated, and the figures often show single cells. It's hard to draw statistical significance from these.

      The phenotypes we report here are result of multiple technical and biological replicates (n >3). Image analysis and statistical analysis was performed when required. We show additional examples in the EVs.

      There are a number of phenomena seen whose physiological relevance is unclear. In figure 1, forced over-expression of CRY1 and PER2 leads to formation of nuclear foci. It is unlikely these foci form at non-overexpressed levels, and so the general interest and relevance is not high nor investigated. This reduces the impact of the finding.

      It has been shown that PERs and CRYs do not form thermodynamically stable, large (detectable) foci under physiological conditions, as we have stated in the manuscript. Whether these proteins have the propensity to form smaller, more dynamic structures of physiological relevance is an interesting question that could be explored elsewhere, but it is not relevant to our study. In our work, these foci are simply convenient markers for analyzing the interaction and subcellular (co)localization of clock proteins under investigation. In the revised version, we have kept the analysis of these foci and the discussion of their potential relevance to a minimum in order to avoid confusion and unnecessary discussions.

      The finding that CK1δ is keep in the dephosphorylated state by binding to PER has been established previously by Johnson and colleagues and should perhaps be mentioned (Qin JBR 2015 (doi: 10.1177/0748730415582127).

      There is clearly a misunderstanding here. Qin et al.’s data show that, in a cell-free system, CK1ε phosphorylates PER2 and also autophosphorylates its C-terminal tail (autoradiograph, Fig. 1E).  

      However, because PER2 phosphorylation is carried out by CK1ε that is tightly anchored to PER2, there is competition between PER2 phosphorylation and tail autophosphorylation. As a result, the kinetics of tail phosphorylation are slower (Fig. 3B and quantification in C) than those observed with free CK1ε (as seen in the presence of the p53 substrate, Fig. 3A,C). We believe that his is also happening in the cell.

      Author response image 1.

      Our data, in contrast, address a different point. It has been known from the Virshup lab for decades that CK1δ/ε undergo futile cycles of (auto)phosphorylation and dephosphorylation, resulting in an active, dephosphorylated kinase in cells because cellular phosphatases are more efficient than CK1 autophosphorylation. We now show that CK1δ is also efficiently dephosphorylated when bound to PER2 (Fig. 3). Nevertheless, despite dephosphorylation of PER2-bound CK1δ, PER2 itself becomes hyperphosphorylated, indicating that cellular phosphatases act differently on these two substrates. To clarify this point, we inhibited phosphatases with calyculin A (CalA). Under these conditions, both PER2 and PER2-bound CK1δ became efficiently hyperphosphorylated (new Fig. 3).

      The degradation of kinase-active but not inactive CK1 is only shown here with 50-fold overexpressed protein so it's interesting, but the relevance to circadian biology is not made clear. The fact that over-expressed CK1 is degraded primarily in the nucleus is interesting, but needs further characterization - is this affected by the epitope tag? Is it true of endogenous CK1 or only over-expressed CK1? Is this not seen with e.g. other forms of CK1, e.g. lacking the C-terminus?

      The observation that unassembled kinase is rapidly degraded is most clearly demonstrated by overexpression experiments. However, Fig. 3 shows that overexpression of CRY1 and PER2 leads to the accumulation of elevated levels of endogenous CK1δ (untagged), indicating that endogenous kinase is likewise degraded in the absence of a stabilizing binding partner. In addition, we present data showing that overexpression of tagged CK1δ reduces the levels of endogenous, untagged CK1δ, further supporting the conclusion that unassembled endogenous CK1δ is unstable and subject to degradation.

      Further characterization of the CK1 degradation pathway is of considerable interest and could form the basis of a separate study, particularly to identify the components that mediate activity-dependent nuclear export and activity-dependent nuclear degradation. The Δ-tail kinase is expressed at very low levels, although interpretation is complicated by the possibility that this reflects pleiotropic effects.

      The final figure, showing that nuclear CK1 is the form responsible for shortening rhythms, is interesting. Is this because massive increases in nuclear CK1 alter PER, or BMAL/CLOCK, or proteasome activity?  

      Our data show that cells expressing either nuclear or cytosolic CK1 are viable, proliferate normally, and maintain a functional circadian clock. Therefore, overexpression of the kinase does not produce pleiotropic effects.

      To assume it's due to PER phosphorylation is in disagreement with the studies of Meng et al. Neuron 2008 DOI 10.1016/j.neuron.2008.01.019.

      The data are not in disagreement with Meng et al.; in fact, they align quite well. Meng et al. showed that CK1ε-tau shortens the circadian period, which we had also previously reported for CK1δ-tau-like (Marzoll et al., 2022). We now demonstrate that CK1δtau-like is enriched in the nucleus, contributing to its period-shortening phenotype. Furthermore, we show that active CK1δ (but not CK1δ-K38R) promotes cytoplasmic accumulation of PER:CRY complexes, consistent with PER2 degradation in the cytosol as described by Meng et al.

      Taken together, these findings suggest that PER proteins acquire their CK1 in the nucleus, and this interaction determines the circadian period length. Following a time delay—set by the kinetics of PER2 phosphorylation—PER2:CRY complexes are exported to the cytosol along with their bound CK1, where they are subsequently degraded.

      Reviewer #2:

      Interactions between the circadian clock proteins PER1/2 with CK1d/e and CRY1/2 influence each of their stability, subcellular localization, and activity, as countless studies over the last two decades have shown. However, many questions still remain, especially in light of newer models of the transcription-translation feedback loop (TTFL) in which the repression phase relies on two distinct mechanisms, a phosphorylation-dependent displacement of the transcription factor by CK1-PER-CRY complexes from DNA early in repression, and a CRY1dependent sequestration of the transcription factor activation domain later in repression. In particular, questions remain about mechanisms triggering nuclear entry/export and activity of these proteins in the cytoplasm and nucleus. 

      Here, the authors utilize a system of induced and/or transient overexpression of proteins with or without with fluorophores to track subcellular localization, stability, and interactions. As the authors point out throughout the manuscript, the overexpression of these clock proteins often causes them to behave differently from the endogenous proteins. It looks as though the authors have done their best to account for these changes, and they have certainly been rigorous in pointing them out, but there is concern that some of the conclusions may be influenced by this overexpression. For example, the relevance of work related to the overexpression-dependent foci is unclear. 

      Same answer as to Reviewer 1: It has been shown that PERs and CRYs do not form thermodynamically stable, large (detectable) foci under physiological conditions, as we have stated in the manuscript. Whether these proteins have the propensity to form smaller, more dynamic structures of physiological relevance is an interesting question that could be explored elsewhere, but it is not relevant to our study. In our work, these foci are simply convenient markers for analyzing the interaction and subcellular (co)localization of the clock proteins under investigation. In the revised version, we have kept the analysis of these foci and the discussion of their potential relevance to a minimum in order to avoid confusion.

      The findings that the stability of the kinase depend on localization, its intrinsic activity, and interaction with PER2 are interesting and important. Use of the CKBD deletion to show that CK1 stabilization depends on its anchoring interaction with PER2 is a nice touch. The authors bring up an excellent point that most of the potential phosphorylation sites on PER1 and PER2 have not been functionally characterized aside from the phosphoswitch mechanism. Their observation that CK1 eventually induces cytoplasmic localization of the CK1-PER-CRY1 complex and the release of CRY1 is intriguing. In particular, the finding that pretreatment of PER2 with CK1 in vitro blocked its ability to interact with CRY1 is very interesting. However, the absence of mechanistic data to explore this in more detail limits the impact of this conclusion. Using the system they have established here to identify the site(s) on PER2 and/or CRY1 that lead to this would help to solidify this work and increase the impact of this work. Overall, there are some interesting findings here but the inclusion of some competing viewpoints and mechanistic data would strengthen the impact of the work.

      Major

      (1) The characterization of the tau-like CK1 mutant R178C as less active than the wild type enzyme is not entirely correct-it is less active on the FASP region as described, but it has increased activity on S478 in the phosphodegron that is independent of inhibition from the FASP region (Gallego et al. PNAS, 2007 and Philpott et al. eLife, 2020). It is still possible that some of the period shortening effects of the mutant could arise from enhanced nuclear accumulation, but the oversimplified description of the mutant as less active should be corrected.  

      In the revised version, we discuss that the enhanced nuclear localization of the Tau-like kinase may contribute, at least in part, to period shortening, similar to how forced nuclear overexpression of wild-type kinase also shortens the period. We emphasize, however, that CK1 Tau is compromised in its priming-dependent activity, whereas its priming-independent activity is context-specific and enhanced toward the β-TrCP site.

      (2) One of main conclusions from the paper, that CK1 induces cytoplasmic localization of the CK1-PER2-CRY1 complex and subsequent release of CRY1 would be strengthened significantly by identifying the phosphorylation site(s) responsible for the cytoplasmic localization of the complex and the release of CRY1. The system they have developed here seems ideal to identify these sites.

      We fully agree with the reviewer. We substituted the known phosphorylation sites in PER2 surrounding the CRY-binding domain, but this had no effect on the phosphorylationdependent release of CRY1. Therefore, a more systematic analysis will be required, including the possibility that phosphorylations in CRY1 itself may contribute. To this end, we are generating PER2 and CRY1 variants in which all Ser/Thr residues are replaced by Ala. Using these constructs alongside the wild-type versions, we will by PCR systematically create hybrids in which specific regions containing phosphorylation sites are exchanged.

      Nevertheless, this will require considerable time and effort, and we believe this investigation exceeds the scope of the present manuscript and will address it in future work.

      (3) The concept of delayed release of CRY1 presented here is an interesting one. It's unclear why the authors have also not incorporated prior findings (Ukai-Tadenuma et al. Cell, 2012, Koike et al. Science, 2012) that peak levels of CRY1 are expressed in a later phase than CRY2, PER1, and PER2. It seems like figure EV6 should reflect the observation that CRY2 is the predominant cryptochrome present during early repression (Koike et al. Science, 2012).

      The reviewer is absolutely right: the expression phases of CRY1, CRY2, PER1, and PER2 are important. I have recently discussed these issues in detail in a News & Views article in The EMBO Journal, commenting on a paper by Smyllie et al. In this News & Views article, I discuss that the presently available data suggest that CRY1 is always present throughout the circadian cycle and keeps circadian transcription partially repressed even at peak phases of expression. In the revised version, I refer to these publications, including those mentioned by the reviewer. However, I would like to keep the model presented in the supplementary figure as simple as possible and specifically focused on the work presented in this manuscript, rather than presenting a comprehensive conceptual model of the circadian clock.

      (4) The model presented in figure EV6 and described throughout the text shows that PER-CRY complexes interact with CK1 in the nucleus, and not in the cytoplasm prior to nuclear entry. Prior work on endogenous protein complexes has shown that CK1-PER-CRY complexes exist in the cytoplasm very early on in the repression phase (Aryal et al. Mol Cell, 2017-ref. 14 in the manuscript). Work by Sancar and colleagues (Cao et al. PNAS, 2020) also shows with endogenous proteins that CK1d has a circadian pattern of nuclear entry (or possibly retention) concomitant with PER2 that is dependent on the presence of PERs and CRYs. Together, these data seem to be inconsistent with your model. 

      We think the data are not inconsistent. The recent Smyllie et al. paper in EMBO Journal shows that PER2 is present in both the cytosol and the nucleus at all times when it is expressed, but cytosolic PER2 is not saturated with CRY, which is more nuclear. Our data demonstrate that PER2 shuttles between the cytosol and the nucleus depending on its occupancy with CRYs (see schematic Fig. 1). Occupancy, in turn, depends on expression levels and binding affinities, including those of CRY2 and PER1. Consequently, PER2 complexes could shuttle continuously throughout the circadian cycle—either because they are not saturated with CRYs due to the balance between expression levels, freely available CRY, and binding affinity, or later in the cycle because CRYs are displaced by phosphorylation. If PER2 acquires casein kinase in the nucleus early in the cycle, it will shuttle out to the cytosol together with the bound CK1. We believe this does occur, but early in the circadian cycle the saturation of PER2 with casein kinase is likely to be very low due to the limited availability of CK1 in the nucleus. I am aware that not everyone will share this interpretation point by point, but discussing it in greater length and detail exceeds the scope of the present manuscript.

      Reviewer #3:

      This manuscript by Serrano and co-workers is a tight body of work that provides much needed insights into the regulation of clock proteins by CK1D, and into the regulation of CK1D itself. While the whole paper relies on artificial overexpression of chimeric/tagged proteins that may have significant differences in the function, the stability and subcellular distribution of the endogenous proteins they are suppose to model, this limitation was been clearly stated by the authors, and nevertheless their study still provides important insights. 

      While the authors have specified which Ck1d isoform (Ck1d1) they are overexpressing in their model cell lines, they may have thought to consider that the overexpression of one Ck1 homologue may affect the endogenous expression of the other homologues and their isoforms, e.g. ck1d1 overexpression may cause an increase in Ck1d2 or Ck1e, which would in turn affect the conclusions. 

      We show in revised Fig. 3 that overexpression of CK1δ1 reduces the expression of endogenous CK1δ1/2. This is consistent with our prediction that overexpressed and endogenous CK1 (including CK1ε) compete for the same stabilizing binding partners, leading to rapid degradation of unassembled kinases.

      Moreover, the antibody they used for endogenous Ck1d (which is ab85320, also mentioned as AF12G4 but that is the clone number, not the catalogue number) is discontinued and its specificity against Ck1d1, Ck1d2 or even the highly identical Ck1e, has not been clearly demonstrated. We know from Fig 3 that it can detect Ck1d1 but it would be great if the authors would provide additional evidence for the specificity of this antibody, for example by overexpressing Ck1d1/Ck1d2/Ck1e to see really which "endogenous" Ck1 we are seeing.

      Are the three bands for example seen in Fig 4A corresponding to the different isoforms? This simple experiment would reinforce the conclusions. 

      We show in the revised figure that the antibody recognizes CK1δ1 and CK1δ2, but not CK1ε. In U2OS cells, the antibody detects a single band (Figure); we do not know whether this represents predominantly one splice isoform or both, which are not resolved. However, this distinction is not relevant for our interpretation, because overexpression of tagged CK1δ1 reduces the expression of whichever endogenous kinase is present.

      There are no minor comments, as the figures, the figure legends and main text are all of good quality and ready for publication.

      Reviewers’ Responses to Point-by-Point Response to Peer Review 

      Referee #1:

      I appreciated the additional efforts by the authors to improve the manuscript. Unfortunately, the underlying approach of forced over-expression remains artifact-prone, and has been largely supplanted by readily available knockin and targeted mutagenesis methods. Over-expression may give clues, but I think more rigorous mechanistic validation is needed to make this compelling. I cannot support publication of this manuscript.

      Referee #2:

      In their response to reviewers, the authors make the valid point that the steady state of a system is usually perturbed to study it. In this study, they have used overexpression of the clock proteins PER2, CRY1 and CK1 to study their effects on subcellular dynamics and stability. In justifying this choice, they refer to several papers that similarly overexpressed at least one of these components, stating that their time-resolved approach brings novel insights. However, there is a missed opportunity here to translate any lessons learned from overexpression studies to a system where the proteins are expressed at physiological levels and stoichiometry.

      The authors reply to reviewer 1 stating that they conclude PER proteins acquire CK1 in the nucleus, but this does not account for other studies showing an apparent PER-CK1 complex in the cytoplasm during the early phases of repression and/or a pattern of PER-dependent nuclear entry of CK1 (Lee et al. 2001, Cell; Aryal et al. 2017 Mol Cell; Cao et al. 2021 PNAS). Given that all 3 of these studies were done with native expression levels, it seems incumbent upon the authors to demonstrate that their conclusions from the overexpression study are physiologically relevant by translating them in some way to a more native system. This also addresses a point made by reviewer 2, major concern 4 that was not satisfactorily addressed by the authors. Perhaps they could validate their hypothesis of PER shuttling and interactions with CK1 or CRY1 that alter this in a native system similar to Aryal or Cao et al. with the use of nuclear export inhibitors?

      The response to reviewer 2, major concern 1 is thoughtful and much appreciated. However, simplifying the effects of the tau mutation on CK1 as having a decreased rate on priming-dependent phosphorylation but not priming-independent is not quite true-the tau mutation also decreases the rate of priming-independent phosphorylation of S662 (in humans) (Philpott et al. 2020, eLife).

      Other papers appearing in this journal seem to all include at least one major new mechanistic insight. Although the authors do a diligent job in characterizing the overexpressed proteins in this system, some of their conclusions are at odds with prior studies of the system in more native conditions, so the potential impact of this work is unclear. To verify these conclusions or test new ones (ie, that CK1 disrupts PER-CRY1 interactions), they should use their insights to generate mutations or make perturbations in a native system and demonstrate that they still hold.

      Referee #3:

      The authors have adequately addressed the reviewers' comments, and it is my opinion that the manuscript is ready for publication. It is true, as previously mentioned by other reviewers, that the evidence presented rely on overexpression, which for the other reviewers seem to preclude publication. However, I find this to be a too strict opinion.

      If the authors had indeed provided evidence using crispr-cas9-mediated genetic manipulation and tagging/mutating endogenous genes for all their experiments, thereby providing more physiological evidence of how clock proteins interact, they would probably have submitted their manuscript to an alternative journal with a higher impact.

      As it stands, it is my opinion that, considering the evidence and limitations of the study, this manuscript is a good match for the journal.

      Author Rebuttal:

      Apologies for the delayed reply regarding our manuscript. In the meantime, we have added several new experiments which address the comments of the reviewers and more. These are now included as Figures 1C, EV3, 4D, 6E, 6F, EV6D, and EV7.

      Figure 1C reinforces our observations from Figure 1B showing that induction of stably-integrated PER2 also results in accumulation of endogenous CRY1 at a timescale that is compatible with the gradual localization of overexpressed PER2 into the nucleus.

      Figure EV3 addresses several technical comments from Reviewers #3 and #1, respectively: Figure EV3A shows that our CK1δ antibody recognizes CK1δ1 and CK1δ2, but not CK1ε. Figures EV 3B and C clearly show how overexpression of our transgenic CK1δ results in decreased endogenous CK1δ which further demonstrates the rapid turnover of active kinase.

      Figure 4D addresses the comment from Reviewer #2. We clearly show that CK1δ is not kept in a dephosphorylated state by binding to PER. In addition to our direct comment to this point, Figure 4D shows that CK1δ regardless if it is expressed alone or in complex with PER2 is phosphorylated to a similar extent when the cells are treated with the phosphatase inhibitor CalA. As indicated in our direct response, we are rather more interested in the observation that cellular phosphatases act differently on PER2 compared to CK1δ despite being in the same PER:CK1δ complex (as shown by the clear stabilization of overexpressed CK1δ by co-expression of PER2).

      Figures 6E, 6F, and EV6D demonstrate that our observations from overexpression systems are also observed in a more physiological context, addressing comments from Reviewers #1 and #2. Figure 6E shows that dephosphorylation of PER2 leads to its relocalization from the cytosol to the nucleus, while Figure 6F analyzes the subcellular localization of PER2 in the context of a functional circadian clock in U2OS cells. The latter demonstrates that PER2 is predominantly nuclear early in the circadian cycle, but redistributes to the cytosol at later time points. We included these experiments in response to the reviewer’s request for a more physiological context. Since we are not a mouse lab, this cell-based system represents the most physiological model we can provide. Figure 6F show the dynamics of endogenous PER2 from DEX-synchronized cells. At early timepoints, PER2 is predominantly nuclear likely due to the incorporation of CRY1 forming the PER:CRY complex. At later timepoints PER2 is redistributed between the cytoplasm and nucleus due to PER2 phosphorylation. Importantly, these results are consistent with and recontextualize the results from Liu et al. (Xie et al., PNAS, 2023) showing the hypophosphorylated PER2 at early timepoints post-DEX is predominantly nuclear and hyperphosphoryated PER2, that appear later post-DEX is predominantly cytoplasmic.

      Finally, Figure EV7 provides a model how the subcellular distribution of CK1δ affects its assembly into the PER:CRY complex emphasizing how nuclear kinase enacts its role in the circadian clock.

      Response to Reviewers:

      We were disappointed by the categorical rejection of overexpression experiments. Without a specific discussion of why they would be inappropriate or not sufficient in the context of the work presented here, the blanket assertion that overexpression inevitably produces artifacts functions more as a rhetorical device than as a substantiated scientific argument. The fact that the term ‘physiological’ generally carries a positive connotation, whereas ‘overexpression’ is often perceived negatively, does not in itself justify the categorical rejection of experiments.

      While we appreciate that some reviewers may personally prefer alternative strategies, we believe that the suitability of any approach must be evaluated in light of the specific biological questions being addressed. I cannot see a single specific point in the reviewers’ responses indicating that any of our experiments yielded artificial results. It is true that targeted knock-in and mutagenesis methods are available, however, these approaches are simply not suited to the questions raised in this manuscript. We also fully agree that, whenever possible, insights from overexpression studies should be validated in systems with a functional clock where proteins are expressed at physiological levels, which we did using U2OS cells, and noting the compatibility of our results with those in the literature using endogenously-tagged constructs. We have cited several recent studies that have investigated the subcellular distribution and circadian dynamics of endogenous or endogenously-tagged clock proteins in mice (Cao et al, 2021; Smyllie et al, 2022, 2016, 2025) and U2OS cells (Öllinger et al, 2014; Gabriel et al, 2021; Xie et al, 2023). While we cannot substantially expand on these previous observations, we confirm them in the revised version by demonstrating the nuclear-to-cytoplasmic relocalization of PER2 in U2OS cells over the course of a circadian cycle. In addition, we show that this process is, in principle, reversible: when CK1 is inhibited with PF670, overexpressed hyperphosphorylated cytosolic PER2 becomes dephosphorylated and accumulates in the nucleus.

      Overall, we consider our approach not only complementary but also essential, as it enables us to address two key questions that would otherwise be difficult or even impossible to resolve:

      (1) Mutual impact of PER2 and CRY1 on subcellular dynamics and the role of PER2 phosphorylation

      Evidence from mouse liver (Cao et al, 2021), mouse SCN (Smyllie et al, 2022, 2025), and U2OS cells (Xie et al, 2023) indicates that a substantial fraction of PER2 remains cytoplasmic throughout its expression cycle, even in the presence of CRY1, which promotes PER’s nuclear import. The mechanisms underlying this cytoplasmic retention remain unclear, and no circadian function has yet been attributed to the cytosolic PER2 pool. Our study addresses how PER2 abundance, phosphorylation state, and stoichiometry relative to CRY1 govern their interaction and subcellular dynamics. This is physiologically relevant because PER1/2 and CRY1/2 proteins oscillate in expression and degradation out of phase, such that their concentrations, stoichiometry, and phosphorylation state vary systematically over the circadian cycle. Transient transfection and inducible overexpression combined with time-lapse microscopy are essential here, as they uniquely allow modulation of protein ratios and CK1δ levels and to resolve their dynamics.

      Previous work established that CRY1 is nuclear and promotes PER2 nuclear accumulation (Smyllie et al, 2022). Our data extend this by showing that subcellular distribution is determined by the CRY1:PER2 ratio. While CRY1 alone is nuclear we show that PER2 alone is cytoplasmic due to rapid nuclear export. Mixed conditions reveal ratio-dependent shifts: at low CRY1-to-PER2 ratios, CRY1 relocalizes to the cytoplasm, whereas at high ratios, PER2 is retained in the nucleus. We explain this behavior by PER2 dimerization: dimers bound to two CRY1 molecules remain nuclear, while dimers bound to a single CRY1 localize to the cytosol. Such species can be expected to form in a physiological context depending on binding affinities and rhythmic expression levels and ratios across circadian time. Importantly, we show that CK1δ-mediated phosphorylation destabilizes PER2 and CRY1 interactions. From this, we infer that PER2 dimers with only a single bound CRY1 transiently form and accumulate in the cytosol, consistent with the lower CRY1-to-PER2 ratio we observe in the cytosol and that has also been reported in the SCN (Smyllie et al, 2025). With continued phosphorylation, PER2 dimers lose CRY1 altogether, while the released CRY1 accumulates in the nucleus. We suggest that this mechanism supports and extends the late repressive phase of the circadian cycle. Recent data show that hypophosphorylated PER2 is predominantly nuclear, whereas hyperphosphorylated PER2 is largely cytoplasmic in mouse liver (Cao et al, 2021; Xie et al, 2023), linking our data to a physiological context.

      Taken together, these findings suggest a mechanism whereby stoichiometry, subunit composition, and CK1δ phosphorylation determine PER:CRY complex composition and localization. Crucially, these complexes and their dynamic relocalization could only be observed using inducible overexpression; knock-in strategies at endogenous levels would not be able to capture such states.

      (2) Posttranslational regulation and subcellular homeostasis of CK1δ and impact on the clock

      Previous work has shown that nuclear export of CK1δ depends on its kinase activity (Milne et al, 2001). Here, we further demonstrate that unassembled CK1δ is subject to degradation, with nuclear turnover accelerated by its catalytic activity. Thus, when evaluating the impact of CK1δ mutants on the circadian clock, one must consider not only kinase activity but also protein stability and subcellular distribution. We find that CK1δ availability for PER2 differs between cytosol and nucleus. In particular, nuclear CK1δ is limiting, and its abundance directly determines circadian period length. This is significant because subcellular CK1δ availability and posttranslational regulation have not previously been examined or incorporated into circadian clock models, as the kinase has been assumed to be non-limiting given its constant expression throughout the circadian cycle. Complex formation between CK1δ and PER is a well-established determinant of circadian timing, with CK1δ overexpression known to shorten period length. Our data explain why: the binding equilibrium between CK1δ and PER must be finely tuned. Previous studies suggested that PER associates with CK1δ in the cytosol and enters the nucleus as a PER:CRY:CK1δ complex (Lee et al, 2001; Aryal et al, 2017). Our data suggest that nuclear PER is not saturated with CK1δ. This is because levels of free, active CK1δ in the nucleus are low, owing to its rapid export or degradation by the nuclear proteasome, which limits its availability for PER binding.

      Our overexpression studies support this mechanism. NES-tagged CK1δ overexpression does not alter circadian period length, because it fails to increase nuclear CK1δ levels: Each PER molecule can coimport only one kinase, a process already occurring in wild-type cells, and the few co-imported molecules rapidly equilibrate with the nuclear pool, where they are subject to export or degradation. In contrast, NLS-tagged CK1δ overexpression directly increases nuclear kinase abundance by antagonizing export, thereby enhancing PER binding and shortening circadian period. This multilayered regulation of CK1δ stability and localization and its consequences for PER2 availability would not have been revealed without targeted overexpression. Our findings therefore fill a key knowledge gap and remain fully consistent with previous studies (Lee et al, 2001; Aryal et al, 2017; Cao et al, 2021).

      Conclusion: In sum, our findings are novel and physiologically relevant, aligning with data from mouse liver and SCN. While studies at strictly endogenous protein levels are important and necessary, perturbation of steady state is a standard strategy to uncover and observe novel mechanisms. Endogenous-level experiments would demand technically unrealistic systems (for example, even the simplest case, analyzing the subcellular dynamics of PER2 alone, would require cells lacking PER1, CRY1/2, and CK1δ/ε). Moreover, adjustment of PER2-to-CRY1 ratios cannot be achieved with stably integrated genes and of course not at physiological expression levels. Thus, inducible overexpression is not merely practical but currently the most feasible approach to dissect these dynamics. We complement our findings with data from U2OS cells with a functional clock, showing that the availability of nuclear CK1δ directly determines circadian period length. Although specific aspects of our extended model require further experimental validation, no published evidence contradicts it to date. Mechanistic discussions of the circadian clock have so far focused primarily on PER protein degradation. Our model broadens this perspective by incorporating CK1δ homeostasis, PER:CRY complex composition, subcellular localization, and their regulation by phosphorylation. In doing so, it provides a detailed framework to be critically tested and refined in future studies.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      In Figure 1G and 1H we report TRAP+ abDGCs as a percentage rather than density because we are analyzing colocalization of the two markers, which are very sparse in this population. Given the very low number of double-labeled abDGCs, calculating density would not be practical. In the revised manuscript we have clarified the rationale for using these measures. As noted in the current text, we did not observe abDGCs co-expressing TRAP and c-Fos; we have made this point more explicit to guide interpretation of these data.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as itis preferentially involved in cognitive function? What happens in ventral DG?

      The sections shown in Figure 2 were obtained from the dorsal dentate gyrus (see Methods, “Histology and imaging”: stereotaxic coordinates −1.20 to −2.30 mm relative to bregma, Paxinos atlas). From a feasibility standpoint, it is not possible to analyze the entire longitudinal extent of the hippocampus with these low-throughput histological approaches. We therefore focused on the dorsal DG, for which there is a strong functional rationale. A large body of work indicates that the dorsal hippocampus, and specifically the dorsal DG, is preferentially involved in spatial memory and in the fine contextual discrimination that underlies pattern separation. The dorsal hippocampus is critical for encoding and distinguishing similar spatial representations, a core component of the high-cognitive demand task used here. In contrast, the ventral DG is more strongly associated with emotional regulation and affective memory processing and is less implicated in high-resolution spatial encoding. For these reasons, the present study was designed to assess TRAP+ cell distributions specifically in the dorsal DG.

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      We agree that prolonged tamoxifen administration results in labeling a heterogeneous population of abDGCs spanning approximately 0 to 5–7 weeks of age, rather than a precisely birth-dated cohort. This is a limitation of this approach and we have included discussion of this in more detail in the revised manuscript.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      We agree that mCitrine is not a marker that allows localization of hM4Di as it is well known that the mCitrine can be independently expressed in a Cre independent manner in this mouse. As suggested, we have removed the figure that showed the mCitrine and have performed immunohistochemical localization of the DREADD with an antibody against the HA tag. This is now shown in Figure 5.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      The goal of this study was to examine activity patterns of adult-born versus mature granule cells, rather than to assess maturation state. The adult-born neurons analyzed were 25–39 days old, an age at which point most cells have progressed beyond the DCX<sup>+</sup> stage and are expected to express NeuN based on prior work. We therefore do not think that including DCX or NeuN quantification would provide additional information relevant to the aims or interpretation of this study.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      We have updated Figure 2B, the Methods, and the main text to more explicitly localize this which it the boundary between the subgranular zone (SGZ) and the hilus.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      We have now added the cell number information to the figure legends. In Figures 2B and 2C, each point corresponds to a single cell, with an equal number of mice per group. The total number of TRAP<sup>+</sup> cells per mouse is shown in Figure 1F, which reports TRAP<sup>+</sup> cell densities by group.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      We made the DG-hilus boundaries clearer in the sample images to improve visualization and interpretation.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

      We apologize for the confusion here. The protocol used in Figure 6 is the same tamoxifen chow–based approach as in Figure 5, differing only in the duration of tamoxifen exposure. Mice in Figure 5 received tamoxifen chow for 7 weeks, whereas mice in Figure 6 received it for 4 weeks, restricting labeling to a younger and narrower cohort of adult-born DGCs. Thus, the population targeted in Figure 6 is younger than that in Figure 5 and does not correspond to mature 6–7-week-old neurons. By contrast, the experiment in Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells, which are Dock10-positive and express Cre endogenously, allowing selective manipulation of this later-stage population.

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

      Comments on revisions:

      I appreciate the authors' careful and thorough revisions. They have addressed all of my previous concerns satisfactorily, and the manuscript is now significantly strengthened. I have no further concerns.

      Reviewer #2 (Public review):

      In this study, the authors investigate how increasing cognitive demand shapes activity patterns in the dorsal dentate gyrus (DG). Using a touchscreen-based TUNL task combined with TRAP/c-Fos tagging, birth-dating of adult-born granule cells (abDGCs), and chemogenetic inhibition, they show that higher task demand increases mature granule cell (mGC) recruitment and enhances suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Functionally, mGC inhibition reduces overall activity and impairs performance without disrupting blade bias, whereas inhibition of {less than or equal to}7-week-old abDGCs increases mGC activity, abolishes blade bias, and impairs discrimination under high-demand conditions. These findings suggest that effective pattern separation depends not only on overall DG activity levels but also on the spatial organization of recruited ensembles.

      The integration of touchscreen TUNL with temporally controlled activity tagging and birth-dated cohorts is technically strong. Quantification of SB-IB bias and radial/apical distributions adds anatomical precision beyond bulk activity measures. The comparison between mGC and abDGC inhibition is conceptually compelling and supports dissociable functional roles. Overall, the data convincingly demonstrate that increasing cognitive demand amplifies blade-biased DG recruitment and that mGCs and abDGCs differentially contribute to both behavioral performance and network organization.

      However, how abDGCs are integrated into the mGC network under high cognitive demand remains unresolved. Additional experiments are needed to clarify how abDGCs shape spatial recruitment patterns and whether they directly inhibit or indirectly regulate mGC activity to maintain high performance.

      Furthermore, the authors frame "high cognitive demand" as a multidimensional construct encompassing broad behavioral challenge. It would strengthen the work to delineate how local abDGC-mGC circuit interactions regulate specific task components in real time. This will require higher temporal resolution approaches, as TRAP and c-Fos labeling integrate activity over prolonged windows and primarily reflect sustained engagement rather than moment-to-moment computations.

      The central conclusion that dentate function depends on coordinated spatial recruitment rather than total activity magnitude is supported by the data, although mechanistic interpretations should be tempered given methodological limitations.

      Overall, this work advances models of adult neurogenesis by emphasizing a critical-period modulatory role of abDGCs in organizing DG network activity during high-demand discrimination. The combined behavioral and circuit-level framework is likely to be influential in the field.

      Reviewer #3 (Public review):

      This study examines the role of dentate gyrus neuronal populations, reflecting neurogenesis and anatomical location (suprapyramidal vs infrapyramidal blade), in a mnemonic discrimination task that taxes the pattern separation functions of the dentate. The authors measure dentate gyrus activity resulting from cognitive training and test whether adult neurogenesis is required for both the anatomical patterns of activity and performance in the cognitive task. The authors find that more cognitively challenging variants of the task evoked more dentate activity, but also distinct patterns of activity (more activity in the suprapyramidal blade, less in the infdrapyramidal blade). Using chemogenetic approaches they silence mature vs immature dentate gyrus neurons and find that only mature neurons (either the general population or specifically mature adult-born neurons), and not immature adult-born neurons, are required for the difficult version of the task. Inhibition of mature adult-born neurons furthermore increased overall activity in the dentate and reduced the biased pattern of activity across the blades, consistent with evidence that adult-born neurons broadly regulate dentate gyrus activity.

      Comments on revisions:

      I appreciate the efforts the authors have taken to revise this manuscript. I have only minor concerns with this revised version of the manuscript:

      Methods state that significance is defined as P<0.05 but some results are interpreted as significant when P=0.05. Either the alpha value needs to change or the interpretation needs to change.

      We have corrected the statement in the Methods section to define statistical significance as P ≤ 0.05, which aligns with how significance was interpreted throughout the manuscript.

      I believe the statistical results for group and blade effects for the ANOVAs, in Figs 2,3 & 4, appear to be switched (blade should be significant, not group).

      We thank the reviewer for pointing out this mistake. We have corrected the reported statistical results for the group and blade effects in the manuscript accordingly.

      I appreciate that sometimes there is not a perfect overlap between immunohistochemical signals, but I continue to believe that the spatially-non-overlapping TRAP and EDU signals in Fig 3 is caused by these 2 markers being in different cells. A Z-stack or orthogonal projection could verify/disprove this concern.

      We agree that limited overlap in single optical sections can raise the possibility that TRAP and EdU signals originate from different cells. However, based on our imaging conditions and inspection across focal planes, the signals are consistent with being present within the same cells, with partial spatial separation likely reflecting subcellular localization and/or sectioning effects.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript presents a compelling new in vitro system based on isogenic co-cultures of human iPSC-derived hepatocytes and macrophages, enabling the modelling of hepatic immune responses with unprecedented physiological relevance. The authors show that co-culture leads to enhanced maturation of hepatocytes and tissue-resident macrophage identity, which cannot be achieved through conditioned media alone. Using this system, they functionally validate immune-driven hepatotoxic responses to a panel of drugs and compare the system's predictive power to that of monocyte-derived macrophages. The results underscore the necessity of macrophage-hepatocyte crosstalk for accurate modelling of liver inflammation and drug toxicity in vitro.

      The manuscript is clearly written and addresses a key limitation in liver organoid systems: the lack of immune complexity and tissue-specific macrophage imprinting. Nevertheless, several conclusions would benefit from a more careful interpretation of the data, and some important controls or explanations are missing, particularly in the flow cytometry gating strategies, stress marker validation, and cluster interpretations.

      Strengths:

      (1) Novelty and Relevance: The study presents a highly innovative co-culture system based on isogenic human iPSCs, addressing an unmet need in modelling immune-mediated hepatotoxicity.

      (2) Mechanistic Insight: The reciprocal reprogramming between iHeps and iMacs, including induction of KC-specific pathways and hepatocyte maturation markers, is convincingly demonstrated.

      (3) Functional Readouts: The application of the model to detect IL-6 responses to hepatotoxic compounds enhances its translational relevance.

      Weaknesses:

      (1) Several key claims, particularly those derived from PCA plots and DEG analyses, are overinterpreted and require more conservative language or further validation.

      We agree that PCA does not allow for maturation trajectories and mentioned that it was a hypothesis that the co-culture was promoting maturation, which we later validated by looking at the expression of key hepatocyte markers as well as by pearson correlation comparison with fetal hepatocytes.

      (2) The purity of sorted hepatocytes and macrophages is not convincingly demonstrated; contamination across gates may confound transcriptomic readouts.

      We agree and have highlighted and addressed this limitation in our discussion. Unfortunately, this is a limitation of bulk sequencing that a small amount of contamination might be present, however the TPM values of ALB for example in the iMacs is extremely low especially when compared to the hepatocytes, indicating that the level of contamination is likely to be very low. Likewise, the expression of CSF1R in the co-cultured iHeps is also extremely low. This has been included in Supp Fig 1F and G.

      (3) Stress response genes and ER stress/apoptosis signatures are not properly assessed, despite being potentially activated in the system.

      This has been included in Supp Fig 2C, where we’ve included the expression of ATF4, CASP3 and CASP9. Although there’s a significant difference in ATF4 expression between Day 0 and Day 7 iHep only/Co-culture, there is no significant difference between the Day 7 iHep only and Day 7 iHep Co-culture. There are no significant differences in CASP3 and CASP9 expression across all the samples.

      (4) Some figure panels and legends lack statistical annotations, and microscopy validation of morphological changes is missing.

      Although we agree that the morphology changes would be interesting, we think that this question is unfortunately outside of the scope of our question. Although Kupffer cells are in direct contact with hepatocytes, they migrate from the liver parenchyma into the sinusoidal spaces where they primarily reside. We do not think that the morphology would add much to the paper, especially given that this is a 2D model as well.

      (5) The co-culture model with monocyte-derived macrophages is not fully characterised, making comparisons less informative.

      Although we agree that it would be interesting to look more closely at the monocyte-derived macrophage co-cultures as well, we think that this would be more suited to a future study as the transcriptomic analysis would likely include confounding effects of patient specific transcriptomic changes, and our primary focus was on developing an isogenic co-culture system.

      Reviewer #2 (Public review):

      Summary:

      This study builds on work by Glass and Guilliams showing that mouse Kupffer cells depend on the surrounding cells, including endothelium, hepatocytes, and stellate cells, for their identity. Herein, the authors extend the work to human systems. It nicely highlights why taking monocyte-derived macrophages and pretending they are Kupffer cells is simply misleading.

      Strengths:

      Many, including human cells, difficult culture assays, and important new data.

      Weaknesses:

      This reviewer identified minor queries only, rather than 'weaknesses' as such.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors establish a human in vitro liver model by co-culturing induced hepatocyte-like cells (iHEPs) with induced macrophages (iMACs). Through flow cytometry-based sorting of cell populations at days 3 and 7 of co-culture, followed by bulk RNA sequencing, they demonstrate that bidirectional interactions between these two cell types drive functional maturation. Specifically, the presence of iMACs accelerates the hepatic maturation program of iHEPs, while contact-dependent cues from iHEPs enhance the acquisition of Kupffer cell identity in iMACs, indicating that direct cell-cell interactions are critical for establishing tissue-resident macrophage characteristics.

      Functionally, the authors show that iMAC-derived Kupffer-like cells respond to pathological stimuli by producing interleukin-6 (IL-6), a hallmark cytokine of hepatic immune activation. When exposed to a panel of clinically relevant hepatotoxic drugs, the co-culture system exhibited concentration-dependent modulation of IL-6 secretion consistent with reported drug-induced liver injury (DILI) phenotypes. Notably, this response was absent when hepatocytes were co-cultured with monocyte-derived macrophages from peripheral blood, underscoring the liver-specific phenotype and functional relevance of the iMAC-derived Kupffer-like cells. Collectively, the study proposes this co-culture platform as a more physiologically relevant model for interrogating macrophage-hepatocyte crosstalk and assessing immune-mediated hepatotoxicity in vitro.

      Strengths:

      A major strength of this study lies in its systematic dissection of cell-cell interactions within the co-culture system. By isolating each cell type following co-culture and performing comprehensive transcriptomic analyses, the authors provide direct evidence of bidirectional crosstalk between iMACs and iHEPs. The comparison with single-culture controls is particularly valuable, as it clearly demonstrates how co-culture enhances functional maturation and lineage-specific gene expression in both cell types. This approach allows for a more mechanistic understanding of how hepatocyte-macrophage interactions contribute to the acquisition of tissue-specific phenotypes.

      Weaknesses:

      (1) Overreliance on bulk RNA-seq data:

      The primary evidence supporting cell maturation is derived from bulk RNA sequencing, which has inherent limitations in resolving heterogeneous cellular states and functional maturation. The conclusions regarding hepatocyte maturation are based largely on increased expression of a subset of CYP genes and decreased AFP levels - markers that, while suggestive, are insufficient on their own to substantiate functional maturation. Additional phenotypic or functional assays (e.g., metabolic activity, protein-level validation) would significantly strengthen these claims.

      We have added a discussion on the limitations of our study.

      (2) Insufficient characterization of input cell populations:

      The manuscript lacks adequate validation of the cellular identities prior to co-culture. Although the authors reference previously published protocols for generating iHEPs and iMACs, it remains unclear whether the cells used in this study faithfully retain expected lineage characteristics. For example, hepatocyte preparations should be characterized by flow cytometry for ALB and AFP expression, while iMACs should be assessed for canonical macrophage markers such as CD45, CD11b, and CD14 before co-culture. Without these baseline data, it is difficult to interpret the magnitude or significance of any co-culture-induced changes.

      We apologise for this oversight, some of the markers were used in determining the purity of the iMacs before co-culture, and we did not end up including these plots for brevity. We have added the purity plots in Supp Fig 2E now, showing that the iMacs were more than 90% pure before co-culture. We acknowledge the concern about cross-contamination for bulk sequencing, and have added in Supp Fig 2G and H the expression of ALB in the iMac fraction, as well as the expression of CSF1R in the iHep fraction, showing minimal contamination with our gating strategy.

      (3) Quantitative assessment of IL-6 production is insufficient:

      The analysis of drug-induced IL-6 responses is based primarily on relative changes compared to control conditions. However, percentage changes alone are inadequate to capture the biological relevance of these responses. Absolute cytokine production levels - particularly in response to LPS stimulation - should be reported and directly compared to PBMC-derived macrophages to determine whether iMAC-derived Kupffer-like cells exhibit enhanced cytokine output. Moreover, the Methods section should clearly describe how ELISA results were normalized or corrected to account for potential differences in cell number, viability, or culture conditions.

      We apologise if this was unclear. The cytokine production from dosed cells was normalized based on the viability of cells measured from the same well.

      (4) Unclear mechanistic interpretation of IL-6 modulation:

      The observed changes in IL-6 production upon drug treatment cannot be interpreted solely as evidence of Kupffer cell-specific functionality. For instance, IL-6 suppression by NSAIDs such as diclofenac is well known to result from altered prostaglandin synthesis due to COX inhibition, while leflunomide's effects are linked to metabolite-induced modulation of immune cell proliferation and broader cytokine networks. These mechanisms are distinct from Kupffer cell identity and may not directly reflect liver-specific macrophage function. Consequently, changes in IL-6 secretion alone - particularly without additional mechanistic evidence or analysis of other cytokines - are insufficient to conclude that co-culture with hepatocytes drives the acquisition of bona fide Kupffer cell maturity.

      We fully agree with the reviewer and have highlighted this in our discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) GSE ID for RNA-seq data has not been provided.

      This has been included.

      (2) Line 291: Can the authors specify what they mean by "state-of-the-art"?

      What we mean here is what others in the field have also recently described. We have rewritten this to be clearer.

      (3) Lines 299-300: check sentence for grammar mistakes.

      We have rewritten and clarified this.

      (4) Figure 1B: The PCA does not really allow for following maturation trajectories. Also, all samples (day 3 Co-iHep, day 7 Co-iHep, day 7 iHep) look as if they cluster more or less together. Therefore, the conclusion drawn in lines 303-305 does not hold. Why is day 3 iHep not also shown here?

      We agree that PCA does not allow for maturation trajectories and mentioned that it was a hypothesis that the co-culture was promoting maturation, which we later validated by looking at the expression of key hepatocyte markers as well as by pearson correlation comparison with fetal hepatocytes.

      (5) Can the authors show that the cells that they are sorting in the double negative gate are indeed hepatocytes? Typically, these cells are big in cell size; therefore, showing the FSC/SSC gate would also be important.

      We have added the FSC/SSC gate in supp fig. 1E to show that the populations have different sizes.

      (6) Can the authors provide microscopy pictures of iHeps, iMacs, and the co-cultured cells for the reader to appreciate whether the morphology of cells already changes during the co-culture experiments?

      Although we agree that the morphology changes would be interesting, we think that this question is unfortunately outside of the scope of our question. Although Kupffer cells are in direct contact with hepatocytes, they migrate from the liver parenchyma into the sinusoidal spaces where they primarily reside. We do not think that the morphology would add much to the paper, especially given that this is a 2D model as well.

      (7) Please show expression of apoptotic and ER stress genes comparing Day7 iHeps and Co-iHeps, since genes such as c-Fos and Ppp2r3b can also be associated with cellular stress.

      This has been included in Supp Fig 2C, where we’ve included the expression of ATF4, CASP3 and CASP9. Although there’s a significant difference in ATF4 expression between Day 0 and Day 7 iHep only/Co-culture, there is no significant difference between the Day 7 iHep only and Day 7 iHep Co-culture. There are no significant differences in CASP3 and CASP9 expression across all the samples.

      (8) In addition to the genes shown in Figure 1E, could the authors extract a longer gene list of maturing hepatocytes and display them all in bar graphs or heatmaps, or similar? E.g., Albumin expression is shown later, but why not show it already here?

      There are not many differences in the canonical hepatocyte markers, which is why we chose only to show the interesting genes that were different, as seen in the later ALB expression plot where there wasn’t a difference in ALB expression after 7 days of co-culture. Instead, we have included a new heatmap in Supp Fig 2B showing the top 40 genes that are contributing to the similarity by pearson correlation.

      (9) Along these lines, how do the authors ensure that they are culturing only hepatocytes and do not have a mixture of cells that may "dilute" the hepatocyte signature?

      Unfortunately, this is an limitation of our methodology, although the expression of key hepatic markers are routinely confirmed by qPCR to ensure that the majority of the cells are hepatocyte-like.

      (10) Lines 347-350: similar to the interpretation of the PCA for hepatocytes, this is a completely random interpretation. The expression of ALB in the co-cultured iMacs indicates that there are some hepatocytes that ended up in the macrophage gate.

      We agree and have highlighted and addressed this limitation in our discussion. Unfortunately, this is a limitation of bulk sequencing that a small amount of contamination might be present, however the TPM values of ALB for example in the iMacs is extremely low especially when compared to the hepatocytes, indicating that the level of contamination is likely to be very low. Likewise, the expression of CSF1R in the co-cultured iHeps is also extremely low. This has been included in Supp Fig 1F and G.

      (11) Figure 2D: Among the pathways shown, there are also stress pathways (acute phase response, HMGB1). Also for these cells, control of apoptotic and ER stress signatures is necessary.

      As mentioned, we have included some stress genes in Supp Fig 2C to address this.

      (12) Lines 385-386: Why would FCGRA3 indicate tissue residency? Is there literature to support this statement?

      CD16 is a marker often used to distinguish Kupffer cells from the surrounding cells, although it also expressed by non-classical monocytes, we have clarified the text here (Lines 356-357).

      (13) Figure 3E: ALB and other genes were at the same or even lower levels expressed in D7 compared to D3. Why is that? Are the cells starting to de-differentiate after 7 days? Please discuss.

      This is a very interesting question that we were wondering ourselves as well, although sadly we do not have an answer yet. We hypothesized that this might be due to the activation of cell proliferation/developmental programmes as the cells are kept longer together, as shown by the expression of morphogens like OSM and IGF-2 after co-culture. We have added some discussion for this (Lines 532-540)

      (14) Line 459: Word "in" is double

      We thank the reviewer for catching this, this has been corrected

      (15) Figure 5: The findings are interesting, but the co-culture model remains somewhat unclear. Can the authors show, e.g., using qRT-PCR, how hepatocytes are developing in this culture system? If the development with monocyte-derived macrophages is altered, then one would expect that also the cellular response is different.

      We agree with the reviewer, but we think that this question would be better answered in a follow-up study. We were looking to answer if the addition of isogenic iMacs would change the drug response of iHeps, and were using the PBMC-derived macrophages here as a control. A more complete study taking into account the genetic background of the donor PBMC-derived macrophages would be much more informative, but sadly outside of the scope of our present study.

      (16) Lines 482-484: The authors talk about LPS-treated cultures and refer to Figure 4. However, there is no graph shown for LPS.

      We apologise for being unclear here, but the co-cultures were co-treated with LPS during the drug stimulation assays, as it had been shown that LPS increases the sensitivity of the liver toward hepatotoxic drugs. We have clarified this in the main text (Lines 435-437).

      Reviewer #2 (Recommendations for the authors):

      (1) It would be nice to add some protein production by the hepatocytes. For example, can they produce albumin or some other protein that can be measured? Perhaps I missed this.

      The protein expression of Albumin and Urea were assessed in the hepatocytes prior to co-culture in Supp Fig 1C; however we did not measure the protein level changes after co-culture as the co-culture would have a significant number of macrophages as well which we thought might affect the readout. Instead, after co-culture the primary analysis was done on the RNA levels of ALB and other cytochrome genes after sorting in Fig 3.

      (2) Was there an increase in hepatocyte number? Did one cell outgrow the other, or did they maintain numbers?

      The relative proportion of the iHeps remained the same, although we did see an expansion in the iMac population after 7 days by flow cytometry in Fig 1D.

      (3) What happens if the iMACs and the iHeps are grown in Costar chambers with pore sizes too small to allow for cell contact, but allowing supernatant to be continuously exposed to both cell types?

      We were primarily focused on the acquisition of KC-like phenotype in the iMacs with regards the question of direct contact, which was why we chose to use conditioned iHep media as part of the iMac experimental set up. However, it would be very interesting to see if the converse is also true, and whether secreted factors from the iMacs alone would be sufficient to drive the changes we observed in the iHeps after co-culture in a follow-up study.

      (4) The discussion could use a brief paragraph on some limitations and what could be added to the co-culture system. For example, could stellate cells and sinusoidal endothelium also impart KC identity? Would growing KCs on endothelium provide a more natural substratum?

      Once again, these are very interesting questions which are unfortunately outside of the scope of our study. However, we have included a short section discussing this in the paper, as we do think that it would be interesting to look at iMacs educated by hepatocyte vs stellate cells for example (Lines 530-536).

      (5) The axonal guidance pathway in early iMACs is interesting. A recent report in vivo showed that macrophages migrate from the liver parenchyma into the sinusoids in neonates when they are still immature. The process could be chemotaxis, or it could be repulsion by parenchyma. Numerous axonal guidance molecules are repulsive, pushing axons away (robo/slit, etc). The migration of Kupffer cells into sinusoids could be a repulsive rather than a chemoattractant pathway. Did the RNA seq data provide any interesting molecules in this regard?

      Reviewer #3 (Recommendations for the authors):

      This manuscript presents a conceptually well-designed approach to modeling hepatocyte-macrophage crosstalk in vitro. The authors develop a co-culture system aimed at recapitulating key aspects of Kupffer cell (KC) identity and hepatocyte maturation. The data convincingly show that macrophages acquire KC-like features under co-culture conditions. However, several major issues limit the strength of the conclusions, the depth of mechanistic insight, and the translational impact of the work.

      First, the study relies heavily on bulk RNA-seq data with minimal functional or protein-level validation - particularly for hepatocyte maturation. To substantiate claims of functional maturation, additional assays measuring albumin secretion, urea production, and CYP activity are essential. Furthermore, the omission of zonation-associated markers (e.g., GLUL, CPS1, CYP2E1) leaves a critical gap in assessing whether the iHEPs achieve physiologically relevant functional states.

      Second, statistical interpretation and reporting are inconsistent. Significant and non-significant findings are frequently conflated, which risks overinterpretation. For instance, the reported reduction in HNF4A expression is not statistically significant, and AFP expression is only significantly reduced in Day 7 co-iHEPs - yet these distinctions are not clearly stated.

      Third, although the authors emphasize the role of cell-cell contact in promoting KC identity, no experiments (e.g., transwell separation, adhesion-blocking assays) directly test this claim. As a result, the mechanistic basis for this conclusion remains speculative.

      Finally, while the data support enhanced macrophage differentiation toward a KC-like phenotype, the evidence that co-culture significantly promotes hepatocyte maturation is far less convincing and requires additional functional, mechanistic, and statistical validation before firm conclusions can be drawn.

      Minor comments:

      (1) Methodology: The choice of a 2.5:1 iHEP:iMAC ratio is not justified. This proportion does not reflect physiological hepatocyte-to-KC ratios in vivo and should be either rationalized or benchmarked against native liver composition.

      We admit that the ratio here is on the higher side of things, but it has been previously reported that there can be between 20 to 40 macrophages per 100 hepatocytes (1:5 to 1:2.5) in the adult mouse liver (Baratta et al., 2009), while admittedly in the developing mouse liver the ratio is closer to 1:4 (Lopez et al., 2011). We chose 1:2.5 as we anticipated that not all of the macrophages would be able to attach, and would thus be lost during media change, as evident by the flow cytometry of the co-culture on Day 3 of the co-culture, where only 20% of the cells had clear CD45 and CD14 expression. We have clarified our methodology in paper (Lines 141-143).

      (2) Effect of iMAC on iHEP (Section 3.2, Supplementary Figure 1E):

      (2.1) The authors should explain why Day 3 co-cultured iHEPs show stronger transcriptomic similarity to primary hepatocytes than Day 7 cells. Possible biological mechanisms (e.g., transient paracrine signaling or temporal changes in maturation dynamics) should be discussed.

      We have added some discussion for this (Lines 309-311, 536-540).

      (2.2) The figure legend refers to "fetal hepatocytes," while the correlation map states "hepatocytes." This discrepancy must be clarified. Moreover, if fetal hepatocytes are used as the reference, and the goal is to assess maturation, comparisons to adult hepatocytes are necessary. 

      The comparison was done against fetal hepatocytes, and has been clarified in the figure. We chose to use fetal hepatocytes here as it would be unfair to compare iPSC-derived cells that are less than 3 weeks old to adult human tissue, and any similarity or differences between the mono/co-cultures to the adult tissue might be due to the shifting transcriptomic landscape during development. However, we do recognise the nuanced nature of using “maturation” here, and what we mean is that the iPSC-derived cells become more similar to their in-vivo counterparts.

      (2.3) Baseline characterization of both cell types before co-culture is insufficient. For iHEPs, flow cytometry data on ALB and AFP positivity rates should be presented, along with post-co-culture changes. For iMACs, marker expression (CD45, CD11b, CD14) should be shown before and after co-culture. The methods mention CD163, CX3CR1, and CD11b, but these data are absent from the results. Additionally, the gating strategy for cell sorting prior to bulk RNA-seq must be clearly described - including how potential cross-contamination of cell fractions (e.g., macrophages in the hepatocyte population) was excluded.

      We apologise for this oversight, some of the markers were used in determining the purity of the iMacs before co-culture, and we did not end up including these plots for brevity. We have added the purity plots in Supp Fig 2E now, showing that the iMacs were more than 90% pure before co-culture. We acknowledge the concern about cross-contamination for bulk sequencing, and have added in Supp Fig 2G and H the expression of ALB in the iMac fraction, as well as the expression of CSF1R in the iHep fraction, showing minimal contamination with our gating strategy.

      (3) IGF2 Expression: The observed upregulation of IGF2, a fetal marker, contradicts the conclusion that co-culture promotes hepatocyte maturation. This inconsistency should be addressed, and possible explanations (e.g., transient fetal-like activation driven by macrophage-derived signals) discussed. The lack of statistical significance for this finding must also be explicitly noted.

      We thank the reviewer for pointing this out. The expression of IGF2 was actually significantly different when comparing the Day 0 Hepatocyte only and Day 7 Hepatocyte only to the Day 3 Co-cultured Hepatocytes, but the significance is lost with the Day 7 co-cultured Hepatocytes. One possible explanation is as the reviewer suggested, that there is a transient program that is activated upon co-culture that is subsequently downregulated. We have updated the figure and text, and added some discussion to reflect this (Lines 309-311, 536-540).

      (4) Effect of iHEP on iMAC: The reported upregulation of KC-related genes is overstated. Changes in LYVE1 and ID1 are not statistically significant (Figure 2G), yet they are presented as meaningful. Clear separation of statistically significant results from non-significant trends is critical to avoid overinterpretation.

      We apologise for this, as it was never our intention to present these markers as significant, but rather we presented these markers because we thought that these markers would be of interest to the audience. We have clarified the text to reflect that these are trends and non-significant (Lines 367-369).

      (5) Mimicking In Vivo Clinical Responses:

      (5.1) The authors' conclusion that IL-6 responses are not recapitulated when iMACs are replaced by monocyte-derived macrophages (MoMs) is not fully supported by the data presented. In fact, the MoM co-cultures exhibit a noticeable trend toward increased IL-6 production (e.g., approximately 150% with LTG at 66.6 µM and 400 µM), suggesting that some degree of responsiveness is retained. To substantiate the claim that the observed cytokine modulation is unique to iKC-containing co-cultures, the authors should perform direct statistical comparisons of absolute IL-6 secretion levels between iKC and MoM co-cultures at each drug concentration. Such analyses are essential to determine whether the differences are statistically significant and biologically meaningful, and to clarify whether the observed effects truly reflect KC-specific functionality rather than general macrophage activation.

      (5.2) The effects of drug exposure on hepatocytes themselves are not addressed. It is important to evaluate whether the co-culture remains viable under treatment, whether it recovers after drug withdrawal, and whether there is evidence of cytotoxicity or irreversible phenotypic loss.

      (6) Interpretation of IL-6 Modulation and Model Specificity:

      The authors show that IL-6 secretion in their co-culture system varies in response to multiple hepatotoxic drugs and parallels some reported clinical trends - notably, a concentration-dependent decrease with diclofenac (DIC) and leflunomide (LFM). They further report that this pattern is not observed in hepatocyte-PBMC-derived macrophage co-cultures, and they conclude that iMAC/iKC-like cells are essential for capturing immune-mediated hepatotoxic responses. However, the data presented do not fully justify such a conclusion. Several key mechanistic issues weaken the interpretation:

      (6.1) Mechanistic ambiguity in the DIC response: The decrease in IL-6 following DIC exposure is most likely attributable to reduced prostaglandin E₂ (PGE₂) production via COX inhibition, which secondarily suppresses IL-6 signaling. This effect is a general pharmacological property of NSAIDs and is not necessarily reflective of Kupffer cell-specific pathways. Direct evidence - such as prostanoid quantification or PGE₂ rescue experiments - is required to establish that the observed effects are liver-specific rather than nonspecific NSAID responses.

      (6.2) Pharmacogenetic complexity in the LFM response: LFM-induced hepatotoxicity is highly variable and largely dependent on CYP2C9 polymorphisms, which determine conversion to the active metabolite teriflunomide. Because hepatotoxicity and the associated cytokine responses are not universal among patients, a simplified co-culture model lacking metabolic diversity cannot be assumed to faithfully reproduce patient-specific immune responses. The observed IL-6 suppression could arise from differences in metabolic activation, intracellular exposure, or indirect signaling changes rather than from intrinsic KC-specific mechanisms.

      These points significantly undermine the authors' claim that IL-6 modulation provides definitive evidence of model specificity or predictive value. At minimum, the manuscript should (i) explicitly acknowledge these mechanistic limitations, (ii) include supporting data such as prostanoid profiling, CYP2C9 modulation, or teriflunomide quantification, and (iii) temper its claims regarding the model's capacity to recapitulate immune-mediated hepatotoxicity. Without such evidence, the current interpretation risks overstating the functional significance and translational relevance of the co-culture system.

      We fully agree with the reviewer and have highlighted this in our discussion (Lines 540 – 551).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The analysis of neural morphology across Heliconiini butterfly species revealed brain area specific changes associated with new foraging behaviours. While the volume of the centre for learning and memory, the mushroom bodies, was known to vary widely across species, new, valuable results show conservation of the volume of a center for navigation, the central complex. The presented evidence is convincing for both volumetric conservation in the central complex and fine neuroanatomical differences associated with pollen feeding, delivered by experimental approaches that are applicable to other insect species. This work will be of interest to evolutionary biologists, entomologists, and neuroscientists.

      Many thanks for your assessment and time handling this manuscript. We value the constructive input of both reviewers and believe that the result is an improved publication.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors previously reported that Heliconius, one genus of the Heliconiini butterflies, evolved to be efficient foragers to feed pollen of specific plants and have massively expanded mushroom bodies. Using the same image dataset, the authors segmented the central complex and associated brain regions and found that the volume of the central complex relative to the rest of the brain is largely conserved across the Heliconiini butterflies. By performing immunostaining to label a specific subset of neurons, the authors found several potential sites of evolutionary divergence in the central complex neural circuits, including the number of GABAergic ellipsoid body ring neurons and the innervation patterns of Allatostatin A expressing neurons in the noduli. These neuroanatomical data will be helpful to guide future studies to understand the evolution of the neural circuits for vector-based navigation.

      We thank Reviewer 1 for the constructive feedback and criticism, which will have strengthened this publication.

      Strengths:

      The authors used a sufficiently large scale of dataset from 307 individuals of 41 species of Heliconiini butterflies to solidify the quantitative conclusions and present new microscopy data for fine neuroanatomical comparison of the central complex.

      Weaknesses:

      (1) Although the figures display a concise summary of anatomical findings, it would be difficult for non-experts to learn from this manuscript to identify the same neuronal processes in the raw confocal stacks. It would be helpful to have instructive movies to show a step-by-step guide for identification of neurons of interest, segmentations, and 3D visualizations (rotation) for several examples, including ER neurons (to supplement texts in line 347-353) and Allatostatin A neurons.

      We approached this with the following logic:

      All 3D segmentations were animated, to illustrate how they are generated from raw imaging data. This means we are providing a video file for each major species group (Heliconius/outgroup-Heliconiini) for Figure 4 (general CX anatomy), Figure 7 (ER neuron projections), Figure S5 (ER neuron/bulb anatomy). This visual connection should help the reader relate 3D segmentations to image stacks. We have also added a reference to these videos in the relevant Figure captions.

      We also annotated image stacks, but did so selectively. We annotated key stacks of Figure 4 (general CX anatomy), Figure 7 (ER neuron projections), Figure S5 (ER neuron/bulb anatomy) and include a reference in figure caption to them.

      We refrained from annotating stacks of Figures 5, 6, 8 and S4. This is because we believe that the annotations we have performed in the figure panels will be sufficient for readers interested in the finer detail of these anatomies who are familiar with general CX anatomy.

      We believe that our approach will help the reader to gain a visual illustration of those parts of the manuscript which report key results and novel insights, such as ER neuronal variation, and that the data and figures collectively provide accessible information sufficient for this purpose.

      Text changes in Figure captions 4, 7 and S5: “See animated 3D segmentations and annotated stacks in file repository.”

      (2) Related to (1), it was difficult for me to assess if the data in Figure 7 support the author's conclusions that ER neuron number increased in Heliconius Melpomene. By my understanding, the resolution of this dataset isn't high enough to trace individual axons and therefore authors do not rule out that the portion of "ER ring neurons" in Heliconius may not innervate the ER, as stated in Line 635 "Importantly, we also found that some ER neurons bypass the ellipsoid body and give rise to dense branches within distinct layers in the fan-shaped body (ER-FB)". If they don't innervate the ellipsoid body, why are they named as "ER neurons"?

      Thanks for pointing to this. We believe this is primarily a nomenclature issue but have tried to specify in the text.

      Ultimately, neurons from this group that project to the EB forming the actual ring neurons and those that project to the FB with unclear function, thus far, emerge through the same lineage, DALv2 (as determined by Kandimalla et al 2023) and therefore have common developmental origin (also noted by Homberg et al 2018). To acknowledge their common developmental origin and to simplify nomenclature, and therefore also provide easier comprehension by non-experts, we specify which DALv2 progeny project to which areas, but refer to both adult neuron populations to “ER neurons”. We have changed the following text to acknowledge our definition specifically, which we hope mitigates the understandable confusion.

      Lines 354-357: “Here, we refer to these neurons, as well as those neurons projecting to the fan-shaped body (GU neurons in [66]), as ER neurons due to their common developmental origin [45,66] and to simplify anatomical descriptions.”

      Lines 386-387: “Whether these ER neurons solely branch in the fan-shaped body, as shown for GU neurons elsewhere [66] or have additional side branches entering the ellipsoid body is not clear.”

      (3) Discussions around the lines 577-584 require the assumption that each ellipsoid body (EB) ring neuron typically arborises in a single microglomerulus to form a largely one-to-one connection with TuBu neurons within the bulb (BU), and therefore, the number of BU microglomeruli should provide an estimation of the number of ER neurons. Explain this key assumption or provide an alternative explanation.

      Thanks for this. We do not think that our hypothesis necessarily requires any specific assumptions regarding the ratio of microglomerulus to ER or TuBu neurons. Even in Drosophila the ratio of ER to MG is only approximately 1:1, as some microglomeruli seem to combine into one. In other species this relationship might be very different. Indeed, our data suggests that in outgroup-Heliconiini the ratio is 4.4 microglomeruli to 1 ER neuron, and in Heliconius it is 3.4. However, as these MG numbers are extrapolated and cannot be precisely counted, they may be too imprecise to come to a definite conclusion, hence why we do not mention this in the text. Importantly, extrapolation in the current form is a valid additional way for us to describe overall bulb anatomy (next to bulb volume, average microglomerulus size).

      In any case, the inference we make here is that a conserved bulb anatomy in volume, MG numbers and size supports our assumption that the additional neurons in the ER neuron group/DALv2 progeny do not arborize in the bulb, but do so in the SMP/SLP region and in the fanshaped body. We believe we have described this inference accurately in the current manuscript.

      An additional point, not mentioned in the manuscript, but emerging through lineage annotations of connectome data, is that some DALv2 progeny have been identified as MBONs as well as being GABA-ergic, which could potentially be the ER-FB neurons that we describe (Schlegel et al 2024 Nature). We refrain from mentioning this here, as its too speculatory, but we thought the reviewer may be interested in this observation.

      (4) The details of antibody information are missing in the Key resource table. Instead of citing papers, list the catalogue numbers and identifier for commercially available antibodies, and describe the antigen, and whether they are monoclonal or polyclonal. Are antigens conserved across species?

      We have now added substantial information to Table 2, including research resource identifiers (RRIDs) and antigen descriptions, as well as information about specificity and conservation. In the text itself, in line 757, we already provide publications that have illustrated conservation very extensively.

      We believe that with the additional information provided in Table 2, all necessary information is now provided.

      (5) I did not understand why authors assume that foraging to feed on pollens is a more difficult cognitive task than foraging to feed on nectar. Would it be possible that they are equally demanding tasks, but pollen feeding allows Heliconius to pass more proteins and nucleic acids to their offspring and therefore they can develop larger mushroom bodies?

      This is an excellent point. Our current understanding is that pollen feeding is a cognitively more demanding task, because, a) the density of pollen resources is lower than nectar resources, and b) the competition for pollen is higher (pollen is depleted quickly, and Heliconius compete with each other, and other taxa including hummingbirds). There is therefore a benefit to high foraging efficiency, which favours the evolution of learning. This is likely reinforced by the long lives of Heliconius which live up to a year, compared to ~4 weeks for most outgroups and the temporal stability of major pollen resources, resulting in a memorised location providing benefit for the long periods of time (Young and Montgomery 2020 Proc B).

      We now refer to an additional publication (Young and Montgomery 2020 Proc B) in lines 103-104 for a fuller description of the ecology of pollen feeding, and in the current manuscript simply focus on the impact of mushroom body expansion on the CX.

      Reviewer #2 (Public review):

      Summary:

      In this study, Farnsworth et al. ask whether the previously established expansion of mushroom bodies in the pollen foraging Heliconius genus of Heliconiini butterflies co-evolved with adaptations in the central complex. Heliconius trap line foraging strategies to acquire pollen as a novel resource require advanced spatial memory mediated by larger mushroom bodies, but the authors show that related navigation circuits in the central complex are highly conserved across the Heliconiini tribe, with a few interesting exceptions. Using general immunohistochemical stains and 3D reconstruction, the authors compared volumes of central complex regions, and unlike the mushroom bodies, there was no evidence of expansion associated with pollen feeding. However, a second dataset of neuromodulator and neuropeptide antibody labeling reveals more subtle differences between pollen and non-pollen foragers and highlights sub-circuits that may mediate species-specific differences in behavior. Specifically, the authors found an expansion of GABAergic ER neurons projecting to the fanshaped body in Heliconius, which may enhance their ability to path-integrate. They also found differences in Allatostatin A immunoreactivity, particularly increased expression in the noduli associated with pollen feeding. These differences warrant closer examination in future studies to determine their functional implication on navigation and foraging behaviors.

      We thank Reviewer 2 for the constructive and thorough review. We believe that addressing these criticisms will have improved this publication.

      Strengths:

      The authors leveraged a large morphological data set from the Heliconiini to achieve excellent phylogenetic coverage across the tribe with 41 species represented. Their high-quality histology resolves anatomical details to the level of specific, identifiable tracts and cell body clusters. They revealed differences at a circuit level, which would not be obvious from a volumetric comparison. The discussion of these adaptations in the context of central complex models is useful for generating new hypotheses for future studies on the function of ER-FB neurons and the role of Allatostatin A modulation in navigation.

      The conclusions drawn in this paper are measured and supported by rigorous statistics and evidence from micrographs.

      Weaknesses:

      The majority of results in this study do not reveal adaptations in the central complex associated with pollen foraging. However, reporting conserved traits is useful and illustrates where developmental or functional constraints may be acting. The implied hypothesis in the introduction is that expansion of mushroom bodies in Heliconius co-evolved with central complex adaptations, so it may be helpful to set up the alternate hypotheses in the beginning.

      Thank you for this relevant comment. We have added to the text in lines 124-128, as follows

      “Indeed, these circumstances permit us to test the hypotheses that modifications in the mushroom bodies either occurred in isolation from other integrative centres, or that they occurred in concert with specific changes in centres, such as the central complex. This provides insights into the functional flexibility of two interacting, integrative centres across evolutionary time.”

      In the main text, the authors describe differences in GABAergic neurons "across several species" but only one Heliconius and one outgroup species seem to be represented in the figures. ER numbers in Figure 7H are only compared for these two species. If this data is available for other species, it would strengthen the paper to add them to the analysis, since this was one of the most intriguing findings in the study. I would want to know if the increased ER number is a trend in Heliconius or specific to H. melpomene.

      This points to imprecise phrasing. We indeed have additional data in other species, but unfortunately not to an extent that would permit quantification of cell numbers, which is why we chose to put these data into the supplement, Fig. S4.

      We modified the text to more directly point at the additional data in Fig S4, now reading in lines 362-368

      “…, we noticed a pronounced difference in a portion of projections leading into the fan-shaped body and a strong difference in signal inside layer III in our two focal species H. Melpomene and D. iulia, as well as other representatives of the Heliconiini tribe (Figure S4A-B, Figure 7). To understand how these differences could have occurred, we quantified ER neuron numbers in our focal species, and identified a significant difference, reflecting a 35% increase in Heliconius (t = 4.221, P = 0.004; Figure 7H).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Add a detailed description about each of the tiff files that were deposited at https://doi.org/10.5281/zenodo.15304965. It was hard for me to relate these raw images with the Figure panels. For instance, "Melp_GAD_26-F_detailed_conc.tif" in the Figure 7 folder seems to be used to make Figure 7L and N, but that information is cryptic.

      We agree with the reviewer. We added further descriptions, and have created a detailed readme file which explains which original file refers to which figure. Together with the efforts for Reviewer 1’s first comment, we hope that this updated version of our repository is easier to understand.

      In addition, we made additional changes in image orientation in some of the files supplied, and which were originally incorrect.

      (2) Add descriptions about the dataset for large-scale volumetric analysis. With the current methods and texts, it is hard to understand what kinds of staining and microscopes were used. I initially thought that they could be micro-CT data.

      We have made two improvements:

      We have added an additional readme file to explain the different datasets, and which datasets were used for each figure, to relate them to the original data deposited at zenodo.org (see your previous comment).

      We have added descriptions in several places in the manuscript file, i.e.

      Lines 133-135, now reading “To assess evidence of volumetric changes in the central complex and associated neuropils, we drew data from a large dataset of immunostained brains from 307 individuals of 41 species, …”

      Lines 144-149, now reading “We used a combination of phylogenetic comparative analysis across a large dataset of brains immunostained against the structural marker synapsin in 41 species and 307 individuals, and more targeted sampling of species that represent the behavioural and neuroanatomical diversity of Heliconiini for more fine-scale assessments of patterns of divergence in substructures of the CX with various antibodies (Figure 1A-B).”

      (3) Line 275: Non-expert readers would need an explanation about what the gamma lobe is.

      Agreed and added in line 273

      “Some of the ventral projections seemed to directly originate from the γ lobe, a portion of the mushroom body, thus potentially labelling projections of mushroom body output neurons into the fan-shaped body (Figure 5a-c) [12,21].”

      (4) Figures 4 I-L are missing.

      We modified the figure caption accordingly, and address annotated differences more directly. This section now reads

      “G/H: Labelling reveals two distinguishable layers in the fan-shaped body while additional staining elsewhere reveals further detail (arrows in G/H-2/3). Thicker tract conflations indicate the columnar architecture determined through the four columnar neuron bundles (arrowheads in G/H-3). Labelling in the EB reveals two pronounced layers (arrows in G/H-1/2), while obvious columns could not be indicated. PB protocerebral bridge, FB fan-shaped body, EB ellipsoid body. A anterior, P posterior. Scale bars are 50 μm.”

      (5) In the current version of Figure 1B, AOTU is displayed with the mushroom body. The authors can emphasize its relation to the central complex by showing it on the right side of panels together with the central complex.

      Great suggestion. We have done this now. We have kept the AOTU at the scale of the MB, indicated by the different scale bars of the bottom of the figure, as we’re showing the CX at a slightly larger scale.

      (6) Figure 1C: What do the colors of the lines represent?

      We now changed these colours so that they correspond to the colours chosen in Figures 2 and S2 as well as in a previous publication of the lab, added an asterisk next to Heliconius aoede, and added text to the figure legend:

      “Colour indicates focal groups here and elsewhere [29]. The asterisk at the branch of H. aoede indicates a secondary loss of pollen feeding.”

      (7) Figures 2A and B: What does the size of the circles represent? I guess that small ones are individuals, and larger ones are species averages. Plots with only species averages would be easier to see. It is difficult to distinguish Heliconius and Helicononius aoede in these panels. It would be easier if Heliconius circles were outlined with thin black lines. 

      Thanks for this. We wanted to keep both the averages and individual data points in one figure, as to not overcrowd the manuscript with additional figures. We still hope that the changes we made address the confusion sufficiently. We made the following modifications to Figure 2 and S1 and S2:

      (1) Added text in the figure legend clarifying what solid and transparent circles indicate (“Solid data points indicate species averages, while opaque circles indicate individual data points.”)

      (2) Added, as suggested, additional contours, to all Heliconius data points, and added corresponding text to the legend (“Black contours indicate Heliconius sp. data points.”)

      (3) Changed opacity settings of individual data points.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 391 and Methods. It was unclear how the extrapolated microglomeruli numbers were calculated. Please clarify this in the methods.

      Agreed. We substantially modified the text to address this.

      Lines 392-396: “We generated high resolution images of the bulb to determine its size (Figure S5 C-F), and 3D segmented seven microglomeruli per individual with which we generated an extrapolated approximation of total microglomeruli number by dividing bulb volume with average microglomerulus volume. This was necessary as most microglomeruli were not discernible from each other (Figure S5 G-H).”

      Lines 862-873: “To segment the bulb, we created high resolution images and were particularly careful to only segment the area of the bulb that comprised large synapses/glomeruli, excluding parts of the LEa/IT projection. This was essential, because we relied on extrapolating the total number of microglomeruli from a subset of segmented microglomeruli and the total volume that contained microglomeruli, which means any section containing tracts and not glomerular structures would skew the estimated total number of microglomeruli. Extrapolation was necessary, as not all microglomeruli were visually discernible. We achieved an unskewed bulb volume by leaving out dense pieces of tubulin-positive tract material. We segmented seven microglomeruli per individual from the posterior section of the bulb, where they were most clearly visible, to get the most comparable impression across individuals and species. We then calculated average microglomerulus size and divided this by bulb volume to determine an approximation of microglomeruli number.”

      (2) Line 439. It would be helpful to add that Kaiser et al. studied honeybees.

      Agreed! Now reads in lines 443-444

      “Moreover, Kaiser et al. [75] identified Allatostatin A expression in three fan-shaped and two ellipsoid body layers in the honey bee brain, …”

      (3) Line 492. "outcome" should be "outcomes".

      We believe that this refers to original line 481. Corrected. Thank you.

      (4) Figure 3B. If there is significance to the colors and triangle directions, please include a key/legend.

      We have added:

      “Cell type depictions are examples with localisation inside each neuropil being purely visual (as well as their colour), while triangles indicate approximate output sites.”

      We also corrected the following issues that were noted during our revisions:

      line 587, wrong reference.

      We updated references 37 and 44, which are now respectively

      Hodge, E. A. et al. Modality-specific long-term memory enhancement in Heliconius butterflies. Philos Trans R Soc Lond B Biol Sci 380, 20240119 (2025).

      Hodge, E. A. et al. Conservation of sensory pathways implies a localised change in the mushroom bodies is associated with cognitive evolution in Heliconius butterflies. Evol qpag005 (2026) doi:10.1093/evolut/qpag005.

      Figure S5 had an error in panels C and D, where the pictures in C were actually for H. Melpomene in D and the reverse; the other panels were correct. We have corrected this.

      In the data submitted on Zenodo: we corrected a few inconsistencies in channel colours and orientation in the .tiff files for Fig 6, 8 and S4.

      We added important bulb 3D segmentation files to the repository on Zenodo.

    1. Author response:

      We would like to express our sincere gratitude to the editors and the two reviewers for providing their constructive and valuable comments that will greatly guide us in improving the manuscript. We will revise the manuscript according to their critiques and suggestions. The existing code for this study, along with preliminary code developed in response to the review comments, has been made publicly available at https://github.com/cbaiming/miRTarDS. We now provide detailed responses to each reviewer below.

      Reviewer #1 (Public review):

      The author presents a new method for microRNA target prediction based on (1) a publicly available pretrained Sentence-BERT language model that the author fine-tunes using MeSH information and (2) downstream classification analysis for microRNA target prediction. In particular, the author's approach, named "miRTarDS", attempts to solve the microRNA target prediction problem by utilizing disease information (i.e., semantic similarity scores) from their language model. The author then compares the prediction performance with other sequence- and disease-based methods and attempts to show that miRTarDS is superior or at least comparable to existing methods. The author's general approach to this microRNA target prediction problem seems promising, but fails to demonstrate concrete computational evidence that miRTarDS outperforms other existing methods. The author's claim that disease information-based language models are sufficient is unfounded. The manuscript requires substantial rewriting and reorganization for readers with a strong background in biomedical research.

      We appreciate the reviewer’s careful examination of modeling, benchmarking, and interpretation, and we are particularly encouraged that they found the proposed method promising. We will make corresponding revisions to the manuscript based on the reviewer’s comments.

      A major issue related to the author's claim of computational advance of miRTarDS: The author does not introduce existing biomedical-specific language models, and does not compare them against miRTarDS's fine-tuned model. The performance of miRTarDS is largely dependent on the semantic embedding of disease terms. The author shows in Figure 5 that MeSH-based fine-tuning leads to a substantial improvement in MeSH-based correlation compared to the publicly available pretrained SBERT model "multi-qa-MiniLM-L6-cos-v1" without sacrificing a large amount of BIOSSES-based correlation. However, the author does not compare the performance of MeSH- and BIOSSES-based correlation with existing language models such as ChatGPT, BioBERT, PubMedBERT, and more. Also, the substantial improvement in MeSH-based correlation is a mere indication that the MeSH-based fine-tuning strategy was reasonable and not that it's superior to the publicly available pretrained SBERT model "multi-qa-MiniLM-L6-cos-v1".

      We thank the reviewer for the constructive suggestions regarding the benchmarking of language models. We acknowledge that the performance of miRTarDS largely depends on the semantic embeddings of disease terms. So, in the revisions, I will: 1) conduct a literature review to introduce existing biomedical-specific language models, and 2) perform a horizontal comparison between our fine-tuned model and these existing models, to more comprehensively evaluate the model’s capabilities.

      Another major issue is in the author's claim that disease-information from miRTarDS's language model is "sufficient" for accurate microRNA target prediction. Available microRNA targets with experimental evidence are largely biased for those with disease implications that have been reported in the biomedical literature. It's possible that their language model is biased by existing literature that has also been used to build microRNA target databases. Therefore, it is important that the author provides strong evidence that excludes the possibility of data leakage circularity. Similar concerns are prevalent across the manuscript, and so I highly recommend that the author reassess the evaluation frameworks and account for inflated performance, biased conclusions, and self-confirming results.

      We thank the reviewer for the comment. We recognize that existing experimentally validated microRNA targets may be biased toward those reported in biomedical literature as disease‑related. To mitigate this bias, we attempted to extract predicted microRNA targets that share a very similar number of miRNA- and gene‑ disease entries as the experimentally validated microRNA targets using the K‑Nearest Neighbors (KNN) method. Then applied Positive‑Unlabeled (PU) Learning to classify the two groups. PU‑Learning is designed to address scenarios where only a subset of the training data is explicitly labeled as positive, while the remaining data are unlabeled—with the unlabeled set containing both potential positives and true negatives—which is highly suitable for the application context of this manuscript [1]. Preliminary results show that after applying the new data extraction and classification approach, model performance drops to around F1=0.73 (the MISIM method also shows a decline, with F1 around 0.58; detailed code is available on GitHub). The specific reasons for this require further investigation.

      Last but not least, the manuscript requires a deeper and careful description and computational encoding of microRNA biology. I'd advise the author to include an expert in microRNA biology to improve the quality of this manuscript. For example, the author uses the pre-miRNA notation and replaces the mature miRNA notation to maintain computational encoding consistency across databases. However, the mature microRNA notation "the '-3p' or '-5p' is critical as the 3p and 5p mature microRNAs have different seed sequences and thus different mRNA targets. The 3p mature microRNA would most likely not target an mRNA targeted by the 5p mature microRNA.

      We thank the reviewer for the critique and suggestion. We fully agree with the reviewer that the distinction between the 3p and 5p mature strands is critical for determining mRNA targeting, as they possess distinct seed sequences. In our study, we relied on the miRNA–disease associations provided by the HMDD database, which annotates interactions at the pre-miRNA level: “… the enriched functions of each mature miRNA are aggregated to the corresponding miRNA precursor.” [2] Furthermore, existing literature suggests that the pre-miRNA level can be appropriate and informative for disease association analyses: “Compared with the mature miRNA method, the pre-miRNA method is more useful for studying disease association.” [3] We also find that, in some cases, both strands cooperate to regulate the same or complementary pathways [4]. We acknowledge the reviewer’s point as an important consideration for future revision. We plan to consult or collaborate with biologists to enhance the quality of the manuscript in biology.

      Reviewer #2 (Public review):

      This study introduces a novel knowledge-driven approach, miRTarDS, which enables microRNA-Target Interaction (MTI) prediction by leveraging the disease association degree between a miRNA and its target gene. The core hypothesis is that this single feature is sufficient to distinguish experimentally validated functional MTIs from computationally predicted MTIs in a binary classification setting. To quantify the disease association, the authors fine-tuned a Sentence-BERT (SBERT) model to generate embeddings of disease descriptions and compute their semantic similarity. Using only this disease association feature, miRTarDS achieved an F1 score of 0.88 on the test set.

      We thank the reviewers for their positive feedback, especially for their recognition of the novelty of this manuscript.

      Strengths:

      The primary strength is the innovative use of the disease association degree as an independent feature for MTI classification. In addition, this study successfully adapts and fine-tunes the Sentence-BERT (SBERT) model to quantify the semantic similarity between biomedical texts (disease descriptions). This approach establishes a critical pathway for integrating powerful language models and the vast growth in clinical/disease data into biochemical discovery, like MTI prediction.

      We would like to thank the reviewer again for their positive feedback. We appreciate their recognition of the novelty of our work, as well as their acknowledgment that the proposed method paves the way for integrating language models with clinical/disease data into biochemical discovery.

      Weaknesses:

      The main weakness lies in its definition of the ground-truth dataset, which serves as a foundation for methodological evaluation. The study defines the Negative Set as computationally predicted MTIs that lack experimental evidence. However, the absence of experimental validation does not equate to non-functionality. Similarly, the miRAW sets are classified by whether the target and miRNA could form a stable duplex structure according to RNA structure prediction. This definition is biologically irrelevant, as duplex stability does not fully encapsulate the complex in vivo binding of miRNAs within the AGO protein complex.

      We thank the reviewers for their constructive feedback. We have realized that treating predicted MTI as a negative class may pose some issues. Therefore, we have decided to adopt Positive Unlabeled (PU) Learning in subsequent updates. This classification method can be applied to datasets such as ours, which contain only positive classes and lack negative ones [1]. We used the miRAW dataset to enable a horizontal comparison of our method with traditional sequence-based prediction approaches. We acknowledge that miRAW may overlook some biological insights, and we plan to optimize the construction of test datasets in the future. Some preliminary explorations have already been conducted, and the relevant code is available on GitHub.

      Furthermore, we will make the following revisions: 1) We will clearly specify the version of miRBase and incorporate more miRNA-related databases. 2) Conduct a further literature review on miRNA biological mechanisms to enhance the quality of the manuscript in biology. 3) Perform a more comprehensive evaluation of the model’s performance. 4) Attempt to identify some representative MTIs that have been overlooked by existing prediction tools but can be predicted by our proposed method.

      References

      (1) Li, F., Dong, S., Leier, A., Han, M., Guo, X., Xu, J., ... & Song, J. (2022). Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Briefings in Bioinformatics, 23(1), bbab461.

      (2) Huang, Z., Shi, J., Gao, Y., Cui, C., Zhang, S., Li, J., ... & Cui, Q. (2019). HMDD v3. 0: a database for experimentally supported human microRNA–disease associations. Nucleic acids research, 47(D1), D1013-D1017.

      (3) Wang, H., & Ho, C. (2023). The human pre-miRNA distance distribution for exploring disease association. International Journal of Molecular Sciences, 24(2), 1009.

      (4) Mitra, R., Adams, C. M., Jiang, W., Greenawalt, E., & Eischen, C. M. (2020). Pan-cancer analysis reveals cooperativity of both strands of microRNA that regulate tumorigenesis and patient survival. Nature Communications, 11(1), 968.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is an important paper that reports in vivo physiological abnormalities in the hippocampus of a rat model of traumatic brain injury (TBI). In this study, authors focused on changes in theta-gamma phase coupling and action potential entrainment to theta, phenomena hypothesized to be critical for cognition. While the authors provide solid evidence of deficits in both features post-TBI, the study would have been stronger with a more hypothesis-driven approach and consideration of alterations of the animal's behavioral state or sensorimotor deficits beyond memory processes.

      We would like to thank the reviewers for their comments on our manuscript. By incorporating their feedback, we were able to make our hypotheses more clear, expand our analyses to compare physiological processes across similar behavioral states, and address extra hippocampal input and potential sensorimotor confounds in our data.

      Specifically, we have added new data in Figure 5 showing how theta amplitude correlates with theta-gamma PAC and entrainment strength. We have also added supplementary Figure 1 demonstrating that there are no differences in exploration or movement velocity in injured animals compared to shams. Supplementary Figures 2, 3, and 4 were added to compare oscillatory power while animals were still, moving at a higher velocity, and following a broadband power shift correction respectively. We also added Supplementary Figure 7 demonstrating that there were no differences in firing rates between sham and injured animals while they were still or moving and Supplementary Figure 8 showing no changes in pyramidal cell bursting. Finally, we added Supplementary Figure 10 showing that there was no difference in velocity or distance traveled during testing in the MWM between sham and injured animals and that learning curves were similar across groups before sham/injury surgery. We believe that the addition of this data significantly improves our manuscript by more strongly controlling for the animal’s behavioral state in our analyses and provides strong evidence that significant sensory/motor deficits were not present in injured animals at this injury level and time point post injury. Below we address specific points raised by the reviewers.

      Reviewer #1 (Public review):

      Summary:

      This study investigated how traumatic brain injury affects oscillatory and single-unit hippocampal activity in awake-behaving rats.

      Strengths:

      The use of high-density laminar electrodes enabled precise localization of recording sites. To ensure an unbiased, rigorous approach, single-unit analysis was performed by a reviewer who was blind to experimental conditions. A proof of concept study was undertaken to characterize the pathology that resulted from the specific TBI model used in the main study. There was an effort to link abnormalities in hippocampal activity to memory disruption by running a cohort of rats on the Morris Water Maze task.

      Weaknesses:

      The paper is written as if the experiment was exploratory and not hypothesis-driven despite the fact that there is a wealth of experimental evidence about this TBI model that could have informed very specific predictions to test a hypothesis that is only hinted at in the discussion. The number of rats used for the spatial working memory experiment is not reported. Some of the statistics are not completely reported. It is also unclear what the rationale was for recording single units in a novel and familiar environment. Furthermore, this analysis comparing single-unit activity between familiar and novel environments is quite rudimentary. There are much more rigorous analyses to answer the question of how hippocampal single-unit firing patterns differ across changes in environments. There are details lacking about the number of units recorded per session and per rat, all of which are usually reported in studies that record single units. Spatial working memory assessment is delegated to a single panel of a supplementary figure. More importantly, there is no effort to dissociate between spatial working memory deficits and other motor, motivational, or sensory deficits that could have been driving the lower "memory score" in the experimental group.

      In order to address these important concerns, we have made the following changes:

      (1) We have updated the results section to include more rationale for the recordings and analyses used to clarify our hypotheses. In addition, we hope that our extensive characterization will lay the groundwork to inform future studies investigating circuit-specific disruptions following TBI and neuromodulatory therapies.

      (2) The number of rats used for the spatial working memory experiment is reported in the text and figure legend.

      (3) We have added supplemental Table 2 to include the requested statistical information (t-statistic, degrees of freedom, and 1 vs 2-tailed analyses).

      (4) Unfortunately, we did not have adequate occupancy to robustly extract and compare place cell properties across groups and environments which obscured the rationale of our study design and limited us to more rudimentary analyses. While animals did actively explore the two environments, the relatively short recording time limited the spatial sampling of the two-dimensional environment. We were able to extract putative place cells and found some evidence that place cells in TBI rats had lower spatial information content than in shams (as has previously been described). However, we did not feel that place cell analyses were rigorous enough to include in this manuscript due to the limited spatial sampling. Future studies in the lab will assess how TBI affects place cell information content, stability, and phase precession with better occupancy.

      (5) We have added Supplemental Table 1 that includes the total number of units recorded for each animal.

      (6) The spatial working memory deficit we report in the MWM is not a novel finding in this model of TBI. However, we wanted to ensure that <sub>L</sub>FPI in our hands at this injury level reproduced this known deficit. Importantly, the swim speed and distance traveled during testing did not differ between groups, suggesting that differences were not due to motor deficits. Additionally, the learning curves before sham/<sub>L</sub>FPI surgery were the same across groups. This data has been added to the manuscript in Supplementary Figure 10. While we did not test animals in a version of the task where the platform was visibly marked, previous studies have demonstrated that sham and injured rats perform comparably in a version of the MWM where the platform is visible or when a constant start location is used. These citations have been added to the manuscript.

      Reviewer #1 (Recommendations for the authors):

      For a more rigorous way of analyzing changes in hippocampal firing patterns across environments, see Wills et al 2005 for example.

      Addressed in point 4 above

      Spatial working memory tasks should always be compared with a control task to rule out confounding performance variables. Examples would be to use a variant of the MWM task that does not require the hippocampus such as using a visible escape platform.

      Addressed in point 6 above

      Statistics are typically reported including a t-statistic and degrees of freedom, not just the p-value. In addition, the authors should indicate whether the t-test is one or two-tailed.

      Addressed in point 3 above

      Reviewer #2 (Public review):

      Summary:

      The authors investigate changes in theta-gamma phase amplitude coupling, and action potential entrainment to theta following traumatic brain injury (TBI). Both phenomena are widely hypothesized to be important for cognition, and the authors report deficits in both after TBI. The manuscript is well-written, the figures are well-constructed, and the author's use of high-level analysis methods for TBI EEG data collected from awake, behaving animals is welcome.

      Major Comments:

      The animal n's are small (4 sham and 5 injured). In Figure 3, for instance, one wonders if panels D and E might have shown significant differences if more animals had been recorded.

      There are conflicting reports regarding the effect of <sub>L</sub>FPI on single cell firing rates. This is likely due to differential task demands and variations in <sub>L</sub>FPI severity across studies. We agree that the firing rates do appear to be trending; however, overall firing rate changes can be difficult to interpret. Because firing rates are influenced by behavior and brain state, we further separated firing rates into epochs when animals were moving or still and found similar trends that did not reach significance (data added in Supplementary Figure 7). We also assessed bursting in pyramidal cells to investigate whether potential changes in bursting influenced overall firing rates, and we found no differences between sham and injured animals across conditions (data added in Supplementary Figure 8). While the n’s are small when considered by animal, the number of units is actually fairly large, so if there were robust effects (as there were for the entrainment analyses), we would expect to see significant differences.

      The text focuses on deficits in the theta and gamma bands, but the reduction in power appears to be broadband (see Figure 1F, especially Pyramidal cell layer panel). Therefore, the overall decrease in broadband (in the injured population) must be normalized between sham and injured animals before a selective comparison between sham and injured animals can be conducted. That is the only way that selective narrow bands i.e., theta and low gamma can be compared between the two cohorts. A brief discussion of the significance of a broadband decrease would be appreciated.

      This is an excellent point that has now been addressed with the addition of Supplementary Figure 4. We used a well-established method (Donoghue et al 2020) to flatten power spectra in order to compare specific frequency bands in the context of a broadband shift. After applying this correction, we show that theta power is still reduced in injured rats compared to shams. While there is no difference in gamma power between groups in the corrected power spectra, this result should be interpreted with caution especially since there is not a large distinct peak in the gamma frequency range in the power spectrum of either sham or injured animals. However, if this is interpreted to mean that gamma power is not different between sham and injured animals, it makes the PAC data even more compelling. While there is clearly a broadband shift, the frequency range of this shift is still limited in the frequency domain to ~4-90Hz which contains physiologically relevant frequencies associated with synaptic currents. Importantly, the power spectra of sham and injured animals converge at low (<4Hz) and high (>100Hz) frequencies. This suggests that slow oscillations which could include delta and respiration-associated oscillations are not affected by TBI (though sleep recordings would be needed to properly address this). High-frequency activity can include ripples and HFOs which need to be separately extracted when comparing between groups due to their transient nature. However, overall spiking activity including the depolarizing spike and the after hyperpolarization significantly contribute to power in the high frequency range. Because this general high-frequency power is not different between groups, it suggests that the limited range of the broadband power reduction still contains important physiological signals. This broadband shift may result from a global reduction in or desynchronization of synaptic input to CA1. The specific mechanisms behind this broadband shift and the consequences it has on coding information in the hippocampus are fascinating questions that we hope will be specifically investigated in future studies. This point is now addressed in the Discussion.

      Reviewer #2 (Recommendations for the authors):

      Minor Comments:

      Please define your reference waveform for theta - is it theta recorded on the channel containing the cell? Average theta for all electrodes in SP? SP + SO? Theta for the nominal "St. pyr." channel? Please define.

      For all entrainment analyses, entrainment was measured referenced to the theta oscillation recorded from st. pyr. on the specific shank where the unit was detected. We added clarification in the results and methods sections regarding this point.

      Similarly, even though the peak of the theta wave appears from the figures to be taken as 0 degrees, please explicitly state this in the text.

      This has been added to the results and methods.

      Did the authors check for any difference between interneurons in SP and interneurons in SO?

      This is an excellent suggestion that we had hoped to investigate as it could inform whether specific interneuron populations were affected. However, we did not record enough units in st. ori to make this comparison.

      On page 8, Figures 3E and 3F are incorrectly labeled 4E and 4F.

      This has been fixed.

      Figure 1, panel C: please add a numerical scale to the colored scale bar.

      This has been added

      Figure 1, panel F: how was the significance between the frequency bands calculated?

      Statistics were done using a t-test at each frequency point with significance set at α=0.01 for multiple comparisons. This has been clarified in the figure legend and methods.

      Figure 3, panel A legend: Please add "Spike at 0 ms omitted for clarity.”

      This has been added

      Figure 4, panel A, right side: please provide the MVL for this cell, so that readers have a benchmark for evaluating the MVL as a parameter. A sample poorly entrained cell, with MVL, would also be informative.

      We added the MVL for this cell. We were unable to add a poorly entrained cell without making the figure more confusing.

      Raw data must be provided for the Morris Water Maze experiments described in Supplementary Figure 3.

      We added data showing no difference in the swim velocity or distance traveled between the sham and injured groups during memory testing as well as data showing that the two groups had similar learning curves during training before sham/injury surgery. See Supplementary Figure 10.

      Antibody 22C11 for APP has been shown to be non-specific when used for immunocytochemistry (it may be fine for Westerns). In addition, using a biotinylated secondary with an ABC kit for visualization risks contamination by post-injury changes in biotin. Reviewed in Xiong et al., 2023, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580020/.

      As is standard practice in neuropathology, negative controls were run for all of these experiments (identical preparations minus the primary antibody.) No non-specific staining was present that could be mis-interpreted as APP-positive axonal profiles in either sham or injured tissue. While beyond the scope of this response, there are many reasons the authors of the cited paper may have had non-specific staining, including a concentration 450X that of the one utilized here and the absence of an antigen-retrieval technique in their protocol.

      Tummala et al. used in vivo calcium-imaging after TBI and also investigated single-cell activity in familiar and novel environments, and when moving or still. The authors could consider discussing their work.

      We have added a citation for this paper

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors studied the effects of traumatic brain injury created by LFPI procedure on the CA1 at the network level. The major findings in this study seem to be that the TBI reduces theta and gamma powers in CA1, reduces phase-amplitude coupling in between theta and gamma bands as well as disrupts the gamma entrainment of interneurons. I think the authors have made some important discoveries that could help advance the understanding of TBI effects at the physiological level, however, more investigations into deciphering the relationship of the behavioral and brain states to the observed effects would help clarify the interpretations for the readers.

      Strengths:

      The authors in this study were able to combine behavioral verification of the TBI model with the laminar electrophysiological recordings of the CA1 region to bring forward network-level anomalies such as the temporal coordination of network-level oscillations as well as in the firing of the interneurons. Indeed, it seems that the findings may serve future studies to functionally better understand and/or refine the therapies for the TBI.

      Weaknesses:

      Discoveries made in the paper and their broad interpretations can be helped with further characterization and comparison among the brain and behavioral states both during immobility and movement. The impact of brain injury in several parts of the brain can alter brain-wide LFP and/or behavior. The altered behavior and/or LFP patterns might then lead to reduced spiking and unreliable LFP oscillations in the hippocampus. Hence, claims made in the abstract such as "These results reveal deficits in information encoding and retrieval schemes essential to cognition that likely underlie TBI-associated learning and memory impairments, and elucidate potential targets for future neuromodulation therapies" do not have enough evidence to test whether the disruptions were information encoding and retrieval related or due to sensorymotor and/or behavioral deficits that could also occur during TBI.

      Movement velocity is already known to be correlated to the entrainment of spikes with the theta rhythm and also in some cases with the gamma oscillations. So, it is important to disentangle the differences in behavioral variables and the observed effects. As an example, the author's claims of disrupted temporal coding (as shown in the graphical abstract) might have suffered from these confounds. The observed results of reduced entrainment might, on one hand, be due to the decreased LFP power (induced by injury in different brain areas) resulting in altered behavior and/or the unreliable oscillations of the LFP bands such as theta and gamma, rather than memory encoding and retrieval related disruption of spikes synchrony to the rhythms, while on the other hand, they may simply be due to reduced excitability in the neurons particularly in the behavioral and brain state in which the effects were observed, rather than disrupted temporal code. Hence, further investigations into dissociating these factors could help readers mechanistically understand the interesting results observed by the authors.

      We appreciate the Reviewer’s insights into disentangling the complex interactions between power, entrainment, and excitability, and have attempted to dissociate these further in our analyses. Regarding the broad effects of TBI, we agree that TBI affects many brain regions outside of the hippocampus as well as white matter pathways containing axons from areas where pathology is not visible, which likely results in widespread changes to LFPs across regions and altered behavior. Here we report disrupted network activity in the hippocampus which is likely a consequence of numerous pathologies across multiple brain regions. In the discussion, we speculate that disrupted power and coupling comes from desynchronization of inputs (especially those from the mEC and MS) as well as changes to local circuits within the hippocampus which combine to disrupt temporal coding. While the disrupted processes we report in the hippocampus are implicated in computational processes thought to support learning and memory, we acknowledge that results from this study do not causally reveal a specific mechanism that is directly responsible for cognitive impairments. We have changed the language of the quoted sentence from the abstract to make our claim less causal as we agree that the direct effects of these results on cognition are difficult to quantify due to the fact that animals were not performing a spatial navigation task with measurable outcomes during recordings. We have also removed the graphical abstract as we believe it is an oversimplification of the results given new analyses.

      Regarding the possible contribution of sensory and motor deficits or differences in behavioral states to the observed changes, we agree that it is essential to consider potential sensorimotor deficits as well as the animal’s behavioral state when comparing oscillations and single unit activity in the hippocampus, especially since these phenomena have been extensively liked to movement velocity and exploration. To address this, we have added Supplementary Figure 1 showing that there are no differences in movement velocity or exploration time between sham and injured animals. Because animals were simply foraging during electrophysiological experiments we do not expect there to be any major additional behavioral differences that would influence oscillations or spiking once locomotion is controlled for, though differences in attention or arousal cannot be ruled out. Additionally, analyses throughout the manuscript are performed independently during periods when animals were moving or still. Data in Figures 1 and 2 also only include data from the familiar environment to rule out any effects of novelty on hippocampal oscillations. Supplementary Figures 2 and 3 were added to demonstrate that TBI-associated reductions in power were consistent when animals were still and when a higher threshold for movement (>20 cm/sec) was used. Finally, supplementary Figure 10 was added showing no differences in swim velocity or distance traveled in the MWM between sham and injured animals, further suggesting that there are no significant sensorimotor deficits at this injury level and timepoint. Additionally, previous studies have demonstrated that sham and injured rats perform comparably in a version of the MWM where the platform is visible or when a constant start location is used, which provides further support that sensorimotor deficits are not responsible for memory deficits in this task (see above).

      Regarding the contribution of neuronal excitability to the reported changes, we agree that changes in the excitability of neurons could have a strong effect on entrainment. Importantly, we show that the disrupted oscillations recorded in the injured hippocampus do not coincide with significant changes in neuronal firing rates between sham and injured animals. We have added Supplementary Figure 7 demonstrating this holds true both when animals are still and when they are moving. Additionally, we have added Supplementary Figure 8 showing no differences in pyramidal cell bursting between sham and injured animals. While this suggests that there are not major changes in excitability, homeostatic plasticity mechanisms may impact firing rates and bursting, and the extent of these effects and their role on entrainment are unclear. This point was added to the Discussion.

      To address the effects of LFP power on entrainment strength, Figure 5 has been updated to show theta and gamma entrainment strength as well as theta-gamma PAC as a function of theta amplitude. We found that, during periods of comparable theta power, interneurons from sham and injured animals are similarly entrained to theta, but pyramidal cells from injured animals become significantly more entrained to theta than in shams. We address the potential implications of these results in the Discussion.

      Reviewer #3 (Recommendations for the authors):

      The authors have stated on page 7 and Figure 2E, "Taken together, injured rats show a decrease in the strength of theta-gamma PAC that is specific to st. pyr, and a shift in peak gamma amplitude to a later phase of theta in both st. pyr and st. rad". Is the shift in the peak position greater than expected by chance?

      We are unaware of a rigorous method that would allow us to compare this shift statistically. We have reported the observed shift and avoided calling the shift significant for that reason.

      The authors state on page 9 "cells (sham familiar=1.63{plus minus}0.23 Hz, n=51, injured familiar=2.11{plus minus}0.20 Hz, n=141, p=0.446; sham novel=1.84{plus minus}0.18 Hz, n=55, injured novel=2.23{plus minus}0.21 Hz, n=134, p=0.170; mean{plus minus}SEM; ks-test; Fig 4E) between sham and injured groups, but a higher percentage of pyramidal cells were active (firing rate >0.1Hz) in both the familiar and novel environment in injured rats compared to shams (sham=74%, injured=87%, p=0.025, Fisher's exact test; Fig 4F)." Do the authors mean Figures 3E and 3F respectively in place of Figures 4E and 4F?

      This has been fixed.

      Regarding the finding of similar firing rates and differences in the overlap of the neurons that were active in between injured and control animals, it is imperative to study the differences in behaviors of the animals. First of all, it seems appropriate to quantify and compare the immobility and mobile periods as well as the movement velocity of the animals in both groups. Then, it would be interesting to see if any behavioral variables correlate with the firing characteristics of the cells in both the sham and the injured animals. Since hippocampal cells have been known to have different levels of recruitment and firing rates according to different behavioral states such as movement velocity, some of the similarities or differences in neural findings might as well be attributed to the differences in behaviors in between the groups. However, some differences may be observed in the injured rats despite similar behavior and the LFP powers. In other words, studying the effects of injury during similar behavioral (e.g. firing rate as a function of movement velocity) and brain states (e.g. categorical effects of awake theta state, type two theta, and ripple states on firing rates and the entrainment) might help dissociate some effects that might only be due to difference in the behavior caused by the injury throughout the brain and might as well have less to do with specific injury induced local circuits level deficits in the hippocampus. The results in Figures 4, 5, and 6 reveal such interesting differences and hence, it becomes even more important to quantify and correlate behavioral states (movement velocity and theta/ripple) to the neuronal characteristics (LFP power, PAC, firing rates, and entrainment) presented in Figure 3.

      These are excellent points, and we have addressed them in the following ways:

      We added Supplementary Figure 1 demonstrating that there were no differences in movement velocity between sham and injured animals during electrophysiological recordings.

      Power and PAC analyses were done exclusively when the animal was moving to compare across similar behavioral states. Additionally, these analyses were constrained to recordings from the familiar environment to rule out any effects of novelty. Because animals were simply foraging during recordings we do not expect other behavioral factors besides movement velocity to play a major role in these processes. We have also added Supplementary Figures 2 and 3 which demonstrate that TBI-associated differences in oscillatory power follow similar trends when animals are still (Sup. Fig 2) or when a higher movement threshold (>20cm/sec) is used (Sup Fig 3). We also added Supplementary Figures 7 and 8 showing that there were no significant differences in firing rates or bursting while animals were still or while they were moving.

      The Discussion was expanded to discuss how TBI may disrupt circuits outside the hippocampus which may contribute to our findings. Additionally, we acknowledge the limitation that these recordings were not obtained while animals were doing a quantitatively measurable spatial navigation task which limits our ability to assess whether changes are truly behaviorally relevant.

      We have also updated Figure 5 to show entrainment across different levels of theta power.

      Elaborating on the abovementioned point, Figures 4B and 4E depict a finding that mean entrainment is reduced in the injured during immobility. The following factors may contribute to the results:

      (1) Reduction in theta power during immobility (reduced attention and/or LFP profile due to brain-wide injury), which makes theta cycles unreliable, which can contribute to the results.

      (2) Changes in neural firing properties during immobility, such as reduced burst rates or firing rates during immobility.

      (3) As the authors claimed in the graphical abstract, there might be an actual disruption of temporal code associated with the memory encoding. It would be awesome if the temporal disruption could be investigated during the comparable theta power and behavioral states. This analysis would test whether there is an unconfounded disruption in the temporal code in the hippocampus due to the injury. In any case, it would be ideal to isolate the epochs during sleep in which animals were in theta state and exclude ripple states to make a definitive assessment of the aforementioned factors. These further investigations would also help the interpretations made by authors in the discussion section such as "This can disrupt type II theta which occurs when animals are not actively moving and exploring the environment. We found that single unit entrainment to theta was substantially decreased in injured rats when they were not moving, a phenomenon not seen in shams, which suggests a disruption in type II theta. This provides further evidence that cholinergic signaling may be dysfunctional following TBI."

      (1) While theta power is reduced in injured animals, it can still be reliably detected even at rest. We added Supplementary Figure 2 showing power spectra while animals were not moving, and a distinct peak can be seen in the theta frequency range. Additionally, clear peaks in entrainment can be seen in the theta frequency band in Fig 4B while animals were still. This suggests that theta can still be reliably detected in injured animals even when they are not moving. However, we agree that reduced attention or arousal could contribute to these changes, and this point has been added to the Discussion.

      (2) We added Supplementary Figures 7 and 8 showing no differences in firing rates or bursting parameters between groups during periods of immobility.

      (3) We updated Figure 5 which now shows entrainment strength as a function of theta amplitude. We found that the theta entrainment strength of both pyramidal cells and interneurons increased with increasing theta amplitudes. We address potential implications of these changes in the Discussion.

      On page 10 the authors state, "theta entrainment strength drastically increased when rats began moving in injured but not sham animals." It is unclear if the effect was confined to the periods when rats started movement. Also, it would be of interest to investigate whether movement epochs and velocity were affected in the periods when the effects were observed.

      This was not confined to the exact points when the rats started moving. We removed the word “began” for clarity. See point regarding velocity above.

      On page 12 the authors state, "On test day, injured rats had a lower memory score than shams (sham=114.8 {plus minus} 21.8, n=9; injured=51.5{plus minus}6.8, n=14; p=0.020; mean {plus minus} SEM; Welch's t-test) indicating poor spatial memory (Sup Fig 3A)." The result is the validation of the TBI injury on a hippocampal-dependent Morris water maze task. However, it would be nice to see the quantification of the movement velocity in the water maze and the trajectory length in each group to further dissect whether animals were constrained in the movement and hence, they could not get to the platform or they forgot where it was located. Also, it would help to compare the rats' performance after sham or TBI surgeries to their performance during the training before the surgeries (assuming the data during the training periods were recorded as well).

      We have added Supplemental Figure 10 to include all of this information. Importantly, movement velocity and distance traveled were not different between groups on testing day, and the learning curves of both groups were the same before sham/injury surgery.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study utilises fNIRS to investigate the effects of undernutrition on functional connectivity patterns in infants from a rural population in Gambia. fNIRS resting-state data recording spanned ages 5 to 24 months, while growth measures were collected from birth to 24 months. Additionally, executive functioning tasks were administered at 3 or 5 years of age. The results show an increase in left and right frontal-middle and right frontal-posterior connections with age and, contrary to previous findings in high-income countries, a decrease in frontal interhemispheric connectivity. Restricted growth during the first months of life was associated with stronger frontal interhemispheric connectivity and weaker right frontal-posterior connectivity at 24 months of age. Additionally, the study describes some connectivity patterns, including stronger frontal interhemispheric connectivity, which is associated with better cognitive flexibility at preschool age.

      Strengths:

      The study analyses longitudinal data from a large cohort (n = 204) of infants living in a rural area of Gambia. This already represents a large sample for most infant studies, and it is impressive, considering it was collected outside the lab in a population that is underrepresented in the literature. The research question regarding the effect of early nutritional deficiency on brain development is highly relevant and may highlight the importance of early interventions. The study may also encourage further research on different underrepresented infant populations (i.e., infants not residing in Western high-income countries) or in settings where fMRI is not feasible.

      The preprocessing and analysis steps are carefully described, which is very welcome in the fNIRS field, where well-defined standards for preprocessing and analysis are still lacking.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      While the study provides a solid description of the functional connectivity changes in the first two years of life at the group level and investigates how restricted growth influences connectivity patterns at 24 months, it does not explore the links between adverse situations and developmental trajectories for functional connectivity. Considering the longitudinal nature of the dataset, it would have been interesting to apply more sophisticated analytical tools to link undernutrition to specific developmental trajectories in functional connectivity. The authors mention that they lack the statistical power to separate infants into groups according to their growing profiles. However, I wonder if this aspect could not have been better explored using other modelling strategies and dimensional reduction techniques. I can think about methods such as partial least squares correlation, with age included as a numerical variable and measures of undernutrition.

      We agree with the reviewer that this complex and rich longitudinal dataset would benefit from more sophisticated analytical approaches to characterise developmental trajectories in functional connectivity and to more directly link them to measures of undernutrition. However, conducting such analyses would require substantial additional methodological development, model validation, and careful interpretation, which fall beyond the scope and timeline of the present manuscript. Our aim here was to provide a clear and robust characterisation of functional connectivity changes during the first two years of life and to examine associations with growth outcomes at a specific developmental stage, while ensuring methodological transparency and statistical reliability. Importantly, these more advanced trajectory-based analyses are currently being pursued in the final phase of the BRIGHT project (BRIGHT IMPACT), in collaboration with expert statisticians and data scientists. This ongoing work aims specifically to leverage the longitudinal richness of the dataset to model developmental trajectories and their associations with early-life adversity and nutritional factors. We therefore see the present study as an important foundation for these forthcoming analyses.

      Connectivity was assessed in 6 big ROIs. While the authors justify this choice to reduce variability due to head size and optodes placement, this also implies a significant reduction in spatial resolution. Individual digitalisation and co-registration of the optodes to the head model, followed by image reconstruction, could have provided better spatial resolution. This is not a weakness specific to this study but rather a limitation common to most fNIRS studies, which typically analyse data at the channel level since digitalisation and co-registration can be challenging, especially in complex setups like this. However, the BRIGHT project has demonstrated that it is possible and that differences in placement affect activation patterns, which become more localised when data is co-registered at the subject level (Collins-Jones et al., 2021). Could the co-registration of individual data have increased sensitivity, particularly given that longitudinal effects are being investigated?

      We agree with the reviewer that the fNIRS community should work toward more precise methods for spatial registration of optodes, not only at the group level but also at the subject level, in order to make more precise inferences about the locations of activations. However, we followed a very thorough offline procedure to model headgear placement based on each participant’s photographs, which we believe complements the coregistration work performed by Collins-Jones in 2021. As reported in the fNIRS data acquisition section “Infants were excluded from further analysis if the band was excessively high over the front above the eyebrows” (line 409, methods section). Moreover channels displacement was measured from the photos, and if it was “equal or greater than 1.6 cm were renumbered, so that each channel was shifted either backward or forward one full channel location in space” (line 413, methods section). While these practices are thoroughly followed in the BRIGHT project, we are aware that they are not part of the standard procedure in many infant fNIRS studies. We hope that this work provides guidance for other researchers on how to coregister infant fNIRS data.

      Considering the spatial resolution of fNIRS, which is on the order of centimetres, and the thorough procedure combining fNIRS–MRI coregistration with channel displacement assessment based on photographs, we do not think that individual-level coregistration would have significantly increased the sensitivity of the results.

      I believe that a further discussion in the manuscript on the application of global signal regression and its effects could have been beneficial for future research and for readers to better understand the negative correlations described in the results. Since systemic physiological changes affect HbO/HbR concentrations, resulting in an overestimation of functional connectivity, regressing the global signal before connectivity computation is a common strategy in fNIRS and fMRI studies. However, the recommendation for this step remains controversial, likely depending on the case (Murphy & Fox, 2017). I understand that different reasons justify its application in the current study. In addition to systemic physiological changes originating from brain tissue, fNIRS recordings are contaminated by changes occurring in superficial layers (i.e., the scalp and skull). While having short-distance channels could have helped to quantify extracerebral changes, challenges exist in using them in infant populations, especially in a longitudinal study such as the one presented here. The optimal source-detector distance that minimises sensitivity to changes originating from the brain would increase with head size, and very young participants would require significantly shorter source-detector distances (Brigadoi & Cooper, 2015). Thus, having them would have been challenging. Under these circumstances (i.e., lack of short channels and external physiological measures), and considering that the amount the signal is affected by physiological noise (either coming from the brain or superficial tissue) might change through development, the choice of applying global signal regression is justified. Nevertheless, since the method introduces negative correlations in the data by forcing connectivity to average to zero, I believe a further discussion of these points would have enriched the interpretation of the results.

      We added a paragraph discussing the choice of using GSR in our pipeline in the discussion of the manuscript as follows: “Importantly, these results remained significant even without GSR, indicating that our findings are not solely driven by preprocessing choices. While the use of GSR in FC studies remains debated (Murphy & Fox, 2017), in the absence of short channels (which are difficult to use reliably with infants (Emberson et al., 2016)) and external physiological measures, applying GSR represented the most appropriate preprocessing option. In fact, failure to correct for systemic physiological fluctuations can, in fact, lead to artificially elevated connectivity estimates in fNIRS data (Abdalmalak et al., 2022)” (line 250, discussion section).

      Reviewer #2 (Public review):

      Strengths:

      The article addresses a topic of significant importance, focusing on early life growth faltering in low-income countries-a key marker of undernutrition-and its impact on brain functional connectivity (FC) and cognitive development. The study's strengths include the laborious data collection process, as well as the rigorous data preprocessing methods employed to ensure high data quality. The use of cutting-edge preprocessing techniques further enhances the reliability and validity of the findings, making this a valuable contribution to the field of developmental neuroscience and global health.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      The study fails to fully leverage its longitudinal design to explore neurodevelopmental changes or trajectories, as highlighted by all three reviewers. The revised manuscript still primarily focuses on FC values at a single age stage (i.e., 24 months) rather than utilizing the longitudinal data to investigate how FC evolves over time or predicts cognitive development. Although the authors acknowledge that analyzing changes in FC (ΔFC) would reduce degrees of freedom (to ~30) and risk interpretability, they do not report or discuss these results, even as exploratory findings.

      As suggested, we added the table reporting the results of the associations between changes in functional connectivity (DFC) between 5 and 24 months and cognitive flexibility in the supplementary materials (Table SI3). We additionally explored the relationship between changes in growth and cognitive flexibility as suggested by Reviewer #3 and we reported these additional analyses in the text as follows: “We also explored whether changes in growth and changes in functional connectivity between 5 and 24 months were associated with cognitive flexibility at preschool age, but we did not find any significant association (Table SI3 and Table SI4).” (line 213, results section).

      Furthermore, the study lacks specificity in identifying which specific brain networks are affected by growth faltering, as the current exploratory analyses mainly provide an overall conclusion that infant brain network development is impacted without pinpointing the precise neural mechanisms or networks involved.

      We added this limitation in the discussion as follows: “While the impact of undernutrition on brain development has been documented in LMICs (46), herein, we provided empirical evidence that growth faltering specifically in infants younger than five months of age impacts observable development of functional brain networks in the second year of life. Future studies may be needed to pinpoint which specific brain networks are impacted” (line 279, discussion section).

      Reviewer #3 (Public review):

      Summary

      This study aimed to investigate whether the development of functional connectivity (FC) is modulated by early physical growth, and whether these might impact cognitive development in childhood. This question was investigated by studying a large group of infants (N=204) assessed in Gambia with fNIRS at 5 visits between 5 and 24 months of age. Given the complexity of data acquisition at these ages and following data processing, data could be analyzed for 53 to 97 infants per age group. FC was analyzed considering 6 ensembles of brain regions and thus 21 types of connections. Results suggested that: i) compared to previously studied groups, this group of Gambian infants have different FC trajectory, in particular with a change in frontal inter-hemispheric FC with age from positive to null values; ii) early physical growth, measured through weight-for-length z-scores from birth on, is associated with FC at 24 months. Some relationships were further observed between FC during the first two years and cognitive flexibility, in different ways between 4- and 5-year-old preschoolers, but results did not survive corrections for multiple comparisons.

      Strengths

      The question investigated in this article is important for understanding the role of early growth and undernutrition on brain and behavioral development in infants and children. The longitudinal approach considered is highly relevant to investigate neurodevelopmental trajectories. Furthermore, this study targets a little studied population from a low-/middle-income country, which was made possible by the use of fNIRS outside the lab environment. The collected dataset is thus impressive and it opens up a wide range of analytical possibilities.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses

      Data analyses were constrained by the limited number of children with longitudinal data on NIRS functional connectivity. Nevertheless, considering more advanced statistical modelling approaches would be relevant to further explore neurodevelopmental trajectories as well as relationships with early growth and later cognitive development.

      While in this study we selected specific FC and outcome variables based on our hypothesis, the final phase of the BRIGHT project, known as BRIGHT IMPACT, aims to apply advanced statistical models to integrate a range of project variables into a single comprehensive analysis. We have acknowledged this in the discussion as follows: “Applying more advanced statistical modelling methods and structural equation modelling analyses may provide greater insight with further investigations in contexts of adversity and, in turn, establish which outcomes are predicted by FC” (line 309, discussion section).

      The abstract and end of the discussion should make it clearer that the associations between FC and cognitive flexibility are results that need to be confirmed, insofar as they did not survive correction for multiple comparisons.

      We have acknowledged this in the abstract as follows: “Our results highlight the measurable effects that poor growth in early infancy has on brain development and the possible subsequent impact on pre-school age cognitive development, underscoring the need for early life interventions throughout global settings of adversity”.

      We have acknowledged this in the discussion as follows: “While our results are consistent with previous studies, we acknowledge that the significant associations between early FC and later cognitive flexibility do not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample” (line 300, discussion section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1 B and C the authors should indicate that the results refer to HbO.

      We have added the suggested specification in the caption of the figure as suggested.

      (2) Figure SI2. Please indicate in the caption that these are the results when pre-processing did not include global signal regression.

      We have added the suggested specification in the caption of the figure as suggested.

      Reviewer #3 (Recommendations for the authors):

      (1) The sentence l529-531 ("To investigate whether FC early in life predicted...") should be more explicit as it is not clear which of the two variables is regressed by the other: is it the measure of cognitive flexibility that is regressed by FC, as the hypothesis suggests? Were other variables considered in the regression model? (For linear regression with only one "prediction" variable, the square root of the coefficient of determination 𝑅2 is equal to the correlation between the two variables.)

      Yes, it is the measure of cognitive flexibility that is regressed by FC. We have rephrased it in the text as follows: “we regressed later cognitive flexibility against FC that showed a significant change across the first two years of life”. There were no other variables in the regression model.

      (2) A summary table of the statistical results for FC-cognitive flexibility associations should be included as for other analyses, in addition to Figure 3B.

      We added a table of the results for the association between FC and cognitive flexibility in the supplementary materials (Table SI2, page 10), matching the same colours of Table 2. We referenced the table in the text in the main manuscript (line 211, result section).

      (3) Figure 3B: The legend should precise that these results did not survive corrections for multiple comparisons.

      We have specified this in the legend of Figure 3 as suggested.

      (4) For the young pre-schooler group, it seems that the age is around 4 years (age mean +/- SD=47.96 +/- 2.77 months) and not 3 years as indicated at several places in the manuscript.

      We found only once instance in which we erroneously said that the younger preschoolers were around 3 years. We replaced “Gambian infants from BRIGHT were cross-sectionally assessed at the age of 3 or 5 years for cognitive flexibility” with Gambian infants from BRIGHT were cross-sectionally assessed between the age of 3 and 5 years for cognitive flexibility (line 489, method section).

      (5) The authors use the term "intra-hemispheric" connections for the ones within each of the 6 sections. This might be misleading since fronto-posterior connections are also intra-hemispheric ones. Specifying "short-range" or "within-section" connections might be clearer.

      As suggested by the reviewer, we replaced “intra-hemispheric” with “intra-hemispheric within section” where appropriate through the whole manuscript.

      (6) Abstract: what is the justification for using the term "optimal" for describing developmental trajectories of FC?

      The term “optimal” refers to knowledge about typical developmental trajectories, coming especially from fMRI studies, as mentioned in the introduction: “Based on data from fMRI, current models hypothesize that FC patterns mature throughout early development (23–27), where in typically developing brains, adult-like networks emerge over the first years of life as long-range functional connections between pre-frontal, parietal, temporal, and occipital regions become stronger and more selective (28–31). [...]. Importantly, normative developmental patterns may be disrupted and even reversed in clinical conditions that impact development; e.g., increased short-range and reduced long-range FC have been observed in preterm infants (36) and in children with autism spectrum disorder (37, 38)” (line 93-106, introduction).

      (7) The confidence interval should be added in Figure SI3.

      As suggested, confidence intervals have been added in Figure SI3.

      (8) Other scatterplot examples of associations might be added as supplementary information.

      As suggested, we added several additional scatterplots to Figure SI3 (with confidence intervals as noted in the comment above) to show other associations between changes in growth and FC at 24 months.

      (9) Figure SI6: % in x-axis is still indicated.

      We apology for the oversight, all the percentage signs have now been removed from the x-axis tick labels.

      (10) The authors might show the (even not significant) results of the associations between changes in growth and cognitive flexibility in supplementary information.

      As suggested, we added the table reporting the results of the associations between changes in growth (DWLZ) and cognitive flexibility in the supplementary materials (Table SI3). We additionally explored the relationship between changes in functional connectivity and cognitive flexibility as suggested by Reviewer #2 and we reported these additional analyses in the text as follows: “We also explored whether changes in growth and changes in functional connectivity between 5 and 24 months were associated with cognitive flexibility at preschool age, but we did not find any significant association (Table SI3 and Table SI4).” (line 213, results section).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      Hoverflies are known for their sexually dimorphic visual systems and exquisite flight behaviors. This valuable study reports how two types of visual descending neurons differ between males and females in their motion- and speed-dependent responses, yet surprisingly, the behavior they control lacks any sexual dimorphism. The results convincingly support these findings, which will be of interest for studies of visuomotor transformations and network-level brain organization.

      This statement perfectly recapitulates our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Hoverflies are known for a striking sexual dimorphism in eye morphology and early visual system physiology. Surprisingly, the male and female flight behaviors show only subtle differences. Nicholas et al. investigate the sensori-motor transformation of sexually dimorphic visual information to flight steering commands via descending neurons. The authors combined intra- and extracellular recordings, neuroanatomy, and behavioral analysis. They convincingly demonstrate that descending neurons show sexual dimorphisms - in particular at high optic flow velocities - while wing steering responses seem relatively monomorphic. The study highlights a very interesting discrepancy between neuronal and behavioral response properties.

      Thank you for this summary. Most of the statement perfectly recapitulates the main findings of our paper. However, we want to emphasize that some hoverfly flight behaviors are strongly sexually dimorphic, especially those related to courtship and mating. Indeed, only male hoverflies pursue targets at high speed, chase away territorial intruders, and pursue females for mating. However, other flight behaviours, such as those related to optomotor responses and flights between flowers when feeding, are not sexually dimorphic. We have amended the Introduction and Discussion to make the difference between flight behaviors more clear. Please see lines 77 and 305 onwards.

      More specifically, the authors focused on two types of descending neurons that receive inputs from well-characterized wide-field sensitive tangential cells: OFS DN1, which receives inputs from so-called HS cells, and OFS DN2, which receives input from a set of VS cells. Their likely counterparts in Drosophila connect to the neck, wing, and haltere neuropils. The authors characterized the visual response properties of these two neuronal classes in both male and female hoverflies and identified several interesting differences. They then presented the same set of stimuli, tracked wing beat amplitude, and analyzed the sum and the difference of right and left wing beat amplitude as a readout of lift or thrust, and yaw turning, respectively. Behavioral responses showed little to no sexual dimorphism, despite the observed neuronal differences.

      Thank you for this very nice summary of our work. We want to clarify that LPTC input to DN1 and DN2 has not been shown directly in hoverflies using e.g. dye coupling, or dual recordings. Instead, the presumed HS and VS input is inferred from morphological and physiological DN evidence, and comparisons to similar data in Drosophila and blowflies. We have amended the Introduction to clarify this. Please see line 64 onwards. The rest of the paragraph perfectly recapitulates the main findings of our paper.

      Strengths:

      I find the question very interesting and the results both convincing and intriguing. A fundamental goal in neuroscience is to link neuronal responses and behavior. The current study highlights that the transformations - even at the level of descending neurons to motoneurons - are complex and less straightforward than one might expect.

      Thank you.

      Weaknesses:

      The authors investigated two types of descending neurons, but it was not clear to me how many other descending neurons are thought to be involved in wing steering responses to wide-field motion. I would suggest providing a more in-depth overview of what is known about hoverflies and Drosophila, since the conclusions drawn from the study would be different if these two types were the only descending neurons involved, as opposed to representing a subset of the neurons conveying visual information to the wing neuropil.

      This is a great point. There are around 1000 fly descending neurons identified in Drosophila, of which many could respond to widefield motion, without being specifically tuned to widefield motion. In Drosophila, at least 35 descending neuron types receive input in the part of the brain where the LPTC outputs are located, and at least 29 descending neuron types project to the wing motor neuropil. Thus, it is more than likely that other neurons project visual widefield motion information to the wing neuropil. Furthermore, we only measured wing beat amplitude (WBA) as seen in the horizontal plane, as we were filming from above. As such, other wing angle changes and rotations are not quantified. We have amended our Introduction (see line 53 onwards) and Discussion (see line 320 onwards) to address these important points.

      Both neuronal classes have counterparts in Drosophila that also innervate neck motor regions. The authors filled the hoverfly DNs in intracellular recordings to characterize their arborization in the ventral nerve cord. In my opinion, these anatomical data could be further exploited and discussed a bit more: is the innervation in hoverflies also consistent with connecting to the neck and haltere motor regions? Are there any obvious differences and similarities to the Drosophila neurons mentioned by the authors? If the arborization also supports a role in neck movements, the authors could discuss whether they would expect any sexual dimorphism in head movements.

      These are all great points. We did not see any clear arborizations to the frontal nerve (FN), where we would expect to find the neck motor neurons (NMNs). In addition, while we did see fine arborizations throughout the length of the thoracic ganglion, we saw no strong outputs projecting directly to the haltere nerve (HN). In the revised version of the MS we have modified figure 4 (morphological characterization) to show a magnification of the thoracic ganglion to clarify this.

      There are important differences between the morphology of DN1 and DN2 in hoverflies and DNHS1 and DNOVS2 in Drosophila, in terms of their projections in the thoracic ganglion. For example, In Drosophila DNOVS2, there are several fine branches along the length of the neuron in the thoracic ganglia. Similarly, we found fine branches in Eristalis tenax DN2, however, in addition, we found a wide branch projecting to the area of the thoracic ganglion where the prothoracic and pterothoracic nerves likely get their inputs, which we also found in Eristalis tenax OFS DN1 (Figure 4). This suggests that both neurons could contribute to controlling the wings and/or the forelegs (which is why we quantified the WBA). In Drosophila DNOVS1, there is a similar fat branch to the prothoracic and pterothoracic nerves, Furthermore, while Drosophila DNHS1 and DNOVS2 have different morphology, DN1 and DN2 in Eristalis looked similar. We have modified the Results section to make this clear, see line 193 onwards.

      In addition, to investigate this further, our revised version of the MS includes analysis of the movement of different body parts (the head angle, fore- and hindleg extension) to investigate this further, and to look for sexual dimorphism. Unfortunately, however, this did not include the halteres, as they cannot be seen well in the videos. The new data can be seen in Figure 7.

      Reviewer #2 (Public review):

      Summary:

      Many fly species exhibit male-specific visual behaviors during courtship, while little is known about the circuit underlying the dimorphic visuomotor transformations. Nicholas et al focus on two types of visual descending neurons (DNs) in hoverflies, a species in which only males exhibit high-speed pursuit of conspecifics. They combined electrophysiology and behavior analysis to identify these DNs and characterize their response to a variety of visual stimuli in both male and female flies. The results show that the neurons in both sexes have similar receptive fields but exhibit speed-dependent dimorphic responses to different optic flow stimuli.

      This statement perfectly recapitulates the main findings of our paper. As mentioned above, while hoverfly flight behaviors related to courtship and mating are strongly sexually dimorphic, other flight behaviours, such as those related to optomotor responses and flights between flowers when feeding, are not. We have amended the Introduction and Discussion to make the difference between flight behaviors more clear. Please see lines 77 and 305 onwards.

      Strengths:

      Hoverflies, though not a common model system, show very interesting dimorphic behaviors and provide a unique and valuable entry point to explore the brain organization behind sexual dimorphism. The findings here are not only interesting on their own right but will also likely inspire those working in other systems, particularly Drosophila.

      Thank you.

      The authors employed rigorous morphology, electrophysiology, and behavior methods to deliver a comprehensive characterization of the neurons in question. The precision of the measurements allowed for identifying a subtle and nuanced neuronal dimorphism and set a standard for future work in this area.

      Thank you.

      Weaknesses:

      Cell-typing using receptive field preferred directions (RFPDs): if I understood correctly, this classification method mostly relies on the LPDs near the center of the receptive field (median within the contour in Fig.1). I have two concerns here. First, this method is great if we are certain there are only two types of visual DNs as described in the manuscript. But how certain is this? Given the importance of vision in flight control, I would expect many DNs that transmit optic flow information to the motor center. I'd also like to point out that there are other lobula plate tangential cells (LPTCs) than HS and VS cells, which are much less studied and could potentially contribute to dimorphic behaviors.

      This is very true, and important. As mentioned above, in Drosophila there are 35 descending neuron types with inputs on the dorsal surface of the brain (labelled DNp1-35), suggesting that they could receive input from LPTCs. However, only 3 of these have been shown physiologically and morphologically to receive LPTC input, in blowflies and Drosophila (DNHS1, DNOVS1, DNOVS2). Note that in both blowflies and fruitflies DNOVS1 gives graded responses, and no action potentials, meaning that we would not be able to record from it using extracellular electrophysiology.

      We previously used clustering techniques to show that in Eristalis, we can reliably distinguish two types of optic flow sensitive DNs from extracellular electrophysiological data, based on a range of receptive field parameters, and we think that these correspond to DNHS1 and DNOVS2 in Drosophila (Nicholas et al, J Comp Physiol A, 2020, cited in paper). As mentioned above in response to Reviewer 1, this does not mean that there are no other neurons that could respond to widefield optic flow, and which might be involved in the WBA we recorded in the paper. However, the point of this paper was not to conclusively show that there are only two optic flow sensitive descending neurons. The point was to say that there are two quite distinct optic flow sensitive neurons that have similar receptive fields in males and females, while their velocity response functions differ between males and females.

      We have modified the Introduction (see lines 53 and 64 onwards) and Discussion to make these important points clear to the Reader, including a mention of the 45-60 LPTCs that exist in the lobula plate, and what their role might be.

      Second, this method feels somewhat impoverished given the richness of the data. The authors have nicely mapped out the directional tuning for almost the entire visual field. Instead of reducing this measurement to 2 values (center and direction), I was wondering if there is a better method to fully utilize the data at hand to get a better characterization of these DNs. As the authors are aware, local features alone can be ambiguous in characterizing optic flows. What's more, taking into account more global features can be useful for discovering potentially new cell types.

      This is a great point, and we did analyse other receptive field properties in this study (shown in previous supp fig 1). In addition, and as mentioned above, we have published a clustering analysis across receptive field properties of these neurons (Nicholas et al, J Comp Physiol A, 2020, cited in paper). The point that we attempted to make in this paper was that by using two strikingly simple metrics, we can reliably distinguish which of the two neuron types we are recording from simply based on azimuthal location and overall directional preference. This makes automated analysis very straightforward. Indeed, we now use this routinely to ID what neuron we are recording from computationally, rather than making a human-based assumption.

      However, we agree that this needs to be shown, and that further in depth analysis was warranted. Therefore, we have provided additional receptive field analysis and clustering (see new supplementary figure 1) and associated text. We also want to highlight that all data is uploaded to Data Dryad for anyone interested in doing additional in-depth analyses.

      Line 131, it wasn't clear to me why full-screen stimuli were used for comparison here, instead of the full receptive field maps. Male flies exhibit sexual dimorphic behaviors only during courtship, which would suggest that small-sized visual stimuli (mimicking an intruder or female conspecific) would be better suited to elicit dimorphic neuronal responses. A similar comment applies to the later results as well. Based on the receptive field mapping in Figure 1, I'm under the impression that these 2 DN types are more suited to detect wide-field optic flows, those induced by self-motion as mentioned in the manuscript. The results are still very interesting, but it's good to make this point clear early on to help set appropriate expectations. Conversely, this would also suggest that there are other visual DN types that are responsible for the courtship-related sexually dimorphic behaviors.

      Thank you for mentioning these important points. Our reasoning for using full-screen stimuli for the analysis on line 131 was that since we used the small sinusoidal gratings for mapping the receptive fields, and to subsequently classify the neurons, it would be unfair to use the same data to investigate potential sexual dimorphism. I.e., we selected neurons that fulfilled certain criteria, and then we cannot rightfully use the same criteria to determine differences. This was not explicitly mentioned in the paper, so we have modified the text to make this clear to the Reader, see lines 142 onwards.

      However, in Supp Figure 2d/e we show that there are no striking receptive field differences between males and females in terms of receptive field center nor directional preference. In Supp Figure 2f we also show that there is no difference between male and female receptive field height and width. We have modified the text to draw the Reader’s attention to this figure, and also mention the additional analysis done in response to the comment above.

      As a side note, I personally expected at least DN1 to have a smaller receptive field in males, as the hoverfly HSN is strikingly sexually dimorphic (Nordström et al, Curr Biol 2008). However, while optic flow sensitive DNs do respond to small objects (see e.g. the J Comp Physiol paper mentioned above) we did not detect any obvious sexual dimorphism in receptive field properties. Indeed, we think that a different subset of DNs control parts of target pursuit behavior (target selective DNs (TSDNs)). This is now addressed in the modified version of the paper, see line 89-92.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I think that the additional measurement of head turns in response to some of the stimuli that showed the strongest sexual dimorphism would be very interesting, but I fully acknowledge that this might be beyond the scope of the current paper or technically too challenging, requiring additional cameras and a whole new tracking software, etc.

      We have added an additional figure to the paper, with associated text, showing the response of the head, fore- and hindlegs to the same stimuli, as far as we could extract them with only one camera filming from above. The new data can be found in the new figure 7, and associated text.

      (2) Are the onset measurements for WBD comparable across flight manoeuvres, given that they are limited to a single projection plane?

      This is a great point, and we have now added this caveat in the text, see line 261-262.

      (3) Line 62 - typo: DNp15 not NDp15.

      Thank you, fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) Related to a comment earlier, in the Introduction, it is mentioned that there are 3 optic flow-sensitive DNs in Drosophila and blowfly. However, I don't see convincing evidence for this in the cited references, none of which have exclusively surveyed all the DNs.

      We have revised this to say that 3 neuron have been identified morphologically and physiologically, but that does not mean that there are no others. Please see line 60 onwards.

      (2) Line 142 and Supplementary Figure 3, this is stated in the next section, but I think it's better to make it clear that DN2 in females has a higher spontaneous rate before mentioning the starfield. Please also specify if the stationary starfield affects the DN2 rate at all in the female flies.

      Great points. We now describe the spontaneous rate before mentioning the responses to moving starfield stimuli, and highlight that there is no difference between no stimulus (pre-stimulation) and a stationary stimulus. Please see lines 155 onwards.

      (3) Line 34, 'redress' should be 'to address'.

      Thank you, fixed.

      (4) Line 59, a bit unclear to me what this sentence is trying to say. Also, I wouldn't say LPTCs are 'indirect' in the sensorimotor transformation -- it's a necessary link in this pathway, no?

      That was indeed a strange sentence. We have simplified it to the following: “LPTCs project to the inferior posterior slope[6], where they synapse with descending neurons[7,8]. In Drosophila at least 35 descending neuron types have their inputs in the posterior surface of the brain (named DNp1-35) [9].”

      (5) Figures:

      This is a formatting problem. The figure legends are separated from the figures, and there are no titles on the figures to indicate which one is which.

      We are sorry about this. We have added labels to the figures.

      Figure 1: What kind of geographic projections are these? The azimuth axis is not labeled.

      These stimuli were not perspective corrected, and therefore the RF maps simply reflect the visual monitor. We have clarified this in the figure legend, including mentioning that the axis label is the same for elevation and azimuth.

      Figure 2a: The error bars are not aligned to the angular axis.

      These have now been aligned.

      Supplement Figure 2b: I'm not sure why there are two measurements at each stimulus orientation. The bottom panel is confusing -- what do you mean by 'receptive field location'? And what does this red arrow/line mean in the bottom panel?

      Thank you for pointing this out. The figure was supposed to help the reader understand our transformations, so it’s great to know that it needed further explanation. To address this, we have added extra text and panel labels, please see lines 520 onwards.

      (6) Methods:

      Line 356: Maybe a picture or schematic drawing would be helpful to explain the setup. For instance, it's unclear what 32 degrees here refers to.

      This is a great suggestion, and a pictogram explaining the set-up can now be seen in Supplementary Fig. 6b.

      Line 404: What does it mean that 'spatially interpolate 10 times'?

      This sentence has been changed to “After subtracting the spontaneous rate, calculated for 0.8 s preceding stimulus onset (dotted line, inset, Fig. 1b, e), we interpolated the resulting local maximum responses to a ten-fold higher spatial resolution (colour coding, Fig. 1a, d).”

      Line 405: How to determine the center from the 50% contour?

      We have modified the Methods to explain how this was done, please see lines 478 onwards.

      Line 408: Please explain more explicitly how LPD and LMS are computed.

      We have modified the Methods to explain how this was done, please see lines 488 onwards.

      Line 418: Is reference 42 correct? I could be wrong, but this reference seems to be talking about target-selective DNs rather than optic flow-sensitive DNs?

      Yes, this reference is correct. In a supp figure to ref 42, we show data from optic flow sensitive neurons, but not their receptive fields. Thanks for checking.

      Line 426: Are the full-screen stimuli presented in 8 directions too? Do I understand correctly that the preferred direction vector for the full-screen stimuli is extracted from a cosine fit, which is slightly different from the 'receptive field preferred direction' in the receptive field mapping measurement, which is the median of all the 'local preferred direction' (which are from the cosine fit)?

      We have modified the text to make this clear, please see lines 519 onwards, as well as the receptive field analysis, please see lines 474 onwards.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study provides valuable insight into the role of actin protrusions in mediating early pre-endoyctic steps of human papillomavirus entry at the cell surface. Using state-of-the-art microscopy in an immortalized keratinocyte model, the authors present mostly solid evidence that filopodia actively promote the transfer of heparin sulfate-coated virions from the extracullar matrix to the viral entry factor CD151. Remaining gaps in the mechanistic model could be further supported by including a more expansive analysis of the fixed microscopy samples and live cell imaging to distinguish virion transfer from direct binding.

      We thank the editorial team for the improved eLife assessment. Regarding the remaining gap, we agree that it is not clear why the large majority of the virions indeed are transferred and not directly binding virions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The author's goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released and interaction with the cell surface, specifically with CD151 was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary. The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage has been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data needs to be provided.

      Comment of the authors: the above paragraph is copied from the very first review and describes the situation before revision.

      Note on revisions:

      The authors did an excellent job in their revision to include data from the effect of proteolytic priming on their observed virion transfer to the cell body. All other minor issues were addressed adequately.

      We are grateful that the referee acknowledges that we addressed all issues adequately.

      The work could be especially critical to understanding the process of in vivo infection. 

      We agree, and would like to point out that a similar comment was raised by the reviewing editor assigned to our original submission, John Schiller. For unknown reasons, he was no longer involved in the evaluation of the revision.

      Reviewer #2 (Public review):

      The study design involves infecting HaCaT cells (immortalised keratinocytes mimicking basal cells of a target tissue) and observing virus localization with and without actin polymerization inhibition by cytochalasin D (cytoD) to analyze virion transfer from the ECM to the cell via filopodial structures, using cellular proteins as markers.

      In the context of the model system, the authors stress in the revised version the importance of using HaCaT cells as a relevant 'polarized' cell model for infection. The term 'polarized' is used in the cell biological literature for epithelial cells to describe a strict apical vs. basolateral demarcation of the plasma membrane with an established diffusion barrier of the tight junction. However, HaCat cells do not form tight junctions. In squamous epithelia, such barriers are only found in granular layers of the epithelium. The published work cited in support of their claims either does not refer to polarity or only in the context of other cells such as CaCo-2 cells.

      We thank the reviewer for this important clarification and fully agree. HaCaT cells do not form tight junctions and therefore do not fulfill the classical definition of polarized epithelial cells with a strict apical basolateral diffusion barrier. In response to this comment, we have removed the term “polarized” in reference to HaCaT cells throughout the revised manuscript. Our intention was not to imply classical epithelial polarity, but rather to emphasize that HaCaT cells represent a functionally relevant keratinocyte model that recapitulates key early steps of HPV infection observed in vivo, particularly abundant ECM deposition enabling for strong virion binding to the ECM.

      We now state on line 120: “PsVs that bind to the ECM at sites distal from the cell body are unable to establish direct contact with entry receptors, until the cell migrates onto them or they are transported along cell protrusions towards the cell body (Schelhaas et al., 2008; Smith et al., 2008). Both cell migration and protrusion transport depend on actin dynamics (Schaks et al., 2019). We aimed for blocking these active recruitment mechanisms in HaCaT cells, a cell line that is widely used as a cell culture model for HPV infection. They resemble primary keratinocytes in several key aspects: they are not virally transformed and produce large amounts of ECM, promoting interactions between viruses and ECM components and thereby facilitating infection (Bienkowska-Haba et al., 2018; Gilson et al., 2020). In addition, subconfluent HaCaT cells form filopodia and filopodial transport is used for the recruitment of ECM-bound virus particles to the cell body (Schelhaas et al., 2008, Smith et al., 2008). Together, these features make HaCaT cells a suitable model for studying active PsV recruitment from the ECM to the cell surface.”

      Overall, the matter of polarity would be important, if indeed the virus could only access cell-associated HSPGs as primary binding receptor, or the elusive secondary receptor via the ECM in the used model system (HaCaT cells), if they would locate exclusively basolaterally.

      We apologize for not having stressed enough that virions bind as well directly to the not imaged, upper cell membrane. To make clear that HaCaT cells are still a suitable model for studying active recruitment, throughout the manuscript, we worked on the following issues (this is an outline, for details see below):

      (1) We now discuss adequately that virions reach cell surface receptors either by passive diffusion or by active transport mechanisms, the latter involving actin dynamics (filopodial transport and cell migration), to which we refer in the revised manuscript as active recruitment.

      (2) We explain why the large majority of virions in the microscopic assay are actively recruited virions.

      (3) We explain the difference between biochemical infection assays that do not differentiate between passive and active recruitment, and microscopic assays studying the basal cell membrane and by this primarily actively recruited virions

      This is at least not the case for binding, as observed in several previous publications (just two examples: Becker et al, 2018, Smith et al., 2008). With only a rather weak attempt at experimental verification of their model system with regards to polarity of binding, the authors then go on to base their conclusions on this unverified assumption.

      We agree with the reviewer that strict epithelial polarity would only be relevant if HPV binding or receptor accessibility were confined to the basolateral membrane, which is not the case in HaCaT cells, as shown previously (e.g., Becker et al., 2018; Smith et al., 2008). However, our conclusions do not rely on strictly polarity-dependent binding.

      We added the following paragraphs clarifying that (i) in HaCaT cells PsVs also bind by passive diffusion to the upper cell membrane and that (ii) at the basal membrane the large majority of imaged PsVs is actively recruited.

      Line 332: “…, the lower PCC at 0 min/CytD suggests that without active recruitment less PsVs reach CD151. At 30 min after CytD, the PCC has reached the level of 0.1 as in the control, which is in line with the idea of fast recruitment as observed in Figure 4. To follow how the basal cell membrane is populated with PsVs over time, as additional analysis we determined the PsVs per µm<sup>2</sup> in ROIs placed in the cell body region. At 0 min, CytD reduces the PsV density to 19 - 33%, albeit the effect is not significant, and at 180 min/CytD the same PsV density as in the control is reached (Supplementary Figure 6A and B). Overall, under CytD there was a trend towards less PsVs present (Supplementary Figure 6A and B). Hence, both Figure 5C and Supplementary Figure 6A and B suggest that active virion transport is required to reach efficiently the basal membrane.”

      Line 447: “Throughout all experiments, we observe at 0 min/CytD only few PsVs at the basal membrane (Figure 1A, Supplementary Figure 6A and B; see also PCC at 0 min between PsVs an CD151 in Figure 5C), suggesting that in the absence of active recruitment the access to the basal membrane via passive diffusion is limited. We wondered, how many PsVs may bind to the cell membrane without a diffusion barrier? For this reason, we incubated EDTA detached HaCaT cells in suspension with PsVs for 1 h at 4 °C, followed by re-attachment for 1 h. Under these conditions, we find, despite of a shorter incubation time (1 h versus 5 h), a roughly 3-fold larger PsV density (1.7 PsVs/µm<sup>2</sup> (Supplementary Figure 6D)) than the highest density observed in the other experiments. However, it should be noted that values of the different experiments cannot be directly compared. Aside from the different treatments, another difference lies in the size of the imaged membrane. The re-attachment of cells is not complete after 1 h (compare size of adhered membranes in Supplementary Figure 6A and 1A), wherefore the membranes are likely strongly ruffled, which results in the underestimation of the membrane area. As a result, we overestimate the PsVs per µm<sup>2</sup> adhered membrane (please note that we cannot re-attach cells for longer times as we then lose PsVs due to endocytosis). In any case, the experiment suggests that PsVs bind more efficiently to membrane surface receptors without a diffusion barrier. We conclude that in our assay PsVs cannot readily bypass the active PsV recruitment by diffusing directly to the basal cell membrane, which is plausible, because to make this happen a 55 nm large PsV must diffuse through the narrow gap between glass-coverslip and adhered cell.”

      Line 538: “The analyzed PsVs hardly bind to the basal cell surface directly by diffusion (Supplementary Figure 6, compare PsV maxima density at 0 min/CytD in A and B to C). Therefore, the actin-driven virion transport would play a decisive role in HPV infection if cells would form a monolayer with a disruption at which ECM is present and that is approached by PsVs, a scenario similar to in vivo infection. In addition, cell migration could establish contact between PsVs and the cell surface.”

      Line 548: “…that can readily bind to the upper cell membrane. We are not aware of a PsV translocation mechanism from the upper to the basal membrane. Therefore, in our assay, PsVs bound to the upper membrane are not expected to show up at the basal membrane. Comparing 0 min of control and CytD (Supplementary Figure 6A and B), we find that compared to the control 19 - 33% of the PsVs reach the basal membrane in the absence of active transport, or in other words, most likely by passive diffusion. Actually, the range from 19 – 33% must be a strong overestimate as PsVs in the control are in transit and many actively recruited PsVs are already internalized during the 5 h incubation period. For this reason, we propose that most likely much less than 10% of the PsVs reach the basal membrane by diffusion. Moreover, in the absence of the diffusion barrier, the density of bound PsVs is strongly increased (Supplementary Figure 6D), showing indirectly that at the basal membrane the binding sites are difficult to access without active recruitment. Taken together, we propose the large majority of PsVs analyzed in our assay are ECM bound and actively recruited to the basal cell membrane.”

      This is one example of several in the manuscript, where claims for foundational premises, observations, and/or conclusions remain undocumented or not supported by experimental data.

      Another such example is the assumption of transfer of the virus from ECM to the tetraspanin CD151. Here, the conclusions are based on the poorly documented inability of the virus to bind to the cell body, which is in stark contrast to several previous publications, and raises questions.

      We hope with the above changes we made clear that virions can also directly bind to the cell body. We also added a paragraph discussing differences between biochemical and microscopic assays.

      Line 568: “In this scenario, sub-confluent HaCaT cells, or even better single HaCaT cells, would be an ideal model system for the microscopic study of these very early infection steps that involve ECM attachment and subsequent active recruitment, as supposed to occur during in vivo infection of basal keratinocytes after binding of virions to the basement membrane (Bienkowska-Haba et al., 2018; Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010). In contrast, in biochemical infection assays, virions diffusing to HSPGs on the cell surface, and by this bypassing active recruitment, are assayed together with the actively recruited virions. Should cells secrete little ECM and are grown to confluency, the passively binding virions are supposed to strongly dominate the infection rate in a biochemical infection assay.”

      There are a number of important additional issues with the manuscript:

      First, none of the inhibitors have been tested in their system for efficacy and specificity, but rely on published work in other cell types. This considerably weakens the confidence on the conclusion drawn by the authors.

      We use inhibitors CytD, blebbistatin, leupeptin and furin inhibitor I. The below references are examples reporting the usage of the inhibitors on HaCaT cells studied in the context of HPV infection.

      Furin inhibitor I:

      Cruz et al., Cleavage of the HPV16 Minor Capsid Protein L2 during Virion Morphogenesis Ablates the Requirement for Cellular Furin during De Novo Infection. Viruses, 2015; doi.org/10.3390/v7112910

      Cytochalasin D/Blebbistatin:

      Schelhaas et al., Human papillomavirus type 16 entry: retrograde cell surface transport along actinrich protrusions. PLoS Pathog., 2008. doi: 10.1371/journal.ppat.1000148.

      Smith et al., Virus activated filopodia promote human papillomavirus type 31 uptake from the extracellular matrix. Virology, 2009; doi.org/10.1016/j.virol.2008.08.040 and

      Leupeptin/Furin inhibitor I:

      Cerqueira et al., Kallikrein-8 Proteolytically Processes Human Papillomaviruses in the Extracellular Space To Facilitate Entry into Host Cells. J. Virology, 2015; doi.org/10.1128/jvi.00234-15

      Moreover, the reversible inhibitory effect of CytD the key inhibitor, used in this study on transport and infection is validated in this study. However, we discuss this data now in the context of directly binding virions more critically.

      Line 485: “Hence, the infection assay suggests that the treatment is largely reversible and only slightly harmful, if at all. However, the luciferase infection assay does not distinguish between actively recruited PsVs and PsVs that bind passively by diffusion to the upper membrane. The latter fraction likely dominates the total infection rate and should be less affected by CytD than the fraction of actively recruited PsVs. Therefore, if the infection pathway of a small fraction of actively recruited PsVs is irreversibly inhibited, we may not be able to detect this effect on the background of unaffected passively binding PsV.”

      Second, the authors aim to study transfer from ECM to the cell body and effects thereof. However, there are still substantial amounts of viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells.

      Regarding direct binding to the cell body, please see our detailed reply above.

      This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. This remains an issue despite the added Supple. Fig. 1, where also only sub cellular regions are being displayed. As a consequence the obtained data from time point experiments is skewed, and remains for the most part unconvincing, largely because the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting the association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could be originating from cell bound and ECM-transferred virions alike.

      We hope with the above explanations it is plausible that the imaged virions primarily reach the basal membrane by active recruitment.

      Third, the use of fixed images in a time course series also does not allow to understand the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout. The microscopic analysis uses an extension of a plasma membrane stain as marker for ECM bound virions, this may introduce a bias and skew the analysis.

      The referee is correct in pointing out that cell spreading after CytD wash off would affect our analysis, e.g. by increasing the overlap between PsVs and the cell body although no active recruitment via filopodial transport and cell migration occurs. An argument speaking against this possibility is the lack of increase in the PCC between PsVs and F-actin after CytD removal, if the protease inhibitor leupeptin was present (Figure 2B and D). Leupeptin prevents PsV/phalloidin overlap despite restored actin polymerization after washout of both inhibitors, suggesting that priming is required for increased PsV–actin association and is too slow to change PCC within 60 min. These results support that the observed overlap reflects active, priming-dependent recruitment rather than cell morphology changes.

      We state on line 252: “Moreover, the experiment suggests that without PsV priming the PCC between PsV-L1 and F-actin does not increase, for instance, due to cell spreading after CytD removal.”

      On line 494, we state “However, we assume that this is rather unlikely, as cell spreading would increase the PCC between PsVs and F-actin under a condition where PsVs are not-primed (and therefore not actively recruited) but cell spreading occurs, which is not the case in Figure 2B and D (CytD/leupeptin).”

      Fourth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established. For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original. But given the high density of objects on the plasma membrane, I am not convinced that doing the same by flipping only the plasma membrane will not also obtain similar numbers than the original.

      Regarding the association of PsVs with CD151 and HS, we corrected for random background with reference to a calibration line that describes the random background association in dependence of the density of objects. We now refer to this issue on line 343: “…, the fraction of PsVs closely associated with CD151 is around 10% (Figure 5D, control), after correction for random background association, for which we used a calibration line based on the same density of PsVs in flipped images (see Supplementary Figure 7).”

      In the legend of Supplementary Figure 7 we state: “…The fraction of closely associated PsVs (PsV-L1 maxima with a distance ≤ 80 nm to the next nearest CD151 maximum) in the Control of Figure 5 was analyzed on original and flipped images (for an example of a flipped image see Supplementary Figure 5A)…on flipped images, we often find values more than half of the values of the original images, demonstrating that many PsVs have a distance ≤ 80 nm to CD151 merely by chance, in the following referred to as background association…We take the altogether 24 fraction values obtained on flipped images (12 values from Control and CytD each), and plot the fraction of closely associated PsVs against the average CD151 maxima density in the respective images. As can be seen in (C), the fraction increases with the maxima density, as the chance of a distance ≤ 80 nm increases with the maxima density. The fitted linear regression line describes how the background association depends from the maxima density. As a result, the background association (y) can be calculated for any maxima density (x) with the equation y = 2.04 • x. The CytD/0 min condition may be overcorrected, if it includes many images with CD151 flipped onto peripheral PsVs that actually are distal to CD151 (for an example ROI see Supplementary Figure 5A). On the other hand, PsVs right at the cell border, where CD151 staining tends to be strong (Supplementary Figure 5A), after flipping have less CD151 than before, contributing to undercorrection.”

      When omitting the CytD/0 min values, we obtain essentially the same calibration line.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      There are further issues that are not pertaining to the study design that I find important.

      Fig.1

      There are few, if any, filopodia in untreated cells. It would be good to quantify their abundance to substantiate that resting HaCat cells are indeed a good model for filopodial transport bs. membrane retraction / spreading.

      We see filopodia in untreated HaCaT cells (although quite variable in abundance, please see control cells in e.g. Figure 3 and 8 and Supplementary Figure 2).

      In HaCat ECM the virus binds also to laminin-332 for a good part. Would this not also confound the analysis?

      We agree with the reviewer that in HaCaT-derived ECM, virus binding is not restricted to heparan sulfate (HS), and that laminin-332 represents an additional relevant binding partner. Indeed, viruses bound to laminin-332 may likewise be transported toward the cell body via laminin-binding integrins. We therefore consider laminin-332 to act as a parallel attachment factor alongside HS rather than as a mutually exclusive alternative.

      However, the primary aim of this study was not to comprehensively map all ECM binding partners, but to analyze the actin-dependent transport of ECM-bound virus particles. HS was chosen as a representative and well-characterized ECM marker for initial virus attachment. Importantly, inhibition of actin dynamics by cytochalasin D blocks this transport process downstream of initial binding. Thus, irrespective of whether the virus is initially bound to HS, laminin-332, or both, the readout reflects interference with the same actin-dependent transport mechanism.

      Consequently, the presence of laminin-332 binding does not confound our analysis, as the experimental outcome is determined by inhibition of transport rather than by the specific ECM attachment factor. Nonetheless, we acknowledge laminin-332 as an important parallel interaction partner and had already mentioned it the first version of the manuscript, but removed the sentence during the last revision, that has now been added again. On line 593 we state: “Finally, not all PsVs bound to the ECM are expected to bind to HS but could also bind to laminin 332 (Culp et al., 2006).”

      Fig.2

      Would benefit from live cell analysis. There are considerable amounts of virions on the cell body, which partially contradicts statements from Fig. 1. The fast transfer to the cell body after cyto D washout is based on the assumption that filopodia formation and transport along them (and not membrane extension) occurs quickly. Is this reasonable? Does membrane extension and migration occur between 0 min and later time points?

      Regarding membrane extension after CytD removal, that in the analysis may be indistinguishable from active recruitment transfer, please see our reply above (no PCC increase between PsV-L1 and F-actin after CytD removal if leupeptin is employed). Regarding migration, we now included this possibility as an active recruitment mechanism that may occur in parallel to filopodial transport (please see our reply above).

      Fig.4

      How are the subcellular ROIs chosen? Is there not a bias by not studying a full cell?

      In Figure 4 we are specifically interested in the time course of PsV diminishment from the cell periphery. The ROIs are generated with reference to the membrane staining, using the cell body delineation as a starting point. For details about how ROIs are generated, please see legend of Figure 4 and materials and methods.

      Fig. 5/6

      The data needs a better analysis on correlation by using randomisation as explained above.

      Please see our reply above. The association between PsVs and CD151 or HS has been corrected using a calibration line based on the same density of objects.

      Fig. 8. Why does blebbistatin block the transport only partially? Previous work on actin retrograde flow suggests that in the absence of myosin II function the transport stops completely. Would this not be a concern, when interpreting the city D data?

      Is the referee referring to Schelhaas et al., 2008 that we cite in the paper? In this paper, in HeLa cells blebbistatin reduced the directed particle motion by 82%, but not completely.

      Suppl. Fig. 1A, B: Intented to adress the issue of viruses binding to the cell body, it unfortunately falls short. It would have been better to analyse complete cells rather than ROIs, or better even, a comprehensive analysis of cell islets (boundary cells vs. central cells, with cell body to cell periphery).

      This experiment addresses the increase in PsV density resulting from active recruitment. Outlining entire cells would include also PsVs close to the cell edge that have not been actively recruited.

      Regarding cell islets (we call them patches of confluent cells as islets may be confused with e.g. more structured Langerhans islets), there are hardly any PsVs at the basal membrane. We state on line 135: “Frequently, we observe patches of confluent cells which are common to HaCaT cells. Cells at the center of these patches are dismissed during imaging, because hardly any PsVs are bound to their basal membrane, indicating that PsVs do rather not reach this area by passive diffusion. Instead, we focus on isolated HaCaT cells or cells at the periphery of cell patches. At these cells, we find more PsVs per cell than one would expect from the employed ≈ 50 viral genome equivalents (vge) per cell, indicating that PsVs are unequally distributed between the cells.”

      Is the difference between untreated and cytoD treated significant?

      We stated in the Figure legend that the difference is not significant (the exact p value is p = 0.089). We now have revised the Figure (previously Supplementary Figure 1A and B, now Supplementary Figure 6A and B), showing the PsV density at the basal membrane over time, also for the experiment shown in Figure 6. The now revised Figure (Supplementary Figure 6A and B) is discussed together with the re-attachment experiment (Supplementary Figure 6C and D), in order to compare the PsV accessibility to the cell membrane with and without diffusion barrier. Please see our reply above (paragraph starting at line 447).

    1. Author response:

      We are particularly encouraged by the consensus that our study provides a substantial resource and that the bioinformatic framework is biologically grounded and convincing, while appropriately noting that further experimental validation will be required. We fully agree with this point. As clarified in the revised manuscript, the lineage relationships we describe are inferred from integrative transcriptomic analyses and are intended to provide a mechanistic and conceptual framework rather than definitive proof of cellular origin. We have further strengthened the Discussion to explicitly acknowledge these limitations and outline future directions, including lineage tracing and functional validation studies.

      At the same time, we respectfully note that such experimental validation would require a substantial extension of this work and likely 2–3 years of additional studies, including development of appropriate model systems. We believe these efforts represent an important next phase of investigation rather than a revision-level addition to the current manuscript. Our primary goal here is to present a high-resolution human transcriptomic resource and a coherent framework that identifies biologically plausible epithelial intermediates linking normal fallopian tube hierarchy to malignant states.

      Given the reviewers’ positive evaluation and recognition of the value and rigor of the dataset and analyses, we respectfully request consideration to proceed with publication as an eLife Version of Record without further experimental revision. We believe that the timely dissemination of these findings will provide a useful resource for the field and help guide the experimental studies needed to test the hypotheses generated here.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      In my view, the presentation of the data is in some cases not ideal. The phrasing of some conclusions (e.g., group-attacks and wolf-pack-hunting by the bacteria) is in my opinion too strong based on the herein provided data.

      We agree with your comment and have replaced the terms “Group-attacks” and “wolf-pack-hunting by “attacks” throughout the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 2AB, please add the name of the statistical test and the number of replicates that the data is based on to the figure legend.

      We thank Reviewer#1 for highlighting the need for more detail. We have revised the manuscript accordingly. The captions of figures 2, 3, 4 and S1 were revised to include the name of the statistical test and the number of replicates. Asterisks indicate significant differences in a multiple comparison test (One -way ANOVA with post hoc Tukey test),* P ≤ 0.05, ** P≤0.01, *** P≤ 0.001

      (2) Figure 2C is this figure referred to in the text?

      We apologize for this oversight. Figure 2C was replaced by new figures 2C and 2D and the old figure 2C is now referenced in the manuscript as Fig 3B1.

      (3) Movie 1, could the movie please also be provided as .mp4? I suggest including individual images across time in the main figure so that readers do not rely on opening a supplementary file for this key finding of the study.

      In the revised manuscript, all the videos were converted to mp4 format and individual images across time were included in Figure 2C and 2D (Chronological snapshots of one attack) and in figure 3B1 (Chronological snapshots of the complete event), thereby improving the readability of the manuscript.

      (4) Figure 3A2 (text l. 355), I am afraid I do not find this figure.

      Fig. 3A2 which previously corresponded to Fig. 3B1, correspond now to Fig. 2C and Fig. 2D. This has been corrected in the revised version of the manuscript.

      (5) Lines 356ff, I am afraid that I find it hard to follow what the authors refer to as the right cell or the left cell. I suggest either adding labels to the movies or providing individual images across multiple timepoints into the main figure that can be labelled and bring across the point.

      Arrows have been added to videos 3–5 to clearly indicate the cells referred to in the text and facilitate tracking across time.

      (6) In general, for all the microscopy, on how many cells have these phenomena been observed? What is n=x? Has this been quantified?

      We thank the reviewer for pointing this out.

      In caption of Fig. 3, the sentence “(A) Percentage of motile A. pacificum ACT03. (B) A. pacificum ACT03 attacked by V. atlanticus LGP32 and (C) A. pacificum ACT03 lysis after 0, 15, 30, 45 and 60 min of interaction. “was replaced by “(A) Cumulative percentage of motile A. pacificum ACT03 cells. (B) Cumulative number of cells attacked by V. atlanticus LGP32 and (C) Cumulative cell lysis after 0, 15, 30, 45 and 60 minutes of interaction.”. In Fig. 3 caption, the sentence “All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.” was also added.

      In Fig. 4 caption, the sentence “All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.” was added.

      In Fig. S1 caption, the sentence “All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.” was added.

      (7) Figure S1A, does this figure show means plus/minus standard deviation? If yes, please add this to the figure legends.

      In Fig. S1 caption, the sentence “Error bars represent the standard deviation of the mean of three independent experiments” was added.

      How do the authors explain the big variation in the test condition and not in the control?

      Regarding the higher variation observed in the test condition compared to the control, this may, on the one hand, reflect biological variability between independent batches of 60-h V. atlanticus cultures used to prepare the supernatants, and, on the other hand, a heterogeneity in the physiological status of independent algal batches (N = 3 ; 2 × 10^4 cells ; see Materials and Methods, Co-culture assay), which may not be perfectly synchronized . In contrast, the control condition consists of A. pacificum cultures incubated in fresh medium without bacterial supernatant, for which algal motility is highly reproducible and thus shows very little variation.

      (8) Line 375, "The lysis phase corresponded to initial vesicle formation followed by the bursting of A. pacificum ACT03 cells (Movie 5) and was induced by the old-starved culture supernatant of V. atlanticus LGP32 (Fig. S1)." Is this reference to Figure S1 correct? S1 shows motility, doesn't it? I don't see how this data supports the statement made in this sentence.

      We apologize for this unclear message.

      "The lysis phase corresponded to initial vesicle formation followed by the bursting of A. pacificum ACT03 cells (Video 5) and was induced by the old-starved culture supernatant of V. atlanticus LGP32 (Fig. S1)." was replaced by "The lysis phase corresponded to initial vesicle formation followed by the bursting of A. pacificum ACT03 cells (Fig. 3C and 3C1).

      And “We next tested whether this lytic effect was mediated by thermostable molecule (s) secreted by Vibrio. “was replaced by “We next tested whether this lytic effect was linked to Vibrio culture supernatant and mediated by thermostable molecule (s) secreted by Vibrio.

      (9) Line 388ff, "Group attacks were observed on non-degraded A. pacificum ACT03 cells, but not on previously lysed cells." No reference to a figure is provided. I am afraid I don't see the data that this statement is based on.

      As it is impossible to show a lack of attack, we just clarified the basis of our experiment.

      “To this end, A. pacificum ACT03 in exponential growth phase was first exposed for 30 minutes to the supernatant of a 60-hour culture of V. atlanticus LGP32, which induced 25% lysis of A. pacificum ACT03 cells. Next, the corresponding V. atlanticus LGP32 cells were added. During exposure, attacks were observed only on undegraded A. pacificum ACT03 cells, but not on previously lysed cells” was replaced by “To this end, A. pacificum ACT03 in exponential growth phase was first exposed for 30 minutes to the supernatant of a 126-hour culture of V. atlanticus LGP32, which induced lysis of 70% of the A. pacificum ACT03 cells (Figures 3C and 3C1, arrow 2 and video 4). Next, cells of V. atlanticus LGP32 from a 60-hour culture, capable of attacking A. pacificum ACT03 cells (Fig. 3B), were added. For 1 hour of exposure, no attack was observed on the previously lysed algae.”

      (10) Figure 4a, Based on the labeling of the figure, in particular the x-axis, it is not fully clear to me what I am looking at.

      Figure 4A has been reworked and its legend modified. We hope that this graph is clearer now.

      (11) Line 428, did the authors consider complementing the pvuD deletion mutant and testing for gain of function when providing the gene in trans?

      We did not investigate pvuD in this study and did not construct a pvuD deletion mutant. We therefore assume that the recommendation refers to pvuB, which was the focus of our work. Unfortunately, we did not perform this experiment. However, several lines of evidence support the implication of PvuB and the vibrioferrin uptake system in this process: (i) the loss of attack behaviour is specific to the mutant in the vibrioferrin uptake pathway and (ii) our expression and proteomic data show a strong induction of vibrioferrin uptake components under starvation and iron-manipulated conditions, which correlate with the attack phenotype.

      (12) Use of the term "group attack" in parentheses in the text, but in the section header and title. Is there really sufficient actual data to say that this is a "group attack"? What exactly are the indications for this being a behaviour of a group?

      We agree with you. The terms “group attacks” and “wolf-pack hunting” were replaced by the more neutral term “attacks” throughout the manuscript.

      (13) Table S1 and S2, those tables give a nice overview. Do the authors provide the raw data based on which they make a claim on "+" and "-" in the individual categories? I would prefer to see the actual data or at least have the possibility to look into this.

      In the revised versions of Tables 1 and 2, we have improved the captions and clarified the meaning of each column in order to avoid any ambiguity between the results of this study and the bibliographic information.

      Specifically regarding Table 2 :

      We do not present any visuals of the interaction between Vibrio and Alexandrium because these species all look alike. Regarding the other algae species tested in interaction with Vibrio, phenomena other than lysis or cell attack have been observed and are the subject of specific laboratory studies.

      (14) Line 456 "first study", line 40f "first evidence of a new mechanism". I suggest toning this down a bit and being clearer in the abstract about this being a working model that can be suggested based on individual bits of data.

      We thank Reviewer #1 for this helpful suggestion.

      In the summary:

      “This is the first evidence of a new mechanism that could to be involved in regulating Alexandrium spp. blooms and giving Vibrio a competitive advantage in obtaining nutrients from the environment.” was replaced by “The interaction model we propose here suggests that Vibrio could play a role in regulating the proliferation of Alexandrium spp., giving it a competitive advantage in obtaining nutrients from the environment.”

      In the discussion:

      Considering predator as a free organism that feeds at the expense of another, this study is the first evidence of the capacity of some Vibrio to develop a predatory strategy against an alga. This behaviour differs from parasitism, because the survival of Vibrio is not exclusively dependent on algae in environment” was replaced by “Consider a predator as a free-living organism that kills its prey and feeds on it, this study provides data suggesting the ability of Vibrios to develop an original predator-like behaviour to kill and feed on algae.”

      (15) Line 469 "Overall, these observations show that V. atlanticus LGP32 is able of wolf-pack hunting behaviour." I see the similarities. I feel that the term "show" is a bit too strong here, or I suggest referring to "wolf-pack-like behaviour".

      The sentence “Overall, these observations show that V. atlanticus LGP32 is able of wolf-pack hunting attack behaviour” was replaced by “Overall, these observations suggest that V. atlanticus LGP32 can exhibit a predator-like behaviour”

      Reviewer #2 (Public review):

      As Weaknesses Reviewer #2 include:

      (1) A lack of early, clear definitions for several important terms used in the paper, including 'predation', 'coordination' and 'coordinated action', 'group attack', and 'wolf-pack hunting', along with a corresponding lack of criteria for what evidence would warrant use of some of these labels. (For example, does mere simultaneity of attacks of an A. pacificum cell by many V. atlanticus cells constitute "coordination"? Or, as it seems to us, does coordination require some form of signalling between predator cells?)

      The term “Coordinate” was replaced by “simultaneous” throughout the manuscript

      The terms “Group attack” and “wolf pack hunting” were replaced by “attack” throughout the manuscript

      (2) Absence of controls for cell density in the test for starvation effects on predatory behaviour; unclear how the length of incubation affects the density of V. atlanticus cells.

      We thank the reviewer for pointing this out.

      Cells density experiment was already performed (cf. Fig. 4A).

      The sentence. ”All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.“ was added in captions of Fig. 3, Fig. 4 and Fig S1

      (3) Lack of clarity in some of the methodological descriptions

      The Methodology has been checked and some improvements have been made.

      Reviewer #2 (Recommendations for the authors):

      (A) Title

      (1) Could 'induces' be better than 'promotes'?

      We agree with Reviewer #2. The initial title, “Starvation of the bacterium Vibrio atlanticus promotes lightning group-attacks on the dinoflagellate Alexandrium pacificum”, was replaced by “Starvation of the bacterium Vibrio atlanticus induces simultaneous attacks on the dinoflagellate Alexandrium pacificum”.

      (B) Abstract

      (1) Perhaps define pycosphere in the abstract - many readers might not know this word.

      We have revised the abstract to define the term phycosphere and added the sentence “This occurs in the microenvironment surrounding phytoplankton cells, the phycosphere. An interface rich in nutrients and organic molecules exuded by the cell.”

      (2) Perhaps "on dinoflagellates".

      We thank Reviewer #2 for this suggestion. We have revised the abstract by replacing “on the dinoflagellates species” with “on dinoflagellates”.

      (3) Line 33 - The word 'prey' is used without a claim of predation having yet been made; only killing has been claimed so far.

      We agree and have replaced the word “prey” by “algae” in the abstract.

      (4) Line 34 - It is unclear whether the description refers to the 'attack stage' or to 'wolf-pack attack' in general. The sentence is written in such a way that it seems to refer to 'wolf-pack attack'. However, this would seem to be incorrect, with the description being specific to V. atlanticus.

      To avoid this ambiguity, we have removed the sentence “resembles the ‘wolf-pack attack’ strategy” from the abstract.

      (5) Line 35 - Should there be a 'consumption phase'?

      We agree with the reviewer #2, “degradation” was replaced by “consumption”.

      (6) If predation is claimed later in the manuscript (which it is), it should be explicitly claimed in the abstract.

      We thank Reviewer #2 for this helpful suggestion.

      We have revised the abstract. The sentence “Results showed that Vibrio atlanticus was able to coordinate lightning group attacks then kill the dinoflagellate Alexandrium pacificum ACT03” was replaced by “The results showed that Vibrio atlanticus was capable of attacking and killing the dinoflagellate Alexandrium pacificum ACT03”.

      (C) Main text

      (1) Line 54 - Perhaps "Among HAB-causing organisms...".

      We agree with the reviewer’s suggestion and have revised the wording.

      (2) Line 56 - "that, together with..., form the "Alexandrium tamarense" complex".

      We agree with the reviewer’s suggestion and have revised the sentence.

      (3) Line 57 - What this "complex" is and its significance should be explained.

      “Among them, Alexandrium pacificum is a flagellated eukaryotic unicellular organism that together with Alexandrium tamarense and Alexandrium fundyense form the "Alexandrium tamarense" complex (Hadjadji et al., 2020)” was replaced by

      “Among them, Alexandrium pacificum is a flagellated eukaryotic unicellular organism that together with Alexandrium tamarense and Alexandrium fundyense form the "Alexandrium tamarense" complex, responsible for paralytic shellfish poisoning worldwide (Hadjadji et al., 2020)”

      (4) Line 58 - What is a Rephy survey?

      We clarified this point, “by rephy survey” was replaced by “by the French phytoplankton observation and monitoring network (Rephy)”

      (5) Line 59 - 'resulting in' instead of 'resulting of'.

      We agree with the reviewer and have replaced “resulting of” with “resulting in”.

      (6) Line 65 - It seems that ', influencing the time of appearance of blooms' would be more correct than the current phrasing. The current phrasing is unclear regarding the relation between species, tolerance range, and the time of appearance of blooms.

      To address this point, “Depending on the phytoplankton species, the tolerance range of physicochemical parameters is different and influences the time of appearance of blooms” was replaced by “Depending on the species of phytoplankton, tolerance to physicochemical parameters varies, which influences when blooms occur.”

      (7) Line 76 - Run-on sentence which should probably be split after the reference to Wang et al., 2020.

      We agree with the reviewer and have split the sentence.

      (8) Line 89 - What are these observations?

      This sentence was reformulated.

      “Based on observations from the natural environment showing a potent relationship between Vibrio and Alexandrium algae bloom events, this study aim to determine in vitro, the main factors implicated in this relationship” was replaced by ”This study aims to describe observations made in the natural environment between Vibrio bacteria and Alexandrium algal blooms, and to determine in vitro the main factors involved in this relationship.”

      (9) Line 94 - This is the first clear reference to a predator-prey interaction, and it is stated as if it's established. Is it not a central goal of the study to demonstrate that predation is even happening?

      Based on the title and abstract, I would have expected the major claims of the paper highlighted in the abstract to be:

      (i) that predation of algae by bacteria occurs in this system,

      (ii) there is a social component of predation,

      (iii) claims about what induces this predatory behaviour.

      The summary has been amended accordingly, and the term “predation” has been removed, along with all sentences referring to it.

      (10) Line 99 - What does n.d. mean?

      This point was addressed in the revised version.

      (11) Line 97 section - specify qPCR.

      This point was clarified in the revised version.

      (12) Line 139 - Mentioning the oligonucleotides in this part of the methods seems out of place. Would this not fit better in the section on Gene expression analysis?

      This sentence was discarded from this paragraph.

      (13) Line 147 - Where did the co-cultured phytoplankton species come from?

      To answer this point, reference to Table 2 was added

      (14) Line 149 - Is it known if the phytoplankton strains had all grown to the same density after 24 hours?

      The doubling time of dinoflagellates in laboratory culture is between 5 and 7 days. During the duration of the experiments, the dinoflagellate concentration did not change significantly.

      The sentence “(doubling time between 5 and 7 days)” was added

      (15) Line 150 - Was the density of the Vibrio cultures at the different incubation times measured? Density might play an important role in predation, and so it would be important to control for density in these assays.

      The concentrations of live vibrio in each individual culture were not actually measured. However, the role of vibrio density in attacks was measured and is shown in Figure 4A and observed in Fig 2B.

      (16) Line 153 - How long was the co-incubation?

      The incubation times were added in the revised version.

      (17) Line 158 - What is mean by "independent experiments", more exactly?

      To clarify this point, “Data are the means of three independent experiments” was replaced by “The data come from three independent experiments using independent phytoplankton cultures and independent bacterial cultures.”

      (18) Line 161 - Perhaps give the source information about the Vibrio strain at its first mention.

      A reference has been added in the revised preprint.

      (19) Line 163 - line 141 refer to multiple non-axenic species, whereas here "the algal strain" is referred to.

      And

      (20) Line 164 - language phrasing throughout the manuscript could use some polishing, e.g., "this means that additional bacteria...".

      To address this comment, “As the algal strain used in the study is not axenic, means that additional bacteria, other than the V. atlanticus LGP32, are potentially present in the experiments.” was replaced by “As the A. pacificum ACT03 strain (table 2) used in the study is not axenic, there is potential for bacteria other than V. atlanticus LGP32 to be present in the experiments.”

      (21) Line 208 - Why were both magnitude and p-value criteria used rather than just p-values?

      In the present proteomic approach each experimental condition was measured six times, and the average (mean) value was used to reduce random noise. Then we selected differences that had to be large enough to matter biologically, this is a central criterion and at least a 2-fold change was considered to focus exclusively on biologically relevant differences, which allowed us to control for the effect size. However, the differences also had to be statistically significant, we applied a statistical confidence at P < 0.01, to be sure that there is less than a 1% chance the result happened randomly. In the present proteomic approach each experimental condition was measured six times, and the average (mean) value was used to reduce random noise.

      Then we selected differences that had to be large enough to matter biologically, this is a central criteria and at least a 2-fold change was considered to focus exclusively on biologically relevant differences, which allowed us to control for the effect size. However, the differences also had to be statistically significant, we applied a statistical confidence at P < 0.01, to be sure that there is less than a 1% chance the result happened randomly. We considered that using both criteria makes the results meaningful and trustworthy, not just a small or random fluctuation.

      (22) Line 270 - Were these three replicate experiments also "independent"; if yes, in what sense?

      “All experiments were conducted in triplicate” was replaced by “The experiments were performed using biological triplicates, each of which was analyzed in triplicate.”

      (23) Line 296 - Perhaps "the temperature-sensitivity (or resistance) of" rather than "the nature of".

      The modification was made in the new manuscript.

      (24) Line 307 - The sentence mentions only one influential period that was removed from the dataset, but the word 'whenever' suggests multiple occurrences.

      We agree, “whenever” was replaced by “because”.

      (25) Line 325 - line 327 - The rationale behind the first part of the following sentence isn't clear to me, and what is meant by the second part is also not clear.

      To clarify this point, “This result is consistent with the difficulty that Vibrio has in growing at temperatures below 20°C and with the complex interacting factors driving bloom dynamics (Laanaia et al., 2013)” was replaced by “This result is consistent with the difficulty Vibrio has in growing at temperatures below 20°C and with the many environmental factors that influence the dynamics of algae proliferation (Laanaia et al., 2013)."

      (26) Line 327 - line 328 - Hard to interpret; does this refer to living algal cells, or all algal cells, living and degraded?

      To improve clarity, “Interestingly, in spring 2015, the mean densities of all Alexandrium cells and of free-living Vibrio were positively correlated” was replaced by “Interestingly, in spring 2015, the mean densities of Alexandrium cells (living and degraded) and of free-living Vibrio were positively correlated”

      (27) Figure 2 - These results strongly point to predation, but why the Vibrio population would already be elevated in the co-culture treatment relative to the control immediately after inoculation (0 hrs) is not clear.

      The experiments were not conducted at the same time, and the first value on the graphs corresponds to the concentration of vibrio determined after 1 hour of exposure/incubation and not at time 0. Figures 2A and 2B have been modified accordingly, and substantial changes have been made to the relevant section of the results.

      (28) Line 348 - There's no mention of Figure 2C in the main text, or of the statistical test associated with it in the Figure 2 legend.

      To address this comment, Figure 2C has now been cited in the main text, and the statistical analysis method has been added to the Figure 2 caption.

      (29) Line 352 - Text descriptions of videos are not easy to connect with the video content. Label the file names the same as how they are referred to in the text.

      We agree with you, the sentence “Epifluorescence microscopy observation of GFP-labelled V. atlanticus LGP32 (previously grown in Zobell medium) in interaction showed that A. pacificum ACT03 cells that had lost their motility were attacked individually by V. atlanticus LGP32 before being lysed (Fig, 2C and Video 1). “was rephrased and replaced by “Epifluorescence microscopy observation of GFP-labelled V. atlanticus LGP32 (previously grow in Zobell medium) in interaction showed that V. atlanticus LGP32 simultaneously attacks A. pacificum ACT03 cells (Fig, 2C and Video 1).”

      (30) Movie 1 could be cut to remove uninteresting footage at the start. What indicates lysis? Is the deformation of the cells an indication of lysis?

      To respond to this comment, Video 1 has been shortened and in the caption, “degraded” was replaced by “lysed”

      (31) Line 353 - Video could be zoomed in more on a few typical attacks to remove visual noise.

      A chronological overview of an attack has been added to Figure 2 corresponding to Figure 2D, and a chronological overview of the overall event has been added to Figure 3 corresponding to Figure 3B1.

      (32) Line 355 - There does not seem to be a Figure 3A2.

      To address this point, the Fig. 2 and Fig. 3 has been revised for more clarity. See above

      (33) Figure 3 - Can the authors fully exclude an effect of bacterial density as distinct from an effect of growth/starvation phase? It would be helpful to determine bacterial viable population densities at 12, 36, 60, and 126 hrs of incubation in Zobell medium, and to control for density in testing for effects on algae.

      Information on Vibrio densities incubated in Zobell medium for 12, 36, 60, and 126 hours has been now included in the results section “Attack of A. pacificum ACT03 is activated by V. atlanticus LGP32 starvation.”

      (34) Line 363 - It is unclear how the degradation of the flagella is apparent from movie 3. It would be helpful to have a comparison with healthy flagella.

      Alexandrium cells with intact flagella move so quickly that it is impossible for us to follow them and film their flagella with the tools at our disposal.

      For greater clarity, arrows have been added to videos 3, 4 and 5.

      (35) Line 364 - Sudden change from referring to the recording as 'video' instead of movie. What is meant by erratic swimming? The cell does not seem to move much.

      To address this comment, “Movie” was replaced by “Video” throughout the manuscript and “erratic swimming” was replaced by “irregular swimming”

      (36) Line 365 - How did you observe the detachment of the flagellum?

      The detachment of the flagellum can be observed using a confocal microscope. This process was filmed and presented in Video 3. Arrows have been added to the video to clearly indicate the flagellum detachment.

      (37) Line 368 - Perhaps this is due to it not being clear regarding which movie is meant, but there is no clear attack visible in movie 4.

      To make this clearer, arrows have been added to the video 4 to indicate attached cells.

      And the sentence in the caption of the video 4 “Vibrio, filmed under a confocal microscope, attacks in groups one immobilized Alexandrium cell then moves on to attack — still as a group — another cell without touching the other whole cells, suggesting active communication between Vibrio cells” was rewritten and replaced by “This video, recorded under a confocal microscope, shows Vibrios simultaneously attacking a first immobilized Alexandrium cell, then moving on to attack a second cell without ever targeting the other cells present, suggesting active communication between the Vibrio bacteria.”

      (38) Line 369 - It seems the peak attach % was reached at 45 minutes, not 15-30 minutes.

      Sorry for the confusion. In fig. 3 for more clarity, the sentence “(A) Percentage of A. pacificum ACT03 motile cells. (B) cells attacked by V. atlanticus LGP32 and (C) cells lysis after 0, 15, 30, 45 and 60 min of interaction” was replaced by “(A) Cumulative percentage of motile A. pacificum ACT03 cells. (B) Cumulative number of cells attacked by V. atlanticus LGP32 and (C) Cumulative cell lysis after 0, 15, 30, 45 and 60 minutes of interaction.”

      (39) Line 382 - "clearly show role of nutrient limitation", see comment re controlling for any role of bacterial density.

      To address this point, information’s on Vibrio densities were added in the manuscript. See cf comment 33.

      (40) Line 385 - line 386 - Phrasing unclear.

      We have revised the text accordingly, “To this aim, A. pacificum ACT03 in exponential growth phase was first exposed for 30 min to supernatant from 60 hours starved V. atlanticus LGP32 Zobell media that induced 25% lysis of A. pacificum ACT03 cells and next to the corresponding V. atlanticus LGP32 cells. Group attacks were observed on non-degraded A. pacificum ACT03 cells, but not on lysed cells.“ was replaced by “To this end, A. pacificum ACT03 in exponential growth phase was first exposed for 30 minutes to the supernatant of a 126-hour culture of V. atlanticus LGP32, which induced lysis of 70% of the A. pacificum ACT03 cells (Figures 3C and 3C1, arrow 2 and video 4). Next, cells of V. atlanticus LGP32 from a 60-hour culture, capable of attacking A. pacificum ACT03 cells (Fig. 3B), were added. For 1 hour of exposure, no attack was observed on the previously lysed algae.”

      (41) Line 413 - Is this the only pathway for quorum sensing in V. atlanticus?

      Indeed, the last two sentences of this paragraph are unclear.

      To address this point:

      “By targeted mutagenesis of key genes involved in QS pathways ΔluxM (HAI-1 production), ΔluxS (AI-2 production) and ΔluxR (high-density QS master regulator) did not lead to any change in the attack behaviour of V. atlanticus LGP32 (Fig. 4C).” was replaced by “Targeted mutagenesis of key genes involved in two of the three known QS pathways in vibrios (Fig. S3), ΔluxM (HAI-1 production), ΔluxS (AI-2 production), and ΔluxR (main high-density QS regulator), did not result in any changes in the attack behavior of V. atlanticus LGP32 (Fig. 4C).”

      And “Taken together these results showed that attack by V. atlanticus LGP32 is not link to QS.” was replaced by. “Combined with the absence of overexpression of the CqsS gene (inducible by CAI-1) involved in the last known QS pathway in Vibrio (Fig. S3), these results indicated that the attack by V. atlanticus LGP32 is most likely unrelated to QS.”

      (42) The references to tropism aren't clear.

      You're right, there's no reason to use the term tropism here. We have removed it.

      (43) Line 439 - Why was H3BO4 used as a control for the addition of FeCl3?

      For clarity, the sentence “Boron being known to be a regulator or capable of being transported by vibrioferrin (Romano et al., 2013; Weerasinghe et al., 2013), we tested its potential involvement in the interaction but no effect was evidenced here.” was replaced by “Given that boron is known for its role in regulating a global bacterial cellular response to phytoplankton and to bind to vibrioferrin (Romano et al., 2013; Weerasinghe et al., 2013), we tested its potential involvement in simultaneous vibrio attacks. Compared to the Zobell control, no effect on the number of attacks was observed”

      (44) Line 441 - line 449 - Should explicitly say in text that no attacks were observed for any species other than the Alexandrium and Gymnodinium species.

      We agree and have explicitly stated in the text that no attacks were observed for any species other than Alexandrium and Gymnodinium.

      (45) Line 454 - line 455 - The last part of this sentence seems a strange statement, since

      (i) it has long been know that predatory bacteria can eat a wide range of eukaryotes, ii) one of the cited papers (Perez et al) actually highlights a case of bacterial predation on algae, and iii) in the next paragraph the authors themselves highlight Streptomyces predation of algae.

      To make this clearer, « Among predators, predatory bacteria are found in a wide variety of environments, and like bacteriophages and predatory protists, they have been reported to prey exclusively on other bacteria » was replaced by “Among predators, predatory bacteria are found in a wide variety of environments and, like bacteriophages and predatory protists, feed primarily on other bacteria, although a few cases of predation on microbial eukaryotes have also been reported.”

      (46) Line 455 - Better to clarify the authors' definition of a predator at the start of the paper. The offered definition seems more like a definition of 'consumer' than 'predator', as the latter normally involves both the killing and consumption of other organisms, not just consumption with some kind of "expense".

      To address this comment:

      - “predator behaviour” was replaced by “predator-like behaviour”

      - and “Considering predator as a free organism that feeds at the expense of another, this study is the first evidence of the capacity of some Vibrio to develop a predatory strategy against an alga. This behaviour differs from parasitism, because the survival of Vibrio is not exclusively dependent on algae in environment” was replaced by “Consider a predator as a free-living organism that kills its prey and feeds on it, this study provides data suggesting the ability of Vibrios to develop an original predator-like behaviour to kill and feed on algae.”

      (47) Line 457 - Don't see the benefit of trying to distinguish from parasitism here, especially since parasitism can be facultative, whereas the authors' phrasing suggests that it is always obligate.

      You are right, this sentence has been deleted.

      (48) Line 463 - line 464 - The authors should clearly explain exactly what detailed aspects of Myxococcus and Lysobacter predation they think the "attack stage" of V. atlanticus resembles.

      Accordingly, “The second stage, the ‘attack stage’ corresponding to physical contact between Vibrio and Alexandrium resembles the ‘wolf-pack attack’ strategy described for Myxococcus xanthus and Lysobacter regardless of the prey species used, M. xanthus must be in close proximity to prey cells in order to induce their lysis and to benefit from their biomass (Martin, 2002; Perez et al., 2014)” was replaced by “The second stage, the ‘attack stage’ corresponding to the physical contact between Vibrios and Alexandrium, is similar to the strategy used by Myxococcus xanthus and Lysobacter. These bacteria must be in close proximity to their prey in order to cause lysis and utilize their biomass, regardless of the prey's species (Martin, 2002; Genovesi et al., 2013; Perez et al., 2016; Zhang et al., 2020)”

      (49) Line 466 - line 467 - The comparison to bacteria clustering around lysed cells is surprising since the authors show that V. atlanticus does not attack already lysed cells.

      The sentence was rephrased, “This phenomenon is comparable to that of bacteria clustering around lysed ciliate cells “was replaced by “Visually, this phenomenon resembles bacteria clustering around lysed ciliate cells.”

      (50) Line 469 - Missing is a statement of exactly what criteria constitute "wolf-pack hunting behaviour" and exactly how V. atlanticus meets those criteria.

      To address this point, “wolf-pack hunting behaviour” was replaced by “predator-like behaviour”

      'Able of' should be corrected to 'Capable of'.

      We agree and have reworded the sentence.

      (51) Line 470 - Consider starting a new paragraph for the material on quorum sensing.

      Accordingly, we have separated the section concerning QS pathway from the section concerning iron pathway.

      (52) As part of their discussion on the role of iron uptake, can the authors comment on any relationship between starvation and iron uptake, and in particular the observations that, while general nutrient deprivation induces attacks, supplementation with a specific nutrient (iron) also induces attacks (Figure 4D)? Do bacteria starved for general growth substrates take up more iron than growing bacteria?

      To respond to this comment, “Future study could demonstrate further the role of vibrioferrin in group attack, by adding iron-saturated vibrioferrin to algae-Vibrio co-cultures.” was replaced by “Interestingly, if a general nutrient deficiency causes attacks, iron supplementation increases the number of attacks (Figure 4D), suggesting the importance of iron absorption in the attack behavior. Future studies should determine whether nutrient deficiency increases the iron absorption capacity of Vibrios and whether this plays a major role in the attack mechanism.”

      (53) Line 486 - Of what is boron known to be a regulator?

      To respond to this comment, “Given that boron is known for its regulatory properties and for being transportable by vibrioferrin“ was replaced by “Given that boron is known for its role in regulating a global bacterial cellular response to phytoplankton and to bind to vibrioferrin”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This work demonstrates that MORC2 undergoes phase separation (PS) in cells to form nuclear condensates, and the authors demonstrate convincingly the interactions responsible for this phase separation. Specifically, the authors make good use of crystallography and NMR to identify multiple protein: protein interactions and use EMSA to confirm protein: DNA interactions. These interactions work together to promote in vitro and in cell phase separation and boost ATPase activity by the catalytic domain of MORC2.

      However, the authors have very weak evidence supporting their potentially valuable claim that MORC2 PS is important for the appropriate gene regulatory role of MORC2 in cells. Exploring causal links between PS and function is an important need in the phase separation field, particularly as regards the role of condensates in gene regulation, and is a non-trivial matter. Any study with convincing data on this matter will be very important. For this reason, it is crucial to properly explore the alternative possibility that soluble complexes, existing in the same conditions as phase-separated condensates, are the functional species. It is also critical to keep in mind that, while a specific protein domain may be essential for PS, this does not mean its only important function pertains to PS.

      In this study, the authors do not sufficiently explore the role that soluble MORC2 complexes may play alongside MORC2 condensates. Neither do they include enough data to solidly show that domain deletion leads to phenotypes via a loss of phase separation per se, rather than the loss of phase separation being a microscopically visible result, not cause, of an underlying shift in protein function. For these reasons, the authors' conclusions regarding the functional role of MORC2 condensates are based on incomplete data. This also dampens the utility of this work as a whole, since the very nice work detailing the mechanism of MORC2 PS is not paired with strong data showing the importance of this observation.

      We thank the reviewer for this thoughtful and constructive critique. We agree that establishing a causal link between phase separation (PS) and biological function—particularly in transcriptional regulation—is a central and non-trivial challenge in the condensate field. We also appreciate the reviewer’s emphasis on two critical alternative interpretations: (i) that soluble MORC2 complexes, rather than condensates, may represent the primary functional species, and (ii) that loss of phase separation upon domain deletion could reflect a downstream consequence of altered protein function rather than its cause.

      To address these concerns, we have performed a series of new experiments specifically designed to decouple condensate formation, and condensate dynamics, thereby allowing us to more rigorously interrogate the functional relevance of MORC2 condensates.

      First, to overcome the limitation of domain deletions which may affect MORC2 function beyond phase separation we introduced a micropeptide-based kill switch (KS) to the C terminus of MORC2. This strategy has recently emerged as a powerful approach to selectively reduce condensate dynamics without disrupting protein expression, folding, or domain architecture [1]. Importantly, unlike CC3 or IDRa deletions, MORC2+KS robustly form nuclear condensates but exhibits markedly reduced internal dynamics, as demonstrated by FRAP analyses showing minimal fluorescence recovery after photo bleaching (Fig. 6a-c). This strategy therefore allows us to perturb condensate material properties independently of MORC2 domain integrity.

      Second, we systematically compared the transcriptional consequences of rescuing MORC2-knockout HeLa cells with MORC2FL, condensation-deficient mutants (ΔCC3 and ΔIDRa), and the dynamics-defective MORC2+KS (Fig. 6d). Despite being expressed at substantially higher levels than MORC2FL (Fig. 6e), all three mutants showed a striking and consistent failure to restore MORC2-dependent transcriptional regulation (Fig. 6f-h). This effect was particularly pronounced for transcriptionally repressed genes, including two sets of high-confidence MORC2 targets reported in prior studies (Fig. 6i and Fig.S10). These findings demonstrate that neither increased protein abundance nor the mere presence of condensate-like structures alone is sufficient to restore MORC2 function.

      Third, our data instead support a model in which both soluble MORC2 complexes and dynamic MORC2 condensates are required for full transcriptional regulation activity. While soluble MORC2 is likely involved in target recognition and complex assembly, our results indicate that proper condensate formation—and critically, condensate dynamics—are essential for effective transcriptional repression and activation. The inability of the MORC2+KS mutant to rescue transcriptional defects, despite intact condensate formation, points away from a model in which MORC2 condensates represent only microscopically visible byproducts of MORC2 activity.

      We believe these new data strengthen the manuscript by pairing the detailed mechanistic dissection of MORC2 phase separation with direct functional evidence, enhancing the conceptual impact and biological significance of the study.

      Strengths:

      Static light scattering and crystallography are nicely used to demonstrate the dimerization of MORC2FL and to discover the structure of the CC3 domain dimer, presumably responsible for the dimerization of MORC2FL (Figure 1).

      Extensive use of deletion mutants in multiple cell lines is used to identify regions of MORC2 that are important for forming condensates in the nucleus: the IBD, IDR, and CC3 domains are found to be essential for condensate formation, while the CW domain plays an unknown role in condensate morphology (Figure 3). The authors use NMR to further identify that the IBD domain seems to interact with the first third of the centrally located IDR, termed IDRa, but not with the latter two-thirds of the IDR domain (Figure 4). This leads them to propose that phase separation is the product of IDB:IDRa interaction, CC3 dimerization, and an unknown but important role for the CW domain.

      Based on the observation that removal of the NLS resulted in diffuse cytoplasmic localization, they hypothesized that DNA may play an important role in MORC2 PS. EMSA was used to demonstrate interaction between DNA and several MORC2 domains: CC1, CC2, IDR, and TCD-CC3-IBD. Further in vitro microscopy with purified MORC2 showed that DNA addition significantly reduces MORC2 saturation concentration (Figure 5).

      These assays convincingly demonstrate that MORC2 phase separates in cells, and identify the protein domains and interactions responsible for this phenomenon, with the notable caveat that the role of the CW domain here is left unexplored.

      We appreciate the reviewer for their positive and detailed assessment of the strengths of our study. Our understanding of the CW domain’s function remains preliminary. Although we observed that the CW domain can influence condensate size, the IDR, IBD, and CC3 domains constitute the core structural elements driving phase separation. Consequently, the CW domain was not a primary focus of the current study. Nonetheless, investigating its functional contributions represents an interesting avenue for future work.

      Weaknesses:

      Although the authors demonstrated phase separation of MORC2FL, their evidence that this plays a functional role in the cell is incomplete.

      Firstly, looking at differentially upregulated genes under MORC2FL overexpression, the authors acknowledge that only 10% are shared with differentially regulated genes identified in other MORC2FL overexpression studies (Figure 6c, d). No explanation is given for why this overlap is so low, making it difficult to trust conclusions from this data set.

      We thank the reviewer for raising this important concern. In response, we have improved the quality and robustness of our RNA-seq analysis by repeating the experiments with optimized sample handling and increased sequencing depth. Using this updated dataset, we identified a considerably higher overlap between MORC2-regulated genes in our study and those reported previously.

      Specifically, we observed 84 overlapping genes with the study by Nikole L. Fendler et al. [2], corresponding to approximately 32% of the MORC2-regulated genes reported in that work (Fig. 6i). In addition, we identified 102 overlapping genes with the dataset reported by Iva A. Tchasovnikarova et al. [3], representing approximately 22% of the genes identified in that study (Fig. S10b).

      We note that complete concordance with previous reports is not expected, given substantial differences in experimental design. For example, Fendler et al. employed a doxycycline-inducible MORC2 expression system [2], whereas our study relies on transient overexpression in MORC2-knockout HeLa cells. In contrast, Tchasovnikarova et al. compared transcriptomes between MORC2 knockout and wild-type cells [3], rather than MORC2 rescue conditions. Moreover, RNA-seq results are inherently influenced by cell line batch variability, sequencing depth, and analysis pipelines, all of which differ across studies.

      Taken together, we consider an overlap in the range of ~20–30% to be reasonable and biologically meaningful in the context of these experimental differences, and we believe that the revised RNA-seq data provide a more reliable foundation for our conclusions regarding MORC2-dependent transcriptional regulation.

      Secondly, of the 21 genes shared in this study and in earlier studies, the authors note that the differential regulation is less pronounced when a phase-separation-deficient MORC2 mutant is overexpressed, rather than MORC2FL (Figure 6e). This is taken as evidence that phase separation is important for the proper function of MORC2. However, no consideration is made for the alternative possibility that the mutant, lacking the CC3 dimerization domain, may result in non-functional complexes involving MORC2, eliminating the need for a PS-centric conclusion. To take the overexpression data as solid evidence for a functional role of MORC2 PS, the authors would need to test the alternative, soluble complex hypothesis. Furthermore, there seems to be low replicate consistency for the MORC2 mutant condition (Figure S6a), with replicate 3 being markedly upregulated when compared to replicates 1 and 2.

      We thank the reviewer for raising these important concerns. In the revised manuscript, we have substantially strengthened both the experimental evidence and the data presentation to directly address the alternative “soluble complex” interpretation as well as the issue of replicate consistency. Specifically, we now provide data that clarify the functional impact of phase-separation-deficient MORC2 mutants and explicitly show replicate-level RNA-seq analyses. The Fig. 6 and Fig. S10support these improvements and enhance both the robustness and transparency of our transcriptional analyses. Collectively, these revisions directly address the reviewer’s concerns regarding the functional interpretation of MORC2 phase separation.

      Thirdly, the authors close by examining the in-cell PS capabilities and ATPase activity of several disease-associated mutants of MORC2 (Figure 7). However, the relevance of these mutants to the past 6 figures is unclear. None of these mutations is in regions identified as important for PS. Two of the mutations result in a higher percentage of the cell population being condensate-positive, but this is not seemingly connected to ATPase activity, as only one of these two mutants has increased ATPase activity. Figure 7 does not add any support to the main hypotheses in the paper, and nowhere in the paper do the authors investigate the protein regions where the mutations in Figure 7 are found.

      We thank the reviewer for raising this point regarding Fig. 7. At the current stage, the results for disease-associated mutations are primarily descriptive. While we observed that certain mutations clustered at the N-terminus can affect MORC2 condensate formation, ATPase activity, and DNA binding, we did not identify a mechanistic explanation for these correlations. Notably, the T424R mutation, previously reported to significantly enhance ATPase activity [4], also increased both intracellular condensate formation and in vitro DNA binding in our experiments. In contrast, other mutations did not show such consistent effects. Previous studies have established that MORC2’s ATP-binding and DNA-binding activities are independent [4]. Our results further suggest that MORC2’s phase separation behavior is independent of both ATP and DNA binding affinity, although existing evidence hints at potential cross-regulatory interactions among these three functions.

      We would also like to emphasize an additional observation that may help contextualize the relevance of N-terminal mutations. Although deletion of the MORC2 N-terminus does not prevent the remaining C-terminal region from forming nuclear condensates, these C-terminal condensates exhibit a marked loss of fluorescence recovery in FRAP assays (Fig. S11). This finding suggests that while the N-terminus is not strictly required for condensate assembly, it plays an important role in regulating condensate fluidity. Accordingly, disease-associated mutations distributed across the N-terminal region may influence MORC2 function by modulating condensate material properties rather than condensate formation per se. Based on this hypothesis, we evaluated the fluidity of condensates formed by the E236G and T424R mutants. FRAP measurements indicated substantially reduced fluorescence recovery in E236G, whereas T424R exerted minimal effects (Fig. 7e, f).

      Overall, our interpretation of the results in Fig. 7 is still at a preliminary stage. Nevertheless, the role of the MORC2 N-terminus in modulating condensate fluidity, together with the observed impairment caused by the E236G mutation, appears to be robust, although the underlying mechanism remains to be elucidated. We have incorporated additional discussion on this point and consider it an important direction for future study.

      Reviewer #1 (Recommendations for the authors):

      (1) Why does MORC2 overexpression lead to changes in gene regulation that are so different from past MORC2 overexpression studies? This is unsettling to me.

      (2) Likewise, why is replicate 3 for the MORC2ΔCC3 variant so different from replicates 1 and 2? Perhaps repeating this experiment would be helpful, both for showing better repeatability and perhaps as regards pulling out a stronger phenotype.

      We have repeated the experiments and obtained improved data quality.

      (3) A better explanation of the relevance of Figure 7 to the story of the rest of the paper, especially the phase-separation of MORC2, would be important to improving this paper.

      We thank the reviewer for this suggestion. We have performed additional experiments and expanded the discussion.

      (4) Are expression levels of mutant proteins in Figure 7 uniform between mutants? If not, is it possible that expression levels might account for the difference in condensate-positive cells between mutants?

      We cannot fully exclude the possibility that differences in expression levels may contribute to the observed differences among mutants. In our experiments, equal amounts of plasmid DNA were used for transfection across all conditions. Although we did not directly quantify post-transfection protein expression levels by immunoblotting or similar approaches, even if certain mutations were to affect protein expression, it would be technically challenging to further optimize the strategy to fully normalize expression levels across mutants.

      Importantly, we note that MORC2 does not form condensates in all transfected cells, even when EGFP fluorescence indicates robust expression levels that are comparable to, or even exceed, those observed in condensate-positive cells. This observation suggests that high expression alone is not sufficient to drive MORC2 phase separation in cells. Therefore, we do not favor the interpretation that the E236K and T424R mutations enhance MORC2 condensation simply by increasing MORC2 protein expression levels.

      Minor:

      (1) I would suggest considering using the term "dynamic" rather than "liquid-like", as FRAP is technically a measurement of the dynamicity of a protein within a volume, rather than a measurement of the actual fluidity of that volume.

      We thank the reviewer for this helpful suggestion. We agree that FRAP measurements primarily report protein mobility and condensate dynamics rather than the physical fluidity of the condensates. We have therefore revised the manuscript to replace “liquid-like” with “dynamic” where conclusions are based on FRAP analyses.

      (2) A further investigation of the role of the CW domain would be very interesting, since it clearly has a major role in condensate morphology. Perhaps CW confers important heterotypic interactions which contribute to compositional control of the MORC2 condensates, and thus function and morphology? However, due to the complexity of this specific question and the potentially marginal improvement offered by this paper, I do not think this is a critical addition.

      We thank the reviewer for this insightful suggestion. We have noted this possibility in the Discussion as an important avenue for future investigation.

      (3) Why is TCD not tested alone by EMSA for affinity to DNA in Figure 5?

      Our inference regarding the DNA-binding capacity of the TCD domain was based on comparative EMSA analyses. Specifically, we found that the TCD–CC3–IBD fragment was able to bind DNA, whereas the CC3–IBD fragment alone showed no detectable DNA binding. From this comparison, we inferred that the TCD domain is responsible for the observed DNA-binding activity.

      Because the TCD domain does not affect MORC2 condensate formation, it was not a central focus of the present study, which primarily aims to elucidate the mechanisms underlying MORC2 phase separation and its functional relevance. For this reason, we did not further test TCD alone by EMSA in Figure 5.

      Reviewer #2 (Public review):

      Summary:

      The study by Zhang et al. focuses on how phase separation of a chromatin-associated protein MORC2, could regulate gene expression. Their study shows that MORC2 forms dynamic nuclear condensates in cells. In vitro, MORC2 phase separation is driven by dimerization and multivalent interactions involving the C-terminal domain. A key finding is that the intrinsically disordered region (IDR) of MORC2 exhibits strong DNA binding. They report that DNA binding enhances MORC2's phase separation and its ATPase activity, offering new insights into how MORC2 contributes to chromatin organization and gene regulation. The authors try to correlate MORC2's condensate-forming ability with its gene silencing function, but this warrants additional controls and validation. Moreover, they investigate the effect of disease-linked mutations in the N-terminal domain of MORC2 on its ability to form cellular condensates, ATPase activity, and DNA-binding, though the findings appear inconclusive in the manuscript's current form.

      Thank you for your thorough and constructive review of our manuscript. In response to the concerns raised regarding the functional relevance of MORC2 condensate formation, we have redesigned and expanded the experiments presented in Fig. 6 and Fig. S6 to directly link MORC2’s condensate-forming capacity with its transcriptional regulatory function. These new experiments provide additional controls and validation, strengthening the causal relationship between MORC2 condensate dynamics and gene regulation.

      At the current stage, the results for disease-associated mutations are descriptive. While we observed that certain mutations clustered at the N-terminus can affect MORC2 condensate formation, ATPase activity, and DNA binding, we did not identify a mechanistic explanation for these correlations. Notably, the T424R mutation, previously reported to significantly enhance ATPase activity [4], also increased both intracellular condensate formation and in vitro DNA binding in our experiments. In contrast, other mutations did not show such consistent effects. Previous studies have established that MORC2’s ATP-binding and DNA-binding activities are independent [4]. Our results further suggest that MORC2’s phase separation behavior is also independent of both ATP and DNA binding, although existing evidence hints at potential cross-regulatory interactions among these three functions.

      Strengths:

      The authors determined a 3.1 Å resolution crystal structure of the dimeric coiled-coil 3 (CC3) domain of MORC2, revealing a hydrophobic interface that stabilizes dimer formation. They present extensive evidence that MORC2 undergoes liquid-liquid phase separation (LLPS) across multiple contexts, including in vitro, in cellulo, and in vivo. Through systematic cellular screening, they identified the C-terminal domain of MORC2 as a key driver of condensate formation. Biophysical and biochemical analyses further show that the IDR within the C-terminal domain interacts with the C-terminal end region (IBD) and also exhibits strong DNA-binding capacity, both of which promote MORC2 phase separation. Together, this study emphasizes that interactions mediated by multiple domains-CC3, IDR, and IBD- drives MORC2 phase separation. Finally, the authors quantified the effect of removing the CC3 on the upregulation and downregulation of target gene expression.

      We thank the reviewer for their appreciation of the key findings presented in this manuscript.

      Weaknesses:

      Though the findings appear compelling in isolation, the study lacks discussion on how its findings compare with previous studies. Particularly in the context of MORC2-DNA binding, there are previous studies extensively exploring MORC2-DNA binding (Tan, W., Park, J., Venugopal, H. et al. Nat Commun 2025), and its effect on ATPase activity (ref 22). The contradictory results in ref 22 about the impact of DNA-binding on ATPase activity, and ATPase activity on transcriptional repression, warrant proper discussion. The authors performed extensive in-cellulo screening for the investigation of domain contribution in MORC2 condensate formation, but the study does not consider/discuss the possibility of some indirect contributions from the complex cellular environment. Alternatively, the domain-specific contributions could be quantified in vitro by comparing phase diagrams for their variants. While the basis of this study is to investigate the mechanism of MORC2 condensate-mediated gene silencing, the findings in Figure 6 appear incomplete because the CC3 deletion not only affects phase separation of MORC2 but also dimerization. Furthermore, their investigation on disease-linked MORC2 mutations appears very preliminary and inconclusive because there are no obvious trends from the data. Overall, the discussion appears weak as it is missing references to previous studies and, most importantly, how their findings compare to others'.

      We thank the reviewer for their careful assessment of MORC2’s DNA-binding properties and its relationship with ATPase and transcriptional activities. We would like to offer the following clarifications to address these concerns, which will also be incorporated into the Discussion section of the revised manuscript.

      First, recent work by Tan et al. [5] similarly identified multiple DNA-binding sites in MORC2, consistent with our findings, though there are discrepancies in the precise binding regions. In particular, they reported that isolated CC1 and CC2 domains do not bind 60 bp dsDNA, which contrasts with our observations. We attribute this difference to the types of DNA used in the assays. In our study, we employed 601 DNA, a defined nucleosome-positioning sequence, which differs substantially from randomly designed short dsDNA. For instance, prior work by Christopher H. Douse et al. [54] also confirmed that MORC2’s CC1 domain can bind 601 DNA.

      Second, in the study by Fendler et al. [2], DNA binding was reported to reduce MORC2’s ATPase activity—an observation that appears inconsistent with the results presented in our Fig. 5j. A critical distinction between the two studies lies in the experimental systems used: Fendler et al. [2] employed MORC2 constructs and 35 bp double-stranded DNA (dsDNA), whereas our experiments utilized full-length MORC2 and 601 bp DNA (a sequence with high nucleosome assembly potential). These differences including the absence of potentially regulatory C-terminal regions in the truncated construct and the varying length/structural properties of the DNA substrates introduce variables that substantially complicate direct comparative analysis of ATPase activity outcomes.

      Separately, Douse et al. [4] demonstrated that the efficiency of HUSH complex-dependent epigenetic silencing decreases as MORC2’s ATP hydrolysis rate increases, implying an inverse relationship between ATPase activity and silencing function. Notably, our current work has not established a direct mechanistic link between MORC2 phase separation and its ATPase activity. Thus, we refrain from inferring that the effect of MORC2 phase separation on transcriptional repression is mediated through modulation of its ATPase function this remains an important question to address in future studies.

      Finally, we have redesigned and expanded the experiments presented in Fig. 6 and Fig. S6 to directly link MORC2’s condensate-forming capacity with its transcriptional regulatory function.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) Unaddressed discrepancies with the previous study:

      (a) Inadequate discussion of Reference 22 and apparent contradictions. Notably, Reference 22 provides evidence for reduced ATPase activity upon DNA binding, in contrast to the current study's observations. Moreover, Reference 22 demonstrates that ATP hydrolysis (ATPase activity) is inversely associated with MORC2-mediated gene silencing, whereas this study concludes that 'the silencing function of MORC2 requires its ATPase activity'. These apparent contradictions warrant a more thorough discussion to reconcile the differences, including potential mechanistic explanations and experimental context that could account for the discrepancies. Additionally, the authors should discuss potential reasons why Ref. 22 may not have observed phase separation during MORC2 biophysical analysis. For instance, in Ref. 22, SEC-MALS was performed at 2 mg/mL (~16 µM) MORC2 FL in the presence of 150 mM NaCl, conditions that could influence phase behavior based on the current manuscript's results. Addressing whether differences in protein construct, buffer composition, or experimental design might account for this discrepancy would strengthen the discussion.

      We thank the reviewer for pointing out the apparent discrepancies between our results and those reported in Ref. 22. We agree that these differences warrant explicit discussion, and we have revised the Discussion accordingly to clarify the experimental and conceptual distinctions between the two studies.

      First, regarding the effect of DNA binding on ATPase activity, Ref. 22 examined MORC2 ATPase activity under conditions where MORC2 does not undergo detectable phase separation, whereas our ATPase assays were performed under conditions in which MORC2 readily forms condensates in the presence of DNA. We therefore propose that the observed increase in ATPase activity in our study may reflect a distinct biochemical regime in which phase separation and/or high local protein concentration modulates enzymatic activity. Importantly, our data do not exclude the possibility that DNA binding per se can inhibit ATPase activity under non-condensing conditions, as reported in Ref. 22.

      Second, with respect to transcriptional repression, Ref. 22 reported an inverse correlation between ATP hydrolysis and MORC2-mediated silencing, whereas our study finds that ATPase activity is required for efficient repression. We suggest that these observations are not necessarily contradictory but may reflect different regulatory layers of MORC2 function. Specifically, ATP binding and hydrolysis may be required for MORC2 structural remodeling and chromatin engagement, while excessive or dysregulated ATP hydrolysis could impair stable silencing complexes, as suggested previously [4]. We now explicitly discuss this possibility in the revised manuscript.

      Finally, we appreciate the reviewer’s suggestion regarding the absence of phase separation in Ref. 22. Indeed, SEC-MALS experiments in Ref. 22 were conducted at ~16 µM MORC2 in the presence of 150 mM NaCl (the purification condition is 500 mM NaCl, 10% glycerol), conditions that based on our phase diagrams—are close to or above the saturation concentration but also strongly influenced by ionic strength. This combination of factors explains why the UV peak from SEC-MALS is not indicative of a homogeneous sample [3].

      (b) The DNA binding capacity of individual MORC2 domains was tested in Fig. 5. IDR appears to be the strongest DNA binder among others. Is this the effect of IDR being isolated from the rest of the protein? A recent paper (Tan, W., Park, J., Venugopal, H. et al. Nat Commun 2025) also investigated DNA binding capacity of different regions of MORC2 using hydrogen-deuterium exchange experiments and EMSA. Interestingly, it can be seen in Figure S9 that the DNA binding capacity of different regions changes when compared together to when in isolation (MORC2 1-603 vs 1-265; 1-495; 496-603). In line with the above, MORC2 IDR's interaction with DNA warrants additional investigation, taking the system as a whole to avoid misinterpretation arising from non-specific interactions.

      We appreciate the reviewer’s insightful comments regarding domain-specific DNA binding and the potential caveats of studying isolated regions. In Figure 5, our EMSA analyses show that the isolated IDR exhibits the strongest DNA-binding signal among the tested fragments. We agree that this observation may, at least in part, reflect the removal of structural or regulatory constraints imposed by the full-length protein.

      Consistent with the reviewer’s point, Tan et al. [5] demonstrated that DNA-binding behavior of MORC2 regions differs when analyzed in isolation versus in the context of larger constructs. We have now incorporated this comparison into the Discussion and explicitly note that DNA binding by the IDR should be interpreted as a contextual and potentially cooperative property rather than an autonomous function.

      Importantly, our conclusions do not rely on the IDR acting as an independent DNA-binding module in vivo. Rather, we propose that the IDR contributes to DNA engagement and phase behavior within the architectural framework of full-length MORC2. We now emphasize this limitation and highlight the need for future studies that probe DNA binding in the context of intact MORC2 or minimally perturbed constructs.

      (2) MORC2 DNA binding impacting phase separation and ATPase activity:

      While it is clear that MORC2: DNA interaction facilitates MORC2 phase separation, the impact on ATPase activity is not conclusive. First, they observe an opposite trend (compared to ref. 22) for DNA binding on MORC2's ATPase activity. Secondly, it is not clear if the increase in ATPase activity is mediated by DNA binding or phase separation. The ATPase activity was measured at 1 µM MORC2 protein concentration in the presence of DNA, where MORC2 appears to phase separate. To draw more definitive conclusions, additional controls are necessary. Specifically, a phase separation-deficient mutant (from this study) and a DNA-binding-deficient mutant (see ref. 22) should be included to disentangle the contributions of DNA binding and phase separation to ATPase activity. The choice of ATP-binding-deficient mutant N39A as a negative control seems inconclusive in this regard. Additionally, why is there an increase in ATP hydrolysis rate for the ATP-binding-deficient mutant in the presence of DNA, resulting in ATP hydrolysis rates similar to WT MORC2? This raises further questions about the underlying mechanism.

      We agree with the reviewer that disentangling the contributions of DNA binding and phase separation to ATPase activity is challenging and that our current data do not fully resolve this issue. As noted, ATPase assays were performed at protein concentrations (1 µM) where MORC2 undergoes DNA-induced phase separation, making it difficult to distinguish whether enhanced ATP hydrolysis arises directly from DNA binding or indirectly from condensate formation.

      We acknowledge that inclusion of additional mutants such as phase separation deficient or DNA-binding deficient variants would provide a more definitive mechanistic separation of these effects. However, generating and validating such mutants in a manner that preserves overall protein integrity is beyond the scope of the current study. Accordingly, we have revised the text to present our findings more cautiously and to frame the observed ATPase enhancement as a correlation rather than a causal mechanism.

      Regarding the ATP-binding–deficient N39A mutant, we agree that its behavior in the presence of DNA raises interesting mechanistic questions. We now explicitly note this unexpected observation and discuss possible explanations, including partial ATP binding, altered oligomeric states, or indirect effects mediated by condensate formation.

      (3) Dissecting the domain-specific contribution in MORC2 phase separation:

      (a) While in cellulo data indicate that the presence of IDR, NLS, CC3, and IBD is all essential for MORC2 condensate formation, it is not clear if this is the effect of the complex cellular environment or whether it is intrinsic for MORC2 phase separation ability. In lines 256-259, the authors suggest IDRa interaction with IBD may serve as a nucleation mechanism for LLPS. In other places, it has been mentioned that CC3 dimerization acts as a scaffold for condensate formation. It is not clear if all of these are essential for MORC2 phase separation, or one of them is essential while the other domain(s) facilitates the phase separation. Though Figure 3 provides a qualitative overview of the contribution of different regions in MORC2 phase separation in cellulo-influenced by the complex cellular environment and substrate interactions, the absolute domain contribution in phase separation would be better studied in vitro by quantitatively comparing phase diagrams (for example, c-sat vs temperature) of different domain deletion constructs.

      We thank the reviewer for highlighting the distinction between intrinsic phase separation propensity and cellular context dependent effects. Our in cellular screening was designed to identify regions required for condensate formation under physiological conditions, where chromatin, binding partners, and macromolecular crowding are present. We agree that this approach does not directly quantify the intrinsic phase separation contribution of individual domains.

      While CC3 dimerization, IDR–IBD interactions, and nuclear localization all contribute to condensate formation, our data do not imply that these elements are mechanistically equivalent. Rather, we propose that CC3 provides a structural scaffold, while IDR-mediated interactions lower the energetic barrier for condensation. We have revised the manuscript to clarify this hierarchical model and to avoid implying that all domains contribute equally or independently.

      We agree that quantitative in vitro phase diagrams would provide valuable insight into intrinsic domain contributions. Whereas the MORC2ΔCC3-IBD (1–900) and CC3-IBD (900-1032) fragment fails to induce phase separation, the IDR mix CC3–IBD fragment drives robust phase separation; additionally, phase separation is entirely abrogated in the absence of domain–domain interactions. These observations collectively verify that phase separation is contingent on specific domain combinations and their interactions.

      (b) Similarly, for line 228-231: 'Notably, condensates formed exclusively in the nucleus and not in the cytoplasm of transfected HeLa cells, suggesting that chromatin-associated nuclear factors, such as DNA, may contribute to the nucleation or stabilization of MORC2 condensates.' This is an important observation made by the authors. Since MORC2 readily phase separates in vitro under physiological conditions, it is important to discuss why MORC2 does not make condensates in the cytoplasm (in the case of MORC2deltaNLS). In this regard, how does the concentration of overexpressed EGFP-MORC2 constructs compare with in vitro tested droplets of MORC2?

      We thank the reviewer for highlighting this important conceptual point. Although MORC2 readily undergoes phase separation in vitro under physiological buffer conditions, the absence of condensate formation in the cytoplasm of cells expressing MORC2ΔNLS underscores the importance of the nuclear environment in promoting MORC2 assembly.

      The cytoplasm differs fundamentally from the nucleus not only in overall molecular composition but also in the availability of high-valency scaffolds such as chromatin. We propose that chromatin-associated components, particularly DNA, provide a platform that locally concentrates MORC2 and increases its effective valency, thereby facilitating nucleation or stabilization of condensates in the nucleus. In contrast, the cytoplasm lacks such scaffolds, even when MORC2 is expressed at appreciable levels. In cultured cells, MORC2 is seldom observed in the cytoplasm. While specific experimental contexts may facilitate its cytoplasmic localization, such observations are rarely reported [6]. In transfection-based systems, MORC2 predominantly displays droplet-like behavior in the nucleus. Notably, in endogenous EGFP–MORC2 chimeric mice, we detected punctate MORC2 structures in the neuronal cytoplasm of the brain and spinal cord. The functional significance and biophysical state of cytoplasmic MORC2 remain largely unexplored.

      With respect to protein concentration, while EGFP-MORC2 is robustly expressed in cells, direct comparison between cellular expression levels and the protein concentrations used in vitro is inherently challenging. Importantly, in vitro phase separation is driven by bulk protein concentration under defined conditions, whereas in cells, effective local concentration and interaction valency are strongly shaped by spatial confinement and chromatin association. We have revised the manuscript text to emphasize this distinction and to avoid interpreting nuclear specificity as a purely concentration-dependent phenomenon.

      (c) Lines 227-228: '... CW domain restricts condensate overgrowth or fusion', this inference is based on CTDdeltaCW puncta being larger in size (Figure 3a). However, in Figure 4h MORC2deltaIDRb and MORC2deltaIDRc also result in larger puncta. Making a final conclusion that the CW domain restricts condensate overgrowth or fusion warrants additional investigation.

      We thank the reviewer for pointing out the limitation of our original conclusion. We agree that the enlarged puncta in both CTDΔCW (Figure 3a) indicate that condensate size regulation involves the CW domain was insufficiently rigorous.

      Re-analysis of existing data identifies clear phenotypic disparities between the mutants: MORC2ΔIDRb/ΔIDRc mutants show two distinct phenotypes (reduced puncta number with enlarged size, or unchanged puncta number with uniform enlargement), and their total puncta area per cell is comparable to the WT. By contrast, CTDΔCW mutants display markedly larger puncta relative to the WT. Based on this distinction, we have revised our conclusion to a more cautious formulation: "These observations suggest that the CW domain may participate in regulating initial nucleation size and the exact molecular mechanisms require further investigation."

      (4) MORC2 condensate-mediated gene silencing:

      This is one of the key investigations of this study where the authors evaluate the ability of MORC2 condensates to regulate gene silencing (transcriptional repression). The major concern here is that the authors are drawing their conclusion based on a CC3 domain deletion mutant of MORC2 and comparing it with wild-type MORC2. Notably, the CC3 domain is responsible for MORC2 dimerization, and as the authors quote, 'The dimeric assembly of CC3 is essential for maintaining the structural integrity of the protein', the absence of CC3 would have a direct impact on its function (such as ATPase activity). With these considerations, it is not clear whether the effect of CC3 domain deletion on gene regulation is an effect of no phase separation or a consequence of loss of function. This necessitates additional validation by including other controls, such as IBD domain deletion mutant, IDRa domain deletion mutant, where the phase separation is impeded without affecting dimerization.

      We appreciate the reviewer’s concern regarding the interpretation of CC3 deletion experiments. We agree that CC3 deletion affects both dimerization and phase separation, complicating attribution of gene regulatory effects solely to condensate formation. Our intention was not to claim that loss of repression arises exclusively from impaired phase separation, but rather to demonstrate that disrupting condensate-dynamic capacity correlates with impaired silencing.

      To directly address these concerns, we have performed a series of new experiments specifically designed to decouple condensate formation, condensate dynamics, and protein abundance, thereby allowing us to more rigorously interrogate the functional relevance of MORC2 condensates.

      First, to overcome the limitation of domain deletions which may affect MORC2 function beyond phase separation we introduced a micropeptide-based kill switch (KS) to the C terminus of MORC2. This strategy has recently emerged as a powerful approach to selectively reduce condensate dynamics without disrupting protein expression, folding, or domain architecture [1]. Importantly, unlike CC3 or IDRa deletions, MORC2+KS robustly form nuclear condensates but exhibits markedly reduced internal dynamics, as demonstrated by FRAP analyses showing minimal fluorescence recovery after photo bleaching (Fig. 6a-c). This strategy therefore allows us to perturb condensate material properties independently of MORC2 domain integrity.

      Second, we systematically compared the transcriptional consequences of rescuing MORC2-knockout HeLa cells with MORC2FL, condensation-deficient mutants (ΔCC3 and ΔIDRa), and the dynamics-defective MORC2+KS (Fig. 6d). Despite being expressed at substantially higher levels than MORC2FL (Fig. 6e), all three mutants showed a striking and consistent failure to restore MORC2-dependent transcriptional regulation (Fig. 6f-h). This effect was particularly pronounced for transcriptionally repressed genes, including two sets of high-confidence MORC2 targets reported in prior studies (Fig. 6i and Fig. S10). These findings demonstrate that neither increased protein abundance nor the mere presence of condensate-like structures alone is sufficient to restore MORC2 function.

      Third, our data instead support a model in which both soluble MORC2 complexes and dynamic MORC2 condensates are required for full transcriptional activity. While soluble MORC2 is likely involved in target recognition and complex assembly, our results indicate that proper condensate formation and critically, condensate dynamics are essential for effective transcriptional repression and activation. The inability of the MORC2+KS mutant to rescue transcriptional defects, despite intact condensate formation, points away from a model in which MORC2 condensates represent only microscopically visible byproducts of MORC2 activity.

      We believe these new data strengthen the manuscript by pairing the detailed mechanistic dissection of MORC2 phase separation with direct functional evidence, enhancing the conceptual impact and biological significance of the study.

      (5) Uncertain impact of pathogenic MORC2 mutations:

      Line 356-365: While the statements such as "disease-associated mutations primarily affect enzymatic and phase behaviors rather than DNA affinity" and "these findings provide mechanistic insight into how specific mutations may contribute to distinct pathological outcomes" are conceptually compelling, the data presented in Figure 7b-d do not appear to fully support these conclusions. For many of the mutants, the differences from WT across key parameters-condensation, ATPase activity, and DNA binding-are either modest or statistically insignificant. As such, drawing a unified mechanistic conclusion from these datasets may overstate what the data actually support.

      We agree that the effects of disease-associated MORC2 mutations described in Fig. 7 are modest and, in some cases, statistically insignificant. Our intention was to document observable trends rather than to propose a unified mechanistic framework. We have revised the manuscript to temper these conclusions and to emphasize the descriptive nature of these data.

      (6) Important conceptual clarifications:

      (a) Intrinsically disordered regions (IDRs) are not synonymous with phase separation. As the authors show, it is a combination of IDR-mediated interactions and CC3 dimerization that contributes towards the phase separation of MORC2. While IDRs can act as scaffolds for multivalent weak interactions that may promote biomolecular condensate formation, many IDRs serve other roles-such as mediating transient interactions, signaling, or regulatory functions-without undergoing phase separation. Researchers should avoid generalizing the assumption that the mere presence of IDRs in a protein implies its ability for phase separation. In this regard, authors should consider restructuring some of their generalized statements: Line 87-88: 'Recent studies suggest that intrinsically disordered regions (IDRs) can drive liquid-liquid phase separation (LLPS)' and Line 159-161: 'we noticed a long unstructured region at its C-terminus (Fig. S1b), a characteristic often associated with proteins capable of phase separation'.

      We agree that IDRs are not synonymous with phase separation and have revised the Introduction to avoid generalized statements. The revised text now emphasizes that IDRs can contribute to phase separation in a context-dependent manner and act in concert with structured oligomerization domains such as CC3-IBD.

      (b) Liquid-liquid phase separation: I would suggest switching the phrase to just phase separation. The rationale is that the in vitro studies of MORC2 (FRAP, droplet imaging) do not show liquid-like behavior, but perhaps liquid-solid. The FRAP studies suggest liquid-like behavior for some of the constructs. Given the differences in viscoelastic properties across the in vitro and in cellulo studies, it is better to generalize to "phase separation". Movies for droplet fusion and FRAP, wherever applicable, would be much appreciated. As the nature of in vitro MORC2 droplets appears different than in cells, movie representations of the above would enable readers to better assess the viscoelastic nature of the droplets (whether liquid, gel, etc).

      We appreciate the reviewer’s insight regarding the viscoelastic properties of MORC2. Our experimental data indeed show a disparity in dynamics between the two environments: while in vitro MORC2-FL condensates exhibit relatively low internal mobility, the in cellulo MORC2-FL puncta display high dynamics, characterized by rapid internal recovery in FRAP assays and droplet fusion events (Fig. S2f).

      This contrast suggests that the intracellular microenvironment plays a critical role in regulating the material state of MORC2 condensates. Consequently, we have focused on providing in vivo fusion data, as we believe in vitro characterizations (such as fusion or FRAP under various artificial conditions) may not faithfully represent the physiological behavior of MORC2. We have revised the manuscript to use the more general term “phase separation” or “condensation” and have added a discussion on these limitations to avoid overinterpreting the material properties observed in vitro.

      (7) Methods:

      (a) Figure 6 S2b: If phase separation occurs at, say, 1.8 µM protein concentration, this indicates that the protein has reached its saturation concentration (c-sat). Beyond c-sat, any additional protein should partition into the dense phase, while the concentration of the dilute phase remains constant. However, in this figure, the dilute phase concentration appears to increase with increasing total protein concentration, which is inconsistent with expected phase separation behavior. As the methods section does not have any sub-section for the sedimentation assay, it becomes difficult to understand how this experiment was performed, whether there is any technical discrepancy in the way soluble and pellet fractions were handled and processed for loading onto the gels. This is also the case with Figure 3d.

      We thank the reviewer for carefully examining the sedimentation assay and for raising this important conceptual point. We agree that, for an ideal two-phase system at thermodynamic equilibrium, the concentration of the dilute phase is expected to remain constant once the saturation concentration (c-sat) is reached.

      In our study, the sedimentation assay was used as an operational readout to assess concentration-dependent partitioning rather than to quantitatively define equilibrium phase boundaries. The assay involves centrifugation-based separation of supernatant and pellet fractions followed by SDS–PAGE analysis, and therefore does not necessarily report the equilibrium concentrations of coexisting dilute and dense phases. In particular, this approach can be influenced by incomplete physical separation of phases, kinetic trapping, and redistribution of material during handling, especially in systems where condensate maturation or internal reorganization occurs on longer timescales.

      Consequently, the apparent increase in the supernatant fraction with increasing total protein concentration likely stems from kinetic limitations and inherent technical constraints of the sedimentation assay, rather than a genuine deviation from classical phase separation behavior. These caveats are now explicitly clarified in the Methods section, with similar limitations of centrifugation-based assays for defining equilibrium phase behavior of biomolecular condensates reported previously.

      (b) Figure 4: The NMR comparisons appear to be primarily qualitative, lacking quantitative analyses such as chemical shift perturbation (CSP) and intensity ratio plots, which would offer deeper mechanistic insights. The NMR spectra detailing interactions among the IDR domains need to be quantified.

      We thank the reviewer for the suggestion. We have now performed quantitative CSP analyses for the NMR data shown in Fig. 4, and the corresponding CSP plots have been added to the revised manuscript (Fig. S7).

      As expected for interactions mediated by intrinsically disordered regions involved in phase separation, the observed CSPs are generally small. Notably, the CSP profile of IDRa closely matches that observed for the full-length IDR, whereas IDRb and IDRc show minimal perturbations. These results indicate that the interaction is primarily mediated by IDRa, with little contribution from the remaining regions.

      Peak intensity analyses were also examined but did not reveal additional residue-specific trends. Together, the quantitative CSP data support our conclusion that the interaction is weak, dynamic, and region-specific, consistent with an IDR-driven, phase-separation-related mechanism. We add this statement in method: CSPs were calculated in Hz at 600 MHz using the following equation:

      Minor comments:

      (1) Line 59-60: The Authors mention the HUSH-complex and then the MORC protein family, but do not discuss the relation between the two.

      We thank the reviewer for this comment. We have revised the Introduction to explicitly state that MORC2 may serve as a component of the HUSH complex and to clarify the functional relationship between MORC family proteins and HUSH-mediated transcriptional repression.

      (2) Line 74: 'Despite their structural similarities...', similarities between what all?

      We agree that this statement was ambiguous. We have revised the text to explicitly specify that the comparison refers to structural similarities among MORC family members.

      (3) Line 75: 'MORC-mediated repression remains...', this is the first time the word 'repression' is mentioned in the text and directly as an outstanding question.

      We have revised the Introduction to introduce the concept of transcriptional repression earlier and to provide appropriate context before posing it as an outstanding question.

      (4) The third paragraph does address issues in comments 1 and 3 to some extent, but the introduction needs some restructuring to provide a proper flow of information.

      We agree that the Introduction required restructuring. We have revised this section to improve logical flow, better integrate prior studies, and more clearly articulate the motivation and scope of the present work.

      (5) Line 83-85: How does the presence of IDRs suggest potential regulatory mechanisms?

      We have revised this sentence to clarify that IDRs may contribute to regulatory mechanisms by enabling multivalent and dynamic interactions, rather than implying that IDRs inherently confer regulatory function or phase separation capability.

      (6) Line 106-107: 'To determine whether MORC2 has N- and C-terminal dimerization interfaces similar to those...', reference 14 has already established that CC3 (denoted as CC4 in ref 14) is responsible for dimerization. Consider acknowledging their work in this regard?

      We thank the reviewer for this reminder. We have now explicitly acknowledged Ref. 14, which previously established the role of CC3 (denoted CC4 in that study) in MORC2 dimerization.

      (7) Lines 117-122: Are the authors comparing morphology from negative stain EM with AlphaFold predicted structure (Figure S1a and S1b)? If so, providing a zoomed-in inset from Figure S1a would be helpful.

      Yes, the comparison was intended to relate the negative-stain EM morphology to the AlphaFold-predicted architecture. We have added a zoomed-in inset in Fig. S1a to facilitate clearer comparison.

      (8) Line 152-153: '...even under varying physiological conditions', what are these varying conditions? Are the authors trying to point towards any of their specific results?

      We have revised this phrase to explicitly refer to variations in salt concentration and protein concentration tested in our in vitro assays.

      (9) Line 154-155: 'The dimeric assembly of CC3 is essential for maintaining the structural integrity of the protein', if it has been established, then please provide a reference.

      We thank the reviewer for this suggestion. For MORC family proteins, C-terminal coiled-coil–mediated dimerization is necessary for correct homodimer formation and functional stability (Xie et al., 2019, Cell Commun Signal. 17:160, Ref 14 in the revised manuscript).

      (10) Line 159-161: 'we noticed a long unstructured region at its C-terminus (Figure S1b), a characteristic often associated with proteins capable of phase separation25.', again authors are generalizing a statement which is, in most cases, context-dependent. For example, ref 25 mentions that unstructured regions or IDRs serve as a scaffold for multivalent interactions.

      We agree with the reviewer and have revised this sentence to avoid generalization. The revised text now emphasizes that IDRs may facilitate multivalent interactions in a context-dependent manner, rather than being intrinsically indicative of phase separation. Additionally, we have explicitly cited the mechanistic insight from Reference 25 that IDRs serve as scaffolds for multivalent interactions, to strengthen the logical link between the structural feature and its potential functional relevance.

      (11) Methods section for NMR (Line 665-667) mentions that nucleotides were added to a final concentration of 10 mM. There is no figure or section for MORC2 NMR with added nucleotides/DNA.

      We thank the reviewer for pointing this out. The nucleotide (ATP) addition was part of preliminary NMR trials and is not directly associated with the figures presented. We have deleted this in the Methods section to avoid confusion.

      (12) Line 285-294: Authors compare the effect of DNA binding on the phase separation of both MORC2FL and MORC2 CTDdeltaCW and conclude that DNA-induced condensation is primarily mediated through interactions with the IDR-NLS region. This appears not to be backed by proper control experiments. The authors do not show whether DNA binding mediates any phase separation for the isolated NTD or not? Similarly, what is the effect of DNA binding on MORC2 deltaIDR?

      We thank the reviewer for this insightful comment and agree that additional controls are essential for rigorously dissecting the contribution of DNA binding to MORC2 phase separation. Our interpretation that DNA-enhanced condensation is primarily mediated through the IDR–NLS region was based on comparative analyses of MORC2FL and MORC2 CTDΔCW, together with EMSA results demonstrating that DNA binding activity is conferred by the IDR–NLS–containing region. We acknowledge, however, that DNA binding alone is not sufficient to infer phase separation behavior.

      To address this point, we have performed additional analyses using the isolated NTD’ (residues 1–536) and MORC2 ΔIDR–NLS mutants (Fig. S6). The isolated NTD’ exhibited detectable DNA binding [4] but did not undergo DNA-induced condensation under conditions while MORC2FL or MORC2 CTDΔCW (residues 537-1032) readily formed condensates, indicating that DNA binding by itself is insufficient to drive phase separation. In parallel, MORC2 ΔIDR–NLS mutants showed severely compromised solubility and stability in vitro, which limited their quantitative characterization in phase separation assays. Nevertheless, under the conditions tested, these mutants did not display DNA-enhanced condensation comparable to MORC2FL.

      Taken together, these observations support a model in which the IDR–NLS region plays a critical role in coupling DNA binding to condensation, while additional domains are required to sustain robust phase separation. We have revised the manuscript text to clarify the experimental scope and to avoid overinterpreting the contribution of DNA binding in the absence of fully reconstituted control systems.

      (13) How did the authors assign the backbone amide NMR chemical shifts for MORC2?

      Backbone assignments of MORC2 IBD (1004-1032) were obtained using SOFAST versions of standard triple-resonance experiments, including HNCACB and CBCACONH, recorded at 298 K. Residual assignment ambiguities were resolved using [15] N-edited HMQC-NOESY-HMQC spectra.

      (14) Line 256: 'The partial compaction of IDRa...', what does the author mean here with 'partial compaction'? How did they measure compaction here?

      Regarding the term “partial compaction” mentioned previously, we apologize for the typographical error this phrase was erroneously used in place of “key component”.

      (15) Line 312-315: Why is there even a MORC2 readout for MORC2 KO cells with only EGFP? Also, the authors suggest that IDR deletion may impair mRNA stability or transcription; however, the expression levels of MORC2 deltaIDR and MORC2 deltaCC3 do not appear drastically different in Figure 3a.

      We thank the reviewer for raising these points. The apparent MORC2 signal in MORC2 knockout cells transfected with EGFP alone is due to the presence of residual MORC2 mRNA. Although CRISPR–Cas9–mediated knockout introduces a frameshift that prevents MORC2 protein expression, the mRNA can still be detected by RNA-seq. This is because nonsense-mediated decay (NMD), which targets transcripts with premature stop codons for degradation, is not always 100% efficient. Therefore, some MORC2 transcripts remain and produce detectable RNA-seq reads, even though no functional protein is expressed.

      Regarding the apparent discrepancy in expression levels, Fig. 3a displays only EGFP-positive cells, within which the fluorescence intensity of MORC2ΔIDR and MORC2ΔCC3 appears comparable to that of WT MORC2. However, the overall fraction of EGFP-positive cells is markedly reduced for these mutants compared to WT. Thus, while expression levels among successfully transfected cells are similar, fewer cells express detectable levels of the ΔIDR or ΔCC3 constructs across the total population. We therefore interpret this reduction in EGFP-positive cell fraction as reflecting impaired expression efficiency of these mutants, potentially arising from altered transcriptional output, mRNA stability, or protein stability. We have revised the manuscript text to clarify this distinction and to avoid overinterpreting the underlying mechanism in the absence of direct measurements.

      Author response image 1.

      EGFP, EGFP–MORC2 (FL), EGFP–MORC2 (ΔCC3), and EGFP–MORC2 (ΔIDR) were re-expressed in MORC2-knockout HeLa cells. Confocal imaging revealed that full-length MORC2 formed condensates in the nucleus, whereas mutants lacking either the CC3 or IDR domain failed to exhibit such behavior. Notably, under identical experimental conditions, we observed a marked reduction in the transfection efficiency of the EGFP-MORC2 (ΔIDR) construct. In contrast to the other variants, EGFP signals for ΔIDR were detectable in only a small fraction of the total cell population, despite consistent DNA loading and protocol synchronization. This observation suggests that the IDR might be required not only for biomolecular condensation but also for maintaining the steady-state levels of the MORC2 mRNA/protein or overall cellular fitness.

      (16) Line 330: 'MORC2 deltaCC3 failed to repress any of the 18 downregulated targets...'. This does not appear to be entirely true as repression of some targets (LBH, TGFB2, GADD45A) are closer to MORC2 FL than the EGFP control.

      We thank the reviewer for pointing out this inconsistency and for highlighting the need for precise wording. We have updated the dataset and revised the text to describe the results more accurately. We now describe that the mutants impair MORC2FL-mediated transcriptional regulation, consistent with the overall trend observed across these target genes.

      (17) Line 347-350: Based on the percent of cells with condensates, the authors conclude that CMT2Z-linked E236G and SMA-linked T424R mutants promote MORC2 phase separation. Again, the effect of these mutations on MORC2 condensation in cells may be direct or indirect. This can be investigated by comparing the in vitro effect of these mutations on MORC2 phase separation.

      We thank the reviewer for raising this important point and fully agree that the effects of disease-associated MORC2 mutations on condensate formation in cells may arise from either direct alteration in intrinsic phase separation propensity or indirect influences mediated by the cellular environment.

      In our study, disease-associated MORC2 mutants were assessed for condensate formation in HEK293F cells. Attempts were made to characterize these mutants in vitro; however, the E236G mutant exhibited markedly reduced solubility and stability upon purification, which precluded reliable in vitro phase separation analysis. We therefore evaluated the impact of E236G in cells and found that this mutation significantly impaired the dynamics of nuclear MORC2 condensates. For the T424R mutant, we note that its intracellular condensates displayed FRAP recovery kinetics comparable to those of WT MORC2, suggesting broadly similar dynamic properties of the assemblies formed in cells, but not necessarily implying a direct enhancement of intrinsic phase separation.

      In light of these considerations, we have revised the text in Lines 347–350 to avoid attributing a direct causal role of these mutations in promoting MORC2 phase separation. Instead, we now describe the observed increase in the fraction of cells containing condensates as a descriptive cellular correlation. We further emphasize that systematic in vitro characterization of disease-associated MORC2 mutants will be required to distinguish direct from indirect effects and represents an important direction for future investigation.

      (18) The discussion section lacks referencing to individual figures in the results section as well as previous literature.

      We agree with the reviewer that the Discussion would benefit from clearer integration with both the Results figures and prior literature. In the revised manuscript, we have substantially restructured the Discussion to explicitly reference key figures when interpreting experimental findings and to more clearly distinguish conclusions drawn from specific datasets. In addition, we have expanded citations to previous studies where relevant, particularly in the context of MORC2 DNA binding, ATPase regulation, chromatin association, and disease-linked mutations. These revisions aim to better situate our findings within the existing literature and to guide readers more clearly between experimental observations and their interpretation.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Zhang et al. demonstrates that MORC2 undergoes liquid-liquid phase separation (LLPS) to form nuclear condensates critical for transcriptional repression. Using a combination of in vitro LLPS assays, cellular studies, NMR spectroscopy, and crystallography, the authors show that a dimeric scaffold formed by CC3 drives phase separation, while multivalent interactions between an intrinsically disordered region (IDR) and a newly defined IDR-binding domain (IBD) further promote condensate formation. Notably, LLPS enhances MORC2 ATPase activity in a DNA-dependent manner and contributes to transcriptional regulation, establishing a functional link between phase separation, DNA binding, and transcriptional control. Overall, the manuscript is well-organized and logically structured, offering mechanistic insights into MORC2 function, and most conclusions are supported by the presented data. Nevertheless, some of the claims are not sufficiently supported by the current data and would benefit from additional evidence to strengthen the conclusions.

      Thank you for your insightful review and constructive suggestions, which have been invaluable in refining our manuscript.

      The following suggestions may help strengthen the manuscript:

      Major comments:

      (1) The central model proposes that multivalent interactions between the IDR and IBD promote MORC2 LLPS. However, the characterization of these interactions is currently limited. It is recommended that the authors perform more systematic analyses to investigate the contribution of these interactions to LLPS, for example, by in vitro assays assessing how the IDR or IBD individually influence MORC2 phase separation.

      We appreciate the reviewer’s insightful comment regarding the characterization of IDR–IBD interactions. In this study, we combined NMR spectroscopy, domain deletion analysis (in vivo), and in vitro phase separation assays to demonstrate that interactions between the IDR and IBD contribute to MORC2 condensate formation. To systematically assess the individual contributions of the IDR and IBD to MORC2 phase separation, we performed in vitro reconstitution assays using purified domain constructs (Fig. S6). Neither the isolated IDR nor the IBD alone exhibited phase separation under buffer conditions approximating the physiological environment, indicating that each domain is individually insufficient to drive condensation. Upon the addition of 10% PEG8000, phase separation was selectively observed for the IDR but not for the IBD, suggesting that the IDR possesses an intrinsic propensity for phase separation that can be enhanced by crowding molecular. Importantly, when the IDR and IBD were mixed, phase separation was robustly induced, supporting a model in which cooperative inter-domain interactions between the IDR and IBD promote MORC2 condensation. In the absence of PEG, no phase separation was observed for the IDR–IBD mixture. These observations imply that IDR–IBD interactions cannot drive phase separation on their own, but require cooperation with CC3-mediated dimerization to achieve this process, which is the central point we wish to emphasize.

      (2) The authors mention that DNA binding can promote MORC2 LLPS. It is recommended that they generate a phase diagram to systematically assess how DNA influences phase separation.

      We agree that constructing a full phase diagram would provide a more systematic evaluation of the effect of DNA on MORC2 phase separation. In the current study, we assessed DNA-dependent condensation across multiple protein and DNA concentrations, which consistently showed that DNA enhances MORC2 phase separation. At low protein concentration (0.5 µM), phase separation requires sufficient DNA, whereas increasing either DNA or protein concentration promotes liquid droplet formation. At high DNA and protein concentrations, amorphous structures dominate, indicating a transition away from dynamic assemblies. We have clarified this point in the Results and Discussion sections and now note that a comprehensive phase diagram analysis represents an important direction for future work.

      (3) The authors use the N39A mutant as a negative control to study the effect of DNA binding on ATP hydrolysis. Given that N39A is defective in DNA binding, it could also be employed to directly test whether DNA binding influences MORC2 phase separation.

      We thank you for your constructive suggestions. The purified wild-type MORC2(1–603) exhibited weak but detectable ATPase activity, whereas the N39A mutant was completely inactive [5]. Based on this characteristic, the N39A mutant was used as a negative control for the ATP-binding-deficient mutant in this study [3]. However, no evidence has been provided to demonstrate that the N39A mutant is defective in DNA binding. Importantly, both our results and previous studies [5-6] indicate that MORC2 engages DNA via multiple domains, suggesting that a single-point mutation is unlikely to significantly compromise its overall DNA-binding capacity.

      (4) Many of the cellular and in vitro LLPS experiments employ EGFP fusions. The authors should evaluate whether the EGFP tag influences MORC2 phase separation behavior.

      We appreciate the reviewer’s concern regarding the potential influence of the EGFP tag. The use of EGFP fusions in our study was primarily to maintain consistency with the in-cell experiments. Importantly, we confirmed that EGFP alone does not undergo phase separation in cells, and this observation is consistent with previous studies [7]. Additionally, in vitro phase separation of MORC2 was independently validated using Cy3–labeled CTD (Fig. S5), which recapitulated the condensate formation seen with EGFP-fused protein. Together, these results indicate that the EGFP tag does not significantly influence MORC2 phase separation, supporting the validity of our conclusions.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors claim to have obtained nucleic acid-free protein, but no data are provided to support this assertion. It is recommended that they include appropriate validation to confirm the absence of nucleic acids.

      We thank the reviewer for highlighting this point. To validate that the purified MORC2 protein is indeed free of nucleic acid contamination, we have additional experimental evidence (e.g., A260/280 measurements, agarose gel analysis, or EMSA in Fig. 5), which has been added to the Methods section and Table S2.

      Note: Agarose gel analysis for MORC2 constructs to confirm the absence of nucleic acids. The pET32 vector as the positive control, the protein preparation for analysis is 0.05 mg. E means E. coli and H means HEK293F.

      (2) The FRAP recovery curves are not normalized to 0, making comparison difficult. The authors should normalize the post-bleach intensity to 0 and re-plot the curves to allow a more standard interpretation of mobile fractions.

      We agree with the reviewer and have now normalized the FRAP recovery curves by setting the post-bleach intensity to 0. The revised plots are presented in the Figures (2f, j, l; 6c, 7f), allowing for more direct comparison of mobile fractions across different conditions.

      (3) The HSQC spectra for IBD appear inconsistent: the peak positions in Fig. 4C do not align with those shown in panels D-F. The authors should verify the spectral assignments and ensure consistency across figures.

      We thank the reviewer for pointing this out. The apparent inconsistency arose from the fact that different spectral regions were displayed in Fig. 4c versus Fig. 4d-f for visualization purposes, which may have given the impression of mismatched peak positions. The spectral assignments themselves are consistent across all panels.

      To avoid confusion, we have now adjusted the spectral window shown in Fig. 4c to match that used in Fig. 4d-f. The revised figure ensures consistent presentation of the same spectral region across all panels.

      Reference:

      (1) Zhang, Y., Stöppelkamp, I., Fernandez-Pernas, P. et al. Probing condensate microenvironments with a micropeptide killswitch. Nature 643, 1107–1116 (2025).

      (2) Fendler NL, Ly J, Welp L, et al. Identification and characterization of a human MORC2 DNA binding region that is required for gene silencing. Nucleic Acids Res.53(4):gkae1273 (2025).

      (3) Tchasovnikarova, I., Timms, R., Douse, C. et al. Hyperactivation of HUSH complex function by Charcot–Marie–Tooth disease mutation in MORC2. Nat Genet 49, 1035–1044 (2017).

      (4) Douse, C. H. et al. Neuropathic MORC2 mutations perturb GHKL ATPase dimerization dynamics and epigenetic silencing by multiple structural mechanisms. Nat Commun 9, 651 (2018).

      (5) Tan, W., Park, J., Venugopal, H. et al. MORC2 is a phosphorylation-dependent DNA compaction machine. Nat Commun 16, 5606 (2025).

      (6) Sánchez-Solana B, Li DQ, Kumar R. Cytosolic functions of MORC2 in lipogenesis and adipogenesis. Biochim Biophys Acta. 1843(2):316-326 (2014).

      (7) Li, C.H., Coffey, E.L., Dall’Agnese, A. et al. MeCP2 links heterochromatin condensates and neurodevelopmental disease. Nature 586, 440–444 (2020).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Behavioral labels rely on video-based scoring, which may not fully capture subtle or hidden movements.

      This is very true; certainly, this work is only a starting point. But the techniques used for this manuscript, despite starting with video-based scoring, specifically did allow us to differentiate behaviors that were too subtle to recognize in the video. For the revision, we will describe how this work leads to future studies in which we will be able to explore other means of collecting behavioral labels, potentially directly from simultaneous recordings of multiple muscles.

      (2) The relationship between brain activity and behavior is correlational, but sometimes interpreted more strongly.

      We will comb through the manuscript and make edits to be more precise and technically correct in presenting this relationship, and clarify that our suggestion of a causal link is only indirect and related to previous work (Mukherjee et al. 2019).

      (3) The manuscript could be clearer and more accessible to readers outside the field.

      We will edit the manuscript in multiple places to make technical and field-specific aspects more accessible. As part of this, in appreciation of Reviewer 2’s comments, we will take additional care to elaborate on and clarify our need and interpretation of SHAP values and classifier structure.

      Reviewer #2 (Public review):

      (1) I have several concerns regarding the methodological comparisons used to establish the superiority of the proposed XGBoost classifier. In particular, the comparison between the XGBoost classifier and previously used QDA approaches (Figure 3) may not be entirely well-matched. The QDA framework was originally designed primarily to detect gape events and does not explicitly assign labels to MTM movements. As a result, the apparent advantage of XGBoost in identifying MTMs may partly reflect differences in task formulation rather than intrinsic differences in classification performance. From visual inspection, gape detection performance appears broadly comparable across methods.

      A more informative benchmark would involve comparing XGBoost to an extended pipeline in which QDA-based gape detection is combined with a secondary movement-detection stage, distinguishing MTMs from periods of no movement. Such a comparison would better isolate the contribution of classifier architecture per se. Without this control analysis, the strength of the claim that XGBoost provides superior performance for behavioral decoding remains somewhat uncertain.

      The revision will further clarify that, as the reviewer notes, the primary improvement in XGB classification compared to QDA (in multi-class aggregated metrics) comes specifically from its ability to classify MTMs, and that for gapes, both QDA and XGB perform on par. We will be more explicit about the fact that our goal in constructing the classifier is not to compare “classifier architecture”—not to find the very best classifier possible—but rather to take the next step by generating an instance of a classifier that performs demonstrably better on aggregated orofacial movements. We will update the manuscript to be more clear in our claims in this regard, and how the current XGB classifier can, once validated, be bootstrapped by future techniques (possibly using more informative data sources) to more fully characterize orofacial movements.

      (2) The presentation of the neural ensemble analyses is considerably less comprehensive and intuitive than that of the behavioral analyses. The manuscript would benefit from more direct visualization of inferred neural state transitions. For example, plotting predicted neural states in a manner analogous to the behavioral states illustrated in Figure 6B would improve interpretability and help readers understand how neural dynamics relate temporally to behavioral changes.

      In addition, the interpretation that GC ensemble dynamics drive behavioral state transitions may require further clarification. If GC activity plays a causal role in initiating behavioral changes, one might expect a consistent brain-to-behavior lag across changepoints. However, Figure 6 appears to show such lag primarily at the second transition but not at the first. This raises questions about how uniformly the proposed causal interpretation applies across state boundaries, and additional analysis or discussion is needed.

      We are happy to update the figures (likely by adding another panel to Figure 6) to clearly show inference of neural state transitions, in a manner similar to how we have shown behavioral state transitions in Fig. 6B. In addition, we will do a more comprehensive job of describing and referencing earlier work in which we have unpacked these analyses in greater detail—work that makes it clear why we would predict a lag-relationship for one set of change points and not the other.

      (3) The neural ensemble analyses primarily focus on constructing higher-level behavioral state variables rather than directly testing how individual movement subtypes relate to neural activity. The behavioral interpretation of the inferred state structure, therefore, remains somewhat unclear. While this approach is consistent with previous work from the authors and with broader state-transition frameworks of gustatory processing, it is not immediately obvious that this is the most informative level of analysis for the present dataset.

      In particular, it would strengthen the manuscript to examine whether GC neurons or ensembles also encode lower-level motor structure, such as the occurrence of gapes or specific MTM subtypes. Demonstrating selective or mixed encoding across hierarchical levels (movement motifs versus abstract behavioral states) would help clarify the functional interpretation of the reported neural dynamics. At present, the manuscript largely assumes that GC activity reflects higher-order behavioral states without directly testing alternative representational possibilities.

      The reviewer makes a good point. While previous work from the lab (Li et al. 2016) has assessed the relationship of GC activity with both the onset of gaping (i.e., the behavioral state transition) and individual gapes and found only a relationship with onset of gaping (findings that we now explicitly describe in the revision), we have not performed a similar analysis for MTMs. We will do so and add it to the paper.

      (4) Because direct behavioral ground truth for intra-oral ingestive movements is difficult to obtain, MTM subtypes are inferred primarily through clustering of EMG waveform features. Although the authors demonstrate statistical separability and cross-session stability of these clusters, it remains unclear whether they correspond to discrete motor programs or instead reflect a structured partitioning of a continuous behavioral space shaped by feature selection and preprocessing choices. Perhaps some additional robustness analyses or convergent validation (e.g., alternative clustering methods, feature perturbation tests, or stronger neural and behavioral dissociations) would help clarify the biological significance of the inferred subtype structure.

      We admit (in fact, we have done so in the text) that we are not yet to the point of being able to “split hairs” to this degree (although we, like R2, see that as a goal). In the meantime, we will expand the section of Results text in which we describe the fact that the clustering of behaviors is observed both in “waveform space” (Fig. 4E was generated using standardized waveforms) and “feature space” (Fig. 4 B,C, and F), and that as such the clusters are NOT simply a partitioning of continuous, unimodal behavioral space. We will report convergent results from alternative (k-means) clustering methods to further support that conclusion. Finally, we will describe (in the Discussion section) ways to more rigorously test and extend this claim in future work.

      Reviewer #3 (Public review):

      Some aspects of the EMG-based movement classification pipeline warrant careful interpretation. The training dataset used for classifier development is relatively small and is derived from a subset of trials in which mouth movements were clearly visible in video recordings. While the classifier performs well on this labeled dataset, it is not entirely clear how representative these labeled examples are of the full range of EMG signals present in the larger dataset.

      Very good point. We will update the text to note this qualification to the reader. We will also, however, highlight the fact that our focus on a highly reliable and representative (i.e., agreed upon by 2 independent, blind scorers) subset of labels allows us to perform more targeted analyses and make more targeted interpretation in our results. And we will also be more pointed in the revision, as we have noted above, about the fact that this work is only scratching the surface of what can be accomplished in this domain, and that future work will involve STARTING with the waveforms that aren't accounted for in terms of gapes and MTMs.

      The interpretation of the three identified MTM subtypes also remains somewhat tentative. The study convincingly demonstrates that distinct waveform-defined clusters exist in the EMG data, but the functional significance of these clusters as ingestive "behaviors" is less clear. As acknowledged by the authors, the specific roles of these movement patterns in the ingestion process remain speculative.

      We share R3’s desire for clarity on this point—we do not wish to imply that we understand more than we understand—and will be sure to fine-tune our language to make clearer and more explicit the fact that the distinction in the roles of the MTM subtypes in ingestion at this point remains speculative.

      Finally, several conclusions in the Discussion rely on relatively strong mechanistic language when describing the relationship between GC dynamics and ingestive behavior. The data clearly demonstrate a temporal association between GC state transitions and changes in the frequencies of the different MTM subtypes. However, the results primarily support the interpretation that similar cortical dynamics are associated with ingestive and rejection-related behaviors rather than definitively establishing that these behaviors "are governed by the same underlying neural mechanisms".

      We will soften our language to clarify which of our Discussion suggestions are speculation, highlighting for the reader the fact that our data, while consistent with evidence suggesting a causal link between the GC transition and gaping (Li et al., 2016; Mukherjee et al., 2019), do not prove a causal neural-behavioral link for MTMs.

      References:

      Li, Jennifer X., et al. “Sensory Cortical Activity Is Related to the Selection of a Rhythmic Motor Action Pattern.” The Journal of Neuroscience, vol. 36, no. 20, May 2016, pp. 5596–607. DOI.org (Crossref), https://doi.org/10.1523/JNEUROSCI.3949-15.2016.

      Mukherjee, Narendra, et al. “Impact of Precisely-Timed Inhibition of Gustatory Cortex on Taste Behavior Depends on Single-Trial Ensemble Dynamics.” eLife, edited by Laura L. Colgin et al., vol. 8, June 2019, p. e45968. eLife, https://doi.org/10.7554/eLife.45968.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      The authors have used a macaque (two animals only) to follow the migration of 'seeded' TDP43 protein in neuronal pathways - thus mimicking the spread of ALS in the human CNS. Previous experiments in rodents failed to demonstrate this, posing interesting and important biological differences, possibly related to the UMN-LMN system in higher order apes and humans. 

      Strengths: 

      An important step forward. 

      Weaknesses: 

      No weaknesses were identified by this reviewer. Only 2 animals were used, but that is appropriate given the sensate status of the macaque. In the opinion of this reviewer, the results are entirely convincing. 

      Reviewer #2 (Public review): 

      Summary: 

      There are astonishingly few papers trying to reproduce the process of initiation and spreading that Braaks studies have suggested and postulated. The authors should be applauded for pioneering such a difficult experiment. They overexpressed the TDP-43 protein in the motor neuron pool of the brachioradialis muscle and showed that by this technique, motor neurons in this pool died, and the muscle got denervated. They had evidence of a spreading process from the spinal cord to the cortex, demonstrated by showing widespread deposits of phosphorylated TDP-43 bilaterally in the cervical cord and the motor cortex. By their experiment, they created a dying-backwards model, not a model of corticofugal spread, like that shown by Braak. No muscle weakness was observed, not even in the brachioradialis. 

      Strengths: 

      The strength of this innovative study is the fact that this spreading experiment uses the phylogenetically young connectome of primates (macaques). They also made the thought-provoking observation of spreading from the cord to the motor cortex, not the corticofugal spread model observed by Heiko Braak. This is thought-provoking because this enables the observer to compare their model with the findings in humans. 

      Weaknesses: 

      The following aspects are not a weakness but need to be better explained for the interested reader - and potentially improved in future studies for which the authors laid the foundation: 

      (1) Why do the authors use the brachioradialis motor neuron pool to overexpress TDP-43? More is known about other muscles and how they are embedded in the motor connectome of primates. Why not the biceps brachii or the hand extensors or - even better - the small muscles of the hand? These are known to be strongly monosynaptically connected with the motor cortex. The authors should explain this. I am unclear if there was a specific reason which I did not see or understand. In my view, the brachioradialis is not the best representative of the primate connectome, for example, to examine this model and compare it with the corticofugal spread. 

      The brachioradialis muscle was chosen primarily for reasons of animal welfare; our concern when designing the experiments was that the muscle we chose for injection might become very wasted and weak before the experiment had been completed. If we had injected a hand muscle, this would have affected manipulation, feeding and grooming behaviours, whereas had we injected biceps brachii or forearm extensors, this would have affected more important behaviours requiring strength for body support in the home cage (e.g. climbing, swinging, etc.). The advantage of choosing brachioradialis is that there is some functional redundancy; in macaques, compared to biceps brachii, brachioradialis has a relatively minor role in elbow flexion and supination of the forearm. We therefore reasoned that there should be physiological compensation for any weakness in brachioradialis, and thus minimal effects on normal behaviour.

      A secondary practical consideration was the importance of good quality MR imaging of the injected muscle and the positioning of the focussing coil; because of the physical constraints related to the monkey sitting in our narrow-bore scanner, the forearm muscles were the optimal choice. 

      With reference to the ‘primate connectome’, whilst hand muscles are known to have strong cortico-motoneuronal connections, we have shown previously that monosynaptic corticomotoneuronal connections are as strong in muscles innervated by the deep radial nerve (like brachioradialis) as in intrinsic hand muscles (Witham et al, 2016).

      Finally, for the purposes of these experiments, all we required was a method for inoculating TDP-43 into a motor neuron pool within the spinal cord, without direct surgical trauma to the spinal cord. Our aim was to test the hypothesis that extracellular TDP-43 is sufficient to cause spreading neuronal changes in macaque, similar to those observed in human ALS/MND; our aim was not to replicate the actual pattern of human MND observed clinically.

      These points will be addressed in a revised version of the manuscript. 

      (2) In the Braaks experiment, only (seemingly soluble) non-phoshorylated TDP-43 "crossed" synapses. Phosphorylated TDP-43 did not do this. The authors of this study saw phosphorylated TDP43 in motor neurons and the cortex. Is there any potential explanation for how it crosses synapses? If it really does, there is an obvious difference to the human situation which needs to be emphasized and explained (in the future). 

      To clarify, there was no evidence of phosphorylated TDP-43 crossing synapses. It is more likely that excess non-phosphorylated TDP-43 crossed synapses, and that this then subsequently led to TDP-43 phosphorylation.  

      (3) There were significant deposits of phosphorylated TDP-43 in oligodendrocytes in humans. Whilst I understand that one experiment cannot solve every question - I am curious about whether the authors saw anything in oligodendrocytes? 

      We have not looked at this.

      (4) Which was the pattern of damage? Of course, this pattern is not likely to have a monosynaptic pattern - like in humans........but was there a pattern? Did it have a physiologically meaningful basis? Was there any relation to the corticofugal monosynaptic pattern? What are the differences? The authors speak of "multiple waves". Does this mean that if this were a corticofugal model, for example, oculomotor neurons would also degenerate? 

      The description of ‘multiple waves’ in paragraph 2 of the discussion section is entirely hypothetical, based on the assumption that there are different mechanisms by which TDP-43 spreads through the nervous system, from slow local spread by diffusion to more rapid long-range axonal spread to widely separated regions. For the neuropathological staging analysis, we therefore looked at different brain regions (hypoglossal nuclei, reticular formation, inferior olives, frontal cortex, temporal cortex and hippocampal formation). This analysis only showed loss of motor neurons in the spinal cord ipsilateral to the side of the muscle injections, in segments consistent with the location of brachioradialis motoneurons. We did not demonstrate a ‘pattern of damage’ as described in humans in our experiments because this is a pre-symptomatic pre-clinical model, with no established ‘damage’ from each wave. We speculate that this is because animals were terminated too early in the disease process.

      However, whilst there was no established neuronal degeneration outside the cervical spinal cord, the observation that there were more pTDP-43 positive Betz cells in left (contralateral to the brachioradialis injection) New M1 than Old M1 (see Figure 6I and J) would support spread via monosynaptic connections to motoneurons; New M1 is where most monosynaptic cortico-motoneuronal connections originate.

      Reviewer #3 (Public review): 

      Summary: 

      In this paper by Jones and colleagues, a non-human primate model is described in which wild-type TDP-43 is expressed in the cervical spinal cord. This gave rise to loss of motor neurons in the ventral horn at that level in the cervical spinal cord. MRI of the muscles allowed to see increased intensity in the mostly affected brachioradialis muscle, suggesting this muscle becomes denervated. At the neuropathological level, TDP-43 and pTDP-43 staining in the cytoplasm is increased, not only at the specific level of the cervical spinal cord, but also at a distance. 

      Strengths: 

      A clear strength is the state-of-the art focal expression of the TDP-43 transgene at a focal site in the cervical spinal cord. This is achieved by combining a general expression of a flipped loxP flanked TDP-43 vector using AAV9 intrathecal administration, followed by an intramuscular AAV2 hSyn CRE-TdTomato vector in the brachioradialis muscle in order to induce focal recombination and expression of TDP-43 in motor neurons innervating this muscle on one side. 

      Another strength is the non-human primate background, which is much closer to the human situation. 

      Weaknesses: 

      Given the complexity and cost of the model, the n is very low. 

      As is common in most studies in non-human primates, we have carried out all statistical analysis within one animal (e.g. the comparison of motoneuron numbers between left and right cord). We then show that results are reproducible in two animals. Although the number of animals is lower than in a typical rodent study, we see this as an advantage of the model, adhering to the 3Rs principle of ‘reduction’.

      The design of the experiments and the results shown about the toxicity induced by this focal TDP-43 expression do not allow us to conclude that it is a good model for ALS for several reasons. It is not clear that the TDP-43 overexpression results in spreading weakness or in spreading motor neuron loss. The neuropathological changes described suggest that there is a kind of stress response, which extends to regions away from the site of primary damage, but more is needed to provide convincing evidence that there is spreading of disease pathology reminiscent of human ALS. 

      As already noted in our response to Reviewer 2 (point 1), animal welfare is an important consideration when designing these complex experiments in primates. We could not therefore justify allowing the animals to survive until extensive wasting and weakness were evident, recapitulating the human disease. 

      The model developed in these experiments is therefore a pre-symptomatic pre-clinical model, in which animals are terminated before pathology leading to widespread motor neuron loss is evident. At post mortem we do have evidence of motor neuron loss in the segments supplying brachioradialis (C4-C8).

      Stress of various forms, including blunt trauma (e.g. Anderson et al, 2021), stab/electrode insertion injury (e.g. Zambusi et al, 2022), chemical (e.g. arsenite) exposure (e.g. Huang et al, 2024), or hypoxia (Marcus et al, 2021) can result in pathological nucleocytoplasmic translocation of TDP-43. In our model, there was no direct trauma to the brain or spinal cord ante mortem, excluding one major cause of tissue stress. Hypoxia during the process of euthanasia is possible, but we would expect there would not be enough time before death for this to manifest as TDP-43 translocation. In the literature TDP-43 translocation due to stress is diffuse; we have demonstrated that in our model the TDP-43 pathology is not diffuse but selective. For example, there was no evidence of disease in the oculomotor nuclei; in the primary motor cortex (M1) there are significantly more pathological changes in the evolutionarily younger ‘NewM1’ compared to the neighbouring ‘OldM1’.

      It is therefore improbable that our findings could be explained by ‘a kind of stress response’. Our findings are better explained by spread of the TDP-43 protein.

      Reviewer #4 (Public review): 

      Summary: 

      In this manuscript, the authors present data describing the development of a model of ALS in rhesus macaques. They use a viral intersectional model to overexpress TDP-43 in a population of motor neurons and then study the spread of the pathology about 7 months later. They demonstrate that both the cervical spinal cord and motor cortex (new and old M1) are full of TDP-43, suggesting that the pathology spreads from the single motor pool to presumably related neurons. 

      Strengths: 

      This is a super-important study in two main ways: 

      (1) This could be the birth of a really important model, one that is really needed for making progress in understanding ALS and the development of therapeutics. There are shortfalls with all the rodent models. Models dependent on cell cultures are superb for understanding cell-autonomous processes, but miss out on connectivity, particularly the long-range connectivity. Organoids may ultimately prove to be beneficial, but they would need cortex, spinal cord, and muscle, and translatability from them is not assured. So a NHP model is needed, and this may be it.

      Furthermore, the Methods are meticulously described and will undoubtedly facilitate reproducibility. 

      (2) The concept of the spread of pathology has been proposed for some time, I think, based initially on the detailed clinical observations of Ravits and colleagues. The authors have looked at this directly and provide supporting evidence for this interesting hypothesis. They show spread locally and contralaterally in the spinal cord (although a figure would be nice) and to the motor cortex. 

      Taking only these 2 points into account is more than sufficient for me to be enthusiastic about this work. 

      Weaknesses: 

      I'd like to make a couple of points that if addressed, could, in my view, help the authors strengthen this work. 

      (1) We don't know how many MNs were transduced by the rAAV. There was no tdTom expression, for whatever reason. The authors show an image of a control experiment with a single MN transduced, but there should be a red motor pool, at least in the control experiments. The impression that I get is that very few were transduced, and, in my mind, this makes the findings even more interesting - maybe you don't need many "starter" MNs. 

      Unfortunately, we cannot know how many motoneurons were transduced.

      However, the reviewer may be correct, that it is actually only a small fraction of the brachioradialis pool. This is supported by the evidence for rather focal denervation seen on MRI.

      (2) Continuing on this point, this leads the authors to conclude that all BR MNs have died. They support this by the reduced MN count (see point 3). Firstly, do we know how many BR MNs there are in the rhesus macaque, and does the reduction seen correspond to this number? Secondly, and more importantly, the muscle looks normal on MRI at 28 weeks - it does not look like a denervated muscle. The authors state that it has maybe been reinnervated, but by what, if all the BR MNs are dead? This does not seem like a plausible explanation to me. Muscle histology, NMJs, and fibre typing would have been useful to understand what's going on with the MNs. (And electrophysiology would have been wonderful, but beyond the scope of this study.) 

      To clarify, we did not conclude that all brachioradialis motor neurons had died, rather that all transfected brachioradialis motor neurons pool had died. As noted above, when these cells die and the muscle is denervated, the MRI signal changes occupy only a small volume of the muscle and are transient. We would not expect to see long-term MRI changes in muscle anatomy after this limited denervation-reinnervation event. 

      Analysis of muscle histology, including fibre typing, is outwith the scope of this initial paper reporting the model; we hope that this will form the basis of a future publication.

      (3) Some MN biologists, like me, fuss a lot about how to count MNs, which is almost as difficult as counting the number of angels on the head of a pin. Every method has its problems. Focusing on the two methods here: (a) ChAT immunohistochemistry is pretty good in healthy states, but we don't know what happens to ChAT expression in different diseases, particularly when you have a new model. If its expression is decreased, then it is not a good marker for MNs; (b) Identifying MNs based on the size and morphology of neurons in the ventral horn is also insufficient. For example, ~30% of neurons in a typical pool are small gamma MNs, and a significant proportion (depending on the muscle) of the remainder will be small alpha MNs. So what one is counting is, at best, the large alpha MNs, not all the MNs in a pool. And in ALS, it's these largest MNs that are affected at the earliest stages. The small ones might be fine. So results will be skewed. (Hence, it would be interesting to see if the muscle had a higher proportion of Type I fibres after being reinnervated by S-type MNs.) 

      This is an interesting point, and we agree that each method used to quantify MN number carries its own limitations. The problem of MN identification is heightened in a MND-like pathological state, especially when considering evidence of reduced ChAT activity in spinal motoneurons in end-stage disease in post mortem human samples (Oda et al, 1995), and more recent evidence from Casas et al. (2013), who demonstrated early presymptomatic reduction in ChAT expression in SOD1G93A mice. It is important to note that this was a modest reduction, not complete abolition of signal (76% of control levels). ChAT immunoreactivity was still present and motor neurons were still identifiable as ChAT-positive at this pre-clinical stage of disease. As counts in our study were performed based on detecting ChAT in cells, it seems unlikely that we would miss cells. However, we cannot rule this out. If indeed this did occur, it would mean that the reduced motoneuron counts which we observed reflect not only cell death, but also profound motoneuron dysfunction which is presumably the proximal precursor to cell death.

      We acknowledge that size-based criteria applied to ChAT-positive neurons will preferentially capture large alpha motor neurons, and that gamma motor neurons and small alpha motor neurons are likely underrepresented in our counts. Our counts therefore reflect the large alpha motor neuron population rather than the total motor neuron pool. We believe that this is not a critical limitation in the context of the present study. Large alpha motor neurons are the population of primary pathological interest in ALS and related MND, being the earliest and most severely affected subtype. The selective vulnerability of fast-fatigable large alpha motor neurons in ALS is well established, and their preferential loss is the defining feature of disease progression in both human post mortem tissue and rodent models (Lalancette-Hébert et al., 2016). In this respect, our size threshold selects for precisely the population whose degeneration is most relevant to the disease phenotype we are modelling. 

      We intend to include comments on these important points in the revised version of the manuscript.

      In response to the final point regarding muscle histology and proportions of Type I fibres, as stated above, reporting of muscle histology, including fibre typing, is planned for a separate publication.

      (4) Statistics. These are complex experiments looking at the spread of a disease. The experimental unit is therefore the monkey, n=2. In each monkey, multiple sections are analysed, which are key technical replicates and often summative. For example, do we care about the average cell number in Figures 4D, E, 5 I, J or 6G, H, or rather the total cell number? Do the error bars mean anything? To be clear, I am by no means minimising the importance of the overall convincing findings. But I do not think this statistical analysis is particularly meaningful. 

      Here, the experimental unit is the tissue slice, mounted on a slide for histological analysis, and not the monkey. All statistical comparisons are made within a single animal. We then show that the findings can be replicated in two animals, both of which show significant results. This is standard approach taken in primate neuroscience, given the need to reduce animal numbers to the minimum consistent with producing convincing results.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Shan et al seeks to define the role of the CHI3L1 protein in macrophages during the progression of MASH. The authors argue that the Chil1 gene is expressed highly in hepatic macrophages. Subsequently, they use Chil1 flx mice crossed to Clec4F-Cre or LysM-Cre to assess the role of this factor in the progression of MASH using a high fat high, fructose diet (HFFC). They found that loss of Chil1 in KCs (Clec4F Cre) leads to enhanced KC death and worsened hepatic steatosis. Using scRNA seq they also provide evidence that loss of this factor promotes gene programs related to cell death. From a mechanistic perspective they provide evidence that CHI3L serves as a glucose sink and thus loss of this molecule enhances macrophage glucose uptake and susceptibility to cell death. Using a bone marrow macrophage system and KCs they demonstrate that cell death induced by palmitic acid is attenuated by the addition of rCHI3L1. While the article is well written and potentially highlights a new mechanism of macrophage dysfunction in MASH and the authors have addressed some of my concerns there are some concerns about the current data that continue to limit my enthusiasm for the study. Please see my specific comments below.

      Major:

      (1) The authors' interpretation of the results from the KC (Clec4F) and MdM KO (LysMCre) experiments is flawed. The authors have added new data that suggests LyM-Cre only leads to a 40% reduction of Chil1 in KCs and that this explains the difference in the phenotype compared to the Clec4F-Cre. However, this claim would be made stronger using flow sorted TIM4hi KCs as the plating method can lead to heterogenous populations and thus an underestimation of knockdown by qPCR. Moreover, in the supplemental data the authors show that Clec4f-Cre x Chil1flx leads to a significant knockdown of this gene in BMDMs. As BMDMs do not express Clec4f this data calls into question the rigor of the data. I am still concerned that the phenotype differences between Clec4f-cre and LyxM-cre is not related to the degree of knockdown in KCs but rather some other aspect of the model (microbiota etc). It woudl be more convincing if the authors could show the CHI3L reduction via IF in the tissue of these mice.

      We thank the reviewer for these constructive comments. We have performed FACSsorting of KCs (CD45<sup>+</sup> F4/80<sup>hi</sup> CD11b<sup>low</sup> TIM4<sup>hi</sup>) or MoMFs (CD45<sup>+</sup> F4/80<sup>low</sup> CD11b<sup>hi</sup> Ly6G<sup>-</sup> TIM4<sup>-</sup>) from Chil1<sup>fl/fl</sup> and Lyz2<sup>∆Chil1</sup> or Clec4f<sup>∆Chil1</sup>mice, respectively. Compared with Chil1<sup>fl/fl</sup> mice, mRNA levels of Chil1 was reduced more than 90% in KCs from Clec4f<sup>∆Chil1</sup> mice while not different in MoMFs (Revised Figure S3B). Besides, compared with Chil1<sup>fl/fl</sup> mice, mRNA levels of Chil1 was reduced more than 90% in MoMFs from Lyz2<sup>∆Chil1</sup> mice while roughly 40% in KCs (Revised Figure S5B). This revised data support the phenotypic difference between Lyz2-CKO and Clec4f-CKO mice.

      We agree with the reviewer that the significant knockdown of Chil1 in BMDM from Clec4f<sup>∆Chil1</sup>mice is confusing. To keep the rigor of our data, we remove this part from our manuscript. 

      Additionally, we performed immunofluorescence staining to detect Chi3l1 expression in liver tissues of these mice. The results show a reduction of Chi3l1 expression in KCs (TIM4+F4/80+ cells) of both Lyz2<sup>∆Chil1</sup>and Clec4f<sup>∆Chil1</sup>mice, with a more pronounced decrease in Clec4f<sup>∆Chil1</sup>mice (Author response image 1). 

      Author response image 1.

      The expression of Chi3l1 in liver tissues of Chil1<sup>fl/fl</sup>, Lyz2<sup>∆Chil1</sup>and Clec4f<sup>∆Chil1</sup>mice. Immunofluorescent staining to detect Chi3l1(green) expression in liver sections of Chil1<sup>fl/fl</sup>, Lyz2<sup>∆Chil1</sup>and Clec4f<sup>∆Chil1</sup>mice under normal chow diet. TIM4 (KCs marker, white), F4/80 (macrophage marker, red), nuclei were counterstained with DAPI, Scale bar=20 µm and 10 µm (Inset).

      (2) Figure 4 suggests that KC death is increased with KO of Chil1. The authors have added new data with TIM4 tht better characterizes this phenotype. The lack of TIM4 low, F4/80 hi cells further supports that their diet model is not producing any signs of the inflammatory changes that occur with MASLD and MASH. This is also supported by no meaningful changes in the CD11b hi, F4/80 int cells that are predominantly monocytes and early Mdms). It is also concerning that loss of KCs does not lead to an increase in Mo-KCs as has been demonstrated in several studies (PMID37639126, PMID:33997821). This would suggest that the degree of resident KC loss is trivial.

      We appreciate the reviewer’s insightful comment. We agree that our data show no substantial generation of monocyte-derived Kupffer cells (MoKCs) within the 16-week HFHC model. However, we do not believe the degree of resident KC loss is trivial, since 60% of KCs die at 16 weeks compared with 0 week (Revised Figure 5D). Instead, our observations align with a phased replacement model: recruited monocytes first differentiate into monocyte-derived macrophages (MoMFs), which we see accumulate (Revised Figure 5D), and only later adopt a KC phenotype. Consistent with this, our 16-week model shows significant EmKC loss and MoMFs expansion, but not yet the emergence of TIM4-MoKCs. This timing is supported by prior studies, where TIM4KCs were observed at 24 weeks, but not at 16 weeks, on similar diets (PMID: 33440159; PMID: 32888418). Therefore, we interpret our findings as capturing an earlier phase of MASLD progression, characterized by EmKC death and MoMF accumulation, prior to their full differentiation into MoKCs.

      (3) The authors demonstrated that Clec4f-Cre itself was not responsible for the observed phenotype, which mitigates my concerns about this influencing their model.

      We thank the reviewer for this comment and are pleased they agree that our control experiment using Clec4f-Cre alone confirms that the phenotype is specific to our genetic manipulation and not an artifact of the Cre driver.

      (4) I remain somewhat concerned about the conclusion that Chil1 is highly expressed in liver macrophages. The author agrees that mRNA levels of this gene are hard to see in the datasets; however, they argue that IF demonstrates clear evidence of the protein, CHI3L. The IF in the paper only shows a high power view of one KC. I would like to see what percentage of KCs express CHI3L and how this changes with HFHC diet. In addition, showing the knockout IF would further validate the IF staining patterns.

      We thank the reviewer for their thoughtful and constructive feedback. We agree that our initial conclusion regarding Chil1 expression in liver macrophages relied heavily on prior observations and was not sufficiently supported by the data presented. In response, we have revised our conclusion to state: "Hepatic macrophages express Chi3l1 and upregulate its expression following HFHC feeding." (Revised manuscript, page 4, line 136-137)

      To strengthen this finding, we have replaced the original high-power image of a single Kupffer cell with a representative low-power view showing multiple F4/80+ macrophages (Revised Figure 1A). Furthermore, we performed quantitative colocalization analysis, which revealed that under normal chow diet (NCD), approximately 8% of F4/80+ macrophages are Chi3l1-positive. This proportion significantly increases to 15% upon HFHC feeding (Revised Figure 1A).

      Additionally, to validate the specificity of the Chi3l1 immunofluorescence signal, we have included staining of liver sections from Chil1 knockout mice. In contrast to wildtype mice, Chi3l1 signal was completely absent within F4/80+ macrophages in Chil1<sup>-/-</sup> mice, confirming the specificity of the staining (Revised Figure 1B, Revised manuscript, page 4, line 152-157).

      Minor:

      (1) The authors have answered my question about liver fibrosis. In line with their macrophage data their diet model does not appear to induce even mild MASH.

      We thank the reviewer for this observation. We agree that under our HFHC dietary conditions, the mice do not develop MASH pathology. However, we believe this earlystage model is a strength of our study, as it allows us to dissect the initial role of the Chi3l1-glucose interaction in regulating Kupffer cell fate during early MASLD, prior to the onset of significant fibrosis. This approach enables us to capture early macrophage adaptations (such as Chi3l1 upregulation) that might otherwise be masked or become secondary to the overt inflammation and scarring characteristic of late-stage MASH models.

      Reviewer #2 (Public review):

      In the revised version of the manuscript, the authors have attempted to address my questions, however, a number of my original concerns still remain.

      Firstly, I had asked for a validation of the different CRE lines used - Lysm and Clec4f. The authors have now looked at BMDMs and KCs (steady state) from these animals. They conclude LysM only targets BMDMs not KCs, while CLEC4F targets both KCs and BMDMs. This I do not understand, BMDMs do not express CLEC4F so why are they targeted with this CRE? Additionally, BMDMs are not the correct control here, rather the authors should look at the incoming moMFs in the livers of these mice in the MASLD setting. Similarly, the KO in the MASLD KCs should be verified.

      We thank the reviewer for these constructive comments. We have performed FACSsorting of KCs (CD45<sup>+</sup> F4/80<sup>hi</sup> CD11b<sup>low</sup> TIM4<sup>hi</sup>) or MoMFs (CD45<sup>+</sup> F4/80<sup>low</sup> CD11b<sup>hi</sup> Ly6G<sup>-</sup> TIM4<sup>-</sup>) from Chil1<sup>fl/fl</sup> and Lyz2<sup>∆Chil1</sup> or Clec4f<sup>∆Chil1</sup>mice fed NCD or HFHC for 4 weeks, respectively. Compared with Chil1<sup>fl/fl</sup> mice, mRNA levels of Chil1 was reduced more than 90% in KCs from Clec4f<sup>∆Chil1</sup> mice while not different in MoMFs at both 0 and 4 weeks (Revised Figure S3B). Besides, compared with Chil1<sup>fl/fI</sup mice, mRNA levels of Chil1<sup>fl/fI</sup was reduced more than 90% in MoMFs from Lyz2<sup>∆Chil1</sup> mice while roughly 40% in KCs at both 0 and 4 weeks (Revised Figure S5B). This revised data support the phenotypic difference between Lyz2-CKO and Clec4f-CKO mice. 

      Then I had asked for validation of macrophage expression of Chil1 in other MASLD human and mouse datasets. The authors have looked into this, but the data provided do not suggest it is highly expressed by these cells either in the other mouse models or in the human. Nevertheless, they include a statement suggesting a similar expression pattern (although also being expressed by other cells). This is not an accurate discussion of the data and hence must be revised. This also prompted me to take another look at their data and this has left me querying the data in Figure 1D. Is the percent expressed 1%? In Figure 1C the scale goes from 0-100 but here 0-1. If we are talking about expression in 1% of cells which would fit with the additional public mouse data now analysed then how relevant are any of these claims? How sure are the authors that the effects seen are through KCs/moMFs? In figure 1D all cells profiled by scRNA-seq should be shown not just MFs to get a better sense of this data. What is macrophage expression of Chil1 compared with all other liver cells?

      We thank the reviewer for the thoughtful feedback. We agree that the expression pattern of Chil1 should be described more accurately. To address this point, we examined four additional publicly available scRNA-seq datasets, including two mouse MASLD models and two human MASLD datasets (Author response image 2). Across these studies, the cell type with the highest Chil1 expression varied, whereas Chil1 transcripts were detected at relatively low frequency in macrophages (~1% of cells; Author response image 2C, E, K). To better present these data, we regenerated the UMAP plots to include all captured liver non-parenchymal cells, defined using the top two lineage specific markers (Author response image 3A–B). Consistent with Figure 2A–C, violin plots show that Chil1 is highly expressed in neutrophils, with only modest expression detected in macrophages (Author response image 3C). Further analysis of monocyte/macrophage subsets indicates that approximately ~1% of MoMFs or KCs express Chil1 (Author response image 3D–F). As the reviewer noted, the y-axis in Author response image 3F ranges from 0–1%, reflecting the low transcriptional detection frequency of Chil1 in macrophages, which is consistent with the additional public datasets analyzed.

      We also recognize that mRNA detection by scRNA-seq does not necessarily reflect protein abundance. Therefore, we assessed Chi3l1 protein expression in hepatic macrophages using immunofluorescence staining for F4/80, TIM4, and Chi3l1 in liver sections from mice fed either normal chow diet (NCD) or HFHC diet. These analyses show that Chi3l1 protein is detectable in both KCs (TIM4<sup>+</sup>F4/80<sup>+</sup>) and MoMFs (TIM4<sup>-</sup>F4/80<sup>+</sup>) (Revised Figure 1A). Quantitative colocalization analysis revealed that under NCD conditions, approximately 8% of F4/80<sup>+</sup> macrophages are Chi3l1-positive, which increases to ~15% following HFHC feeding (Revised Figure 1A). To confirm antibody specificity, we additionally performed staining in Chil1 knockout mice. In contrast to wild-type mice, Chi3l1 signal was completely absent in F4/80<sup>+</sup> macrophages from Chil1<sup>-/-</sup> mice, validating the specificity of the staining (Revised Figure 1B). Together, these results suggest that low-abundance Chil1 transcripts may be under-detected by scRNA-seq, whereas immunofluorescence captures accumulated protein. Importantly, our functional experiments using Clec4f-Cre– mediated deletion directly support that the observed phenotypes are mediated through Kupffer cells, regardless of expression levels in other liver cell types.

      In response to the reviewer’s comments, we have made the following revisions:

      (1) Softened our conclusion to: “Hepatic macrophages express CHI3L1 and upregulate its expression following HFHC feeding” (Revised manuscript, page 4, lines 136–137).

      (2) Included representative low-magnification images showing multiple F4/80<sup>+</sup> macrophages along with quantitative analysis (Revised Figure 1A).

      (3) Added immunofluorescence staining of Chil1<sup>-/-</sup> liver sections demonstrating complete absence of Chi3l1 signal in F4/80<sup>+</sup> macrophages, validating antibody specificity (Revised Figure 1B).

      (4) Regenerated UMAP plots to display all liver non-parenchymal cells and clearly indicate the low detection frequency of Chil1 transcripts in macrophages (Author response image 3).

      (5) Revised the relevant text to more accurately describe Chil1 expression patterns in hepatic macrophages (Revised manuscript, page 4, lines 136–157).

      Author response image 2.

      Analysis of Chil1 expression in additional single-cell RNA sequencing datasets. (A-C) Chil1 expression in a mouse model of NASH. (A) t-SNE projection of cell clusters from scRNA-seq data (GSE1283338) of livers from C57BL/6J mice fed a control or NASH diet for 30 weeks. (B) Dot plot showing scaled Chil1 expression across all identified cell clusters. (C) Dot plot of scaled Chil1 expression after excluding the neutrophil cluster, highlighting expression in macrophage populations. Analyzed cell clusters and cell numbers: KC_H (healthy, 1178); KC3_Control (1142); KC_N (NASH, 1045); KN_RM (recruited macrophage in KC niche, 950); Proliferating_KC (364); PDC_Control (356); Ly6CHi_RM (320); LSEC (299); NK_NKT (393); B_cell (244); DC_1 (107); DC_2 (118); Ly6CLo_RM (127); Hepatocyte (57); PDC_NASH (46); Neutrophil (21). (D-E) Chil1 expression during NAFLD progression in a mouse Western diet model. (D) t-SNE projection of cell clusters from scRNA-seq data (GSE156059) of livers from C57BL/6J mice fed a Western diet with fructose/sucrose for 12, 24, and 36 weeks. (E) Dot plot showing scaled Chil1 expression across all identified cell clusters. Analyzed cell clusters and cell numbers: capsule macs (250), LAMs (1419), Ly6chi monocytes (6912), mac1 (638), moKCs (767), Patrolling monocytes (690), Prolif.macs (521), Resident KCs (3629), Transitioning monocytes (3615). (F-H) Chil1 expression in human cirrhotic liver biopsies. (F) t-SNE projection of cell clusters from scRNA-seq data (GSE136103) of healthy and cirrhotic human liver samples. (G) Dot plot showing scaled Chil1 expression across major cell lineages. (H) Dot plot of scaled Chil1 expression specifically within the mononuclear phagocyte (MP) population. Analyzed cell clusters and cell numbers: B cell (1951); cycling (967); Epithelia (3751); ILC (10091); mast cell (2511); Mesenchyme (2382); MP (10874); pDC (317); Plasma cell (877); T cell (19076). (I-K) Chil1 expression in a human NAFLD explant. (I) t-SNE projection of cell clusters from scRNA-seq data (GSE190487) of a human NAFLD liver explant. (J) Dot plot showing scaled Chil1 expression across all identified cell clusters. (K) Dot plot of scaled Chil1 expression within the MP subpopulations. Analyzed cell clusters and cell numbers: B cell (1278); Cycling (152); MP (2897); pDC (391); Plasma cell (85); T cell (1551); KC (403); SAMac (scar-associated macrophages, 723); TM (tissue monocytes, 1265).

      Author response image 3.

      Hepatic macrophages express Chi3l1. (A-D) Wildtype C57BL/6J mice were fed either a normal chow diet (NCD) or HFHC for 16 weeks. NPCs were isolated and subjected to BD Rhapsody scRNA sequencing. (A) Uniform manifold approximation and projection (UMAP) plots illustrate the clustering of NPCs from the livers of mice fed NCD and HFHC. Major cell types are colored. (B) Heatmap showing the mean expression of top2 markers of each cell type. (C) Violin plots show the RNA expression of Chil1 between NCD and HFHC livers in each cell cluster. (D) UMAP plots depict the clustering of Monocytes/Macrophages in the livers of mice fed NCD and HFHC. Cell clusters are color-coded. (E) Dot plot displays the scaled gene expression levels of lineage-specific marker genes in different cell clusters. (F) Dot plot shows the scaled gene expression levels of Chil1 in the indicated cell clusters.

      The cell death had also previously concerned me that 40-60% of KCs were tunel +ve. I do not understand how 60% are +ve at 8 weeks but then they have more or less same number of TIM4+ cells at 16 weeks? How can this be? why do the tunel +ve cells not die? This concern remains as I don't understand how they reached these numbers given the images. Additional, larger images were also not provided to be sure that they are representative images in the figure. Now in the images provided, there are clearly cells which are TIM4+ where the tunel does not overlap, likely it is in a LSEC or other neighbouring cell. Indeed also taking Fig S11b as an example there are ˜7KCs and at best 1 expresses tunel so how do they get to 60%?

      We thank the reviewer for these constructive feedback. We agree that the sustained TUNEL positivity without corresponding KC depletion presents an apparent paradox. Based on our data, we propose that TUNEL-positive KCs represent cells in a prolonged stressed or pre-apoptotic state rather than undergoing immediate clearance. This interpretation is supported by the relatively stable TIM4+ cell numbers between 8 and 16 weeks, which would be inconsistent with rapid cell death and removal. Previous studies (PMID: 33440159; PMID: 32888418) have similarly documented gradual KC loss during MASLD progression, supporting our view that KC death occurs over an extended timeframe rather than acutely.

      Regarding quantification concerns, we acknowledge that the representative images in the original figure may have been misleading. To address this, we have now quantified KC apoptosis using low-magnification fields across multiple liver sections to ensure statistical rigor. Figure S11B (now Revised Figure S9B) presents these data, showing that under NCD conditions, KC apoptosis rates are minimal in both genotypes. Following HFHC feeding, apoptosis rates are comparable between Chil1<sup>fl/fl</sup> and Lyz2<sup>Δ Chil1</sup> mice. Importantly, we have replaced all TIM4/TUNEL co-staining images with lowmagnification representative images in the revised figures (Revised Figure 1A, 1B, 5E, S9A, S9B). These images better reflect the quantitative data and confirm that the originally highlighted high-magnification fields were not representative of global apoptosis rates.

      Reviewer #3 (Public review):

      This paper investigates the role of Chi3l1 in regulating the fate of liver macrophages in the context of metabolic dysfunction leading to the development of MASLD. I do see value in this work, but some issues exist that should be addressed as well as possible.

      Here are my comments:

      (1) Chi3l1 has been linked to macrophage functions in MASLD/MASH, acute liver injury, and fibrosis models before (e.g., PMID: 37166517), which limits the novelty of the current work. It has even been linked to macrophage cell death/survival (PMID:31250532) in the context of fibrosis, which is a main observation from the current study.

      We thank the reviewer for raising this important point and acknowledge previous studies linking Chi3l1 to macrophage function in liver disease. However, several aspects of our work extend beyond these prior reports. First, although global Chi3l1 deficiency has been shown to promote macrophage apoptosis in toxin-induced fibrosis models (PMID: 31250532), our study demonstrates that Chi3l1 differentially regulates the fate of distinct hepatic macrophage subsets embryo-derived Kupffer cells (KCs) and monocyte-derived macrophages (MoMFs)—in MASLD. To our knowledge, this subset-specific regulation of hepatic macrophages has not been previously described. Second, we identify a previously unrecognized metabolic mechanism by which Chi3l1 regulates macrophage survival. Specifically, we find that Chi3l1 binds glucose and promotes glucose uptake, thereby protecting the highly glucose-dependent KCs from metabolic stress–induced death, while exerting minimal effects on MoMFs. This mechanism is distinct from the previously reported Fas/Akt-mediated pathway (PMID: 31250532) and highlights a metabolic checkpoint controlling macrophage subset– specific vulnerability. Third, our findings reveal context- and cell type-dependent roles of Chi3l1. While myeloid-specific deletion of Chi3l1 has been reported to ameliorate steatohepatitis and fibrosis (PMID: 37166517), our KC-specific deletion model shows that loss of Chi3l1 in KCs exacerbates disease, indicating a previously unrecognized protective role of Chi3l1 in KCs during early MASLD. Together, these findings provide new insights into macrophage subset-specific regulation, identify a novel glucose related metabolic mechanism, and reveal context-dependent functions of Chi3l1 in MASLD pathogenesis.

      (2) The LysCre-experiments differ from experiments conducted by Ariel Feldstein's team (PMID: 37166517). What is the explanation for this difference? - The LysCre system is neither specific to macrophages (it also depletes in neutrophils, etc), nor is this system necessarily efficient in all myeloid cells (e.g., Kupffer cells vs other macrophages). The authors need to show the efficacy and specificity of the conditional KO regarding Chi3l1 in the different myeloid populations in the liver and the circulation.

      We thank the reviewer for raising this important point regarding the specificity of the genetic models and the apparent discrepancy with the study by Feldstein and colleagues (PMID: 37166517). To address these concerns, we performed additional experiments to directly assess the efficiency and cell-type specificity of Chi3l1 deletion in our models.

      (1) Efficiency and specificity of LysM-Cre and Clec4f-Cre models

      We isolated KCs (CD45<sup>+</sup> F4/80<sup>hi</sup> CD11b<sup>low</sup> TIM4<sup>hi</sup>) or MoMFs (CD45<sup>+</sup> F4/80<sup>low</sup> CD11b<sup>hi</sup> Ly6G<sup>-</sup> TIM4<sup>-</sup>) by FACS from Chil1<sup>fl/fl</sup>, Lyz2<sup>∆Chil1</sup> and Clec4f<sup>∆Chil1</sup>mice fed either NCD or HFHC diet. Consistent with the known specificity of these Cre lines, Clec4f-Cre resulted in >90% reduction of Chil1 mRNA in KCs with no significant change in MoMFs (Revised Figure S3B), confirming efficient KC-specific deletion. In contrast, LysM-Cre reduced Chil1 expression by >90% in MoMFs but only ~40% in KCs (Revised Figure S5B). These data support the reviewer’s concern that LysM-Cre mediates incomplete recombination in KCs, whereas the Clec4f-Cre model provides KC-specific deletion, explaining why the phenotype observed in Lyz2<sup>∆Chil1</sup> mice is relatively modest.

      (2) Relationship to the study by Feldstein et al.

      We agree that our LysM-Cre results appear different from those reported by Feldstein and colleagues. However, considering the new recombination data and differences in disease models, we believe the findings are complementary rather than contradictory. First, the disease models differ substantially. Feldstein et al. used a CDAA-HFAT diet for 10 weeks, which rapidly induces severe inflammation and fibrosis, whereas our study employed a long-term HFHC diet, modeling the more gradual metabolic progression of MASLD. These distinct disease contexts may engage different CHI3L1dependent pathways. Second, the mechanistic focus differs. Feldstein et al. reported that myeloid Chi3l1 promotes steatohepatitis and fibrosis through inflammatory macrophage recruitment and IL13Rα2-mediated stellate cell activation. In contrast, our study identifies a metabolic mechanism in which CHI3L1 binds glucose and promotes glucose uptake, protecting metabolically vulnerable KCs from stress-induced death. Finally, and importantly, KC-specific deletion using Clec4f-Cre recapitulates the key phenotypes observed in our study, including effects on KC survival and metabolic regulation. This confirms that the observed effects are KC-autonomous and not due to broader Cre activity in other myeloid populations.

      Together, these additional experiments clarify the recombination efficiency of our models and demonstrate that our conclusions are supported by KC-specific genetic evidence.

      (3) The conclusions are exclusively based on one MASLD model. I recommend confirming the key findings in a second, ideally a more fibrotic, MASH model.

      We thank the reviewer for this valuable suggestion. To address this point, we tested our key findings in an additional MASH model using a methionine–choline-deficient (MCD) diet. First, we examined Chi3l1 expression in this model. Wild-type mice fed an MCD diet for 6 weeks showed significantly increased Chi3l1 mRNA and protein levels in liver tissues compared with NCD controls, confirming diet-induced upregulation (Revised Figure 3A–B). To determine the functional contribution of Kupffer cell–derived Chi3l1, we subjected Clec4f<sup>ΔChil1</sup> mice and Chil1<sup>fl/fl</sup> controls to MCD feeding for 6 weeks. Body weight was comparable between genotypes throughout the feeding period (Revised Figure 3C). However, KC-specific deletion of Chi3l1 significantly exacerbated MCD diet–induced liver pathology, including increased steatosis, inflammation, and fibrosis, as indicated by higher MASLD activity scores, enhanced Oil Red O staining, increased Sirius Red deposition, and elevated α-SMA expression (Revised Figure 3D). Consistent with these histological findings, Clec4f<sup>ΔChil1</sup> mice exhibited an increased liver index, whereas serum ALT levels remained comparable between groups, suggesting increased hepatic lipid accumulation rather than aggravated hepatocellular injury (Revised Figure 3E). In addition, serum and hepatic triglyceride levels and serum cholesterol were significantly elevated, while hepatic cholesterol levels were not significantly different from controls (Revised Figure 3E). Together, these results validate our findings in an independent MASH model and further support a protective role for Kupffer cell–derived Chi3l1 in limiting steatosis and disease progression (Revised manuscript, page 5, line 188-205).

      (4) Very few human data are being provided (e.g., no work with own human liver samples, work with primary human cells). Thus, the translational relevance of the observations remains unclear.

      We thank the reviewer for raising this important point. We agree that additional human validation would further strengthen the translational relevance of our findings. We initially attempted to examine macrophage cell death in human liver samples by performing TUNEL and F4/80 co-staining on human liver cancer tissues. However, we did not detect clear colocalization in these samples. We speculate that this may reflect differences in disease context and stage, as the available samples represent endstage liver disease, whereas our study focuses on early MASLD progression. Despite this limitation, we provide several lines of evidence supporting the human relevance of our findings. First, analysis of multiple public human MASLD scRNA-seq datasets demonstrates Chi3l1 expression in hepatic macrophages (Figure 2F–K). Second, analysis of public bulk RNA-seq datasets shows that Chi3l1 expression positively correlates with MASLD disease activity and progression (Revised Figure 1EF). Third, our observations are consistent with previous clinical studies reporting elevated CHI3L1 levels in patients with MASLD/MASH and advanced liver disease. We acknowledge that functional validation in primary human macrophages or human liver tissues would further strengthen the translational significance of this work. This limitation and future direction have now been added to the Discussion (Revised manuscript, page 10, lines 409–411).

      Comments on revisions:

      The authors have done a thorough job addressing my comments. However, I am not convinced about the MCD diet model, which is somewhat hidden in the Supplementary Files. Neither seems MASH different nor are any fibrosis data shown to support the conclusions. I am not satisfied with this part of the revised manuscript, and I do not agree that the second MASH model would support the conclusions.

      We thank the reviewer for their continued careful evaluation and for highlighting the need for clearer presentation of the MCD model data. To address this concern, we have substantially revised this section of the manuscript. First, the MCD model results have now been moved from the Supplementary Figure to a new main figure (Revised Figure 3) to improve visibility and clarity. Second, we have added additional fibrosis analyses, including Sirius Red staining and α-SMA immunostaining, to directly assess fibrotic changes. These analyses show that MCD feeding induces significant collagen deposition in control mice and that fibrosis is further increased in Clec4f<sup>ΔChil1</sup> mice (Revised Figure 3D). Importantly, the MCD model recapitulates the key phenotypes observed in the HFHC model, with KC-specific Chi3l1 deletion leading to increased MASLD progression. These findings support the conclusion that the protective role of Kupffer cell–derived Chi3l1 is not restricted to a single dietary model, but is observed across distinct models of steatohepatitis. We hope that these revisions clarify the results and strengthen the evidence supporting our conclusions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor:

      Line 73 - should be moMfs not moKCs

      We thank the reviewer for this helpful comment. The term moKCs was used intentionally in line 73 to refer to monocyte-derived Kupffer cells, rather than MoMFs (monocyte-derived macrophages). To avoid potential confusion, we have clarified the terminology in the revised manuscript.

      Methods: diet is mentioned for 6 weeks but for HFHC should be 16.

      The correction has been made in the Methods section (page 3,line115).

      Liver/body weight ratios are >3 then I think it is body/liver weight ratio?

      We thank the reviewer for this query. The reported values represent liver-to-body weight ratios, calculated as (liver weight ÷ body weight) × 100%. A value of ~3% is consistent with the expected range for mice with MASLD-associated hepatomegaly.

      This clarification has been added to the revised figure legend.

      Figure 5F - what happens in Clec4f-CRE mice fed HFHC?

      We thank the reviewer for this question. Western blot analysis showed that the HFHC diet upregulated Chi3l1 protein in the livers of Clec4f-Cre mice post HFHC diet (Author response image 4.), similar to the increase observed in wild-type mice.

      Author response image 4.

      The expression of Chi3l1 in serum of Clec4f cre mice. (A) Western blot to detect Chi3l1 expression in murine serum of Clec4f cre mice before and after HFHC feeding. n=3 mice/group.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents convincing findings that oligodendrocytes play a regulatory role in spontaneous neural activity synchronisation during early postnatal development, with implications for adult brain function. Utilising targeted genetic approaches, the authors demonstrate how oligodendrocyte depletion impacts Purkinje cell activity and behaviours dependent on cerebellar function. Delayed myelination during critical developmental windows is linked to persistent alterations in neural circuit function, underscoring the lasting impact of oligodendrocyte activity.

      Strengths:

      (1) The research leverages the anatomically distinct olivocerebellar circuit, a well-characterized system with known developmental timelines and inputs, strengthening the link between oligodendrocyte function and neural synchronization.

      (2) Functional assessments, supported by behavioral tests, validate the findings of in vivo calcium imaging, enhancing the study's credibility.

      (3) Extending the study to assess the long-term effects of early-life myelination disruptions adds depth to the implications for both circuit function and behavior.

      We appreciate these positive evaluation.

      Weaknesses:

      (1) The study would benefit from a closer analysis of myelination during the periods when synchrony is recorded. Direct correlations between myelination and synchronized activity would substantiate the mechanistic link and clarify if observed behavioral deficits stem from altered myelination timing.

      We appreciate the reviewer’s thoughtful suggestion and have expanded the manuscript to clarify how oligodendrocyte maturation relates to the development of Purkinje-cell synchrony. The developmental trajectory of Purkinje-cell synchrony has already been comprehensively characterized by Good et al. (2017, Cell Reports 21: 2066–2073): synchrony drops from a high level at P3–P5 to adult-like values by P8. We found that the myelination in the cerebellum starts to appear from P5-P7 (Figure S1A, B), indicating that the timing of Purkinje cell desynchronization coincides with the initial appearance of oligodendrocytes and myelin in the cerebellum. To determine whether myelin growth could nevertheless modulate this process, we quantified ASPA-positive oligodendrocyte density and MBP-positive bundle thickness and area at P10, P14, P21 and adulthood (Fig. 1J, K, Fig. S1E). Both metrics increase monotonically and clearly lag behind the rapid drop in synchrony, indicating that myelination could be not the primary trigger for the desynchronization. When oligodendrocytes were ablated during the second postnatal week, the synchrony was reduced (new Fig. 2). Thus, once myelination is underway, oligodendrocytes become critical for maintaining the synchrony, acting not as the initiators but as the stabilizers and refiners of the mature network state.

      We have added the new subsection in discussion (lines 451–467) now in which we propose a two-phase model. Phase I (P3–P8): High early synchrony is generated by non-myelin mechanisms (e.g. transient gap junctions, shared climbing-fiber input). Phase II (P8-). As oligodendrocytes proliferate and ensheath axons, they fine-tune conduction velocity and stabilize the mature, low-synchrony network state.

      We believe these additions fully address the reviewer’s concerns.

      (2) Although the study focuses on Purkinje cells in the cerebellum, neural synchrony typically involves cross-regional interactions. Expanding the discussion on how localized Purkinje synchrony affects broader behaviors - such as anxiety, motor function, and sociality - would enhance the findings' functional significance.

      We appreciate the reviewer’s helpful suggestion and have expanded the Discussion (lines 543–564) to clarify how localized Purkinje-cell synchrony can influence broader behavioral domains. In the revised text we note that changes in PC synchrony propagate into thalamic, prefrontal, limbic, and parietal targets, thereby impacting distributed networks involved in motor coordination, affect, and social interaction. Our optogenetic rescue experiments further support this framework, as transient resynchronization of PCs normalized sociability and motor coordination while leaving anxiety-like behavior impaired. This dissociation highlights that different behavioral domains rely to varying degrees on precise cerebellar synchrony and underscores how even localized perturbations in Purkinje timing can acquire system-level significance.

      (3) The authors discuss the possibility of oligodendrocyte-mediated synapse elimination as a possible mechanism behind their findings, drawing from relevant recent literature on oligodendrocyte precursor cells. However, there are no data presented supporting this assumption. The authors should explain why they think the mechanism behind their observation extends beyond the contribution of myelination or remove this point from the discussion entirely.

      We thank the reviewer for pointing out that our original discussion of oligodendrocyte-mediated synapse elimination was not directly supported by data in the present manuscript. Because we are actively analyzing this question in a separate, follow-up study, we have deleted the speculative passage to keep the current paper focused on the demonstrated, myelination-dependent effects. We believe this change sharpens the mechanistic narrative and fully addresses the reviewer’s concern.

      (4) It would be valuable to investigate the secondary effects of oligodendrocyte depletion on other glial cells, particularly astrocytes or microglia, which could influence long-term behavioral outcomes. Identifying whether the lasting effects stem from developmental oligodendrocyte function alone or also involve myelination could deepen the study's insights.

      We thank the reviewer for raising this point and have performed the requested analyses. Using IBA1 immunostaining for microglia and S100b for Bergmann glia, we quantified cell density and these marker signal intensity at P14 and P21. Neither microglial or Bergmann-glial differed between control and oligodendrocyte-ablated mice at either time‐point (new Figure S2). These results indicate that the behavioral phenotypes we report are unlikely to arise from secondary activation or loss of other glial populations.

      We now added results (lines 275–286) and also discuss myelination and other oligodendrocyte function (lines 443–450). It remains difficult to disentangle conduction-related effects from myelination-independent trophic roles of oligodendrocytes. We therefore note explicitly that future work employing stage-specific genetic tools or acute metabolic manipulations will be required to parse these contributions more definitively.

      (5) The authors should explore the use of different methods to disturb myelin production for a longer time, in order to further determine if the observed effects are transient or if they could have longer-lasting effects.

      We agree that distinguishing transient from enduring effects is critical. Importantly, our original submission already included data demonstrating a persistent deficit of PC population synchrony (Fig. 4, previous Fig. 3): (i) at P14—the early age after oligodendrocyte ablation—population synchrony is reduced, and (ii) the same deficit is still present in adults (P60–P70) despite full recovery of ASPA-positive cell density and MBP-area and -thickness (Fig. 2H-K, Fig. S1E, and Fig. 4). We also performed the ablation of oligodendrocytes after the third postnatal week. Despite a similar acute drop in ASPA-positive cells, neither population synchrony nor anxiety-, motor-, or social behaviors differed from littermate controls. Thus, extending myelin disruption beyond the developmental window does not exacerbate or prolong the phenotype, whereas a short perturbation within that window leaves a permanent timing defect. These findings strengthen our conclusion that it is the developmental oligodendrocyte/myelination program itself—rather than ongoing adult myelin production—that is essential for establishing stable network synchrony. We now highlight this point explicitly in the revised Discussion (lines 507–522).

      (6) Throughout the paper, there are concerns about statistical analyses, particularly on the use of the Mann-Whitney test or using fields of view as biological replicates.

      We appreciate the reviewer’s guidance on appropriate statistical treatment. To address these concerns we have re-analyzed all datasets that contained multiple measurements per animal (e.g., fields of view, lobules, or trials) using nested statistics with animal as the higher-order unit. Specifically, we applied a two-level nested ANOVA when more than two groups were compared and a nested t-test when two conditions were present. The re-analysis confirmed all original conclusions. Because the nested models yielded comparable effect sizes to the Mann–Whitney tests, we have retained the mean ± SEM for ease of comparison with prior literature but now also report all values for each mouse in Table 1. In cases where a single measurement per mouse was compared between two groups, we used the Mann–Whitney test and present the results in the graphs as median values.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      In this important work, it is demonstrated that certain high-resolution cryo-EM structures can be obtained by using concentrated cell extracts without purification. The compelling results with the mammalian ribosomes demonstrate the utility of this approach for this molecule and complexes with elongation factor 2. Moreover, this work also demonstrates the utility of 2D template matching for particle picking for structure determination by single-particle averaging pipelines.

      We thank the reviewers for their valuable comments and suggestions, which have helped us to improve the manuscript. We provide a response to the referees’ comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Seraj et al. introduces a transformative structural biology methodology termed "in extracto cryo-EM." This approach circumvents the traditional, often destructive, purification processes by performing single-particle cryo-EM directly on crude cellular lysates. By utilizing high-resolution 2D template matching (2DTM), the authors localize ribosomal particles within a complex molecular "crowd," achieving near-atomic resolution (~2.2 Å). The biological centerpiece of the study is the characterization of the mammalian translational apparatus under varying physiological states. The authors identify elongation factor 2 (eEF2) as a nearly universal hibernation factor, remarkably present not only on non-translating 80S ribosomes but also on 60S subunits. The study provides a detailed structural atlas of how eEF2, alongside factors like SERBP1, LARP1, and IFRD2, protects the ribosome's most sensitive functional centers (the PTC, DC, and SRL) during cellular stress.

      Strengths:

      The "in extracto" approach is a significant leap forward. It offers the high resolution typically reserved for purified samples while maintaining the "molecular context" found in in situ studies. This addresses a major bottleneck in structural biology: the loss of transiently bound or labile factors during biochemical purification.

      The finding that eEF2 binds and sequesters 60S subunits is a major biological insight. This suggests a "pre-assembly" hibernation state that allows for rapid mobilization of the translation machinery once stress is relieved, which was previously uncharacterized in mammalian cells.

      The authors successfully captured eIF5A and various hibernation factors in states that are traditionally disrupted. The identification of eIF5A across nearly all translating and non-translating states highlights the power of this method to detect ubiquitous but weakly bound regulators.

      The manuscript beautifully illustrates the "shielding" mechanism of the ribosome. By mapping the binding sites of eEF2 and its co-factors, the authors provide a clear chemical basis for how the cell prevents nucleolytic cleavage of ribosomal RNA during nutrient deprivation.

      Weaknesses:

      (1) While 2DTM is a powerful search tool, it inherently relies on a known structural "template." There is a risk that this methodology may be "blind" to highly divergent or novel macromolecular complexes that do not share sufficient structural similarity with the search model. The authors should discuss the limitations of using a vacant 60S/80S template in identifying highly remodeled stress-induced complexes. For instance, what happens if an empty 40S subunit is used as a template? In the current work, while 60S and 80S particles are picked, none are 40S. The authors should comment on this.

      Thank you for your comment. As noted by the reviewer, 2DTM inherently favors particles that share sufficient similarity with the search template and may underrepresent highly remodeled or structurally divergent complexes. Importantly, once particles are identified, subsequent 2D/3D classification and refinement are not constrained by the template used for particle picking. Consistent with this, we observe classes displaying additional or altered densities absent in the original template, indicating that template matching does not preclude the detection of remodeled ribosomal states, although highly divergent species may still escape detection.

      Regarding the use of a 40S subunit as a template for 2DTM, we tested two templates: a complete 40S subunit and the 40S body alone. Using these 40S templates, we captured several 40S-, 43S-, and 48S-containing complexes, as well as 80S particles. As expected, no individual 60S classes emerge with 40S-TM. 40S-TM yielded 80S classes similar to those with 60-TM, although the number of particles was lower than that in 60S template matching, resulting in lower resolution of these classes. Since this study focuses on ribosome hibernation, we chose to proceed with the 60S-TM results and do not report results using 40S-TM. We reported 40S-TM results in another study from our groups (Zottig et al., bioRxiv, 2025), which focuses on translation initiation on 40S subunits and was deposited as preprint after this submission.

      We have added a comment and reference describing the use of the 40S template in the initial section of Results and Discussion: “This result echoes our concurrent finding that using 40S or partial 40S templates yields a variety of initiation complexes and 80S classes, revealing densities beyond those in the template [44].”

      (2) In the GTPase center, the authors identify density for "DRG-like" proteins. However, due to limited local resolution in that specific region, they are unable to definitively distinguish between DRG1 and DRG2. While the structural similarity is high, the functional implications differ, and the identification remains somewhat speculative. The authors should acknowledge this in the text.

      We agree with this comment and address it in the main text:

      “Whereas the overall shape and secondary structure resemble DRG1 or DRG2, the local resolution is insufficient to distinguish between these or other similarly structured proteins. Both yeast and mammalian counterparts are reported to function with a companion factor (Tma146p or Gir2 in yeast; or DFRP1 and DFRP2 in mammals), but our maps do not contain density that could correspond to DFRP1/2 near the putative DRG1/2 density. Future work will elucidate the function of these or other DRG-like GTPases in the context of an elongation complex.”

      (3) While "in extracto" is superior to purified SPA, the act of cell lysis (even rapid permeabilization) still involves a change in the chemical environment (pH, ion concentration, and dilution of metabolites). The authors could strengthen the manuscript by discussing how post-lysis changes might affect the occupancy of factors like GTP vs. GDP states.

      Thank you for pointing this out. Cell lysis can indeed lead to a change in the chemical environment, although we do not know how post-lysis changes may specifically affect the occupancy of factors, such as GTP- vs. GDP-bound states. We tried to minimize this effect by performing a rapid permeabilization. Our efforts to optimize our protocols are ongoing, and we expect to have a better answer to this question in the future.

      Nevertheless, to address this reviewer’s concern, our discussion states: “Additional optimization of buffer conditions may be required to more accurately represent the translation states observed in cells, as ionic conditions are known to affect the conformation of the ribosomes (e.g. rotated/non-rotated) and binding of protein factors”.

      (4) The study provides excellent snapshots of stationary states (translating vs. hibernating), but the kinetic transition, specifically how the 60S-eEF2 complex is recruited back into active translation, is not well discussed. On page 13, the authors present eEF2 bound to 60S but do not mention anything regarding which nucleotide is bound to the factor. It only becomes clear that it is GDP after looking at Figure S9. This should be clarified in the text. Similarly, the observations that eEF2 is bound to GDP in the 60S and 80S raise questions as to how the factor dissociates from the ribosome. This could also be discussed.

      Thank you for bringing this to our attention. We now state in the main text that eEF2 is bound with GDP on the 60S subunit.

      As for the kinetic transitions of 60S-eEF2 complexes, like this reviewer, we are fascinated by the possible roles and mechanisms of the 60S-eEF2 complex. The averaged particle ensembles derived from cryo-EM data do not report on the kinetics or transition pathways directly. We acknowledge in the main text that “Future studies will bring insights into the roles of the protein(s) and into the functions and transitions of 60S•eEF2 complexes to the pool of translating ribosomes”.

      Overall Assessment:

      The work reported in this manuscript likely represents the future of structural proteomics. The combination of high-resolution structural biology with minimal sample perturbation provides a new standard for investigating the cellular machines that govern life. After addressing minor points regarding template bias, protein identification, and transition dynamics, this work may become a landmark in the field of translation.

      Reviewer #2 (Public review):

      In this manuscript, the authors describe using "in extracto" cryo-EM to obtain high-resolution structures of mammalian ribosomes from concentrated cell extracts without further purification or reconstitution. This approach aims to solve two related problems. The first is that purified ribosomes often lose cellular cofactors, which are often reconstituted in vitro; this precludes the ability to find novel interactions. The second is that while it is possible to perform cryo-EM on cellular lamella, FIB milling is a slow and laborious process, making it unfeasible to collect datasets sufficiently large to allow for high-resolution structure determination. Extracts should contain all cellular cofactors and allow for grid preparation similar to standard single-particle analysis (SPA) approaches. While cryo-EM of cell extracts is not in itself novel, this manuscript uses 2D template matching (2DTM) for particle picking prior to structure determination using more standard SPA pipelines. This should allow for improved picking over other approaches in order to obtain large datasets for high-resolution SPA.

      This manuscript has two main results: novel structures of ribosomes in hibernating states; and a proof-of-principle for in extracto cryo-EM using 2DTM. Overall, I think the results presented here are strong and serve as a proof-of-principle for an approach that may be useful to many others. However, without presenting the logic of how parameters were optimized, this manuscript is limited in its direct utility to readers.

      Thank you for this valuable comment. We have expanded our Methods section “Optimization of 2DTM in RRL data “to present the logic behind parameter optimization, with the paragraph beginning with “We optimized high-resolution template matching procedures…”

      Reviewer #3 (Public review):

      Summary:

      The authors describe a new structural biology framework termed "in extracto cryo-EM," which aims to bridge the gap between single-particle cryo-EM of purified complexes and in situ cryo-electron tomography (cryo-ET). By utilizing high-resolution 2D template matching (2DTM) on mammalian cell lysates, the authors sought to visualize the translational apparatus in a near-native environment while maintaining near-atomic resolution. The study identifies elongation factor 2 (eEF2) as a major hibernation factor bound to both 60S and 80S particles and describes a variety of hibernation scenarios involving factors such as SERBP1, LARP1, and CCDC124.

      Strengths:

      (1) The use of 2DTM effectively overcomes the signal-to-noise challenges posed by the dense and viscous nature of cellular extracts, yielding maps as high as 2.2 Å.

      (2) The discovery of eEF2-GDP as a ubiquitous shield for ribosomal functional centers, particularly its unexpected stabilization on the 60S subunit, provides a compelling model for ribosome preservation during stress.

      Weaknesses:

      (1) Representative nature of cell samples and lower detection limit

      The cells used in this study (MCF-7, BSC-1, and RRL) are either fast-growing cancer cell lines or specialized protein-synthetic systems. For cells with naturally low ribosomal abundance (such as quiescent primary cells), achieving the target concentration (e.g., A260 > 1000 ng/uL) would require an exponentially larger starting cell population.

      Is there a defined lower limit of ribosomal concentration in the raw lysate below which the 2DTM algorithm fails to yield high-resolution classes? In ribosome-sparse lysates, A260 becomes an unreliable proxy for ribosome density due to the high background of other RNA species and proteins. How do the authors estimate specific ribosome abundance in such heterogeneous fields?

      We have not tested these specific points, but we found that 2DTM can successfully result in high-resolution reconstructions even with 1-2 particles per micrograph. This would require a substantially larger dataset than in this work yet could provide a viable strategy for diluted or low-abundance samples. Other optimizations, including lysate concentration, may help as well. We have the following text to reflect these points:

      “Additional optimization of buffer conditions may be required to more accurately represent the translation states observed in cells, as ionic conditions are known to affect the conformation of the ribosomes (e.g. rotated/non-rotated) and binding of protein factors [91-94]. For cells or samples with lower abundance of ribosomes or other macromolecules/complexes of interest, a lysate concentration step or collection of a larger dataset may be considered.”

      (2) Quantitation in heterogeneous lysates and crowding effects

      The authors utilize A260 as a key quality control measure before grid preparation. However, if extreme physical concentration is required to see enough particles, the background concentration of other cytoplasmic components also increases. This may lead to molecular crowding or sample viscosity that interferes with the formation of optimal thin ice. How do the authors calculate or estimate the specific abundance of ribosomes in the cryo-EM field of view when they represent a much smaller percentage of the total cellular content?

      We reported A260 as a reference that may be useful to achieve particle distributions resembling those in our work, rather than as a key quality control measure. Accordingly, we do not use it to estimate ribosome concentration or the specific abundance of ribosomes; instead, we’d recommend adjusting the sample concentration/dilution by grid screening.

      This reviewer mentions the important aspect of ice thickness. We found that the highest population of ribosome particles is found in thicker ice regions, and these particles have been used to make up the majority of our datasets leading to high-resolution reconstructions. We have added this observation to “Optimization of 2DTM in RRL data”.

      (3) Optimization of sample preparation

      The authors describe lysates as dense and viscous, requiring multiple blotting steps (2-3 times) for 3-8 seconds. Have the authors tested whether a larger molecular weight cutoff (e.g., 100 kDa) during concentration could improve the ribosome-to-background ratio without losing small factors like eIF5A (approx. 17 kDa)? Could repeated blotting of a concentrated, viscous lysate introduce shearing forces or increased exposure to the air-water interface that perturbs the native conformation of the complexes?

      We strived to minimize the number of steps in sample preparation, so we did not extensively test concentration steps. We also found that a concentration step can be omitted; the eIF5A-containing structure from the RRL dataset was determined without this step. We agree with the reviewer that repeated blotting may change ribosome complex equilibrium and result in a different distribution of functional states than in cells. However, we did not find evidence of perturbation of the native conformations of complexes, as the positions of ribosomes and factors are nearly identical to those observed in previous studies, including the recent high-resolution structures from cells that we cite.

      (4) The regulatory switch and mechanism of eEF2

      The finding that eEF2-GDP occupies dormant ribosomes is striking. What drives eEF2 from its canonical role in translocation to this hibernation state? Is this transition purely driven by stoichiometry (lack of mRNA/tRNA) and the GDP/GTP ratio, or is there a role for post-translational modifications? How do these eEF2-bound dormant ribosomes rapidly re-enter the translation pool upon stress relief?

      We are glad that this reviewer is fascinated by the eEF2-GDP occupancy on dormant ribosome (just like we are)! These are important open questions that require further research, as our cryo-EM analyses cannot directly address the kinetic or mechanistic aspects of the mentioned processes. We did explore the known modification/phosphorylation sites in eEF2 densities but did not find evidence for such modifications, which does not rule out the possibility of transient or new modifications.

      (5) Hibernation diversity and LARP1 contextualization

      The study reveals that hibernation strategies vary across cell types. Does the high hibernation rate in RRL reflect a physiological state, or does it hint at “preparation-induced stress” due to resource exhaustion or mRNA degradation in the cell-free system? How do the authors reconcile their discovery of LARP1 on 80S particles with recent 2024 reports that primarily describe LARP1 as an SSU-bound repressor?

      Based on the high abundance of hibernating ribosomes in RRL (relative to many other samples we have tested so far), we speculate that this scenario may result from the stresses induced during lysate preparation: first, the rabbits are treated with phenylhydrazine inducing cell stress, then lysates are treated with micrococcal nuclease to degrade endogenous mRNAs. In addition, the specialization of reticulocytes may contribute to the distinct expression of stress/hibernation factors.

      As for LARP1, our finding is consistent with the 2024 work by Saba et al, who reported LARP1 binding to both 40S subunits and 80S ribosomes. They also noted that LARP1-bound ribosomes are “non-translating”, consistent with our structures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 3, it would be easier for the reader if the authors would report the % of particles in each class. Also, indicating body rotation and head swiveling values would help.

      Because our high-resolution maps result from a combination of data sets (e.g., RRL with an mRNA and RRL without an mRNA), we specify the particle percentages in the corresponding classification schemes in supplemental figures. To avoid excessive labeling in this figures, body rotation and head swiveling values for the new classes are shown in Figure 4.

      (2) Page 16, what is 'elongation factor 1'? It doesn't seem the authors refer to eEF1A?

      Thank you for pointing out this inconsistency, this is indeed eEF1A. We have corrected the text.

      (3) Page 16, after 'individual 60S subunits', there is a missing full stop.

      Thanks. Corrected.

      Reviewer #2 (Recommendations for the authors):

      I am not an expert in ribosome biology and do not have any specific comments on the various states presented here. Instead, I will mainly focus on the image processing aspects of this manuscript.

      Major points:

      (1) Were any AI-based particle pickers, such as crYOLO, topaz, or warp tested? While more traditional template-based or LoG pickers were shown to be inferior to 2DTM, it is unclear if AI methods would perform just as well. Given that a major point of this manuscript is the image processing pipeline, and that these AI tools have been widely adopted in the field, I think this is an important consideration.

      We used other particle pickers before using 2DTM and have listed them in the Supplementary Information: please see Table S1 for a complete list of particle pickers evaluated in this study. Since our present work focuses on a sample preparation method, a more extensive evaluation of particle picking methods is beyond the scope of this study.

      (2) While the methods used to obtain the structures presented are detailed, I think it would also be useful to provide some logic for how parameters were determined or optimized. This would serve as a useful foundation for readers who wish to try out this in an extracto approach on their own specimens. Some of these optimizations seem quite specific, such as optimization of angular search parameters, but with no clear logic: e.g., why is the out-plane search coarser than the in-plane search; what is the effect of increasing the angular step sizes? Some seem inconsistent, e.g., why is e2pdb2mrc.py sometimes used and the cisTEM simulate used other times? Some are poorly described, such as "the defocus search turned on for micrographs with thicker ice" where there is no mention of how ice thickness is assessed and how thick is too thick. I think a workflow figure with accompanying text would help the reader understand the logic used in this work and how to apply that logic to their own projects.

      To address the comments in (2), we provide separate responses addressing each comment:

      (1) Provide some logic for how parameters were determined or optimized:

      The logic behind determining and optimizing search parameters is a balance between search precision and computational cost. In practice, users must weigh the benefit of finer sampling against the substantial increase in runtime, particularly for large datasets. For example, enabling defocus searching with a 200 Å step size and a 1000 Å range increases the computational time by approximately 11-fold compared to running the same search with defocus disabled (since each defocus plane in the positive and negative direction are searched), making such increases prohibitive, when GPU resources are limited. In such cases, reducing the defocus search to a 250 Å step size and a 500 Å range can dramatically shorten runtime while preserving nearly the same number of reliable matches. In summary, we found that optimizing the defocus search, in-plane, out-plane angles, and the image/micrograph pixel size can substantially reduce the processing speed while sacrificing only a small percentage of particles.

      We have expanded our parameter optimization paragraph in “Optimization of 2DTM in RRL data”, as mentioned in a previous response.

      (2) Some seem inconsistent, e.g., why is e2pdb2mrc.py sometimes used and the cisTEM simulate used other times?

      e2pdb2mrc.py is simpler to use and was used in the beginning of the project. Later, we switched to using the simulate program since it preformed slightly better. Either software is suitable to generate templates for 2DTM.

      (3) Some are poorly described, such as "the defocus search turned on for micrographs with thicker ice" where there is no mention of how ice thickness is assessed and how thick is too thick.

      We did not quantitatively assess ice thickness; instead, we tested whether it is advantageous to include the defocus search. To this end, we first performed CTF estimation and grouped micrographs based on their fit resolution. From each group, we selected ten micrographs representing the highest and lowest fit resolutions. Template matching was then performed using identical parameters, once with defocus search enabled and once with it disabled. The number of picked particles for each micrograph under both conditions was compared. When a significant difference was observed most commonly for icy micrographs with low fit resolution we enabled defocus search for that group of images. The difference between having the defocus search on vs off sometimes resulted in having 2x more matches. We found these images/datasets appeared to have a higher background compared to in-vitro reconstituted samples. The template-matching results from these micrographs were subsequently combined with results from groups processed with defocus search disabled.

      To address this point, we have included this description in “Optimization of 2DTM in RRL data”.

      (4) I think a workflow figure with accompanying text would help the reader understand the logic used in this work and how to apply that logic to their own projects.

      Thanks for this suggestion. We have added a workflow figure as Figure 1—figure supplement 2.

      Minor Points:

      (1) While the image processing described seems appropriate, I think it is still necessary to include Fourier shell correlation plots for the final structures as supplemental data.

      Thank you for pointing out this inadvertent omission. We have added FSC curves in Figure 3—figure supplement 3.

      (2) One of the initial workflows used is a Relion 3 pipeline, which is, at this point, quite dated. Is there a reason Relion 4 or 5 was not used instead?

      The project started when Relion 3 was the latest version.

    1. Author response:

      We thank the editors and reviewers for their careful evaluation of our manuscript, “GM-CSF regulates ILC states and myeloid cell signaling during ulceration in Crohn’s disease.” We appreciate the constructive feedback and agree that strengthening the mechanistic understanding of GM-CSF signaling in the regulation of ILC populations will significantly improve the study.

      The reviewers identified a key gap regarding the downstream mechanisms by which GM-CSF maintains ILC3 populations and limits ILC1 expansion. In response, we will focus our revision on defining the myeloid-mediated pathways downstream of GM-CSF that regulate ILC states.

      Specifically, we plan to: 

      (1) Characterize myeloid cell responses to GM-CSF signaling

      We will perform additional analyses of both our Xenium spatial transcriptomics and zebrafish single-cell RNA-seq datasets to identify transcriptional changes in macrophages and monocytes associated with GM-CSF signaling. This will include differential gene expression and pathway enrichment analyses to uncover candidate signaling pathways (e.g., cytokine and STAT5-associated programs) that may mediate ILC regulation.

      (2) Strengthen spatial niche analysis in human tissue

      We will refine our Xenium-based analyses to better define the cellular microenvironments surrounding GM-CSF-producing cells, including higher-resolution visualization and quantification of receptor-expressing target cells and signaling niches within ulcerated regions.

      (3) Further define immune cell populations in the zebrafish model

      We will enhance the definition of ILC subsets by incorporating additional marker-based analyses and clarifying their relationship to human ILC populations. In parallel, we will more thoroughly characterize the myeloid compartment in csf2rb-deficient zebrafish to determine how GM-CSF signaling impacts these populations.

      (4) Clarify analysis methods and presentation

      We will address all points related to statistical testing, data visualization, and figure clarity raised by the reviewers, including the use of appropriate statistical comparisons for multi-group analyses and improved annotation of gene modules and data sources.

      Together, these revisions will provide a clearer mechanistic framework linking GM-CSF signaling in myeloid cells to the maintenance of ILC3 populations and suppression of inflammatory ILC1 responses.

      We believe these additions will substantially strengthen the manuscript and address the reviewers’ concerns. We appreciate the opportunity to revise our work and look forward to submitting a revised version.

    1. Author response:

      We would like to thank the editors and the reviewers for their thoughtful and constructive assessment of our manuscript. We appreciate the reviewers' positive recognition of our research and their thoughtful assessment of our data.

      In the upcoming revision, we will incorporate rigorous statistical analysis (p-values) for our binding assays, optimize the structural figures and summary tables for better clarity, and discuss the recent preprint paper alongside the nuances of Egl-BicD stoichiometry. Regarding the suggestion for CLIP-seq, we agree that a global analysis would be a valuable extension of this work. However, as our lab’s core expertise is in structural biology, and the in vivo functional studies in this manuscript were conducted through a collaboration to validate our structural findings, we feel that such a large-scale genomic study falls beyond the scope of the current structural report.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zare-Eelanjegh et al. investigate how the endoplasmic reticulum, the nucleus, and the cell periphery are mechanically linked by indenting intact cells with specially shaped atomic force probes that double as drug injection devices. -Fluorescencelifetime imaging of the membrane tension reporter -FlipperTR- reveals that these three compartments are mechanically linked and that the actin cytoskeleton, microtubules, and lamins modulate this coupling in complex ways.

      Strengths:

      (1) The study makes an important advance by applying FluidFM to probe organelle mechanics in living cells, a technically demanding but powerful approach.

      (2) Experimental design is quantitative, the data are clearly presented, and the conclusions are broadly consistent with the measurements.

      Weaknesses:

      (1) Calcium-dependent- effects: Indentation can evoke cytoplasmic CA<sup>2+</sup> elevations that drive myosin contraction and reshape the internal membrane network (e.g., vesiculation: PMID : 9200614, 32179693) possibly confounding the Flipper-TR responses; without simultaneous/matching CA<sup>2+</sup> imaging, cell viability assays (e.g., Sytox), and intracellular CA<sup>2+</sup> sequestration or myosin inhibition experiments, a more complex mechanochemical coupling cannot be excluded, weakening conclusions.

      (2) Baseline measurements: FlipperTR lifetime images acquired without indentation do not exclude potential -light-induced or -time-dependent- changes, which weaken the conclusions.

      (3) Indentation depth versus nuclear stiffness/tension: Because lamin-A/C depletion softens nuclei, a given force may produce a deeper pit and thus greater membrane stretch. It is unclear how the cytoskeletal perturbations affect indentation depth, which weakens the conclusions.

      Reviewer #2 (Public review):

      Summary:

      This useful study combines atomic force microscopy with genetic manipulations of the lamin meshwork and microinjection of cytoskeletal depolymerizing drugs to probe the mechanical responses of intracellular organelles to combinations of cytoskeletal perturbations. This study demonstrates both local and distal responses of intracellular organelles to mechanical forces and shows that these responses are affected by disruption of the actin, microtubule, and lamin cytoskeletal systems. Interpretation of these effects is limited by the absence of key data determining whether acute microinjection of cytoskeleton-depolymerizing drugs has complete or partial effects on the targeted cytoskeletal networks.

      Strengths:

      This study uses a sensitive micromanipulation system to apply and visualize the effects of force on intracellular organelles.

      Weaknesses:

      The choice to deliver cytoskeleton-depolymerizing drugs by local microinjection is unusual, and it is unclear to what extent actin and microtubule filaments are actually depolymerized immediately after microinjection and on the minutes-length timescale being evaluated in this study. This omission limits the interpretation of these data.

      Reviewer #3 (Public review):

      Summary:

      Using an approach developed by the authors (FluidFM) combined with FLIM, they discover that a mechanical force applied over the cell nucleus triggers mechanical responses dependent on the Lamina composition.

      Strengths:

      The authors present a new approach to study mechano-transduction in living cells, with which they uncover lamin-dependent properties of the nucleus.

      Weaknesses:

      (1) The transfer of the mechanical response from the Lamina to the ER is not fully covered.

      (2) In Figure 4D, WT dots are the same for each compartment. Why do the authors not make one graph for each compartment with WT, A-KO, B-KD, and A-KO/B-KD together?

      (3) In Figure 1E, the authors showed well how the probe deforms the nucleus. It is not indicated in the material and methods section or in the figure legend, where, in Z, the acquisition of FLIM images was made or if it is a maximum projection. I assume it was made at a plane in the middle of the nucleus to see the nuclear envelope border and the ER at the same time. Did the authors look at the nuclear membrane facing upward, where most of the deformation should occur? Are there more lifetime changes? In Figure D, before injection of CytoD, we can clearly see a difference at the pyramidal indentation site with two different lifetime colors.

      (4) A great result of this article regards the importance of Lamins, A and B, in triggering the response to a mechanical force applied to the nucleus. Could 3D imaging for LaminA and LaminB be performed at the different time points of indentation to see how the lamins meshworks are deformed and how they return to basal state? This could be correlated with the FLIM results described in the article.

      (5) Lamins form a meshwork underneath the nuclear membrane. They are connected to the cytoskeletons mainly by the LINC complex. Results presented here show that the cytoskeletons are implicated in transferring the stimulus from the nuclear envelope to the ER. Could the author perform the same experiments using Nesprin-2 or/and Nesprin-1 or/and SUN1/2 knockdowns to determine if this transmission is occurring through the LINC complex or rather in a passive way by modifying the nuclear close surroundings?

      (6) The authors used cytoskeleton drugs, CytoD and Nocodazole, with their FluidFM probe, but did not show if the drugs actually worked and to what extent by performing actin or microtubule stainings. In the original paper describing FluidFM, 15s were enough to obtain a full FITC-positive cell after injection. Here, the experiments are around 5 minutes long. I therefore interrogate the rationale behind the injection of the drugs compared to direct incubation, besides affecting only the cell currently under indentation.

      We thank the reviewers for their constructive criticisms and suggestions. Accordingly, we amended the manuscript and the figures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Calcium-dependent effects: Indentation can evoke cytoplasmic CA<sup>2+</sup> elevations that drive myosin contraction and reshape the internal membrane network (e.g., vesiculation: PMID : 9200614, 32179693) that may affect Flipper-TR signals independent of membrane tension; without simultaneous CA<sup>2+</sup> imaging, cell viability assays (e.g., Sytox costaining), intracellular CA<sup>2+</sup> sequestration or myosin inhibition, a more complex mechanochemical coupling cannot be excluded. Tracking ER morphology during the experiments with luminal and membrane markers would further clarify this point.

      For the goal of our article which is exhibiting and quantifying the tension propagation and tension homeostasis over different organelles managing the mechanosensitivity and thus the mechanoresponse of cell, the test cells (drug injected cells) were compared with the control group of non-drug injected cells (Fig. 2 and Fig. 3), and in these cases potential overall responses of the cells to intendation, e.g. potential changes in CA<sup>2+</sup> sequestration, are covered by the control group.

      Interestingly, using only cylindrical probes in CytoD injection while indenting cells, demonstrated higher tension at the NE compared to the control group of non-drug injected cells. This indicates that a higher effect arising from the F-actin-disturbance phenomena compared to the indention process itself, at least where the cells were stimulated using cylindrical probes. That was also the reason why in the next steps of this study including varying the indentation site from the nucleus to the ER or cell periphery as well as studying WT cells compared to varied lamina compositions, only cylindrical probes with minimized indention effect on the NE and the ER were used.

      Lastly, to examine simultaneously response to tension changes and calcium dynamics, we have meanwhile extended our study and analyzed cells treated with different cytoskeleton disturbing drugs (e.g., CytoD), subjected to viscoelasticity measurements using AFM indentation (i.e. cells relaxation studies following indentation), and injected with drugs perturbing the regulation of CA<sup>2+</sup> homeostasis (i.e., Thapsigargin), combined with simultaneous CA<sup>2+</sup> imaging, for which another manuscript is in preparation.  

      (2) Baseline measurements: FlipperTR lifetime images acquired without indentation, collected with identical timing and illumination, are needed as controls to gauge potential light-induced or time-dependent changes.

      For every cell a baseline referring to its tension at relaxed state (without indentation) was quantified by a Flipper-TR image taken before the indention and injection processes (“before”). As explained in the manuscript (lines 180-184), this baseline tension value was then used to be subtracted from the tension measured over time by the time-lapse FlipperTR imaging over the course of 3-4 min of stimulation (indentation + injection) as well as immediately or 5 min post-stimulus. The control group (i.e., non-drug injected cells or WT cells where the effect of F-actin depolymerization or the effect of lamina composition were studied, respectively) was always performed in the same manner as for test group. As such, tenson increase due to the light-inducing, time-dependent changes or indentation solely, were excluded.

      (3) Indentation depth versus nuclear stiffness/tension: Because laminA/C depletion softens nuclei, a given force may arguably produce a deeper pit and thus greater (not less) membrane stretch. Demonstrating that pit geometry depends only on applied force - and not on genetic or pharmacological perturbations - is necessary to rule out alternative interpretations.

      We thank the reviewer for raising this important point regarding the relationship between indentation depth and nuclear stiffness. To address whether pit geometry depends on applied force rather than genetic perturbations, we analyzed the piezo movement required to reach the 150 nN force setpoint across all experimental conditions (WT, LMNA KO, LMNB KD, and LMNA KO/LMNB KD cells).

      Our results (Fig. S6) demonstrate that there is no statistically significant difference in the piezo displacement from the contact point to the 150 nN setpoint between any of the experimental groups (Kruskal-Wallis H-test: H = 1.744, p = 0.627). This indicates that for a constant applied force of 150 nN, the indentation depth is equivalent across all conditions despite differences in nuclear stiffness.

      Therefore, the observed differences in tension response and perhaps the membrane stretch cannot be attributed to variations in indentation depth but rather reflect the intrinsic differences in molecular mechanical response to equivalent mechanical stimuli.

      This has been added in the manuscript in lines 282-286.

      Reviewer #2 (Recommendations for the authors):

      (1) Please clarify the distinctions between the pyramidal and cylindrical probes. The manuscript alludes to sharpening the cylindrical probe to facilitate membrane rupture. Do both probes rupture the plasma membrane upon force application? If so, at what applied force does this occur? It seems that PM rupture would also affect tension on intracellular membranes during and especially after force application.

      Yes, both cylindrical and pyramidal probes are rupturing PM as well as the nuclear membrane when targeting the nucleus of cells. When targeting Hela cells, used for this study, pyramidal probes puncture the membrane at a higher force of 100 nN compared to rupture forces between 10 nN and 50 nN required for sharpened cylindrical probes used here. This was explained in manuscript lines 112-115 for cylindrical probes and revised for pyramidal probes in lines 115-119.

      (2) Also re: probes: it is clear from Figure 1 that the total volume displacement induced by the pyramidal probe is far greater than the cylindrical probe. This greater displaced volume seems to be a very reasonable explanation for the increased membrane tension detected with the pyramidal probe, but this interpretation is not discussed.

      That is a good point, thank you! This has been added in lines 138-140.

      (3) Both cytochalasin D and nocodazole work by preventing new polymerization of monomers, which acutely affects new assembly and, over time, leads to loss of polymerized filaments. On the timescale of the experiments shown, it seems possible that acute effects on new filament assembly may be occurring, but that pre-assembled filaments may remain stable. It may thus be a misinterpretation to describe these conditions as "without actin fibers" or "without MTs". Further complicating matters, it is possible that the kinetics of filament disassembly may be altered by combinatorial treatment and/or in lamin knockout conditions versus wild-type cells. For instance, it has been shown that microtubule depolymerization increases actin contractility (see PMID 33089509). For these reasons, control experiments showing the extent of actin and/or microtubule disassembly in each condition tested are essential to interpret the data shown.

      Thank you for rasing this valid point. This has been corrected and noted as "less actin fibers" and "less MTs". For what concerns the timescale within which the drugs (e.g., CytoD and Nocodazole) affect the filaments assembly, a higher concentration of 50 µM for each of CytoD and Nocodazole leading to final concentration of 0.5 µM was used for intracellular injection. This final physiologically relevant concentration was expected to act as fast as 12 min for CytoD and 1-5 min for Nocodazole when directly delivered inside the cell, excluding the required time for passing the plasma membrane. Especially in our study examining the dynamic response of cells and change in tension is focusing on the early effects of drugs and deviation from the control groups rather than the steady state achieved at longer time points. The basis for the time estimation relies on the reported values in the literature. For instance, a recent comprehensive study quantified actin dynamics and its interaction with CytoD using high resolution images of single actin filaments acquired by total internal reflection fluorescence (TIRF) microscopy and reported a value of approximately 150 s (depicted from the graphs presented in Fig. 2D and 2F) as a starting point of inhibiting actin filaments polymerization after introducing 5 nM CytoD flow in a chamber containing actin filaments.1 Or in another study, a half-time of 40 s for the complete disassembly of microtubules in monocytes has been reported for cells incubated with 1 µM Nocodazole.2 This part was also included in SI file, section “Mechanochemical stimulation”.

      (4) The presentation of some of the data could be clarified. For instance, it is unclear how some time course experiments can be non-significant but the endpoint analysis can be significant (for instance, Figure 3C vs. Figure 3D.)

      We agree that some instances require clearer interpretation: indenting cell nucleus using cylindrical probes induced a higher tension at CytoD-injected cells compared to control cells at both the ER and NE, during and after stimulus (Fig. 2E-F and Fig. 3C-D). Time lapse tension analysis of these cells at the ER and NE showed a close to significant and significant differences between test and control groups, respectively. p-values of 0.087 for Fig. 2E (bottom row, ER) and 0.042 for Fig. 3C (top row, ER) were captured at the ER for the last time point during stimulus. For “after stimulus” condition, significant differences between CytoD-injected and control cells at both the ER and NE were captured. The ER’s complex morphology consists of many curved structures of lumens and disks which can deform when subjected to external mechanical perturbation, making it prone to absorb stress and strain when directly targeted. That could explain the similar tension levels in both CytoD-injected and control cells during ER indentation. Notably, unlike nucleus-targeted cells, ER-targeted cells only show increased tension at the ER and NE in CytoDinjected cells compared to control ones after stimulation. This suggests fundamental differences in the mechanical coupling of the nucleus and the ER to the cytoskeleton. While the nucleus maintains direct, structural actin connections through the nuclear lamina and LINC complexes3, making it immediately sensitive to actin disruption, the ER relies on indirect, signaling-mediated cytoskeletal interactions4,5. Thus, the ER functions as a dynamic tension buffer that engages cytoskeletal support primarily during active repair processes following mechanical perturbation. This explains why nuclear probing reveals immediate tension differences in actin-disrupted cells, while ER probing only shows post-retraction effects. Consequently, statistical analysis detects significant differences between test and control groups after probe removal, but not during probe contact in ER-targeted experiments. This was also explained better in the manuscript in line 236.

      References

      (1) Mitani, T. et al. Microscopic and structural observations of actin filament capping and severing by Cytochalasin D. bioRxiv, 2025.2001.2028.635382 (2025).

      (2) Cassimeris, L. U., Wadsworth, P. & Salmon, E. D. Dynamics of microtubule depolymerization in monocytes. J Cell Biol 102, 2023-2032 (1986).

      (3) Maurer, M. & Lammerding, J. The Driving Force: Nuclear Mechanotransduction in Cellular Function, Fate, and Disease. Annu Rev Biomed Eng 21, 443-468 (2019).

      (4) Shi, X. et al. Actin nucleator formins regulate the tension-buffering function of caveolin-1. J Mol Cell Biol 13, 876-888 (2022).

      (5) van Vliet, A. R. & Agostinis, P. PERK and filamin A in actin cytoskeleton remodeling at ER-plasma membrane contact sites. Molecular & Cellular Oncology 4, e1340105 (2017).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the manuscript entitled 'The Role of ATP Synthase Subunit e (ATP5I) in 1 Mediating the Metabolic and Antiproliferative 2 Effects of Biguanides', Lefrancois G et al. identifies ATP5I, a subunit of F1Fo-ATP synthase, as a key target of medicinal biguanides. ATP5I stabilizes F1Fo-ATP synthase dimers, essential for cristae morphology, but its role in cancer metabolism is understudied. The research shows ATP5I interacts with a biguanide analogue, and its knockout in pancreatic cancer cells mimics biguanide treatment effects, including altered mitochondria, reduced OXPHOS, and increased glycolysis. ATP5I knockout cells resist biguanide-induced antiproliferative effects, but reintroducing ATP5I restores the effects of metformin and phenformin. These findings highlight ATP5I as a promising mitochondrial target for cancer therapies. The manuscript is well written.

      Strengths:

      Demonstrated the experiments in systematic and well-accepted methods.

      Weaknesses:

      The significance of the target molecule and mechanisms may help in understanding the molecular mechanisms of metformin.

      We greatly appreciate the reviewer’s insightful comment regarding the importance of the target molecule and its mechanisms in elucidating metformin’s molecular actions. ATP5I plays a key role in the dimerization and assembly of the F1F0-ATP synthase complex. To address this, we performed Blue Native-PAGE followed by western blotting using an antibody against the β-subunit of the F1 domain. Our results show that metformin affects the oligomeric state of the F1F0-ATP synthase in a way that partially reproduces the effect of the KO of ATP5I (Fig 2G). This provides direct evidence that metformin acts on-target through ATP5I.

      Reviewer #2 (Public review):

      Summary:

      The mechanism(s) by which the therapeutic drug metformin lowers blood glucose in type 2 diabetes and inhibits cell proliferation at higher concentrations remain contentious. Inhibition of complex 1 of the mitochondrial respiratory chain with consequent changes in cellular metabolites which favour allosteric activation of phosphofructokinase-1, allosteric inhibition of fructose bisphosphatase-1 and cAMP signalling and activation of AMPK which phosphorylates transcription factors are candidate mechanisms. The current manuscript proposes the e-subunit of ATP-synthase as a putative binding protein of biguanides and demonstrates that it regulates the expressivity of the Complex 1 protein NDUFB8.

      Strengths:

      (1) The metformin conjugate and metformin show comparable efficacy on inhibition of cell proliferation in the millimolar range.

      (2) Demonstration of compromised expression of the Complex I protein NDUFB8 by the ATP5I knockout and its reversal by ATP5I expression is an important strength of the study. This shows that the decreased "sensitivity" to metformin in the ATP5I knock-out cells could be due to various proteins.

      (3) Demonstration of converse effects of ATP5I KO and re-expression ATP5I on the NAD/NADH ratio.

      Weaknesses:

      (1) The interpretation of the cellular co-localization of the biotin-biguanide conjugate with TOMM20 (Figure 1-D) as mitochondrial "accumulation" of the conjugate is overstated because it cannot exclude binding of the conjugate to the mitochondrial membrane. It would have been more convincing if additional incubations with the biotin-biguanide conjugate in combination with metformin had shown that metformin is competitive with the biotin-conjugate.

      We appreciate the reviewer’s comment and agree that the resolution provided by fluorescence microscopy makes it challenging to pinpoint the specific mitochondrial compartment where the biotin-biguanide conjugate localizes, even with additional markers such as TOMM20 antibodies for the inner mitochondrial membrane. While it remains a possibility that the conjugate binds to the mitochondrial surface, another plausible explanation is that the biotin moiety may facilitate entry into mitochondria through a biotin-specific transporter, adding further mechanistic intricacies. Furthermore, while a competition assay with metformin might help investigate interactions with mitochondrial targets and transporters (OCT family), it would not compete for biotin-mediated transport. Thus, while we acknowledge the reviewer’s suggestion, we believe such an experiment may not provide conclusive evidence regarding the conjugate’s mitochondrial localization or mechanism of entry. Instead, we revised the manuscript to more accurately describe the findings as "mitochondrial association" rather than "mitochondrial accumulation," ensuring that our interpretation remains consistent with the resolution and limitations of the data presented.

      (2) The manuscript reports the identification of 69 proteins by mass spectrometry of the pull-down assay of which 30 proteins were eluted by metformin. However, no Mass Spectrometry data is presented of the peptides identified. The methodology does not state the minimum number of peptides (1, 2?) that were used for the identification of the 31/69 proteins.

      We added a comprehensive table summarizing these findings (Figure 1- figure supplement 2). We considered all peptides and decided to perform stringent validation tests for those chosen to be further studied.

      (3) The validation of ATP5I was based on the use of recombinant protein (which was 90% pure) for the SPR and the use of a single antibody to ATP5I. The validity of the immunoblotting rests on the assumption that there is no "non-specific" immunoactivity in the relevant mol wt range. Information on the validation of the antibody would be helpful.

      Regarding the recombinant protein used for SPR, its purity was evaluated using a Coomassie-stained gel. For the antibody used in immunoblotting, its specificity was validated through knockout cell lines (Figure 2A), ensuring minimal concerns about non-specific immunoactivity within the relevant molecular weight range. Unfortunately, the KO data comes in the paper after the first immunoblots are presented. We outlined this validation in the methods section.

      (4) Knock-out of ATP5I markedly compromised the NAD/NADH ratio (Fig.3A) and cell proliferation (Figure 3D). These effects may be associated with decreased mitochondrial membrane potential which could explain the low efficacy of metformin (and most of the data in Figures 3-5). This possibility should be discussed. Effects of [metformin] on the NAD/NADH ratio in control cells and ATP5I-KO would have been helpful because the metformin data on cell growth is normalized as fold change relative to control, whereas the NAD/NADH ratio would represent a direct absolute measurement enabling comparison of the absolute effect in control cells with ATP5I KO.

      The mitochondrial membrane potential depends on a functional electron transport chain which drives proton pumping from the matrix to the intermembrane space. Metformin can decrease the mitochondrial membrane potential and this is usually explained as a consequence of complex I inhibition [1]. It has been published that metformin requires this membrane potential to accumulate in mitochondria so the actions of metformin are self-limiting due to this requirement. The reviewer is right that ATP5I KO cells could be resistant to metformin because they may have a lower membrane potential. We do not believe this to be the case because the response to phenformin, another biguanide that can enter mitochondria through the membrane without the need of the OCT transporters [2], is also affected in ATP5I KO cells. Of note, compensatory mechanisms such as enhanced glycolysis, as observed in ATP5I KO cells (elevated ECAR and increased sensitivity to 2-D-deoxyglucose), and the ATPase activity of F<sub>1</sub>F<sub>0</sub>-ATP synthase could potentially help maintain membrane potential suggesting that this might not be an issue in the ATP5I KO cells. Chandel and colleagues already proposed that reversal of the F<sub>1</sub>F<sub>0</sub>-ATPase keeps this membrane potential in metformin-treated cells [3].

      Nevertheless, to experimentally address this point, we measured the mitochondrial membrane potential using tetramethylrhodamine methyl ester (TMRE) and ATP levels using luciferase-based assays (CellTiter-Glo) in ATP5I KO cells. We sow now that ATP levels are not significantly reduced in ATP5I KO cells, likely because of compensatory glycolysis (Figure 5D), while the mitochondrial membrane potential remains close to normal (Figure 6D and E).

      We did not measure the NAD<sup>+</sup>/NADH in both control and KO cells treated with metformin because we provide now a more direct measurement of metformin acting on ATP5I: the state of oligomerization of the F<sub>1</sub>F<sub>0</sub>-ATPase (Figure 2G) as well as a Seahorse Bioenergetic Stress test (Figure 6A-C). Both figures provide results consistent with targeting ATP5I by biguanides. We also discuss that targeting ATP5I can result in complex I inhibition due to the well-known role of F<sub>1</sub>F<sub>0</sub>-ATPases in cristae formation and the assembly of the respiratory complexes. We do not believe ATP5I is the only target of metformin and in the paper we properly acknowledged and discussed other proposed targets in the introduction, results section page 8 and the discussion.

      (5) Figure-6 CRISPR/Cas9 KO at 16mM metformin in comparison with 70nM rotenone and 2 micromolar oligomycin (in serum-containing medium). The rationale for the use of such a high concentration of metformin has not been explained. In liver cells metformin concentrations above 1mM cause severe ATP depletion, whereas therapeutic (micromolar) concentrations have minimal effects on cellular ATP status. The 16mM concentration is ~2 orders of magnitude higher than therapeutic concentrations and likely linked to compromised energy status. The stronger inhibition of cell proliferation by 16mM metformin compared with rotenone or oligomycin raises the issue of whether the changes in gene expression may be linked to the greater inhibition of mitochondrial metabolism. Validation of the cellular ATP status and NAD/NADH with metformin as compared with the two inhibitors could help the interpretation of this data.

      NALM-6 cells are very glycolytic, have low respiration rates, and weak dependence on ATP5I (DepMap score: -0.47) [4]. The concentration of 16 mM metformin was chosen based on the IC<sub>50</sub> for this cell line. Both ATP status and NAD<sup>+</sup>/NADH ratios will depend on the extent of the compensatory glycolysis. On the other hand, our genetic screening evaluates cell proliferation as an integration of all metabolic activities required for the process. This unbiased screening revealed a common pathway affected by metformin and oligomycin different that the pathway affected by rotenone, which is consistent with the finding that metformin acts of the F<sub>1</sub>F<sub>0</sub>-ATPase. Our new Seahorse data demonstrate that oligomycin has a markedly reduced effect in metformin-treated cells, supporting a shared mechanism of action. Notably, uncouplers restore respiration in both metformin-treated and ATP5I knockout cells, which aligns with the mechanism we propose (please see our new section on the Seahorse Mito Stress test and the new discussion). In the discussion, we acknowledged—based on existing literature—that the cellular context may play a significant role in determining the response to this drug.

      Reviewer #3 (Public review):

      Most of the data are based on measurements of the oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) measured by the Seahorse analyser in control and ATP5l KO cells. However, these measurements are conducted by a single injection of a biguanide, followed over time and presented as fold change. By doing so, the individual information on the effect of metformin and derivate on control and KO cells are lost. In addition, the usual measurement of OCR is coupled with certain inhibitors and uncouplers, such as oligomycin, FCCP, and Antimycin A/rotenone, to understand the contribution of individual complexes to respiration. Since biguanides and ATP5l KO affect protein levels of components of complex I and IV, it would be informative to measure their individual contributions/effects in the Seahorse. To further strengthen the data, it would be helpful to obtain measurements of actual ATP levels in these cells, as this would explain the activation of AMPK.

      Thank you for this valuable comment. We have now performed the suggested analysis, which is presented in the new Figure 6. The data are consistent with our proposition that biguanides target ATP5I, but they also suggest the possibility of additional targets, such as Complex I, as proposed by other groups. Please see our new section on the Seahorse Mito Stress test and the new discussion. We also measured ATP (Figure 5D). and the mitochondrial membrane potential (Figure 6D and E). These measurements reflect the powerful compensation provided by glycolysis.

      The authors report on alterations in mitochondrial morphology upon ATP5l KO, which is measured by subjective quantifications of filamentous versus puncta structures. Fiji offers great tools to quantify the mitochondrial network unbiasedly and with more accuracy using deconvolution and skeletonization of the mitochondria, providing the opportunity to measure length, shape, and number quantitatively. This will help to understand better, whether mitochondria are really fragmented upon ATP5l KO and rescued by its re-introduction.

      Thanks for the suggestion. We used the Mitochondrial analyzer plugin from ImageJ/Fiji and redid Figure 2 and 4 and quantified details of the mitochondrial network reporting differences in branches number, length, endpoints and diameter.

      Finally, the authors report in the last part of the paper a genetic CRISPR/Cas9 KO screen in NALM-6 cells cultured with high amounts of metformin to identify potential new mediators of metformin action. It is difficult to connect that to the rest of the paper because a) different concentrations of metformin are used and b) the metabolic effects on energy consumption are not defined. They argue about the molecular function of the obtained hits based on literature and on a comparison of the pattern of genetic alterations based on treatments with known inhibitors such as oligomycin and rotenone. However, a direct connection is not provided, thus the interpretation at the end of the results that "the OMA1-DEL1-HRI pathway mediates the antiproliferative activity of both biguanides and the F1ATPase inhibitor oligomycin" while increasing glycolysis, needs to be toned down. This is an interesting observation, but no causality is provided. In general, this part stands alone and needs to be better connected to the rest of the paper.

      NALM-6 are very glycolytic, have low respiration rates, and weak dependence on ATP5I [4], forcing us to use higher concentrations of metformin to inhibit their growth. Recent results show that metformin targets PEN2 in the cytosol to increase AMPK activity, controlling both the glucose lowering and the life span extension abilities of metformin [5]. This work raises the question whether the antiproliferative and anticancer effects of metformin are due to a mitochondrial activity or are controlled by this new pathway of AMPK activation. Hence, the genetic screening was performed to unbiasedly find how metformin works. The results provide compelling evidence for mitochondria and in particular the ATP synthase as potential targets of metformin and a foundation for future studies. We added to the following text to the beginning of this section: “Several candidate targets have been reported for biguanides and our results presented so far suggest a new one. Clues about drug mechanism of action can be obtained in unbiased manner using genetic perturbation [6]. To obtain an unbiased observation of biological processes affected by metformin, we performed a genome-wide pooled CRISPR/Cas9 KO screen in NALM-6 cells cultured in the presence of metformin at a concentration affecting growth (16 mM).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1B, the total ACC antibody is missing, and the total AMPK should be replaced, especially since they claim pAMPK increases with metformin and BFB treatment. Additionally, the streptavidin pull-down image in Figure 1F needs to be resized to show the fully cropped section.

      We repeated this experiment three times and added the new figures to the supplemental data. We corrected the main figure in the manuscript with a representative blot for total ACC (Fig 1B).

      (2) Clarify whether ATP5I alone activates mitochondrial respiratory activity or if it functions in a complex with other proteins. Also, explain how metformin affects ATP5I-is it phosphorylated directly or through an upstream target

      ATP5I interacts directly with ATP5L and both proteins form part of the peripheral stack of the F<sub>1</sub>F<sub>0</sub>-ATP synthase. ATP5I and ATP5L play demonstrated roles in the dimerization of the F<sub>1</sub>F<sub>0</sub>-ATP synthase. We discussed that they may affect other functions of the enzyme as part of the peripheral stack which interact with the OSCP (oligomycin sensitivity conferring protein) located in the F1 portion of the enzyme. Further work is needed to understand how ATP5I may affect the interactions between the F0 and F1 parts of the enzyme. We did not investigate whether metformin affects the phosphorylation of ATP5I, but this remains an important question for future studies. The PhosphoSitePlus database indicates that ATP5I undergoes phosphorylation and acetylation at multiple sites, suggesting potential regulatory mechanisms worth exploring.

      (3) Ensure that all immunofluorescence (IF) images include a scale bar.

      Done

      Reviewer #2 (Recommendations for the authors):

      (1) Details of the mass spectrometry analysis and the number of peptides for the proteins identified would increase the merit of the study.

      We added a comprehensive table summarizing these findings (Figure 1- figure supplement 2). We considered all peptides and decided to perform stringent validation tests for those chosen to be further studied.

      (2) The lower NAD/NADH ratios in the ATP5I KO cell lines and the higher ratios with ATP5I expression are convincing data of the cellular redox state of these cells (with variable NDUFB8). Other data sets (e.g. OCR and ECAR and Relative growth, %) are normalized to the respective control and therefore do not show the relative effect of metformin (in control cells) to the ATP5I knock-out. The effects of metformin concentration on the NAD/NADH ratio would provide a direct measure of the extent to which metformin mimics ATP5I KO. This data would be clearer to interpret than Figure 3GHKL; Figures 5EF; S1; S2).

      We did not measure the NAD<sup>+</sup>/NADH in both control and KO cells treated with metformin because we provide now a more direct measurement of metformin acting on ATP5I: oligomerization state F<sub>1</sub>F<sub>0</sub>-ATPase and its vestigial assembly intermediates (Figure 2G) as well as a Seahorse Bionergetic Stress test (Figure 6A-C). Both figures provide results consistent with targeting ATP5I by biguanides. We also discuss that targeting ATP5I can result in complex I inhibition due to the well-known role of F<sub>1</sub>F<sub>0</sub>-ATPases oligomerization in cristae formation and the assembly of the respiratory complexes.

      (3) Figure 6: NAD/NADH data for metformin (16mM) and rotenone (70 nM) /oligomycin 2 uM) would establish whether the concentrations are "matched" to allow a comparison of their gene signatures.

      We used those concentrations based on similar effects on cell growth since the ration NAD/NADH depends on the extent of glycolytic compensation induced by blocking respiration.

      (4) Intramitochondrial accumulation of the biotin conjugate could be demonstrated in Figure 1D from competition between metformin and the biotin-conjugate.

      We appreciate the reviewer’s comment and agree that the resolution provided by fluorescence microscopy makes it challenging to pinpoint the specific mitochondrial compartment where the biotin-biguanide conjugate localizes, even with additional markers such as TOMM20 antibodies for the inner mitochondrial membrane. While it remains a possibility that the conjugate binds to the mitochondrial surface, another plausible explanation is that the biotin moiety may facilitate entry into mitochondria through a biotin-specific transporter, adding further mechanistic intricacies. Furthermore, while a competition assay with metformin might help investigate interactions with mitochondrial targets and transporters (OCT family), it would not compete for biotin-mediated transport. Thus, while we acknowledge the reviewer’s suggestion, we believe such an experiment may not provide conclusive evidence regarding the conjugate’s mitochondrial localization or mechanism of entry. Instead, we revised the manuscript to more accurately describe the findings as "mitochondrial association" rather than "mitochondrial accumulation," ensuring that our interpretation remains consistent with the resolution and limitations of the data presented.

      Reviewer #3 (Recommendations for the authors):

      In addition to my comments for the public review, the manuscript would be strengthened by the following points:

      (1) The abstract needs to be streamlined to communicate more clearly what the paper is about. The last part of the results is not mentioned and is completely disconnected from the ATP5I KO story.

      We have significantly modified our abstract to include both the genetic screening significance and our new findings on the F<sub>1</sub>F<sub>0</sub>-ATP synthase oligomerization.

      (2) Quantifications of the western blots (Figure 1B) are missing. Seems like AMPK total protein levels go down with BFB.

      We quantified the blots.

      (3) How often was the pull-down repeated (Figure 1F)? It would be also important to show this in other cell types, such as pancreatic cancer cells.

      The pull-down was an initial large-scale discovery experiment performed once. However, the findings were subsequently validated in KP-4 pancreatic cancer cells in three independent experiments. As a direct readout of metformin’s impact on ATP5I, we assessed the oligomerization state of the F1ATPase and compared the effects of metformin with those of ATP5I knockout. We show that metformin partially phenocopies the ATP5I KO phenotype, and we reproduced this effect in a second cell line, U2OS osteosarcoma cells.

      (4) Does the KO of ATP5l affect other subunits of the v-ATP5a?

      Yes—we added an immunoblot to document this in Fig. 2A. Notably, ATP5I knockout also reduces ATP5L and OSCP levels.

      (5) Does metformin and BFB itself affect mitochondrial morphology and respiration?

      To evaluate the activity of BFB in comparison with metformin, we performed immunoblot analyses of the AMPK pathway, growth assays, and microscopy-based assessment of mitochondrial morphology. These data are shown in Fig. 1B–D. A more comprehensive analysis of metformin’s effects on mitochondrial respiration has now been added as Fig. 6, using Seahorse measurements and multiple respiratory inhibitors.

      (6) Since there is a strong increase in ECAR, does this correspond to an increase in glucose uptake? Are the proteins or genes involved altered or how to explain the increased flux through glycolysis in ATP5l KO cells?

      This is a very interesting idea, as our CRISPR screen identified several genes that could potentially enhance glycolysis as a vulnerability in metformin-treated cells. In future work, we will explore this biology in greater depth.

      (7) Line 242, for easier understanding, states clearly that metformin reduces growth by x-percent.

      Yes, is a 65-fold chang. We added it to the text.

      (8) The conclusion at the end of the result section is not supported by the data or not well explained. I guess oligomycin will stop the action of metformin on vATP5l, or how to explain this?

      We clarified the conclusion.

      (1) Xian, H., Liu, Y., Rundberg Nilsson, A., Gatchalian, R., Crother, T. R., Tourtellotte, W. G., Zhang, Y., Aleman-Muench, G. R., Lewis, G., Chen, W., Kang, S., Luevanos, M., Trudler, D., Lipton, S. A., Soroosh, P., Teijaro, J., de la Torre, J. C., Arditi, M., Karin, M. & Sanchez-Lopez, E. Metformin inhibition of mitochondrial ATP and DNA synthesis abrogates NLRP3 inflammasome activation and pulmonary inflammation. Immunity 54, 1463-1477 e1411, (2021).

      (2) Hawley, S. A., Ross, F. A., Chevtzoff, C., Green, K. A., Evans, A., Fogarty, S., Towler, M. C., Brown, L. J., Ogunbayo, O. A., Evans, A. M. & Hardie, D. G. Use of cells expressing gamma subunit variants to identify diverse mechanisms of AMPK activation. Cell metabolism 11, 554-565, (2010).

      (3) Wheaton, W. W., Weinberg, S. E., Hamanaka, R. B., Soberanes, S., Sullivan, L. B., Anso, E., Glasauer, A., Dufour, E., Mutlu, G. M., Budigner, G. S. & Chandel, N. S. Metformin inhibits mitochondrial complex I of cancer cells to reduce tumorigenesis. eLife 3, e02242, (2014).

      (4) Hlozkova, K., Pecinova, A., Alquezar-Artieda, N., Pajuelo-Reguera, D., Simcikova, M., Hovorkova, L., Rejlova, K., Zaliova, M., Mracek, T., Kolenova, A., Stary, J., Trka, J. & Starkova, J. Metabolic profile of leukemia cells influences treatment efficacy of L-asparaginase. BMC Cancer 20, 526, (2020).

      (5) Ma, T., Tian, X., Zhang, B., Li, M., Wang, Y., Yang, C., Wu, J., Wei, X., Qu, Q., Yu, Y., Long, S., Feng, J. W., Li, C., Zhang, C., Xie, C., Wu, Y., Xu, Z., Chen, J., Yu, Y., Huang, X., He, Y., Yao, L., Zhang, L., Zhu, M., Wang, W., Wang, Z. C., Zhang, M., Bao, Y., Jia, W., Lin, S. Y., Ye, Z., Piao, H. L., Deng, X., Zhang, C. S. & Lin, S. C. Low-dose metformin targets the lysosomal AMPK pathway through PEN2. Nature 603, 159-165, (2022).

      (6) Bruno, P. M., Liu, Y., Park, G. Y., Murai, J., Koch, C. E., Eisen, T. J., Pritchard, J. R., Pommier, Y., Lippard, S. J. & Hemann, M. T. A subset of platinum-containing chemotherapeutic agents kills cells by inducing ribosome biogenesis stress. Nat Med 23, 461-471, (2017).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      (1) Problems associated with averaging: The authors intended to focus on the oviposition clock in individual females, however due to the inherent noise in the oviposition rhythm they had to resort to averaging across Lomb-Scargle periodograms generated from individual time-series. They then tested whether the averaged periodogram contains a significant frequency. However, this reduction in noise also reduces the ability to compare differences in power of the rhythm across individuals. Furthermore, this method makes it especially difficult to distinguish the contribution of subsets of the circuit on the proportion of rhythmic flies and the power of the rhythm. In this revised version the authors use two manipulations to disrupt the molecular clock, which could have different success rates based on the type and number of cells targeted. Unfortunately, the type of averaging used prevents the detection of any such effects. It is to be noted that, indeed, individual-level differences in period between the PdfDicerGal4 > perRNAi and UAS-perRNAi lines help the authors to establish that there is a significant reduction in period length when the molecular clock is abolished in PDF cells. These individual measurements are now very helpful in discerning the effect of manipulations carried out on different circadian neural subsets, some of which could have been missed if only averages were considered.

      First, it is important to emphasize that we are certainly not "averaging across LombScargle periodograms". As explained in the paper (and at length in the Supplementary Material), what we do is first to detrend each individual time series, then average _all_ the resulting time series (and not only those of rhythmic individuals), and finally take the Lomb-Scargle periodogram of this average series. Nevertheless, we agree with the reviewer in that the use of averages reduces our ability of understanding what happens at the individual level. The problem is that in most cases the presence of noise has made it difficult to draw any meaningful conclusions. One fortunate exception is the one mentioned by the reviewer. Averaging, on the other hand, has allowed us to extract some useful information in those cases.

      (2) Sensitivity to sample size: Averaging reduces the effect of random background noise but noise reduction is dependent upon sample size. Comparing genotypes with different sample sizes in addition to varying signal to noise ratios (which might also change with neural manipulations) makes it difficult to estimate how much of the rhythm structure is contributed by a given neuronal subset; thus, whenever possible comparisons should be made between groups that include similar number of flies. This problem is compounded when the averaged periodogram is composed of both rhythmic and weakly rhythmic individuals. For instance, in the main text the reported value of period length of pdfDicerGal4 > perRNAi is 20.74h (see also Fig 2J) but in the Supplementary figure 2S1 this is close to 22h, while the values reported for the control are largely similar (24.35h in Fig 2H versus ~24h in Fig 2S1). A difference of 3.6h between control and experimental flies is much greater than 2h. Which estimate (average versus individual) is more reliable in predicting the behavior of these flies is difficult to determine without further experiments.

      In most of the experiments analyzed for this paper the number of flies for control and experimental genotypes are very similar. In the remaining ones, the number of flies for experimental genotypes is roughly twice the number of flies for control genotypes. As mentioned, noise reduction depends on sample size. This implies that, when a genotype is assessed as rhytyhmic the sample size used is evidently large enough. On the other hand, when a genotype is assessed as arrhythmic it is important to know if sample size is large enough. It is for this reason that we have used many more flies for arrhythmic genotypes vs. their control genotypes.

      Regarding the period difference between the average of rhythmic individuals, and the population denoised average, notice first that they are not necessarily excactly the same thing, since our population average uses all flies, and the denoising might introduce some variations over the underlying periods (which would be undetectable without the denoising). Also, and more importantly, Fig. 2S1 shows that for the average of the individual periods the error bars are large, and thus statistically, the reported value for the population average falls within the confidence interval for the individual average.

      (3) Based on the newly provided data for individual fly periodograms the reader can visually evaluate the rhythmicity associated with each genotype. Such visual inspection did not reveal any clear difference between the proportion of rhythmic individuals between experimental and parental GAL4 and/or UAS controls, except for experiments using per01 mutant animals. This is surprising since if these circuits are controlling the oviposition rhythm, perturbing them should affect most individuals in a similar way.

      The problem here is that, given the amount of noise present in this behavior, it is difficult to obtain any reliable information from individual records, since, by its random nature, in a given experiment noise might be disturbing the expected behavior of individuals in very different ways. That is the reason why we have resorted to population averages.

      Other comments

      Disrupting the clock in the 5th sLNv and 3 Cry+ LNds (and weakly in a small subset of DN1) affected egg-laying. Although the work emphasizes the importance of the LNd, the role of the 5th sLNv's role should be discussed.

      As mentioned in the paper, what the experiments show is that the 3 Cry+ LNds and 5th sLNv (usually called E cells) are candidates to be the main drivers of the oviposition rhythm, but the connectomics show that only 2 Cry+ LNds are connected to the oviposition circuit. In order to be more accurate, throughout the corresponding section (now called "The molecular clock in E neurons is necessary for rhythmic egg-laying") of the corrected manuscript we have always referred to the cells marked by the driver as E-cells. In the Discussion, we have added a line commenting that, in the connectome, the 5th sLNv is not connected to any cells of the oviposition circuit.

      Minor corrections:

      In subsection "Two Cry+ LNd neurons directly oviIN", there was a mistake in the use of "E1" and "E2" (their meanings were interchanged). We have corrected this section, giving the correct definitions. We have also corrected some minor english typos.

      Joint Recommendations for the authors:

      (1) Line 234 'to disrupt the molecular clock in (those) neurons', Please clearly describe the cell types in which MB122B driver works.

      We have clarified the cell types in which MB122B driver is expressed (line 236)

      (2) Line 235 gen cycle, should be gen'e' cycle

      The typo has been corrected

      (3) The authors should provide the raw data in repositories as per journal policy of eLife.

      The data are now available at the following links:

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Fig4_+> UAS-perRNAi.zip

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Fig4_M 122Bsplit-Gal4>+.zip

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Fig4_MB122Bsplit-Gal4>UAS-perRNAi.zip

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Figures1

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Shahbazi et al used a recurrent neural network model trained to control a musculoskeletal model of the arm to investigate how neural populations accommodate activity patterns underpinning savings. The paper draws upon the recent finding of a "uniform shift" in preparatory activity in monkey motor cortex associated with savings, and leverages full access to a computational model to establish causality.

      Strengths:

      The paper is well written, and the figures are clearly presented. The key finding that the uniform shift first reported based on neural recordings by Sun et al. emerges in artificial neural networks performing a similar task is interesting and well-backed by their analyses. Manipulating this uniform shift to show that it drives behavioural savings is an important causal confirmation of the proposal by Sun et al.

      Weaknesses / Comments:

      As mentioned earlier, the core results are well backed by the analyses. Most of my comments relate to adding more controls and additional questions that could be explored with the model to strengthen the paper.

      (1) Savings are quantified as more rapid relearning of the FF upon re-exposure (e.g., Figure 3). This finding is based on backpropagation through time, but would this hold when using a different optimiser, e.g., FORCE?

      This is an interesting question, and indeed, there are an increasing number of studies addressing how different neural network learning rules may affect the kinds of representations that arise after learning (Codol et al., 2024). However the focus of the present paper is not on which neural network approaches or which specific optimisers produce savings, rather, the focus is on the basis and neural geometry of savings when it emerges.

      We have added a short paragraph to the Discussion section [lines 349-355] to address this:

      “The present results are based on RNNs trained in an error-based approach using backpropagation through time (Werbos, 1990) using the Adam optimizer (Kingma and Ba, 2014). Other techniques for training RNNs have been proposed including the FORCE algorithm (Sussillo and Abbott, 2009). In addition, several recent reports have demonstrated success using reinforcement learning approaches to train neural networks in the context of sensorimotor control tasks (Lillicrap et al., 2015; Codol et al., 2024a). An interesting avenue for future work is to determine how the present results may or may not generalize to different neural network architectures and learning rules.”

      (2) The authors should include a "null model" showing that training on a different reaching task following NF, as opposed to FF2, won't show something akin to a uniform shift during preparation due to the adoption of TDR and having similar targets.

      This is a critical point. Training on a different reaching task other than FF2 (e.g. a different force field) will indeed result in a uniform shift, but critically, a shift in a different direction in neural state space than the uniform shift associated with FF2. The central focus of the present paper is to show that when there remains a non-zero projection of preparatory neural activity along the direction of the uniform shift associated with a given learning task, this residual projection underlies savings when networks are subsequently re-exposed to the same task.

      In the Results section we had included a short paragraph to describe control simulations that we performed that address this concept. We have expanded this text and added a Figure and the results of statistical tests to better describe this control [lines 179-187]:

      “As an additional control we trained networks after the growing up phase on an opposing force field (CCW) and then as above, exposed the networks to a NF washout phase, and then to a CW force field. In this case no savings was observed in the CW force field, either for initial lateral deviation, or for learning rate (Figure 3). In fact, we observed that initial lateral deviation is larger for the novel force field (t(39)=-4.918, p=1.6e-5). This observation is in line with the finding that learning opposing force fields sequentially results in interference (Sun et al., 2022). The results of these control simulations underscore that the savings effect observed in our main study was learning-specific—it was due to prior learning of the CCW force field, and not a general effect of learning any novel dynamics.”

      (3) The analyses of network activity during movement preparation (Figure 4) nicely replicate the key finding in Sun et al, but I think the authors could leverage the full access to their network and go further, e.g., by examining changes (or the lack of) during execution in FF2 with respect to FF (and perhaps in a future NF2 with respect to NF), including whether execution activity lives also lives in parallel hyperplanes, etc.

      We agree that a visualization of the neural activity during movement would be beneficial to the reader. To address this we have added a new Figure (Fig. 6) and associated text [lines 210-219]. The Figure shows the neural trajectories when the RNNs are first exposed to the FF1 and when they are first exposed to FF2 (after NF2 washout). Trajectories are plotted in 3D corresponding to the first 3 principal components, starting at the go cue and ending 200 ms into the movement, for each of the 8 movement targets.

      “The neural trajectories for preparation and for movement can be visualized in principal component space. Figure 6 shows trajectories during planning and early execution for initial FF1 and FF2 exposure. Hidden unit activity was subjected to a principal components analysis, and neural trajectories within the first three PCs are shown for movements to each of the eight movement targets. Filled circles indicate neural state 200 ms prior to the go cue. During the preparatory period trajectories travel along PC1 and then disperse across PC2 and PC3 into the circular pattern indicated by the filled stars, which indicate time of the go cue (also see Figure 5A). After the go cue neural trajectories shift back along PC1 and rotate along oscillatory patterns characteristic of populations of motor cortical neurons in non-human primates during movement (Churchland and Shenoy, 2024).”

      (4) Related to the above, while the results are interesting and the paper is well done, I kept wishing that the authors had done "more" with their model. This could be one or two final sections on "predictions" that would nicely complement their "validation" of the uniform shift, and that, in my opinion, would greatly increase the impact of the paper. In particular:

      (a) What would be the effect of learning more "tasks"? For example, is there a limit on how many fields can be learned? (You show something related by manipulating network size, but this is slightly different.)

      These are interesting questions and to some extent they are already addressed in the paper. Of course, the number of tasks that a network is able to learn, will be related to how much those tasks overlap in a control space. Indeed, this idea goes back to early theoretical accounts of connectionist models such as Hopfield nets and capacity for representing information (Hopfield, 1982; Hopfield et al., 1983). The control simulations that we described in the paper [lines 179-187 and Figure 4] are a test of one extreme version of this, in which two tasks are in direct opposition to each other (opposite force fields), and in this situation no savings emerges. We believe it is an interesting question, but beyond the scope of the present paper to undertake a comprehensive exploration of the nature of task-overlap in upper limb reaching learning tasks.

      (b) Figure 5 is a nice causal demonstration that the uniform shift is related to savings. However, and related to comment #3, it'd be interesting to see more details about how the behaviour and the network activity changes as preparatory activity shifts along this axis, in particular regarding how moving the preparatory states affect the organisation and dynamics of upcoming execution activity -these are the kind of intuitions that modelling studies like this one can provide.

      This has been addressed above by the changes we made to address the reviewer’s comment #3.

      (c) The authors focus on a task design that spans baseline, FF, NF, FF2 to replicate the original study by Sun et al. However, it would be interesting if they generated predictions for neural changes to other types of tasks that have been studied behaviourally. These could include, for example: (i) modelling a visuomotor rotation or a mirror reversal task; (ii) having to adapt to a FF in the opposite direction; (iii) investigating the role of adding an explicit context and having the networks learn multiple FF; and (iv) trying to learn FF fields in opposite directions, perhaps restricted to specific targets. As the authors know, all these questions and more have been studied with similar behavioural paradigms, and it would be nice to see what neural predictions are generated by this model.

      See responses above e.g. to comment 4. We have clarified the text and provided a new Figure to illustrate our opposite FF control simulations. The other suggestions about visumotor rotations, and contextual cues, are interesting and potentially important questions that we are working on, but we believe are beyond the scope of the current paper which is focused specifically around the question of savings in FF learning.

      (5) On the Discussion: When extrapolating from neural network results to animals, the fact that your networks can learn implicitly doesn't mean that animals do learn implicitly. Indeed, I think the consensus view is that different perturbations may lead to the expression of different types of savings (e.g., FF vs VR, which seems to be more explicit). Besides, these different mechanisms may be primarily implemented by brain regions less directly tied to motor control (e.g., cerebellum, parietal cortex?), which are not directly implemented in the authors' model.

      Of course the reviewer is correct that our simulations are not evidence that savings in motor tasks learned by animals is only implicit, and we do not make any such claims in the paper. The model we describe in the present paper is not meant to be a comprehensive model of motor learning in humans/animals. Indeed, the pure “context free” type of learning that we implement in our simulations basically cannot occur in animals, because there is always some information that provides contextual information. Indeed there are computational models of motor learning that include these effects, e.g. the COIN model (Heald et al., 2021). Our model however provides a useful window into what the context-free component of savings may look like. The approach we describe in the present paper is a powerful way to probe the context-free component of savings in isolation in a way that is not possible (at least not readily) in animals/humans. We have modified the text in the Discussion [lines 372-379] to better articulate this point.

      “The simulations described here do not constitute evidence that savings in motor learning tasks is exclusively implicit in animals and humans. The purely context-free learning implemented in our simulations is highly unrealistic, as some form of contextual information is invariably available. Indeed, computational models of motor learning that incorporate contextual effects already exist, e.g. (Heald et al. 2021). Nevertheless, our simulations provide a useful window into what the context-free component of savings may look like. This approach offers a powerful means of probing the context-free component of savings in isolation—something that is not readily achievable in animal or human experiments.”

      Reviewer #2 (Public review):

      Summary:

      Shahbazi et al. trained recurrent neural networks (RNNs) to simulate human upper limb movement during adaptation to a force field perturbation. They demonstrated that throughout adaptation, the pattern of motor commands to the muscles of the simulated arm changed, allowing the perturbed movements to regain their typical, perturbation-free straight-line paths. After this initial learning block (FF1), the network encountered null-fields to wash out the adaptation, before re-experiencing the force in a second learning block (FF2). Upon re-exposure, the network learned faster than during initial learning, consistent with the savings observed in behavioral studies of adaptation. They also found that as the number of hidden units in the RNN increased, so did the probability of exhibiting savings. The authors concluded that these results propose a neural basis for savings that is independent of context and strategic processes.

      Strengths:

      The paper addresses an important and controversial topic in motor adaptation: the mechanism underlying motor memory. The RNN simulation reproduces behavioral hallmarks of adaptation, and it provides a useful illustration of the pattern of muscle activity underlying human-like movements under both normal and perturbing conditions. While the savings effect produced by the network, though significant, appears somewhat small, the simulation demonstrating an increase in savings with a greater number of hidden units is particularly intriguing.

      Weaknesses:

      (1) To be transparent, savings in motor adaptation have been a primary focus of my own research. Some core findings presented in this paper are at odds with the ideas I and others have previously put forward. While I don't want to impose my agenda on the authors of this paper, I do think the authors should address these issues.

      (a) The authors acknowledge the ongoing debate in the literature regarding the mechanisms underlying savings, particularly whether it stems from explicit or implicit learning processes. However, it remains unclear how the current work addresses this debate. There is already a considerable body of research, particularly in visuomotor adaptation, demonstrating that savings is predominantly driven by explicit strategies. For example, when people are asked to report their strategy, they recall a strategy that was useful during the first learning block (Morehead et al. 2015). Furthermore, savings are abolished under experimental manipulations designed to eliminate strategic contributions (e.g., Haith et al., 2015; Huberdeau et al., 2019; Avraham et al., 2021). The authors briefly state that their findings support the hypothesis that a neural basis of memory retention underlying savings can be independent of cognitive or strategic learning components, and that savings can be characterized as implicit. While these statements may be true, it is not clear how this work substantiates these claims.

      We have addressed a similar point raised by Reviewer 1, see point #5 above. Our work represents an example of how savings can occur from implicit mechanisms in the absence of explicit contextual cues. Our goal is not to resolve the debate about how this occurs in humans/animals. Rather, our model provides a useful window into what the context-free component of savings may look like. Our approach is a powerful way to probe the context-free component of savings in isolation in a way that is not possible (at least not readily) in animals/humans. We have modified the text in the Discussion [lines 372-379] to better articulate this point.

      “The simulations described here do not constitute evidence that savings in motor learning tasks is exclusively implicit in animals and humans. The purely context-free learning implemented in our simulations is not meant to be a full model of biological learning, as in biological systems some form of contextual information is invariably available. Indeed, computational models of motor learning that incorporate contextual effects already exist, e.g. (Heald et al. 2021). Nevertheless, our simulations provide a useful window into what the context-free component of savings may look like. This approach offers a powerful means of probing the context-free component of savings in isolation—something that is not readily achievable in animal or human experiments.”

      (b) Our research has also demonstrated that if implicit adaptation is completely washed out after the initial learning block, it not only fails to exhibit savings but is actually attenuated relative to the first learning block (Avraham et al., 2021). This phenomenon of attenuation upon relearning can also be seen in other studies of visuomotor adaptation (e.g., Leow et al., 2020; Yin and Wei, 2020; Hamel et al., 2021; Hamel et al., 2022; Wang and Ivry, 2023; Hadjiosif et al., 2023). More recently, we have shown that this attenuation is due to anterograde interference arising from the experience with the washout block experience (Avraham and Ivry, 2025). We illustrated that the implicit system is highly susceptible to interference; it doesn't require exposure to salient opposite errors and can occur even following prolonged exposure to veridical feedback. The central thesis of this paper, namely that implicit savings can emerge through RNNs, is at odds with these empirical results. The authors should address this discrepancy.

      These empirical results are interesting and intriguing, and we agree that they are relevant in the context of the debate about the relative contributions and interactions between explicit and implicit learning systems and savings. Importantly, contextual interference is impossible in our model, since there are no contextual cues about which force field is present or absent. Interactions between an explicit system and an implicit learning system are also impossible in our model, since there is no possibility of context-driven explicit learning or memory. The approach we have taken in the present paper is not to model a full explicit plus implicit learning system but rather to probe how savings may emerge from a purely implicit learning mechanism alone and to compare the neural geometry underlying this implicit-drive savings to the neural recording results from monkey electrophysiology studies. Nevertheless we have added some text to the Discussion [lines 380-391] to situate our findings in the context of the studies mentioned above by the reviewer.

      “Recent empirical work suggests that relearning after washout of implicit adaptation can be attenuated rather than facilitated, a phenomenon attributed to anterograde interference from the washout phase (Avraham et al., 2021; Hadjiosif et al., 2023; Hamel et al., 2022, 2021; Leow et al., 2020; Wang and Ivry, 2025; Yin and Wei, 2020). The savings observed in our simulations differs from these behavioral findings. Crucially, our model excludes both contextual interference (since no cues signal which force field is present) and explicit-implicit interactions (since context-driven explicit learning is absent). Our goal was not to model a complete explicit-implicit system, but rather to probe how savings may emerge from a purely implicit mechanism and to compare the underlying neural geometry to monkey electrophysiology data. Our results suggest that high-dimensional neural circuits possess an intrinsic capacity for savings via persistent preparatory traces. How and when this capacity may be masked by interference or explicit-implicit interactions in biological systems remains an open question for future work.”

      (2) This brings me to the question about neural correlates: The results are linked to activity in the primary motor cortex. How does that align with the well-established role of the cerebellum in implicit motor adaptation? And with the studies showing that savings are due to explicit strategies, which are generally associated with prefrontal regions?

      The modeling approach we use in the present paper is area agnostic, and we do not include different neural modules to represent specific brain areas such as cerebellum or prefrontal regions. In the current approach we specifically exclude explicit strategies, as a way to specifically probe implicit mechanisms alone. Also see response to reviewer 1 comment 5 above.

      (3) The analysis on the complexity of the neural network (i.e., the number of hidden units) and its relationship to savings is very interesting. It makes sense to me that more complex networks would show more savings. I'm not sure I follow the author's explanation, but my understanding is that increased network complexity makes it more difficult to override the formed memory through interference (e.g., from the experience with NF2). Also, the results indicate that a network with 32 units led to a less-than-chance level of networks exhibiting savings (Figure 3b). What behavioral output does this configuration produce? Could this behavior manifest as attenuation upon relearning? Furthermore, if one were to examine an even smaller, simpler network (perhaps one more closely reflecting cerebellar circuits), would such a model predict attenuation rather than savings?

      These are interesting questions, and are potentially important, for future work to explore. Our interpretation of the results of smaller networks is that these small RNNs fail to show savings presumably because the learned FF behavior is 'erased' during washout because of the limited capacity to retain the FF learning in a distinct neighborhood in neural state space. Our paper is focused specifically on the relationship between savings, implicit learning, and neural capacity via network size, in the context of the monkey electrophysiology results in motor cortex. It would be interesting in future work to explore a cerebellar-like modeling approach.

      (4) The authors emphasize that their network did not receive any explicit contextual signals related to the presence or absence of the force field (FF), thus operating in a 'context-free' manner. From my understanding, some existing models of context's role in motor memories (e.g., Oh and Schweighofer, 2019; Heald et al., 2021) propose that memory-related changes can be observed even without explicit contextual information, as contextual changes can be inferred from sudden or significant environmental shifts (e.g., the introduction or removal of perturbations). Given this, could the observed savings in the current simulation be explained by some form of contextual retrieval, inferred by the network from the re-presentation of the perturbation in FF2?

      It is important to note that this is not possible in the context of the modeling approach described in the present paper. For example, in trial 1 of FF2, because the network has no contextual cue signaling the FF’s presence, the network has no information before movement begins that a FF will be present during movement (recall that the FF is velocity-dependent, and so is zero before movement begins). Once the network encounters the FF during movement, some component of its response I suppose could be described as contextual inference derived from effector state (similar to the account described in the COIN model), but strictly speaking the model is only responding to what it encounters in the moment. Any change in behaviour due to prior learning (e.g. savings) is due to the interaction between the residual learning-related neural state (e.g. the uniform shift), the effector state in the moment, and the errors encountered during movement. We don’t interpret this as “inference” in the traditional sense of an explicit learning system.

      (5) If there is residual hidden unit activity related to the FF at the end of the NF2 phase, how does the simulated movement revert back to baseline? Are there any differences in the movement trajectory, beyond just lateral deviation, between NF1 and NF2? The authors state that "changes in the preparatory hidden unit activity did not result in substantive changes in the motor commands (Figure 5b), which emphasizes that the uniform shift resides in the null space of motor output." However, Figure 5b appears to show visible changes in hidden unit activity. Don't these changes reflect a pattern of muscle activity that is the basis for behavior? These changes are indeed small, but it seems that so is the effect size for savings (Figure 3a). Could this suggest that there is not, in fact, a complete washout of initial learning during NF2 within the network?

      This is precisely the point of the paper, i.e. to show that neural activity during the preparatory period before movement onset is different, even though the behaviour during the preparatory period is the same (i.e. no muscle activity and no movement). This recapitulates the empirical findings from the neural data reported in the Sun et al. (2022) paper.

      The reviewer asks “Don't these changes reflect a pattern of muscle activity that is the basis for behavior?” Yes indeed they do, but not during the NF and not during the preparatory activity prior to movement onset.

      The reviewer asks “Could this suggest that there is not, in fact, a complete washout of initial learning during NF2 within the network?” We addressed this in the paper (Results/Washout) by comparing kinematics after washout to that prior to FF learning; e.g. any differences in lateral deviation of the hand path for the entire reach trajectory was in the range of 0.1 mm, which is less than 0.25 % of the lateral deviation encountered in the FF and only 0.1 % of the reach distance (10 cm).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1c, lower panel: Is this from the early or late stage of FF1?

      This is an example movement after learning in a null field (NF). We have clarified this in the Figure caption.

      (2) Please clarify what the two panels in Figure 1e represent.

      We have clarified in the Figure caption that these are activity from two example hidden units.

      (3) If Figure 2c is intended to illustrate the changes in motor commands for individual muscles, consider reorganizing the plots by muscle to more clearly show the change for each muscle from NF1 to FF1.

      The point here is not to make fine-grained comparisons between specific muscles, rather to show a general example of how muscle activity is different. For the sake of visual simplicity in a Figure that already has many components we have decided to keep Figure 2c the same.

      (4) The text mentions that no savings were observed when the network was trained on CCW followed by CW perturbations. However, no data or statistical analysis is presented to support this claim. I wonder if the authors would expect attenuated learning when exposed to the CW perturbation, given a memory of the opposite perturbation.

      We have added a Figure to provide data for the FF opposite control.

      (5) The relevance of the discussion on choking under pressure to the paper wasn't clear.

      We have modified the relevant text in the Discussion section [lines 356-363] to clarify the relevance of the present work to other recent work on how complex features of motor behaviour can arise due to the dynamics of preparatory neural activity in motor cortex.

      References

      Avraham G, Morehead JR, Kim HE, Ivry RB. 2021. Reexposure to a sensorimotor perturbation produces opposite effects on explicit and implicit learning processes. PLoS Biol 19:e3001147. doi:10.1371/journal.pbio.3001147

      Codol O, Krishna NH, Lajoie G, Perich MG. 2024. Brain-like neural dynamics for behavioral control develop through reinforcement learning. bioRxiv. doi:10.1101/2024.10.04.616712

      Hadjiosif AM, Morehead JR, Smith MA. 2023. A double dissociation between savings and long-term memory in motor learning. PLoS Biol 21:e3001799. doi:10.1371/journal.pbio.3001799

      Hamel R, Dallaire-Jean L, De La Fontaine É, Lepage JF, Bernier PM. 2021. Learning the same motor task twice impairs its retention in a time- and dose-dependent manner. Proc Biol Sci 288:20202556. doi:10.1098/rspb.2020.2556

      Hamel R, Lepage J-F, Bernier P-M. 2022. Anterograde interference emerges along a gradient as a function of task similarity: A behavioural study. Eur J Neurosci 55:49–66. doi:10.1111/ejn.15561

      Heald JB, Lengyel M, Wolpert DM. 2021. Contextual inference underlies the learning of sensorimotor repertoires. Nature 600:489–493. doi:10.1038/s41586-021-04129-3

      Hopfield JJ. 1982. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A 79:2554–2558. doi:10.1073/pnas.79.8.2554

      Hopfield JJ, Feinstein DI, Palmer RG. 1983. “Unlearning” has a stabilizing effect in collective memories. Nature 304:158–159. doi:10.1038/304158a0

      Leow L-A, Marinovic W, de Rugy A, Carroll TJ. 2020. Task errors drive memories that improve sensorimotor adaptation. J Neurosci 40:3075–3088. doi:10.1523/JNEUROSCI.1506-19.2020

      Wang T, Ivry RB. 2025. Contextual effects during sensorimotor adaptation are an emergent property of population coding in a cerebellar-inspired model. Sci Adv 11:eadr4540. doi:10.1126/sciadv.adr4540

      Yin C, Wei K. 2020. Savings in sensorimotor adaptation without an explicit strategy. J Neurophysiol 123:1180–1192. doi:10.1152/jn.00524.2019

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ritzau-Jost et al. investigate the potential contribution of AP broadening in homeostatic upregulation of neuronal network activity with a specific focus on dissociated neuronal cultures. In cultures obtained from a few brain regions from mice or rats using different culture conditions and examined by different laboratories, AP half-width remained stable despite chronic activity block with TTX. The finding suggests that AP width is not significantly modulated by changes in sodium channel activity.

      Strengths:

      The collaborative nature of the study amongst the neuronal culture experts and the rigorous electrophysiological assessments provides for a compelling support of the main conclusion.

      Weaknesses:

      Given the negative nature of the results, a couple of remaining issues (such as the cell density of cultures and the presentation of imaging experiments with a voltage sensor) warrant further consideration. In addition, a discussion of the reasons for the I stability of AP half-width to sodium channel modulation might help extend the scope of the study beyond the presentation of a negative conclusion.

      We would like to thank the reviewer for positively evaluating our manuscript. Please find below our detailed point-to-point response to the reviewer’s comments.

      Reviewer #2 (Public review):

      Summary:

      This study reexamined the idea that action potential broadening serves as a homeostatic mechanism to compensate for changes in network activity. The key finding was that, while action potential broadening does occur in certain neurons - such as CA3 pyramidal cells-it is far from a universal response. This is important because it helps resolve longstanding discrepancies in the field, thereby contributing to a better understanding of network dynamics. The replication of these findings across multiple laboratories further strengthened the study's rigor.

      Strengths:

      Mechanisms of network homeostasis are essential to understand network dynamics.

      Weaknesses:

      No weaknesses were noted by this reviewer.

      We would like to thank the reviewer for the positive evaluation of our manuscript. Please find below our detailed point-to-point response to the reviewer’s comments.

      Reviewer #3 (Public review):

      Summary:

      The manuscript "Unreliable homeostatic action potential broadening in cultured dissociated neurons" by Ritzau-Jost et al. investigates action potential (AP) broadening as a mechanism underlying homeostatic synaptic plasticity. Given the existing variability in the literature concerning AP broadening, the authors address an important and timely research question of considerable interest to the field.

      The study systematically demonstrates cell-type- and model-specific AP broadening in hippocampal neurons after chronic treatment with either tetrodotoxin (TTX) or glutamatergic transmission blockers. The findings indicate AP broadening in CA3 pyramidal neurons in organotypic cultures after TTX treatment, but notably not in dissociated hippocampal neurons under identical conditions. However, blocking glutamatergic neurotransmission caused AP broadening in dissociated hippocampal neurons. Moreover, extensive evaluations in neocortical dissociated cultures robustly challenge previous findings by revealing a lack of AP broadening following TTX treatment. Additionally, the proposed role of BK-type potassium channels in mediating AP broadening is convincingly questioned through complementary electrophysiological and voltage-imaging experiments.

      Strengths:

      The manuscript exhibits an outstanding experimental design, employing state-of-the-art techniques and a rigorous multi-lab validation approach that greatly enhances scientific reliability. The experimental results are meticulously illustrated, and the conclusions drawn are justified and supported by the presented data. Furthermore, the manuscript is comprehensively and clearly written.

      Weaknesses:

      Concerning the statistical analyses employed, it is advisable to consider the Kruskal-Wallis test with corrections for multiple comparisons when evaluating more than two experimental groups.

      We would like to thank the reviewer for the positive evaluation of our manuscript. In the following we first address the comment regarding the used statistical tests. Please also find below the detailed response to the reviewer’s further comments. Indeed, we did not apply a correction for multiple comparisons in Figure 2. This seems justified because in this exceptional case we are more worried about type II errors (false negative). The Kruskal-Wallis test seems not appropriate for this type of data for which only the comparison between the control and respective TTX data is relevant. Instead, we followed the reviewer’s suggestion by applying corrections for false discovery rate (FDR). We thank the reviewer for pointing out this statistical issue and addressed it in the revised manuscript (lines 121–128):

      “Even though AP durations varied up to 2-fold between conditions, statistically significant homeostatic AP broadening was not detectable in any of the tested conditions (Fig. 2B). To minimize type II errors (false negative) we intentionally did not apply a correction for multiple comparisons. The only significance was observed in condition III but in an opposite direction (i.e. AP narrowing with TTX, P=0.026; Fig. 2B). However, this is likely a false positive because application of corrections for false discovery rate results in P=0.268 for both Benjamini–Hochberg and Bonferroni correction.”

      Recommendations for the authors:

      Reviewing Editor Comments:

      The main and most important observation of the study is that the AP does not change in most cases examined. A discussion of the mechanisms of the changes in CA3 neurons would significantly strengthen the compelling evidence presented. The individual reviews are also provided, in case the authors find them useful to include other aspects suggested by the reviewers.

      We would like to thank the Reviewing Editor for handing our manuscript and for the positive evaluation of our work. The main focus of our study was the analysis of homeostatic plasticity in cultured neurons of the neocortex. We agree that the findings in CA3 neurons are interesting. As explained in more detail below, we have carefully discussed the mechanisms of the changes in CA3 neurons in the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      Major points

      (1) AP widths measured in the present study under basal conditions are generally larger than the value reported in previous work by Li et al. 2020 (~1.5 ms). In particular, rat cortical cultures prepared using the same conditions show that the mean AP half-width in controls of the present study (~2.5 ms) is closer to the mean AP half-width in TTX-treated neurons in Li et al. (~2.0 ms).

      We thank the reviewer for the detailed and positive feedback as well as for the thoughtful questions. The inconsistency of action potential half-duration reported in our and Li et al.’s data is partially due to differences in the way the half-duration was measured. In Li et al. the exact method is unfortunately not defined, but from a personal communication with the authors we know that they measured half-duration based on the AP amplitude between AP peak and AP voltage threshold. In contrast, we measured half-duration based on the AP amplitude between AP peak and the resting membrane potential preceding current injections. When we measure AP half-duration instead from voltage threshold, the average half-durations are 1.97 ms (compared to 2.64 ms from baseline, n = 106 cells; average across conditions I–IV, control and TTX merged). Thus, the discrepancy in the half-duration is to a significant proportion due to methodical differences in the way the half-duration was measured.

      One parameter that is not stated in either study is cell plating density, which can potentially bias the neuronal network activity levels of cultures. Could the authors comment on the possible contribution of neuronal culture density to AP half-width under basal recording conditions and its sensitivity to chronic TTX treatment? Are there any data available? For example, cultures used by Li et al may have been plated at a high density and experienced high activity level during culturing, which could have contributed to the enhanced sensitivity to chronic activity suppression by TTX.

      We agree that neuronal culture density is an important factor influencing neuronal activity and hence potentially also the sensitivity to chronic activity suppression. In our experiments, the number of plated cells per cover slip varied between conditions about 3-fold: 30–50k cells for conditions I and II, 25–30k cells for conditions III, VII, XI, 50k cells for condition IV, 65k for conditions V, VI and VIII, and 70k cells for conditions IX and X. Li et al. do not provide the cell density or the number of plated cells. Despite the difference in the number of plated cells in our dataset across various laboratories, we did not observe a systematic effect of cell number on baseline AP half-duration. Furthermore, we observed strongly different baseline activity across our various experimental conditions (Fig. 3A), which did not correlate with cell density. Also, we did not notice an impact of baseline activity on the sensitivity to chronic activity suppression with TTX (cf. Fig. 3A and 2B). We have now added the number of plated cells per condition to the methods section as well as the following paragraph to the discussion section (lines 256–262):

      “The sensitivity to chronic TTX treatment might depend on baseline neuronal activity, which is in part related to neuronal culture density[37]. However, TTX did not induce AP broadening despite different baseline activities (Fig. 3A) and a nearly threefold variation in the number of plated cells per cover slip between conditions (25k – 70k cells per coverslip).”

      In addition, a discussion of the reasons for the seeming stability of AP half-width to sodium channel modulation might help extend the scope of the study beyond the presentation of a negative conclusion.

      We thank the reviewer for this suggestion and have added a paragraph to the end of the discussion emphasizing potential advantages of cell-type specific AP broadening (lines 353–362):

      “Despite the lack of homeostatic, TTX-induced AP broadening in dissociated cultures, AP duration was broadened upon Kyn-treatment in dissociated cultures and using TTX in CA3 neurons in organotypic cultures. Because BK-channels control AP duration in CA3 neurons of organotypic cultures[79], homeostatic BK-channel downregulation as proposed by Li et al. may be involved in AP broadening in this specific cell type. While the reasons for the variable occurrence of homeostatic AP broadening remain unknown, this may render neuronal circuitries more robust to perturbations. The regulation of AP duration therefore might represent one element in the repertoire of neuronal plasticity that is, similar to other plasticity mechanisms, not generally shared, but specifically expressed in some cell types and neuronal compartments.”

      (2) In this study, CA3 neurons in organotypic cultures were the only cells that showed AP broadening with TTX treatment. Notably, CA3 neurons show strong recurrent activity in general and would be expected to have experienced high levels of activity in culture. For CA3 neurons in organotypic cultures, does IbTx increase basal AP half-width?

      We thank the reviewer for this interesting idea. Even though, to our knowledge, there is no study investigating the effect of IbTx on AP width in CA3 neurons of organotypic cultures, Raffaelli et al. (DOI 10.1113/jphysiol.2004.062661) reported ~15% AP broadening using the BK-channel blocker paxilline. Therefore, TTX-induced broadening in CA3 neurons might be related to BK-channel-dependent AP repolarisation, consistent with the model proposed by Li et al. Because organotypic cultures show increased activity for longer cultivation periods and higher connectivity compared to acute slices (De Simoni et al., DOI 10.1113/jphysiol.2003.039099), the effect of TTX may be aggravated in organotypic cultures compared to acute slices or in vivo. However, the lack of a TTX-effect was not dependent on background neuronal activity or culture density in our recordings (see above as well as lines 306–310 of the revised manuscript).

      (3) Figures 4E-G. In experiments to test the efficacy of IbTx with GEVI, larger fields of view of neuron(s) used for recordings should be included. As shown, it is difficult to discern the quality of the preparation and does not provide a representative indication of the type of signals measured.

      We thank the reviewer for this suggestion and have included an image of a representative neuron expressing the GEVI in Fig. 4E.

      Minor points

      (1) Lines 222-228. With respect to cell-type specificity of TTX-induced AP broadening, the observed lack of effect of TTX in dissociated hippocampal cultures might suggest that the cultures are predominantly DG granule cells and CA1 neurons, with few CA3 neurons surviving. Could the authors comment?

      We thank the review for this interesting hypothesis and have discussed it in the manuscript as a potential explanation for our different findings in the hippocampus.(lines 263–270):

      “Although we mainly focus on neocortical cultured neurons (condition I to VIII, Fig. 2) because Li et al. used neocortical neurons, the absence of AP broadening in hippocampal neurons (group IX to XI) could in principle be explained by the selective loss of CA3 neurons, which show AP broadening in organotypic cultured neurons (Fig. 1A and B). However, CA3 neurons were shown to survive in dissociated cultures following region-specific microdissection[40], and CA1 neurons are generally more stress-sensitive to excitotoxicity with glutamate or NMDA than CA3 and DG neurons[42], arguing against a general selective loss of CA3 neuron in dissociated cultures.”

      (2) Figures 3D, E. To what extent is the observed increase in sEPSC amplitude due to an increase in sEPSC frequency? Is quantal amplitude increased following TTX treatment, a postsynaptic strength parameter that one would not expect to be affected by a change in AP width, but that is known to undergo up-scaling with chronic TTX treatment?

      We would like to thank the reviewer for the question. We cannot rule out an interplay between sEPSC amplitude and frequency. We did not measure quantal amplitude in the presence of TTX. Our experiments were designed to test whether TTX successfully induced homeostatic plasticity, but not to attribute the observed effect to pre- and postsynaptic mechanisms. We have added the following statement to the revised manuscript, to highlight the possible interaction of sEPSC amplitude and frequency (lines 176–178):

      “These changes in sEPSC amplitude and frequency are not specific for somatic, pre- or postsynaptic adaptations. However, the results show that blocking AP firing with TTX successfully induced homeostatic plasticity under our experimental conditions.”

      (3) Line 132. Could the authors explain the rationale for using AP amplitude as a measure of neuronal "viability"?

      In a response to Cell, Li et al. suggested that the lack of a TTX effect was due to recordings from unhealthy neurons and that small AP amplitudes could indicate impaired cell viability. Indeed, we also believe that cells which appear morphologically less healthy tend to have small and slow APs. A mechanistic rationale could be a change resting membrane potential or changes in the expression of voltage-gated sodium and potassium channels. However, AP amplitudes were not affected following TTX treatment in any of the eleven recording conditions (Fig. 2D) or a cross-conditional comparison (Fig. 2E). In the revised manuscript, we have now added a possible rationale (lines 134–137):

      “Because unhealthy neurons tend to have small and slow APs, possibly due to changes in resting membrane potential or expression of voltage-gated sodium and potassium channels, we first analyzed AP amplitude as a measure of neuronal viability.”

      Reviewer #3 (Recommendations for the authors):

      I propose addressing the following questions, either through additional experiments (recommended) or a deeper theoretical discussion:

      (1) Since the authors demonstrate that blocking glutamatergic neurotransmission in dissociated hippocampal neurons causes AP broadening, do similar phenomena occur in organotypic cultures and dissociated neocortical neurons?

      We thank the reviewer for the interesting question. In dissociated hippocampal cultures, we show that AP duration is maintained following treatment with TTX and NBXQ, while Kyn-treatment leads to AP broadening (Figure 1C). To our knowledge, the effect of Kyn on AP duration has not been studied in neocortical dissociated cultured neurons. However, Kyn induced AP broadening in CA3 neurons of hippocampal organotypic cultures (Zbili et al., DOI 10.1073/pnas.2110601118) while CNQX did not induce such broadening in CA1 neurons (Karmarkar and Buonomano, DOI 10.1111/j.1460-9568.2006.04692.x). Both findings are in accord with our recordings from dissociated hippocampal cultures. These data however do not allow inference as to whether AP broadening is a cell-type specific or blocker-specific mechanism in hippocampal organotypic cultures. Because the main focus of our study is the absence of AP broadening in neocortical cultured neurons as described by Li et al., we adjusted the corresponding discussion section (lines 299–322)

      “In contrast, APs were not significantly broader following synaptic block by NBQX (Fig. 1C, D), in accord with recordings from CA1 neurons in organotypic cultures using CNQX. TTX-induced broadening may therefore be cell-type specific or due to a differential effect of the glutamate receptor blockers on NMDA receptors which are blocked by Kyn but not NBQX/CNQX or TTX and which have recently been demonstrated to be important for the induction of synaptic homeostatic plasticity[41].”

      (2) Are BK channels involved in AP broadening observed in CA3 pyramidal neurons in organotypic cultures?

      We thank the reviewer for the question. BK channels control spike duration in CA3 neurons of organotypic cultures (~15% broadening upon block by paxilline; Raffaelli et al., DOI 10.1113/jphysiol.2004.062661). Even though there is no available data on the contribution of BK channels to homeostatic spike broadening in this cell type, CA3 neurons in organotypic cultures thereby fulfil the two necessary preconditions of the model proposed by Li et al. (namely, the control of the resting AP width by BK-channels and TTX-induced AP broadening). We include this possibility in the discussion (lines 355–357):

      “Because BK-channels control AP duration in CA3 neurons of organotypic cultures[79], homeostatic BK-channel downregulation as proposed by Li et al. may be involved in AP broadening in this specific cell type.”

      (3) AP broadening consistently occurs in CA3 neurons within organotypic cultures; what molecular or cellular mechanisms underpin this phenomenon, and is there a potential contribution from glial cells?

      We thank the reviewer for this interesting question. CA3 neurons show AP broadening upon chronic inactivity across various studies that has not been observed in CA1 or DG neurons. Recordings from CA3 neurons served as a positive example for TTX-induced AP broadening in our study, in contrast to a lack of broadening in dissociated (neocortical and hippocampal) cultured neurons. The discrepancy between the results in dissociated and organotypic cultured neurons could indeed be due to interactions with glia cells. We have added this possibility to the discussion in the revised version of the manuscript (lines 270–273)

      “Altered cell-cell interactions with glia and neurons in organotypic and dissociated neuronal cultures could instead contribute to the different findings in various hippocampal preparations.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the role of an E3 ubiquitin ligase ITCH in regulating the viral life cycle of SARS-CoV-2. The authors showed that ITCH mediates ubiquitination of the membrane (M) and envelope (E) proteins of SARS-CoV-2. Ubiquitination of E and M results in enhanced interactions between the structural proteins and redistribution of the structural proteins into autophagosomes. The authors claim that the enhanced interactions between structural proteins and trafficking of the structural proteins into autophagosomes contribute to SARS-CoV-2 replication and egress, prompting ITCH as a potential antiviral target. ITCH also alters the cellular distribution of host proteases important for spike cleavage which protect and stabilize spike with cleavage. The authors also demonstrated that SARS-CoV-2 replication is augmented by ITCH in which virus replication is significantly impaired in cells lacking ITCH expression.

      Strengths:

      The authors provided high-quality data with appropriate experimental controls to justify their claims and conclusions. The mechanistic analyses are excellent and presented in a logical manner. The investigation of the role of ubiquitination in coronavirus assembly and egress is novel as most previous studies focused on its role in mediating innate immune responses.

      Weaknesses:

      Although the authors showed that ITCH ubiquitinates E and M proteins, the claim that such ubiquitination promotes virion assembly and egress is circumstantial. The enhanced interaction between the structural proteins and targeting of ubiquitinated structural proteins into autophagosomes does not necessarily result in increased virion production and release as suggested by the authors. There is a disconnect between the ubiquitination of structural proteins and the role of ITCH in augmenting virus replication as shown in Fig. 6A and B. In addition, the authors showed that the catalytic activity of ITCH is important for the localization and maturation of host proteases. However, the mechanism behind is unknown. Also, it is unclear how protection of spike from cleavage conferred by ITCH explains its role in promoting replication as a lack of spike cleavage would inevitably compromise entry. The major weakness of the manuscript is the lack of experimental data that explains the molecular role of ITCH in relation to its phenotype observed during SARS-CoV-2 infection.

      We sincerely thank the reviewer for the positive evaluation of the quality, rigor, and novelty of our study. We particularly appreciate the thoughtful comments regarding the mechanistic link between ITCH-mediated ubiquitination and viral assembly/egress, as well as the broader implications for SARS-CoV-2 replication.

      Our data support a model in which ITCH-mediated ubiquitination of the structural proteins M and E enhances their interactions and promotes their trafficking into autophagosomal compartments, ultimately contributing to increased virion production and release. The phenotypic outcomes observed in Fig. 6A-B (replaced by re-measured viral infectious titer and genomic copy number in the culture medium of vT2-WT and vT2-KO cells) are consistent with our earlier findings in Figs. 1-5, which demonstrate that ITCH promotes SARS-CoV-2 replication. Thus, the replication defect observed in ITCH-deficient cells aligns with the mechanistic effects of ITCH on structural protein ubiquitination and trafficking.

      We agree with the reviewer that directly linking ubiquitination of structural proteins to virion production would further strengthen the mechanistic connection. However, direct detection of ubiquitinated virions in vitro, particularly by electron microscopy (EM), remains technically challenging. Our laboratory has not yet established an EM-based platform optimized for high-resolution SARS-CoV-2 virion analysis. Furthermore, it is possible that ubiquitin chains conjugated to structural proteins are cleaved during or after virion egress, which would complicate their detection in released particles. These technical and biological considerations currently limit direct visualization of ubiquitinated virions.

      Regarding the role of ITCH in regulating the localization and maturation of host proteases, our recent studies [1, 2] have demonstrated that ITCH is involved in Golgi fragmentation, leading to altered furin distribution and impaired cathepsin L maturation. These findings provide mechanistic insight into how ITCH catalytic activity may influence host protease processing. We have incorporated this discussion into the revised manuscript (last paragraph of the Discussion section) to better contextualize our observations.

      With respect to spike cleavage, although S1/S2 processing is required for SARS-CoV-2 entry, accumulating evidence suggests that excessive intracellular cleavage may be detrimental to virion stability. For example, in Vero cells lacking TMPRSS2, virions containing cleaved S1 and S2 are less stable [3]. Additionally, the D614G substitution renders the spike protein more resistant to cleavage, reduces S1 shedding, and enhances incorporation of intact spike into virions, thereby increasing infectivity and stability [4-6]. These findings suggest that maintaining intact spike during intracellular assembly may be advantageous for the viral life cycle. In this context, ITCH-mediated modulation of host protease distribution and spike processing may help preserve spike integrity within assembling virions.

      Taken together, the ability of ITCH to (i) enhance structural protein interactions, (ii) facilitate trafficking through autophagosomal pathways, and (iii) promote incorporation of intact spike into virions provides a coherent mechanistic framework explaining how ITCH enhances virion production and release. While additional studies will be required to further dissect the precise molecular details, our data collectively support a functional link between ITCH ubiquitin ligase activity and SARS-CoV-2 assembly and egress.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript Qiwang Xiang et al. investigated the role of the E3 ubiquitin ligase ITCH in the life cycle of SARS-CoV-2. They claim the following:

      (i) ITCH promotes virion assembly by interacting with E and M proteins and enhancing their K63-linked ubiquitination

      (ii) ITCH-mediated ubiquitination promotes autophagosome-dependent secretion of viral particles.

      (iii) ITCH stabilizes the viral spike protein by impairing its processing by furin and catepsin L proteases.

      The manuscript provides an interesting exploration of ITCH's role in the SARS-CoV-2 life cycle but requires additional work to strengthen key claims and address potential confounding factors.

      Strengths:

      The experiments are sufficiently clear in documenting that ITCH activity is critical for efficient SARS-CoV-2 replication and for M and E proteins K63-linked ubiquitination

      Weaknesses:

      The manuscript does not convincingly demonstrate how ITCH-mediated ubiquitination of E and M impacts virus assembly and release. Identifying the specific lysine residues in M and E targeted by ITCH, and generating mutant VLPs or recombinant viruses, would strengthen the conclusions.

      Most of the conclusions rely on ITCH overexpression data, which may have off-target effects on Golgi integrity and vesicular trafficking. For instance, figure 4F provides evidence of altered Golgi morphology and TGN46 fragmentation raising concerns that ITCH overexpression could indirectly mislocalize furin, affecting S1/S2 cleavage of the spike protein. In addition, inhibition of furin activity may also lead to off-target effects, given its role in processing numerous host proteins.

      Similarly, ITCH overexpression is likely to indirectly affect cathepsin-L maturation. In addition, the manuscript does not clarify how impaired cathepsin L activity would influence virus assembly or release.

      A major concern is also the lack of quantification and statistical analysis of immunofluorescence images throughout the manuscript, which undermines the reliability of these observations.

      We sincerely thank the reviewer for recognizing the importance of ITCH in SARS-CoV-2 replication and for the constructive and insightful suggestions to further strengthen the manuscript.

      Regarding the impact of ITCH-mediated ubiquitination of E and M on virus assembly and release, our data support a model in which ITCH promotes K63-linked ubiquitination of the E and M proteins, facilitating their recruitment to p62-positive autophagosomal compartments. This recruitment likely enhances the spatial proximity and interaction frequency of structural proteins within assembly sites, thereby promoting efficient virion assembly and subsequent release via autophagosome-dependent secretory pathways.

      We agree that identifying the specific lysine residues in M and E targeted by ITCH and generating mutant VLPs or recombinant viruses would provide a more direct mechanistic link. These are important and technically demanding experiments that require extensive mutagenesis and reverse genetics approaches. While beyond the scope of the current study, we fully acknowledge their value and plan to pursue these directions in future work to further refine the mechanistic understanding of ITCH-dependent ubiquitination during coronavirus assembly.

      Regarding the reliance on ITCH overexpression systems, we acknowledge the reviewer’s concern that ectopic ITCH expression may affect Golgi integrity and vesicular trafficking. Indeed, our recent studies [1, 2] demonstrate that ITCH catalytic activity disrupts Golgi structure, resulting in altered furin distribution and impaired cathepsin L maturation. These findings provide mechanistic context for the phenotypes observed in the present study and suggest that ITCH regulates host protease localization through defined cellular pathways rather than nonspecific overexpression artifacts. We have now expanded the Discussion section (last paragraph) to clarify this mechanistic framework.

      Importantly, SARS-CoV-2 infection itself significantly activates endogenous ITCH, and therefore our ectopic expression system likely mimics infection-induced ITCH activation rather than representing a purely artificial condition. In addition, key phenotypes, such as reduced viral replication and altered structural protein behavior, are consistently observed in ITCH-deficient cells, supporting the physiological relevance of ITCH activity in the viral life cycle.

      Regarding cathepsin L (CTSL) maturation, we have expanded the Discussion to clarify how impaired CTSL activity may influence viral assembly and egress. ITCH inhibits CTSL maturation, thereby reducing excessive spike cleavage into smaller fragments. Although CTSL-mediated spike processing facilitates genome release following endocytosis [7, 8], CTSL is a lysosomal protease, and lysosomes are exploited by β-coronaviruses as egress organelles [9]. Excessive lysosomal proteolysis may therefore compromise virion integrity during egress. In this context, ITCH-mediated inhibition of CTSL maturation may preserve spike stability within assembling or trafficking virions, thereby promoting the production and release of infectious particles during the replication phase.

      Regarding quantification and statistical analysis of immunofluorescence data, we appreciate this important point. In the revised manuscript, we have included expanded image panels with increased cell numbers, quantitative colocalization analyses to enhance the rigor of these observations.

      Reviewer #3 (Public review):

      Summary:

      Xiang et al. investigated the role of ubiquitin E3 ligase ITCH in SARS-CoV-2 replication. First, they described the role of ITCH on the structural proteins. Here, the ubiquitination of E and M (but not S) leads to an enhanced interaction and presumably virion assembly. In addition, E and M ubiquitination seems to be necessary for p62-guided sequestration into autophagosomes for secretion. Furthermore, ITCH regulates S proteolytic cleavage by changing furin localization and inhibiting CTSL protease maturation. In addition, SARS-CoV-2 infection upregulates ITCH phosphorylation, whereas knockout of ITCH reduces SARS-CoV-2 replication.

      Strengths:

      The proposed study is of interest to the virology community because it aims to elucidate the role of ubiquitination by ITCH in SARS-CoV-2 proteins. Understanding these mechanisms will address broadly applicable questions about coronavirus biology and enhance our knowledge of ubiquitination's diverse functions in cell biology.

      Weakness:

      The involvement of ubiquitin ligases in SARS-CoV-2 replication is not entirely new (see E3 Ubiquitin Ligase RNF5; Yuan et al., 2022; Li et al., 2023). While the data generally support the conclusions, additional work is needed to confirm the role of ITCH in SARS-CoV-2 replication in a biologically relevant context. The vast majority of data is based on transient overexpression experiments of ITCH, which ultimately leads to massive ubiquitination of several viral and host cell factors, including potentially low-affinity substrates not typically recognized under physiological conditions. In addition to that, nearly all experiments were done in cells co-overexpressing ITCH and the viral structural proteins (or cellular proteases) in HEK293T cells. Therefore, a proteomic analysis of protein ubiquitination in a) SARS-CoV-2-infected cells (ideally several cell types) and b) SARS-CoV-2-infected v2T-ITCH-KO cells would verify the ITCH-related ubiquitination of e.g., E and M and would strengthen the whole manuscript. In addition, the few key experiments using SARS-CoV-2 infected cells were performed in VeroE6 cells, which are neither human nor lung-derived. Only in one experiment were lung-derived Calu3 cells included.

      Moreover, the manuscript names ITCH as a central regulator of SARS-CoV-2 replication. If ITCH is beneficial for E and M interaction and thereby aids virion assembly, showing its effect on VLP production would be desirable. Clarifications regarding data acquisition and data analysis could strengthen the manuscript and its conclusions.

      We sincerely thank the reviewer for the thoughtful evaluation and for highlighting the importance of demonstrating physiological relevance.

      We agree that the involvement of E3 ubiquitin ligases in SARS-CoV-2 replication is not entirely unprecedented. Accordingly, we have expanded the Introduction to discuss RNF5 and other E3 ligases previously implicated in SARS-CoV-2 biology (e.g., Yuan et al., 2022; Li et al., 2023), thereby clarifying how ITCH differs mechanistically.

      Regarding the reliance on transient overexpression systems, we acknowledge the reviewer’s concern. Importantly, SARS-CoV-2 infection itself significantly induces ITCH phosphorylation and activation. Therefore, our ectopic expression system likely mimics infection-driven ITCH activation rather than representing a purely artificial condition. Moreover, key findings, including reduced viral replication and diminished E/M ubiquitination, were validated in ITCH knockout cells, supporting the physiological relevance of ITCH-dependent structural protein ubiquitination under endogenous conditions.

      We appreciate the suggestion to perform a global proteomic analysis of ubiquitinated proteins in (i) SARS-CoV-2-infected cells and (ii) SARS-CoV-2-infected ITCH-KO cells. Such analyses would indeed provide a comprehensive and unbiased assessment of ITCH-dependent ubiquitination events. While this approach is beyond the scope of the current study, we fully recognize its value and plan to pursue it in future investigations to further refine the mechanistic understanding of ITCH-mediated ubiquitination during coronavirus assembly.

      With respect to the cellular models used, Vero E6/TMPRSS2 cells are widely established for SARS-CoV-2 propagation due to their robust viral replication, rapid growth, and reduced culture-adapted mutations. Compared with Calu-3 cells, which grow more slowly and may acquire specific adaptations in certain viral genes during prolonged passage, Vero E6/TMPRSS2 cells maintain high viral stability and reproducibility, making them suitable for mechanistic studies. Nevertheless, we agree that human lung-derived systems are highly relevant, and we have included Calu-3 cell data where feasible to support translational relevance.

      Regarding the role of ITCH in virion assembly, our data in Fig. 2 demonstrate that ITCH-mediated K63-linked ubiquitination enhances the interaction between E and M proteins, supporting a functional role in virus-like particle (VLP) formation. We agree that direct visualization and quantification of VLP production by EM would further strengthen this conclusion. Such experiments require additional optimization and will be pursued in future work to provide more direct structural evidence.

      Finally, in response to the reviewer’s comments on data acquisition and analysis, we have expanded image panels, increased the number of quantified cells, and included quantitative colocalization analyses with appropriate statistical evaluation in the revised manuscript to enhance rigor and reproducibility.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should compare the infectivity of SARS-CoV-2 generated in cell lines expressing or lacking ITCH to investigate the effects of ITCH on infectivity, possibly by measuring RNA to PFU ratio and determining the S cleavage pattern in purified virions.

      We re-measured the viral infectious titer and genomic copy number in the culture medium of vT2-WT and vT2-KO cells infected at an MOI of 0.0001 for 24 h. ITCH ablation reduced the viral copy number by approximately 8-fold (Fig. 6B), while the infectious titer (TCID<sub>₅₀</sub>) decreased by at least 25-fold (Fig. 6A), indicating that loss of ITCH markedly impairs the formation of infectious viral particles. This finding is consistent with the role of ITCH in promoting Spike (S) protein cleavage.

      As suggested, to assess the S cleavage pattern in secreted virions, we precipitated proteins from the culture medium of SARS-CoV-2–infected cells with or without ITCH expression. Analysis of the precipitated S proteins revealed that the loss of ITCH markedly altered the integrity of full-length S in SARS-CoV-2 virions (Fig. S7A).

      (2) The authors should strengthen the connection between ubiquitination of structural proteins and viral egress by measuring infectious virus particles in the supernatants from cells with or without ITCH expression by plaque assay. However, this cannot be accurately achieved without performing the experiment described in point 1 as cleavage of spike and infectivity would affect the results.

      While a plaque assay was not performed, we quantified infectious viral particles in the supernatants using the TCID<sub>₅₀</sub> assay. These analyses showed that loss of ITCH resulted in a marked reduction in infectious virion production (>25-fold; Fig. 6A). In contrast, viral genomic copy numbers, which reflect both infectious and non-infectious particles, were reduced by approximately eightfold (Fig. 6B). The disproportionate reduction in infectious titer relative to viral copy number (approximately threefold difference) is consistent with a defect in virion infectivity, most likely due to impaired S cleavage in the absence of ITCH (Fig. S7A). The reduction in viral copy numbers suggests that ITCH-dependent ubiquitination of viral structural proteins contributes to efficient viral assembly and egress.

      (3) The authors should strengthen the connection between ubiquitination of structural proteins and virion assembly by EM.

      We appreciate the reviewer’s insightful comment. However, detecting ubiquitinated virions in vitro via electron microscopy (EM) remains technically challenging. At present, our laboratory has not yet established an EM-based system optimized for SARS-CoV-2 virion analysis. Moreover, it is also possible that ubiquitin chains present on virions may be cleaved during or after the viral egress process, further complicating their detection.

      Reviewer #2 (Recommendations for the authors):

      Supp. Figure 2: the authors should provide sequencing data for both ITCH-KO clones for consistency.

      The sequence for both ITCH-KO clones have been included now (Fig. S2C).

      Figure 2: All interaction data between structural proteins and p62 rely on ITCH overexpression. It would be helpful to include data in ITCH-KO cells as controls to validate these findings.

      As suggested, we performed E-based immunoprecipitation in wild-type (WT) and ITCH-knockout (KO) cells and found that E pulled down less p62 in the absence of ITCH, confirming that ITCH-mediated ubiquitination of E facilitates its interaction with p62 (Fig. 3C).

      Figure 3H: Verify the middle LC3B panel, as it does not match the merge panel. Please, correct any discrepancies.

      We thank the reviewer for pointing out this error. Fig. 3H (now Fig. 3J) has been corrected accordingly.

      Figure 4F: the labeling of the different panels seems incorrect.

      We have corrected the figure labeling.

      The authors should perform cell viability assays in clomipramine-treated cells. In addition, the authors should clarify whether clomipramine's antiviral effects depend on ITCH expression, given the comparable virus copy numbers in treated WT (Fig. S7B) and ITCH-KO cells (Fig. S7C)

      We thank the reviewer for this helpful comment. As shown Author response image 1., while clomipramine (Clom) treatment for 48 hours resulted in a modest reduction in cell number compared with the DMSO control, no apparent cell death was detected under these conditions.

      Author response image 1.

      Vero-TMPRSS2 (A) or Vero-ITCH-KO (B) cells were treated with DMSO or chloroquine (Clo) for 48 h, and cell viability was assessed by calcein AM staining (n = 3).

      Reviewer #3 (Recommendations for the authors):

      Results:

      Fig.2A and 2E display controversial results with different outcomes depending on the used bait. In my opinion, in both approaches, the overexpressed ITCH should be able to ubiquitinate M and E (since they are co-expressed). However, the interaction of E and M is not affected by the overexpression of ITCH or ITCH-CS when E is used as a bait (Fig.2A). In contrast, the interaction of E and M is enhanced in the presence of overexpressed ITCH (Fig.2E), when M is used as a bait.

      We thank the reviewer for pointing this out. It should be noted that the blots display only the major (un-ubiquitinated) bands of E and M. When M was used as the bait, more E (main band, un-ubiquitinated form) was co-precipitated in the presence of ectopically expressed ITCH. In contrast, when E was used as the bait, comparable levels of M (main band, un-ubiquitinated form) were detected regardless of ITCH expression. These results suggest that ubiquitin-modified M can bind more E, whereas ubiquitin-modified E does not significantly affect its interaction with M. A more detailed explanation has been added to the revised text.

      Fig.3A+3F: The authors claim a reduced E secretion when ITCH-KO cells or shRNA-treated p62 cells are used. I believe an input loading control of the supernatant displaying an equal amount of e.g. BSA is missing.

      In response to the reviewer’s suggestion, we have now included Coomassie Brilliant Blue (CBB) staining of the culture medium (now shown in Fig. 3A and Fig. 3F).

      Fig.3B: ITCH does not interact with E (or M) alone in the displayed data. The data is comparable with data observed for the interaction with S (Supp.4A). However, the author claims that ITCH interacts with M and E but not S (page 11).

      We would like to clarify that in ECL-based Western blotting, strong signals can mask weaker ones due to contrast limitations. In this experiment, ectopic expression of ITCH produced a strong signal that obscured the endogenous ITCH band. Upon longer exposure, the endogenous ITCH signal becomes visible. Additionally, our data presented in Fig. 1 and the new data in Fig. 3C demonstrate the interaction between the relevant proteins.

      Fig 3F: A scrambled control is missing. Moreover, it would be desirable to see if overexpression of p62 would enhance E release to verify that ITCH ubiquitination and p62-positive autophagosomes are necessary for E release.

      We appreciate the reviewer’s comment. Proteins in the culture medium were precipitated using TCA, and Coomassie Brilliant Blue (CBB) staining has been included (now shown in Fig. 3F). Additionally, E release was examined in the presence of overexpressed p62, and the results showed that p62 overexpression increased the level of E detected in the medium (now shown in Fig. 3G).

      Fig.3: Overall, an experiment using, e.g. cycloheximide (protein synthesis inhibitor) and MG132 (proteasome inhibitor) would strengthen the hypothesis that E and M are not degraded in a lysosome after ITCH overexpression. In my opinion, a colocalization experiment with LAMP1 is unsuitable to draw this conclusion. Would the overexpression of a deubiquitinating enzyme diminish M, E and p62 interaction? Does ITCH/p62 only regulate the release of the overexpressed single E or M protein, or does it also affect VLP release? An experiment analyzing purified VLPs produced in ITCH- or ITCH-CS overexpressing cells would be desirable.

      We thank the reviewer for these important questions. As suggested, we performed additional CHX and MG132 experiments. As shown in Fig. 3H and Fig. S3I, degradation of both E and M proteins was blocked by MG132 treatment, indicating that they are degraded via the proteasome pathway. Notably, MG132 treatment did not rescue the ITCH-mediated decrease of E/M levels, suggesting that the ITCH-dependent reduction of E and M is not mediated through the proteasome pathway. In addition, our recent back-to-back studies [1, 2] demonstrated that ITCH overexpression inhibits lysosomal function by impairing hydrolase maturation, suggesting that ITCH-mediated ubiquitination of E or M is unlikely to promote their degradation through the lysosomal pathway. Together, these data suggest that ITCH-mediated reduction of E and M is not due to enhanced degradation but is instead associated with their secretion.

      Overexpression of deubiquitinating enzymes specifically targeting E or M (which remains to be identified) would likely reduce their interaction with p62.

      Our data indicate that ITCH-mediated ubiquitination of E and M enhances their mutual interaction, supporting a role for this process in virus-like particle (VLP) formation. P62 would facilitate the release of VLPs by promoting the secretion of ubiquitinated E and M. In addition, the data presented in Fig. 2 indicate that ITCH enhances the mutual interaction of these structural proteins, thereby promoting virus-like particle (VLP) formation.

      Fig.4A: PPC site mutation indicated in yellow. There is no yellow color.

      We have revised the label to read “PPC site mutation indicated in red and green”.

      Fig.4C: Why should the overexpression of ITCH or ITCH-CS affect the S protein cleavage when the cleavage site is anyhow mutated?

      In this analysis, we aimed to verify that neither ITCH nor ITCH-CS affects the cleavage pattern of the mutated S protein. As these data are already presented in Fig. 4D (now Fig. 4C), the redundant result has been removed, and the corresponding description has been added to the revised text.

      Fig.4C: Lysates from the single expression of S wt protein (-ITCH/ +ITCH-CS; as indicated in Fig.4B) is missing for comparison to S mut protein.

      As these controls and related data are already presented in Fig. 4D (now Fig. 4C), the redundant result here has been removed.

      Fig. 4D: Lane 5 and Lane 7 are labeled similarly. ITCH+ in Lane 5 needs to be removed.

      We thank the reviewer for pointing out this error. The labeling (now Fig. 4C) has been corrected.

      Fig 4G: A theoretical MOI of 1 does not lead to an infection of all cells. Therefore, including a third marker for infection control, e.g., N protein, would be helpful. This would clarify whether the changes in furin localization are due to infection.

      We appreciate the reviewer for raising this point. Our goal was to examine whether SARS-CoV-2 infection affects the localization of furin (mouse antibody) relative to the Golgi marker (rabbit antibody). As suitable E, N, or M antibodies raised in goat or donkey were not available, we could not include those markers in this experiment. However, we did confirm M protein expression in parallel, and the infection efficiency was higher than 80% (Author response image 2.). To further validate that the observed changes in furin localization were due to viral infection, we have now included additional images showing a larger field of view containing more cells .

      Author response image 2.

      Fig.4: Generally, the colocalization of proteases with TGN46 should be analyzed quantitatively using, for example, Madner's overlap coefficient. This would be needed to draw the conclusion stated in the manuscript.

      We appreciate the reviewer’s suggestion. We now have included the colocalization analysis in the Fig. 4E and F.

      Fig.4/5: Overview IF pictures displaying additional cells would be desirable to clarify furin/cathepsin L localization in ITCH/ITCH-CS expressing cells. Otherwise, it looks (in my opinion) very subjective.

      In response to the reviewer’s suggestion, we have included additional images with a larger field of view encompassing more cells for Fig. 4 and 5 (presented in Fig. S5B and S5H).

      Fig.5D/G: MOI is missing in the figure legend.

      As suggested, the MOI information has been added to the figure legend.

      Fig.5D/G/6C/F: Infection control (e.g., N-protein) is missing in the Western Blots.

      We have added the infection control M in the figures.

      Fig.6: Why is the overall amount of ITCH reduced during the course of infection?

      We appreciate the reviewer for raising this point. As shown in Fig. 6C and F, ITCH was significantly activated, as indicated by its phosphorylation at the T222 site during viral infection. This activation promotes ITCH self-ubiquitination.

      Fig.6A: Would an overexpression of ITCH enhance viral replication?

      Moderate upregulation of ITCH promotes viral replication, whereas excessive ITCH overexpression leads to cell death, which in turn partially reduces viral titers.

      Discussion:

      Is there an explanation of how ITCH changes furin localization and CSTL maturation?

      Our recent back-to-back studies[1, 2] demonstrated that ectopic ITCH expression disrupts Golgi integrity, resulting in altered furin distribution and impaired CSTL maturation. The relevant discussion has now been incorporated into the revised text (last paragraph of the Discussion section).

      It would also be helpful to discuss the role of other known ubiquitin ligases like RNF5 in the replication of SARS-CoV-2 and other CoVs. Since the pandemic began, many interactome and host-factor studies in various cell types have been published. None of these studies identified ITCH so far. Could you comment on this?

      As suggested, we have included additional known ubiquitin ligases involved in SARS-CoV-2 replication and in other viral systems (see the third paragraph of the Introduction).

      Overall, in my opinion, the figure legends need to be improved. It is often not clear if ITCH is endogenously detected or overexpressed.

      We thank the reviewer for the helpful suggestion. Additional details have been incorporated into the figure legends.

      (1) Xiang Q, Lu Y, Wang H, Chen H, Chen P, Zhao X, et al. ITCH regulates Golgi integrity and proteotoxicity in neurodegeneration. Science Advances 2025; 11:eado4330.

      (2) Xiang Q, Liu Y, Wang J. Golgi fragmentation driven by the USP11-ITCH axis triggers autolysosomal failure in neurodegeneration. Autophagy 2026.

      (3) Peacock TP, Goldhill DH, Zhou J, Baillon L, Frise R, Swann OC, et al. The furin cleavage site in the SARS-CoV-2 spike protein is required for transmission in ferrets. Nature microbiology 2021; 6:899-909.

      (4) Zhang L, Jackson CB, Mou H, Ojha A, Peng H, Quinlan BD, et al. SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity. Nature communications 2020; 11:1-9.

      (5) Plante JA, Liu Y, Liu J, Xia H, Johnson BA, Lokugamage KG, et al. Spike mutation D614G alters SARS-CoV-2 fitness. Nature 2021; 592:116-21.

      (6) Daniloski Z, Jordan TX, Ilmain JK, Guo X, Bhabha G, Sanjana NE. The Spike D614G mutation increases SARS-CoV-2 infection of multiple human cell types. Elife 2021; 10:e65365.

      (7) Jaimes JA, Millet JK, Whittaker GR. Proteolytic cleavage of the SARS-CoV-2 spike protein and the role of the novel S1/S2 site. IScience 2020; 23:101212.

      (8) Zhao M-M, Yang W-L, Yang F-Y, Zhang L, Huang W-J, Hou W, et al. Cathepsin L plays a key role in SARS-CoV-2 infection in humans and humanized mice and is a promising target for new drug development. Signal transduction and targeted therapy 2021; 6:1-12.

      (9) Ghosh S, Dellibovi-Ragheb TA, Kerviel A, Pak E, Qiu Q, Fisher M, et al. β-Coronaviruses use lysosomes for egress instead of the biosynthetic secretory pathway. Cell 2020; 183:1520-35. e14.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this remarkable study, the authors use some of their recently-developed oxytocin receptor knockout voles (Oxtr1-/- KOs) to re-examine how oxytocin might influence partner preference. They show that shorter cohabitation times lead to decreased huddling time and partner preference in the KO voles, but with longer periods preference is still established, i.e., the KO animals have a slower rate of forming preference or are less sensitive to whatever cues or experiences lead to the formation of the pair bond as measured by this assay. This helps relate the authors' recent study to the rest of the literature on oxytocin and partner preference in prairie voles. To better understand what might lead to slower partner preference, they quantified changes to the durations and frequency of huddling. In separate assays, they also found that Oxtr1-/- KOs interacted more with stranger males than wild-type females. In a partner choice assay, they found that wild-type males prefer wild-type females more than Oxtr1-/- KO females. They then performed bulk RNA-Seq profiling of nucleus accumbens of both wild-type and Oxtr1-/- KO males and females, either housed with animals of the same sex or paired with a wild-type of the opposite sex. 13 differentially expressed genes were identified, mostly due to downregulation in wild-type females. These genes were also identified in a module lost in the Oxtr1-/- voles by correlated expression profiling. They also compared results of transcriptional profiling in female and male wild-type vs Oxtr1-/- voles (independently of bonding state) and found hundreds of differentially expressed genes in nucleus accumbens, mostly in females and often with some relation to neural development and/or autism. Some of the reduction in the transcript was confirmed with in-situs, as well as compared to changes in transcription in the lateral septum and paraventricular nucleus (PVN) of the hypothalamus. Finally, they find fewer oxytocin+ and AVP+ neurons in the anterior PVN.

      Strengths:

      This is an important study helping to reveal the effects of oxytocin receptor knockout on behavior and gene expression. The experiments are thorough and reveal a surprising number of genetic and anatomical differences, with some sexual dimorphism as well, and the authors have more carefully examined the behavioral changes after shorter and longer periods of partner preference formation.

      We thank Reviewer #1 for the positive assessment of the study’s significance and for recognizing the value of our behavioral and transcriptional analyses in refining the role of oxytocin signaling in pair bonding.

      Weaknesses:

      It is surprising that given all the genetic changes identified by the authors, the behavioral phenotypes are fairly mild. The extent of gene changes also might be underreported given the variability in the behavior and relatively low number of animals profiled.

      Pair bonding is a robust behavior composed of distinct modules that are supported by redundant and compensatory neural pathways. Our findings support a model in which Oxtr functions in parallel with other mechanisms to modulate specific components of social attachment. We have addressed this point in the discussion. We have also updated our result and method section to more clearly reflect our cohort size which is comparable to similar studies.

      Reviewer #1 (Recommendations for the authors):

      How do the wild-type males 'know' which animal is which during the three-chamber assay test of Figure 4B? Do the Oxtr1-/- KO females act in some way different from the wild types in this experiment?

      We thank the reviewer for this question. During follow-up analyses prompted by reviewer requests to characterize the behaviors underlying the apparent bias in WT male choice, we discovered a labeling error in the metadata used to analyze these assays. The error flipped the genotypes of the tethered stimulus animals at the ends of the chamber. After correcting this error and reanalyzing the data, we find that naïve WT males do not show a significant preference for naïve WT females over naïve Oxtr<sup>1-/-</sup> females. We have reconfirmed the metadata used in all assays in this study; no other datasets or conclusions are affected.

      While overall choice frequency is equivalent for males and females, our revised analyses demonstrate that Oxtr loss nonetheless alters the dynamics of social interactions in a sex-specific manner. In particular, the presence of an Oxtr<sup>1-/-</sup> male significantly alters WT females’ social behavior—enhancing prosocial engagement and reducing aggression—independent of which male is ultimately chosen. These findings support the conclusion that Oxtr function modulates early reciprocal social interactions rather than categorical choice outcomes.

      MOAT and LOAT seem like cumbersome acronyms, more so than something simpler like vole 1 vs vole 2.

      We have replaced these acronyms throughout the manuscript with the simpler, descriptive terminology; winner (MOAT) and loser (LOAT).

      Only three animals per condition seemed to have been used for RNA-Seq studies in Figure 5. Given the high behavioral variability in the earlier figures, did the authors screen for animals with exemplar or similar behavior within groups? The lack of significance of other genes or across other groups might just be due to a low-powered experiment given the high behavioral and genetic variability.

      We thank the reviewer for raising the important point regarding behavioral preselection, which has been performed in some similar studies. For our study, animals were not preselected based on exemplar or matched behavioral performance prior to tissue collection, as doing so would risk introducing variation in gene expression patterns due to the experience of complex social interactions. Instead, given that our prairie vole lines are maintained on an outbred background, tissue from three animals was pooled for each RNA-seq sample to reduce inter-individual variability and to capture representative transcriptional states within each experimental group. While this approach increases robustness to individual variability, we acknowledge that it may limit sensitivity to detect low expression behavior linked gene transcripts.

      On lines 426-429, the authors state that "While there was no significant difference in Oxtr transcript levels by genotype (padj = 0.753)-consistent with minimal nonsensemediated decay despite a premature stop codon-we have previously shown that no functional protein is produced in Oxtr1-/- animals (52)." This assertion could use strengthening, even if just to explain how this was verified in their previous publication. What is the evidence for nonsense decay and a full knockout of functional receptors at the protein level?

      We agree that this point benefits from clarification. Although Oxtr transcript levels were not significantly different by genotype (padj = 0.753), consistent with minimal nonsense-mediated decay, transcript abundance alone does not reflect receptor functionality. In our prior study, we directly assessed Oxtr protein function using receptor autoradiography and found a complete absence of specific ligand binding in Oxtr<sup>1-/-</sup> animals across brain regions that show robust Oxtr binding in wild-type voles, demonstrating a full loss of functional receptor protein. We have clarified this in our manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript uses a recently published oxytocin receptor null prairie vole line to examine the effects of this mutation on pair bonding behavior and PVN gene expression. Results reveal that Oxtr sex specifically influences early courtship behavior and partner preference formation as well as suppressing promiscuity toward novel potential mates. PVN gene expression varies between Oxtr null and WT prairie voles.

      Strengths:

      Behavioral analyses extend beyond the typical reporting of frequency and duration. The gene expression models and analyses are well-done and convincing. The experimental designs and approaches are strong.

      We thank Reviewer #2 for highlighting the strengths of the gene expression modeling and behavioral analyses.

      Weaknesses:

      More details and background literature explaining the role of the Oxt system in pair bonding behaviors is necessary, particularly for the Introduction. The authors overstate several times that Oxtr expression is not necessary for partner preference formation, based on their previous findings. However, it does appear, particularly, in the short cohabitation that it is necessary. Thus, the nuanced answer may be that Oxt may accelerate partner preference formation. Improving the presentation of the statistics and figures will make the manuscript more reader-friendly.

      We thank the reviewer for this thoughtful feedback and agree that additional background on the oxytocin (Oxt) system’s role in pair bonding will strengthen the manuscript. We have revised the introduction to expand our discussion of prior pharmacological and comparative studies suggesting that Oxt signaling modulates multiple components of pair bonding.

      Finally, in response to the reviewer’s suggestion, we have improved the presentation of figures and statistical reporting by interlacing figures with figure legends and updating the supplementary statistics table.

      Reviewer #2 (Recommendations for the authors):

      Major concerns

      (1) The Introduction provides a "broad strokes" approach to link the oxytocin and vasopressin systems as neuromodulators of social attachment processes. This study is a follow-up to a recent publication by the senior authors' groups which reported that the Oxtr null prairie voles were able to form typical pair bonds. Now, the authors are revisiting the same question by developing a series of behavioral assays to probe distinct aspects of pair bonding behavior. However, the Introduction lacks a nuanced examination of how the oxytocin system has been shown to regulate an array of social behaviors in prairie voles and other social species.

      We thank the reviewer for this observation and agree that the original Introduction did not capture the breadth and nuance of oxytocin system involvement in social behavior. We have substantially revised the Introduction in response to the reviewer’s suggestion to include a more detailed discussion of the role played by oxytocin signaling in social behaviors displayed across multiple phyla, including during the early stages of pair bonding.

      (2) In addition, there seems to be relevant viral Oxtr KD and KO studies in prairie voles which could be referenced to reflect differences between acute pharmacological Oxtr inhibition and prolonged viral KD of Oxtr on behavioral outcomes. This could also be put into context with the authors' first paper in prairie voles and others' work with mice showing how congenital Oxtr null rodent models may result in behavioral changes that are not reflected in the pharmacological or viral manipulation research. This could help justify the approach of the current study.

      We thank the reviewer for suggesting this comparison and have included a section in the discussion comparing pharmacological manipulations and global knock outs as well as the discrepancy in phenotypes that arise due to these methods. This expanded discussion clarifies why a congenital genetic model provides complementary insights: it allows us to identify which components of pair bonding are robust to developmental loss of Oxtr and which remain sensitive, thereby distinguishing between Oxtr-dependent behavioral modules and those supported by parallel mechanisms. Additionally, we have included viral manipulations of Oxtr in prairie voles during the early phase of interactions between the sexes in the introduction, to contextualize our study in the broader field. 

      (3) On lines 129-130: The authors state, "We previously found that Oxtr is not required for the display of partner preference following 1 week of cohabitation". While this is the general conclusion of their previous publication, this seems like a rather larger overgeneralization. There are many studies that have documented the functional regulation and necessity of the Oxt system for partner preference behavior in prairie voles. Therefore, it would be more appropriate to state that their previous study demonstrated that "Oxtr null prairie voles are able to develop a partner preference", but not that Oxtr is not necessary for partner preference formation. This may be a question about when the KO occurs, whether it be congenital or conditional.

      (4) This statement is repeated in Lines 350-352. However, the authors can now qualify this statement at this point in the manuscript with their new data which suggests that Oxtr null voles fail to form a partner preference after short cohabitation, but WT still form such preferences. This would suggest the qualification of this statement should be on the onset of partner preference formation as Oxtr is necessary for partner preference formation after a "short" cohabitation. Therefore, both findings are more in line with previous results which suggest that Oxt signaling accelerates partner preference formation.

      We have revised this language throughout the manuscript to state that our prior work demonstrated that Oxtr null voles are capable of forming a partner preference after extended cohabitation.

      (5) It appears Supplementary Table 1 is not scaled to the page size, so not all statistical results are clear. This limits the accuracy of my review.

      This table has been reformatted to ensure all statistical results are properly scaled to page size.

      (6) It is not always clear what statistical analyses are being performed. For example, how were the data in Figures 4G-H analyzed? What statistics were used and the output should be more readily available.

      During follow-up behavioral analyses prompted by Reviewer #1 requests to characterize the basis of the apparent WT male bias, we discovered a labeling error in the metadata associated with a subset of naïve three-chamber choice assays. In these cases, the genotypes of the tethered stimulus animals had been inadvertently flipped. After correcting this error and reanalyzing the data, we find that naïve WT males do not show a significant preference for naïve WT females over naïve Oxtr1-/- females. We have rechecked the metadata for all assays included in this study and confirmed that this was the only instance in which such an error occurred. We further analyzed the temporal dynamics of naive choice to find that Oxtr function modulates early reciprocal social interactions but does not affect the genotype ultimately chosen.

      To improve the clarity of the statistical analyses performed, we have reformatted our presentation of figure legends and our statistics table. All statistical tests, sample sizes, and relevant parameters (including exact tests used, correction methods where applicable, and definitions of units of analysis) are explicitly stated in the figure legends and compiled in the supplementary statistical summary table, in accordance with eLife reporting guidelines.

      (7) Oxytocin plays a critical role in development as early as embryogenesis. It may be useful to frame some of the Introduction and Discussion recognizing the congenital deletion of Oxtr may affect much of development. With that in mind, it is not surprising to see changes in gene expression associated with neurodevelopmental disorders.

      We now explicitly acknowledge in both the Introduction and Discussion that congenital Oxtr deletion likely impacts neural development which provides context for the observed enrichment of neurodevelopmental gene expression changes.

      Minor concerns

      (1) It was not clear why vasopressin was referenced in the Introduction. Specifically, the study documents that Oxtr null prairie voles have a reduction in Avp neurons in the PVN, which would suggest some aspects of Oxt signaling regulate Avp expression. However, the Introduction is not focused on how Oxt regulates the Avp system but rather on how each is a modulator of social attachment. It would improve the justification of this study to focus on Avp expression if the Introduction presented this concept.

      We thank the reviewer for pointing out the need for greater clarity around our reference to vasopressin (Avp) in the Introduction. We have simply stated that the potential for pair bonding is correlated with the patterns of expression of Oxtr and V1ar in the introduction. The goal of this study was to find evidence of behavior and gene expression changes due to the chronic loss of Oxtr which lead to our finding that a population of Avp neurons is lost in the animals lacking Oxtr. As we did not intend to justify our study on this basis, we have clarified our discussion to include previous studies where OT manipulation affects Avp neurons.

      (2) Figures and supplemental figures need figure legends.

      We have re-arranged the figure legends for each figure (including the supplementary figures) to follow the figures for easier readability and accessibility.

      (3) Figure 1 Timeline is focused more on the male timeline with "bond formation" and "bond maintenance" reflecting the days required to form a partner preference for males. The figure should be revised to reflect similar time points for female pair bonding.

      Figures have been revised to reflect each sex's bonding timeline.

      (4) Figure 1 has a color theme with females represented by red/pink and males represented by dark/light blue. However, this is not true for Figures 1C and 1D. Please revise these color schemes.

      Color schemes have been standardized across all figures.

      (5) It is not clear what is being graphed in Figures 2 and 3. The duration graphs have many more data points than the frequency graphs. Can this be explained?

      We thank the reviewer for pointing out this lack of clarity. The difference in the number of data points reflects how these measures are defined. Duration plots are generated at the level of individual huddle events, specifically pooling all huddles whose duration falls within the top quartile for a given animal, whereas frequency plots are generated at the level of individual animals and therefore contain one data point per subject. As a result, duration graphs necessarily include more data points than frequency graphs. The figure legends and Methods section explicitly state the unit of analysis for each metric and to clarify why the number of data points differs between duration and frequency plots.

      (6) What are the black bars in Figure 4H meant to represent?

      We thank the reviewer for this question. In the original submission, the black bars in Figure 4H were intended to indicate time periods showing statistically significant convergence in the chooser’s preference for the MOAT (More Of Assay Time, now winner) animal, based on the sliding preference index analysis. However, as mentioned during revision we identified a metadata error affecting the dataset used to generate this figure. After correcting the error, the figure was fully reanalyzed and regenerated. As a result, Figure 4H now presents a different analysis and no longer includes these black bars, and the conclusions drawn from this panel have been revised accordingly. The updated figure, legend, Results text and statistics table now accurately reflect the new analysis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      …other neurons such as AWB, AWA, and ADL are also involved in the coding process. These neurons likely communicate with different interneurons to contribute to 1-octinduced outputs. The authors' conclusion that loss of tax-4 reduces attractive responses and that osm-9 mutants reduce repulsive responses is not entirely convincing. TAX-4 is required for both AWC (an attractive neuron) and AWB (a repulsive neuron), and osm-9 is essential for ASH, ADL, and AWA (attraction-associated). Therefore, the observed effects on the attractive and repulsive responses could be more complex. Additionally, the interpretation of results involving the use of IAA to reduce the contribution of AWC at lower concentrations lacks clarity. A more effective approach might involve using transgenically expressed miniSOG or histamine (HisCl1) to specifically inhibit AWC neurons.

      We agree that the sensory inputs into chemotactic behavior are likely more complex, involving other neurons besides ASH and AWC. We now explicitly discuss possibility in the Discussion (lines 449-467).

      We have also utilized transgenically expressed HisCl1 in ASH and AWC to address this concern. Crucially, we observe that some of the effects of the broad mutations are reproduced by inactivating ASH and AWC. This finding validates our overall hypothesis that sensory-driven behavior is a balance of simultaneous afferent inputs of opposite valence AND shows that ASH and AWC are involved as expected. We are currently performing a comprehensive analysis of sensory inputs into locomotory decision making, including the neurons mentioned in the Reviewer’s comment.

      We also agree that using IAA is not a very clean way to inactivate AWC. The AWC HisCl results referenced above should alleviate this concern. However, the IAA result does put our findings into a broader context of multi-sensory integration which demonstrates the potential usefulness and selective advantages of the dual-input coding architecture that we are hypothesizing.

      Furthermore, they did not observe significant entrainment of AIB activity with the 2.2 mM 1-oct application. This might be due to the animals being anesthetized with 1 mM tetramisole hydrochloride, which could affect neural activity and/or feedback from locomotion. 

      We now mention these caveats “It is possible that immobilization and anesthetization may be affecting AIB responses to sensory activity and/or proprioceptive feedback from locomotion. However, it is also possible that motor feedback from RIM was obscuring the sensory signal.” Line 357

      It is unclear whether subtracting AVA activity from AIB activity provides a valid measure. Similarly, it is unclear how the behavioral data from freely moving worms compares to the whole-network calcium imaging results obtained from immobilized worms.

      Ray and Gordus 2025 (Current Biology 35:5534) recently demonstrated that AIB activity can be modeled as the additive convolution of AVA, AWC, and AIA activity, lending validity to our subtractive approach. In their study, AVA was the major contributor, but addition of AWC and AIA signals (i.e. sensory inputs) resulted in a significant greater accuracy. We have now mentioned their work in the manuscript (line 363) “To address this possibility, we subtracted AVA activity, representing the motor state, from the AIB activity (AVA closely mirrors RIM), based on the observation that AIB activity can be modeled as the sum of convolutions of motor activity and sensory activity.” (lines 360-363)

      The relationship between network activity in freely moving worms and immobilized worms has been explored by Kato et al 2015 (Cell 163:656-669); we now refer to this work on line 131 “These transitions are related to network state changes which drive spontaneous reversals during foraging in freely moving worms. Immobilization and anesthetization, necessary for confocal imaging, distort certain aspects of these motor command sequences compared to freely moving worms executing the motor commands and receiving proprioceptive feedback. However, the intrinsic motor programs remain intact under these conditions.” (lines 131-136)

      Reviewer #2 (Public review):

      tax-4, but not osm-9 mutants were used in chemotaxis and imaging assays. It would have been nice to have osm-9 results as well for these assays. The mutants are not specific to AWC and ASH. Cell-specific rescue of these neurons would have strengthened the proposed model.

      Osm-9 data are now included in the chemotaxis assays (Fig. 4E).

      Cell-specific HisCl data are now included for ASH and AWC (Fig. 4F, G, 5D), confirming our proposed model.

      Limited tax-4 data were included in the imaging (Fig. 6), but unfortunately, NeuroPAL imaging in tax-4 has proven to be technically difficult. NeuroPAL images in the tax-4 background appear different, perhaps because of developmental effects on gene expression due to the lack of sensory input (recall that the NeuroPAL color scheme is based on the relative expression levels of 40+ neuronal promoters). Inactivation of individual sensory neurons using HisCl1 or other transgenes may be the simpler approach.

      The Results and Discussion have been significantly rewritten to incorporate these new data

      We are currently working on a comprehensive study of the sensory inputs into locomotory decision making in the context of chemosensation, which we expect to reveal roles of other neurons besides ASH and AWC and provide a fuller picture of the complexities of this system.

      Reviewer #3 (Public review):

      (1) It is not clear precisely how important AWC is (compared to other cells) for the attractive response, though the presence of odor-off behavior implicates it. This could be resolved by looking at additional mutants (tax-4 is broad).

      We have addressed this concern using transgenically-expressed HisCl1 which has demonstrated a clear role for AWC in overall chemotaxis and locomotory decision making upon encountering the 1-oct/buffer interface in microfluidics devices (Fig. 4F, G, 5D).

      (2) Relatedly, dose-dependent chemotaxis data (Figure 4C, D) should be provided for osm-9 animals to get a sense of the degree to which dose-dependence is explained by ASH.

      Osm-9 data now included (Fig. 4E)

      The Results and Discussion have been significantly rewritten to incorporate these new data

      (3) Figure 4A, B should include average traces with errors, as there are several ways the responses can vary across conditions.

      Averaged traces with error bars now shown (Fig. 4A, B)

      (4) The data in Figure 6G does not appear to have error bars.

      Error bars now shown for 6G

      Also, it would help to include a more conventional demonstration of AIB responding to stimuli (e.g. averaging stimulus-aligned responses as a percent of the fluorescence value at stimulus onset to perform the desired subtraction).

      Fig. 6G top panel shows the stimulus-aligned responses of AIB with no subtraction performed. The 6 sequential stimulations are shown as a single continuous trace, consistent with the experimental protocol utilized. Averaging was performed across the 12 individuals of the sample set. However, we did not calculate the average of responses within a dataset (i.e. first plus second plus third etc.) to avoid obscuring any sensitization/desensitization that might be occurring with multiple stimuli.

      Subtracted calcium traces are harder to interpret. As it stands, the evidence that sensory signals are persisting in AIB and not being shunted by proprioceptive feedback in microfluidic devices is not strong.

      Addressing the point about proprioceptive feedback in microfluidics devices, the following sentence was added in the Results section: “Immobilization distorts certain aspects of these motor command sequences compared to freely moving worms executing the motor commands and receiving proprioceptive feedback, but the intrinsic motor programs remain intact.” (lines 131-136).

      To add context for the AIB-AVA subtraction, Ray and Gordus 2025 (Current Biology 35:5534) recently demonstrated that AIB activity can be modeled as the additive convolution of AVA, AWC, and AIA activity, lending validity to our subtractive approach. In their study, AVA was the major contributor, but addition of AWC and AIA signals (i.e. sensory inputs) resulted in a significant greater accuracy. We have now mentioned their work in the manuscript: “To address this possibility, we subtracted AVA activity, representing the motor state, from the AIB activity (AVA closely mirrors RIM), based on the observation that AIB activity can be modeled as the sum of convolutions of motor activity and sensory activity.” (lines 360-363)

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 1: The number of replicates (n) is missing.

      In Fig. 1D, only a single trial is shown as a representative example rather than averages, which would necessitate error bars. The Results and Figure Legend text has been updated to clarify this, and the average CI is now included in the first Results section (lines 111, 976)

      Figure 4: The sample size (n = 3-5) is relatively small, which may limit the statistical power.

      Sample size was increased to 5 for all data points shown on the new graph (Fig. 4E and noted in the figure legend (line 1019)

      Figure 4: The 0.22 mM concentration significantly affects both AWC and ASH. It is also unclear whether this concentration also affects other neurons, such as AWB, ADL, and AWA.

      We have not performed exhaustive analysis of other neurons in these datasets. These analyses are difficult and time consuming, so we have opted to present a dataset which supports our hypothesis that multiple afferent pathways of opposite valence act in a balanced way to drive chemotaxis. We are currently performing an in-depth analysis of the sensory inputs into the circuit, which we expect to present in a future study

      Reviewer #2 (Recommendations for the authors):

      The tax-4 and osm-9 experiments are great, but I recommend clarifying that tax-4 and osm-9 are expressed in other neurons as well. The text gives the impression that these mutants are specific to AWC and ASH, respectively. The authors should note these caveats.

      This concern is thoroughly addressed in the descriptions and rationale presented for the use of ASH and AWC HisCl strains.

      The authors should also provide the code used to interpret their results.

      Code will be provided through Zenodo.org

      Reviewer #3 (Recommendations for the authors):

      It would help to clarify (early on) the degree to which you are attributing responses to particular cells (e.g. AWC) as opposed to a class of cells with AWC as an example.

      This concern is thoroughly addressed in the descriptions and rationale presented for the use of ASH and AWC HisCl strains.

      The NeuroPAL imaging and analysis (especially Figures 3D, E) is a bit distracting and appears non-essential. If possible, it would help to combine Figures 2 and 3 with a focus on panels 3ABC to streamline the narrative.

      We would prefer to keep the present format so the reader can appreciate the power of the whole-brain approach for analyzing network activity and behavioral outputs in the context of sensory-motor responses. Specifically, our insight that attractive and aversive afferent inputs were activated simultaneously was wholly dependent on this approach. Otherwise, there would have been little to no reason for examining AWC activity at aversive 1-oct concentrations, which was essentially the foundation of the study.

      To highlight this point, we have added the following sentence in the Discussion: “This novel insight highlights the value of the whole-brain approach (enabled by the NeuroPAL system) for studying the network dynamics underlying sensory driven behaviors.” Lines 431-433.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors test the hypotheses, using an effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      Strengths:

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      Weaknesses:

      There is some concern about the fact that participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. The concern is that participants worked with less vigor during self-versus-others trials and this may partly account for a key two-way Recipient x Effort interaction on the size of the Reward Positivity EEG component. Of note, participants took longer to complete tasks when working for others. While it is true that, in all cases, participants met the requisite task demands (they pressed the required number of buttons) they did so more sluggishly when earning rewards for others. The Authors argue that this reflects less motivation when working for others, which is a plausible explanation. The Authors also try to rule out this diminished vigor as a confounding explanation by showing that the two way interaction remains even when including reaction times (and also self-reported task liking) as a covariate. Nevertheless, it is possible that covariates do not fully account for the effects of differential motivation levels which would otherwise explain the two-way interaction. As such, I think a caveat is warranted regarding this particular result.

      We thank Reviewer #1 for the continued positive assessment and for continuing to highlight the caveat regarding the potential influence of differential vigor on the observed RewP interaction effects.

      We agree that a caveat is warranted. As detailed in our previous response (R5), we had already conducted control analyses addressing this concern; however, we acknowledge that these results were not incorporated into the manuscript itself. We have now addressed this by adding the covariate analyses to the Result section, along with an explicit caveat in the Discussion.

      Before describing the specific revisions, we would like to offer a minor clarification: the covariates in our control analyses were trial-by-trial response speed and self-reported effort ratings, rather than task liking ratings as noted in the summary above. Neither response speed nor effort rating predicted RewP amplitudes, and the critical Recipient × Effort and Recipient × Effort × Magnitude interactions remained significant and essentially unchanged. However, as the reviewer rightly pointed out, covariates may not fully capture the effects of differential motivation. Specifically, we have made the following revisions:

      First, we added the covariate control analyses to the Result section: “To rule out the possibility that the differential vigor between self- and other-benefiting trials drove the Recipient × Effort and Recipient × Effort × Magnitude interactions on the RewP, we conducted two control analyses by including trial-by-trial response speed and subjective effort ratings as separate covariates in the RewP model. Neither response speed (b = -0.07, p = .641) nor effort rating (b = 0.10, p = .186) predicted RewP amplitudes, and the critical Recipient × Effort and Recipient × Effort × Magnitude interactions remained significant and essentially unchanged (see Supplementary Table S3 for full regression estimates)” (page 12, para. 1).

      Second, we added a caveat to the Discussion section acknowledging this alterative explanation, which reads, “Another concern is that participants exhibited less vigor when working for others, as indicated by slower response speed and lower subjective effort ratings for other- versus self-benefiting trials. Although our control analyses confirmed that neither covariate predicted RewP amplitudes and the critical interactions remained significant, covariates may not fully capture the effects of differential motivation, and this alternative explanation cannot be entirely ruled out” (page 22, para. 2, lines 9–12; page 23, para. 1).

      Reviewer #2 (Public review):

      Summary:

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      Strengths:

      An important strength of the study is that amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

      We sincerely appreciate Reviewer #2’s positive evaluation of our manuscript and thank the reviewer for recognizing the strength of our experimental design and analysis approach.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a wonderful and landmark study in the field of human embryo modeling. It uses patterned human gastruloids and conducts a functional screen on neural tube closure, and identifies positive and negative regulators, and defines the epistasis among them.

      Strengths:

      The above was achieved following optimization of the micro-pattern-based gastruloid protocol to achieve high efficiency, and then optimized to conduct and deliver CRISPRi without disrupting the protocol. This is a technical tour de force as well as one of the first studies to reveal new knowledge on human development through embryo models, which has not been done before.

      The manuscript is very solid and well-written. The figures are clear, elegant, and meaningful. The conclusions are fully supported by the data shown. The methods are well-detailed, which is very important for such a study.

      Thank you for this feedback! We are excited for the possibilities of this method to discover genes required for various morphogenetic processes associated with human embryonic development.

      Weaknesses:

      This reviewer did not identify any meaningful, major, or minor caveats that need addressing or correcting.

      A minor weakness is that one can never find out if the findings in human embryo models can be in vitro revalidated in humans in vivo. This is for obvious and justified ethical reasons. However, the authors acknowledge this point in the section of the manuscript detailing the limitations of their study.

      Reviewer #2 (Public review):

      Summary:

      This manuscript is a technical report on a new model of early neurogenesis, coupled to a novel platform for genetic screens. The model is more faithful than others published to date, and the screening platform is an advance over existing ones in terms of speed and throughput.

      Thank you for this feedback! We agree that the robust symmetry breaking observed in our model, the comparisons to the human embryo in our cell type analysis, and the ability to conduct large-scale genetic screens represent advancements in the modeling of human neural tube closure that may be built upon in the future.

      Strengths:

      It is novel and useful.

      Weaknesses:

      The novelty of the results is limited in terms of biology, mainly a proof of concept of the platform and a very good demonstration of the hierarchical interactions of the top regulators of GRNs.

      The value of the manuscript could be enhanced in two ways:

      (1) by showing its versatility and transforming the level of neural tube to midbrain and hindbrain, and looking at the transcriptional hierarchies there.

      We thank the reviewer for this valuable suggestion and will keep this in mind for future work. As accurate answers to this question would require the development of robust midbrain and hindbrain organoid models, we believe that this question is outside the scope of the present work.

      (2) by relating the patterning of the organoids to the situation in vivo, in particular with the information in reference 49. The authors make a statement "To compare our findings with in vivo gene expression patterns, we applied the same approach to published scRNA-seq data from 4-week-old human embryos at the neurula stage" but it would be good to have a more nuanced reference: what stage, what genes are missing, what do they add to the information in that reference?

      We agree that a more comprehensive comparison of in vitro and in vivo data would add value to the study. We have added an analysis of the human Week 3 data, as neurulation occurs between Weeks 3 and 4 of human embryogenesis (new Figure 1F). We see our in vitro cell types in both datasets. We also included volcano plots in our supplementary figure to show major differences in gene expression (new Figure S1G). Somewhat surprisingly, embryo samples show higher expression of hemoglobin subunits and other hypoxia-related genes than organoids do, which may indicate hypoxic stress during sample handling during ex vivo experimentation (Schelshortn, et al., 2008) or alternatively, reflect differences in the metabolic environment between embryos and organoids. We did not find any differences would have affected our transcription factor candidate selection.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers were very enthusiastic about the work and provided suggestions for textual changes that will clarify the figures, methods, and results for readers.

      Reviewer #2 (Recommendations for the authors):

      (1) In Figure 1:

      (a) What is the orientation of the images in 1C?

      We have specified in the text and figure legend that this is a top-down view of an outer organoid.

      In this panel, what is the problem with ZO-1 in D4?

      We believe this is non-specific staining of dead cells that shed into the lumen during folding and closure. We have added this interpretation to the figure legend and added two supplementary time lapse videos (new Supplementary Video 1 and new Supplementary Video 2) of organoid closure that show dead cells being shed into the lumen as support to this interpretation.

      (b) What is the three-dimensional organization of these structures, if any? Or are they two-dimensional? In a way, this also refers to 1C.

      We have clarified in the text and figure legend that these organoids are three dimensional, and that Fig. 1B-C are top-down views.

      (c) Why can't we see FOXG1 amidst the markers forebrain? This is a very characteristic one.

      We see sparse FOXG1 expression in the human embryo samples at Week 4 (new Figure 1F), which may indicate that FOXG1 expression is upregulated later in the human embryo, after neural tube closure. We do see high levels of other fore brain associated transcription factors by this time however, including OTX2, LHX2, and SIX3.

      (d) The Figure 1 legend needs to be clear about the issues raised here.

      We have updated the Figure 1 legend to address these points.

      (2) Figure 2, could they explain in the text better how they organize the ML gene expression? What are their criteria?

      We thank the reviewer for catching this critical omission. We have added details of our medio lateral axis generation to the Methods section under “Single cell RNA sequencing analysis.”

      (3) Explain how and why the 77 genes were picked up?

      We have clarified at our first mention of 77 genes that this is a subset of our original 78 candidate genes, which were selected as described in the text (last paragraph in the results section “Identifying transcription factor candidates for regulation of anterior neurulation”. We have added a line in the Methods section that we were unable to clone a functional guide plasmid against one our candidates (NR6A1).

      (4) The authors mention the value of the geometry and the mechanics in neural tube closure, but they make no attempt to unravel these inputs, or at least the genes, from their screen, associated with them.

      We have rewritten this discussion of the literature to emphasize the active role of the neural ectoderm compared to the surface ectoderm, in order to justify the genetic analysis of the neural ectoderm rather than the surface ectoderm. We have clarified that our goal is to find upstream developmental drivers (transcription factors) of folding and closure, rather than investigate mechanical mechanisms of this process.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Liao et al. present SCOPE (Spatial reConstruction via Oligonucleotide Proximity Encoding), a method for reconstructing spatial organization from diffusion-defined DNA barcode interactions without the use of optical imaging. In SCOPE, hydrogel beads bearing unique DNA barcodes contain both "sender" and "receiver" oligonucleotides. Upon enzymatic release, sender oligos diffuse locally and hybridize to receiver oligos on neighboring beads, forming chimeric molecules that encode spatial proximity. Sequencing these products yields an interaction matrix, which is then used to reconstruct a spatial coordinate map.

      The authors demonstrate reconstruction of synthetic two-dimensional shapes, a large multicolor Snellen eye chart, and the interior surface of three-dimensional molds. The work expands the conceptual and experimental landscape of optics-free spatial sequencing.

      Thank you for this accurate summary of the work.

      Strengths:

      SCOPE employs bidirectional sender and receiver oligonucleotides on every bead, rather than using asymmetric transmitter-receiver architectures found in other diffusion-based methods. The symmetric design may improve detection sensitivity and reconstruction strategies, and represents a meaningful variation on optics-free spatial encoding.

      A notable strength of this study is the physical scale achieved. The authors reconstruct a Snellen chart spanning approximately 704 mm² and demonstrate molded 3D structures on the order of 75-100 mm³. Although some larger-scale warping is evident, and is discussed as potentially due to non-uniform diffusion, the relative local positioning across these large areas appears impressively accurate.

      The authors extend reconstruction beyond two-dimensional arrays to three-dimensional molded surfaces. This demonstrates that the assay and the computational methods for interpreting proximity graphs can support non-planar spatial relationships, expanding the scope of optics-free spatial inference.

      Thank you for highlighting these strengths of SCOPE.

      Weaknesses:

      Although the method is discussed in the context of spatial genomics and potential tissue applications, it is currently demonstrated only on engineered two-dimensional bead arrays and three-dimensional shapes fabricated in molds. It remains unclear how SCOPE would perform in heterogeneous biological environments, where diffusion may exhibit additional non-uniformities. A biological proof-of-concept, even limited in scope, would help define the method's strengths and limitations more clearly.

      We concur with the reviewer that a biological proof-of-concept is a key next step, and that diffusion will be more heterogeneous in this more complex environment. To this end, we are actively working to further develop SCOPE for use in tissue sections, with the goal of capturing transcriptomes, accessible chromatin, and genomes. As part of this work, we also hope to systematically explore a range of tissue permeabilization and tissue clearing approaches to mitigate the impact of heterogeneity on performance.

      The reconstruction of three-dimensional structures lacks strong sampling from volume interiors. This is speculated to be due to several possible factors; however, this limitation constrains the method to reconstruction of volume surfaces rather than comprehensive three-dimensional profiling.

      Thank you for highlighting this important limitation. The 3D reconstructions are indeed constrained by under sampling of volume interiors. We anticipate that this might be addressed via relatively minor adjustments to the protocol, e.g. using light or base-labile linkers to trigger oligo release, with the expectation that this will improve reaction consistency throughout the volume. However, even if we are unable to resolve this issue, we note that surface-resolved reconstructions may be useful for some goals, e.g. embedding a bead-packed gel within a tissue lumen, such as the gut. This could enable surface beads to capture RNA transcripts from adjacent cells, while bead–bead associations serve to define the surface topology.

      The reconstruction workflow involves multiple preprocessing steps and embedding choices. While these appear to work well for synthetic shapes with known geometry, it is less clear how parameter choices would be made in contexts where ground truth is unknown. Clarifying how reconstruction robustness is assessed without prior knowledge of spatial structure would help readers understand how the method could be practically deployed, particularly in more heterogeneous tissue contexts.

      Thank you for the opportunity to clarify. The computational pipeline used for 2D SCOPE reconstruction is designed to operate on a standardized input format and can be applied to arbitrary datasets without prior knowledge of spatial structure. For example, as shown in Figure 3, both the circle and “swoosh” geometries were reconstructed using the same algorithm and identical initial parameters. While certain hyper parameters are pre-specified (e.g. the number of k-nearest neighbors used to compute the pairwise distance matrix for UMAP), these are fixed across datasets. Other parameters, such as UMAP’s “min_dist,” are selected via an automated heuristic grid search that proceeds without user intervention. The agreement with ground truth in these controlled settings, together with the reproducibility of stochastic reconstructions (see Figure 3E-F), supports the robustness of the approach.

      Importantly, there was one exception. Reconstruction of the Snellen eye chart dataset required a manual step, involving an initial 3D UMAP embedding followed by a 2D projection to “flatten” the result. We suspect this reflects radial non-uniformities in sender/receiver oligo diffusion at larger spatial scales. Addressing such confounders algorithmically by explicitly modeling diffusion heterogeneity represents an important area for future work, with the goal of entirely eliminating the need for manual intervention.

      Finally, we note that these benchmark shapes represent somewhat contrived examples, and the geometries encountered in practice may often be much less complex. For example, in conventional spatial genomics, the geometry consists of a bead monolayer forming a flat, regular surface on a rectangular slide of known dimensions. Regardless of the tissue architecture overlaid on this surface, the reconstruction problem is defined by the bead monolayer itself, inferred through sender-receiver interactions.

      References

      Qian N, Li J, Yasser R, Yu M, Weinstein JA. 2026. Volumetric DNA microscopy for mapping spatial transcriptomes in three dimensions. Nat Protoc. doi:10.1038/s41596-025-01329-3

      Qian N, Weinstein JA. 2025. Spatial transcriptomic imaging of an intact organism using volumetric DNA microscopy. Nat Biotechnol 1–11.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates, through a series of EEG and MEG experiments, that the human brain automatically categorizes words from alphabetic and non-alphabetic languages, and it unpacks the neural mechanisms of this process from multiple angles. The work examines not only univariate repetition-suppression (RS) effects, but also how repeating or alternating languages influences the representational similarity of words within and across language categories.

      Strengths:

      The univariate RS effects across multiple experiments lend support to some of the main conclusions

      Weaknesses:

      I have reservations about the logic underlying the multivariate analyses, and I believe the implications of the control experiments merit fuller discussion.

      (1) Question 1: Logic of the multivariate analyses

      The original text states:

      "The processing of intra-language similarity was quantified as correlation distances between neural responses to two words of the same language, which occurred more frequently and would be inhibited in the Rep-Cond (vs. Alt-Cond) due to habituation (Fig. 1c)...".

      I argue that this passage conflates two levels. Building a representational dissimilarity matrix (RDM) is a data-analysis step; it cannot be equated with a cognitive computation. Hence, there is no sense in which this computation occurs "more frequently" in one condition. RDM construction rests on the pairwise similarity of activity patterns, so even if a task engaged no cognitive computation of representational similarity, we could still compute an RDM. Conversely, if a task factor alters the RDM, we must explain how that factor changes the underlying neural patterns, not claim that it triggers specific cognitive processing. Therefore, I neither understand what "more frequent processing" the authors refer to, nor accept their account of the multivariate results.

      The multivariate result pattern, briefly, is that distances between words, both within and across languages, are larger under the repetition condition. One plausible interpretation is that a word representation comprises two parts: language-type (alphabetic vs. non-alphabetic) and fine-grained identity features (visual shape, orthography, semantics, phonology, etc.). Repetition of language type may, via RS, reduce the weight of the first component, thereby increasing the relative contribution of fine-grained features and amplifying inter-word differences. This could explain the multivariate findings.

      Thank you for these insightful comments regarding the logic of the multivariate analyses. In the revision, we will clarify that the multivariate analyses were conducted to assess correlation distances between neural responses to pairs of words, either within the same language or across different languages. The processing of intra-language similarity was assessed rather than defined by conducting the multivariate analyses. We will further elaborate the rationale underlying our experimental design, specifically why the processing of intra-language similarity is expected to occur more frequently in the repetition condition (Rep-Cond) than in the alternation condition (Alt-Cond).

      We also appreciate the alternative account of the observed neural repetition suppression (RS) effects in terms of language-type versus fine-grained identity feature processing. This perspective will be incorporated into the revised Discussion. In particular, we will outline the patterns of neural activity predicted by an account that assumes an increasing contribution of fine-grained features, and evaluate the extent to which our findings are consistent with these predictions.

      (2) Question 2:

      For unlearned languages, people cannot distinguish lexical from sub-lexical levels. What, then, determines (i) the RS-effect difference between letters and radicals in familiar languages and words in unlearned ones, and (ii) the similarity of repetition effects between words in unlearned and familiar languages? An explicit account is needed.

      Thank you for this helpful suggestion. In the revised manuscript, we will include a dedicated paragraph addressing these two issues. Specifically, we will provide a more precise account of the differences in repetition suppression (RS) effects between letters and radicals in familiar languages, as well as the similar RS effects observed for unlearned and familiar languages. These additions will help clarify the interpretation of the neural RS effects associated with visual word processing and strengthen the theoretical implications of our findings.

      Reviewer #2 (Public review):

      Summary:

      This study investigates how the human brain categorizes visual words from distinct writing systems (alphabetic vs. non-alphabetic) as a neural basis for the social-categorization function of language. Using a repetition suppression paradigm combined with electroencephalography and magnetoencephalography, the authors conducted nine experiments with independent participants to identify the neural network underlying language-based categorization, characterize its temporal dynamics, and test whether this process operates independently of linguistic properties such as semantic meaning and pronunciation.

      Strengths:

      (1) The study employs a well-validated design with clear control conditions and systematically manipulates key variables, including writing system, language familiarity, and native language background. The use of nine experiments with independent participant samples strengthens the reliability and replicability of the results.

      (2) The work combines EEG and MEG, cross-validating findings across imaging modalities to support the reported neural effects. A combination of univariate, multivariate, and connectivity analyses is used to characterize neural responses and network interactions.

      (3) Results are consistent across multiple language groups and for both familiar and unfamiliar languages, supporting the generalizability of the identified neural mechanism beyond specific languages or prior experience.

      Weaknesses:

      The authors provide compelling evidence that the identified neural network supports the categorization of words by language, including computations of intra-language similarity and inter-language difference. However, the conceptual framing of this finding as directly reflecting the social-categorization function of language may be premature. While the task captures spontaneous language categorization, it does not involve social evaluation or intergroup processes. The connection to social categorization is inferred from prior literature rather than demonstrated within the current experimental design. Clarifying this distinction would strengthen the conceptual precision of the manuscript.

      Thank you for raising this important point. In the revised Discussion, we will include an additional paragraph to clarify several related issues. First, prior research suggests that language can serve as a socially relevant category cue. Second, these findings imply that rapid categorization of words by language may occur in the human brain. Third, our results identify a neural network supporting such rapid language-based categorization but do not directly test how this process relates to social categorization. Highlighting these points will help delineate the scope of our findings and point to important directions for future research.

      We'll work on a revision of the manuscript and will submit the revision when it's ready.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) One may be careful in interpreting the comparison between MCF10a and Beas2b cells as used in this study. The conditions may not necessarily be representative of the actual properties of breast and bronchial epithelia. How much of the epithelial organization is reconstituted under these experimental conditions remains to be established. This is particularly obvious for bronchial cells, which would need quite specific culture conditions to build a proper bronchial layer. In this study, they seemed to be on the verge of a mesenchymal phenotype (large gaps, huge protrusions, cells growing on top of each other, as mentioned in the manuscript).

      We thank the reviewer for this important point. We agree that our experimental conditions do not fully recapitulate the in vivo architecture of either breast or bronchial epithelia. As the reviewer points out, the two cell lines need typical culture conditions to grow in an in-vivo like architecture, such as acinar structures for mammary tissue, and a pseudostratified architecture for the bronchial tissue, and it certainly would be interesting to subject the cell lines in these organotypic architectures and study the fate of oncogenic mutant cells. However, this would be an independent study on its own and is out of the scope of the current manuscript. Here, we intend to compare these two well-established epithelial lines from mammary and bronchial epithelial tissues, with distinct intrinsic mechanical and organisational properties, in minimal culture conditions, and study how just the context of having two different sources of epithelial cells can change the fate of oncogenic cells present in the wild-type population. We have now also performed experiments with the MDCK cell line, which is not like the BEAS2B line, and has well-defined cell-cell adhesions [Supplementary figure. 4a], and epithelial morphology, and shown that the fate of HRasV<sup>12</sup> mutants is different here as well, as compared to the MCF10A cell line.

      (2) As an alternative to Beas2b, comparison of MCF10a with another cell line capable of more robust in vitro epithelial organization, but ideally with different adhesive and/or tensile properties, would be highly interesting, as it may narrow down the parameters involved in the segregation of oncogenic cells.

      We agree with the reviewer and in line with this suggestion, we have repeated the key experiments using Madin-Darby Canine Kidney (MDCK) cells, a well-established model epithelial cell line. Our results show that even though MDCK cells show significantly distinct properties compared to BEAS2B cells (MDCK being more epithelial like than BEAS2B), the dynamics of the HRasV<sup>12</sup> clusters in both these systems are similar [Supplementary figure. 4b], and distinctly different from the mammary epithelial cells (MCF10A). We did not observe the formation of an actin belt around HRasV<sup>12</sup> clusters in MDCK monolayers, which indeed forms in MCF10A monolayers. Additionally, in MDCK cells, the HRasV<sup>12</sup> mutant clusters are not under compaction or jamming, instead, they form protrusions similar to the ones seen in BEAS2B monolayers. These results solidify our hypothesis of tissue-specific differences in the mechanics of cancer initiation.

      (3) While the seminal description of tissue properties based on interfacial tensions (Brodland 2002) is clearly key to interpreting these data, the actual "Differential Interfacial Tension Hypothesis" poses that segregation results from global differences, i.e., juxtaposition of two tissues displaying different intrinsic tensions. On the contrary, the results of the present work support a different scenario, where what counts is the actual difference in tension ALONG the tissue boundary, in other words, that segregation is driven by high HETEROTYPIC interfacial tension. This is an important distinction that should be clarified.

      We thank the reviewer for this insightful comment. As correctly noted, Brodland’s 2002 work provided a foundational formulation of the Differential Interfacial Tension Hypothesis (DITH), which frames tissue organization in terms of effective interfacial tensions.

      While in its original form, DITH emphasised segregation as a consequence of global differences in the intrinsic (bulk) tensions of juxtaposed tissues, our results specifically show that segregation is determined by local interfacial mechanics between transformed- and host cells. These local interfacial dynamics, however, is related to global contractility of cells- From our experiments with blebbistatin, we have observed a loss in the efficiency of segregation upon reducing global contractility, consequently inhibiting the formation of the interfacial actomyosin belt, which serves as the source of the interfacial tension between healthy and mutant populations. Therefore, the differences in local interfacial mechanics stem from intrinsic global contractility of cells in discussion here.

      We have also clarified this distinction more clearly in the discussion and have explicitly stated that while DITH provided the foundation for conceptualizing tissue mechanics, our findings on transformed cell- healthy cell interactions specifically demonstrate that a higher efficiency of segregation is driven by high heterotypic interfacial tension at the tissue boundary.

      (4) Related: The fact that actomyosin accumulates at the heterotypic interface is key here. It would be quite informative to better document the pattern of this accumulation, which is not clear enough from the images of the current manuscript: Are we talking about the actual interface between mutant and wt cells (membrane/cortex of heterotypic contacts)? Or is it more globally overactivated in the whole cell layer along the border? Some better images and some quantification would help.

      We agree that a detailed visualisation of actomyosin distribution would strengthen our conclusions. We have now added a few more images of the interface to the Supplementary Data [Supplementary figure. 5], which show that cortical actin accumulates in individual cells, at the wild type cell-mutant cell interface, and actin levels go up in both wild type and mutant populations at the interface. This is also clear from the quantifications of different region of interests [Figure 2e], which is done by segmenting individual cells in these regions and quantifying actin intensity in each cell.

      (5) In the case of Beas2b cells, mutant cells show higher actin than wt cells, while actin is, on the contrary, lower in mutant MCF10a cells (Author response image 2). Has this been taken into account in the model? It may be in line with the idea that HRas may have a different action on the two cell types, a possibility that would certainly be worth considering and discussing.

      We thank the reviewer for raising this important point. While a direct experimental dissection of how HRasV<sup>12</sup> mutation affects actin levels in BEAS2B and MCF10A cells individually is beyond the scope of the present study, we do not rule out the possibility that a HRasV<sup>12</sup> mutation may exert cell-type-specific biochemical effects on actin regulation in these two epithelial systems.

      Although the difference in actin between the mutants and the wild-type cells has not been incorporated into the model presented in the manuscript, we have now shown how actin levels change in response to the interfacial tension formed between the mutant and wildtype cells by adding a mechanochemical feedback to the model. Rather than prescribing intrinsic differences in actin levels between mutant and wild-type cells, we asked whether the feedback between the actin cytoskeleton and mechanical stress alone is sufficient to generate the observed actin reorganization. To address this, we incorporate a mechanochemical feedback loop (MCFL-I), originally developed in our earlier work [35], into the vertex model framework. This feedback captures the experimentally observed coupling between cell shape, actomyosin organization, and mechanical stress (i.e., heterotypic interfacial tension), and has previously been shown to reproduce biologically realistic epithelial behaviours such as dynamic cell shapes and heterogeneous actomyosin distributions [35].

      In this framework, actin is not introduced as an explicit or intrinsic variable. Instead, changes in actomyosin organization emerge dynamically in response to mechanical stresses. Specifically, MCFL-I allows the preferred area and preferred perimeter of cells to evolve depending on cell shape and actomyosin binding, rather than remaining fixed. From these evolving parameters, we compute the normalized contractility, , which we interpret as a proxy for bulk actin, and normalized line tension which we interpret as a proxy for junctional actin. These normalized quantities provide size-independent measures of actomyosin organization across the tissue. 

      The equations for MCFL-I can be written as:

      Thus, with MCFLs, the vertex model does not have fixed 𝐴<sub>0</sub> and 𝑃<sub>0</sub>. The cells dynamically change these parameters depending on the vertex model dynamics. The constitutive relations for the and are given below [1]:

      Here, is the fraction of myosin bound to actin as a function of cell area 𝐴. This nonlinear dependence arises from the load or strain-dependent binding of myosin to actin, and is a model parameter which is proportional to the binding affinity of myosin to actin in the absence of any strain. We consider to the be the same for both mutant and wild-type . Importantly, both mutant and wild-type cells obey identical mechanochemical rules in the model. Differences in actin organization arise solely due to differences in mechanical stress generated by differential interfacial tension. Positive differential interfacial tension compresses mutant cells within clusters. This will lead to different and P<sub>0>/sub> across the monolayer via MCFL-I, and thus reduced bulk actin and increased junctional actin [Appendix figure. 4], consistent with experimental observations. Conversely, when differential interfacial tension is weak or negative, mutant and wild-type cells experience similar stresses, and the model predicts minimal differences in actin organization [Appendix figure. 5].

      Thus, while HRasV<sup>12</sup>-dependent biochemical effects may indeed differ between BEAS2B and MCF10A cells, our results demonstrate that mechanical interactions at mutant– wild-type interfaces are sufficient to generate distinct actin signatures in the two tissues, without invoking cell-type-specific actin regulation. We have added the details of the mechanochemical feedback loop in the model to the Appendix to emphasize that the model tests the sufficiency of mechanics-driven actin reorganization rather than excluding additional biochemical contributions. 

      Although it looks that even for Λ > 0 we see that the normalized line tension seems to be negative. This is however just an artefact of the colorbar limits we have used to compare with the Λ < 0 case. If we plot with different colorbar limits, we see that the interface has as shown in Author response image 1.

      Author response image 1.

      Reviewer #2 (Public review):

      (1) It is unclear what the mechanistic origin of the shape-tension coupling is, which is used in the vertex model, and how important that coupling is for the presented results. The authors claim that the shape-tension coupling is due to the anisotropic distribution of stress fibers when cells are under external stress. It is unclear why the stress fibers should affect an effective line tension on the cell boundaries and why the stress fibers should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched. Similar stress fibers form when the cytoskeleton or polymer networks are stretched. It is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure, and stress fibers would not form. The authors should better justify the use of the shape-tension coupling in the model and also present simulation results without that coupling. I expect that most of the observed behavior is already captured by the differential tension, even if there is no shape-tension coupling.

      The reviewer is correct in stating that most of the observed behaviour is already captured by the differential tension, without the shape-tension coupling. However, the shape tension coupling has been used here in accordance with the experimental observation that the cells at the interface are aligned and elongated along the interface [Fig. 2h], which can not be captured without the shape-tension coupling. The difference between shape indices of cells at the interface and away from the boundary is plotted versus the interfacial tension in the case of no shape-tension coupling [Appendix figure 2]. The red dashed line represents the experimental value of the shape index difference. The blue line is the shape index difference between two randomly chosen groups of cells (half of the total number of cells in each group is taken). At zero line-tension, the difference in shape index between interface cells and cells away from the interface is same as that between randomly chosen groups of cells, which is expected since there should be no interface at zero line-tension. The no shape-tension data presented here are averaged over 19 seeds. Although the results without shape-tension coupling reaches experimental values at high enough differential tension [Appendix figure 3], a closer inspection of the simulation results show that the cells are just squeezed and are aligned perpendicular to the interface, which is contrary to what is seen in experiments [Fig. 2h].

      Calculating the average of the absolute value of the dot product of the nematic director and the interface edge for simulations with and without shape-tension coupling [Appendix figure 3] clearly shows that with shape-tension coupling, the cells align and elongate along the interface as is seen in experiment, given by an interface dot product value > 0.5 at high enough line-tension values. Further, shape-tension coupling or biased edge tension has been used before to model for cell elongation during embryo elongation [45] and here we use it as an active line-tension force, which elongates cells along the interface, in addition to the differential tension which is passive. This additional quantification of the alignment and elongation of cells along the interface will be added to the Appendix.

      (2) The observed difference of shape indices between the interfacial and bulk cells in simulations in the absence of differential line tension is concerning. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. For all presented simulation results, the authors should repeat multiple simulations and then present both averages and standard deviations. This way, it would be easier to determine whether the observed differences in simulations are statistically significant.

      The difference in shape indices between the interfacial and bulk cells in simulations has now been calculated over 11 different seed values. The observed differences in simulations, along with the standard deviations have been plotted in Figure 4b. This figure will be updated to include the standard deviations. The nonzero difference in shape index in the absence of differential line tension for low values of stress threshold is due to the shape-tension coupling acting even at low differential tension. Thus, a non-zero, sufficiently high value of the stress threshold is required in our model with shape-tension coupling. This has also been stated in section 4 of the paper. The importance of the shape-tension coupling has been stated in response to the previous point.

      (3) The authors should also analyze the cell line tension data in simulations and make a comparison with experiments.

      The line tension for each edge can be calculated as .

      Although the line tension distributions look similar to the ones obtained from Bayesian Force Inference, a better comparison is between the normalized line tension and actin seen in experiment as we have discussed under point (4) asked by Reviewer 1.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The authors claim that the negative tension Lambda<0 resembles the Beas2b phenotype. This is not consistent with the expression of actin in Figure 2f, which seems very similar in all four regions of interest (ROIs). Also, the segregation index data for Beas2b in Figure 1h looks very different from the demixing parameter in Figure 4f for the negative value of Lambda.

      In the model presented in the previous version of the manuscript, actin differences have not been incorporated. We have only added an interfacial line tension, which might arise only at the interface between cells. In response to comment (4) from Reviewer 1, we have considered a vertex model with mechanochemical feedback and interfacial line tension to understand how actin distribution in the tissue is affected by interfacial tension. The results presented match very well with experimental images.

      The reviewer has rightly pointed out that the segregation index (SI) data presented in Fig. 1h have a different trend compared to those in Fig. 4f. However, it is essential to note that in the simulation, the initial condition is one in which the mutant cluster is already fully segregated, and thus, at the initial time point. This is not the case in experiments, and at initial time points. Thus, the two plots are not directly comparable and only show how SI changes in our simulations. It is more effective to compare the final time points in Fig. 2f with those in Fig. 4e, where we observe that Mcf10a has a higher SI compared to Beas2b, and the case with Λ > 0 has a higher SI than the case with Λ < 0. This supports our claim that Λ < 0 resembles the Beas2b phenotype and Λ > 0 resembles the Mcf10a phenotype.

      (2) It is unclear how the threshold pressure Pi_0 is implemented for the shape-tension coupling in the vertex model. Is the value of the additional tension gamma_ij equal to 0 if the internal pressure is below that threshold?

      The stress threshold is implemented for the shape-tension in the vertex model in the following way. The line tension forces can be written as:

      where, and . If the stress on the cell is below the threshold, then for those cells.

      (3) In vertex model simulations, the authors use identical parameters for wild-type and mutant cells. This does not seem to be consistent with experimental observations in Figure 2, where the expression of actin is different, and also, cell shape indices are different for the wild-type and mutant cells. The authors should comment on how that choice affects their simulation results.

      We thank the reviewer for this comment. As noted in our response to comment 4 from  reviewer 1, we have now attempted our simulations after adding a mechanochemical feedback to the model. Here, both wild-type and mutant cells follow identical mechanochemical rules within the vertex model. This choice does not imply that the cells are mechanically identical in the tissue; rather, it allows us to test whether differences in cell shape and actin organization can emerge purely from mechanical interactions.

      By incorporating the mechanochemical feedback loop (MCFL-I), the model captures how heterotypic interfacial tension redistributes mechanical stresses between mutant and wild-type cells. These stresses lead to differences in cell area, perimeter, and shape, which are then translated via MCFL-I into distinct bulk and junctional actin signatures. Consequently, even though the intrinsic parameters are the same, the emergent mechanical environment reproduces the experimentally observed differences in actin intensity and cell shape indices (as shown in Figure 2).

      Thus, our approach demonstrates that the experimentally observed heterogeneity between mutant and wild-type cells can arise solely from interface-driven mechanical effects, without prescribing any cell-type-specific parameters in the model.

      (4) Also provide data for cell line tensions in the vertex model, which can then be compared with the experimental data in Figure 2. This is especially important because the differential cell line tension at the interface of mutants and wild-type cells seems to be playing a very important role.

      The cell tensions from the vertex model have been plotted in the response to main comment (3) from Reviewer 2. Since the interfacial tension has been included as an extra term in the vertex model by hand, it is not trivial to simply compare the line tensions from the vertex model to the experimental data. However, we can understand how the tensions are by looking at the normalised tension and normalised contractility plotted as a response to comment (4) from Reviewer 1. Those plots are from a vertex model with mechanochemical feedback and the plots match well with experimental actin images.

      (5) In Figure 2j, the authors should report the relative cell pressure and line tension for all four ROIs. The data is only shown for the wild-type cells and for mutants in clusters, even though the figure caption states that the data is presented for all four ROIs. It would also be useful to report the cell tension at the interface between the mutant cells and wild-type cells since this is the key parameter for the vertex model simulations.

      We agree and have updated the graph [Figure 2j].

      (6) The tangential motion of cells around oncogenic clusters only shows up towards the end of Supplementary Video 3. It is unclear whether this is a transient effect or whether this tangential motion would persist for a longer time.

      We thank the reviewer for raising this point. In our experiments, tangential cell motion in the wild type population along the boundary of oncogenic cluster consistently emerges as the oncogenic cluster becomes compacted. We have plotted tangential velocity in interfacial wild type cells over time (Supplementary Fig. 6b), and show that such a motion persist at the cluster-wild-type interface, until the end of time-lapse recordings in all cases. 

      (7) It is very awkward that the authors are representing an integral of the tangential velocity over different loops in Figures 3c and 4i. Thus, it is very hard to separate how much of the increase in the integrated velocity is due to larger loops and how much is due to changes in the average tangential velocity. Since different loops have different perimeters, it would have been better to report the average tangential velocity by dividing the integrated tangential velocity by the perimeter length of each loop. In the methods, the authors state that the concentric circles go from the center to a point twice the radius of the mutant cluster, but this is not consistent with the image in Figure 3c, where the concentric circles seem to go only to the boundary of the mutant cluster.

      We thank the reviewer for raising the point regarding the dependence of the loop-integrated tangential velocity on the perimeter length. While the circulation (loop-integrated tangential velocity) indeed scales with loop size, it increases with radius only if tangential velocity components are directionally coherent along the loop.

      In our data, concentric-loop analysis centered on mutant clusters reveals a systematic increase in tangential motion with radius, with the largest values occurring at the outermost loops corresponding to the cluster–tissue interface. In contrast, applying the identical analysis to randomly selected wild-type regions does not yield any monotonic increase with radius, despite the increasing perimeter of the loops, and instead shows fluctuations around zero. This control demonstrates that the observed increase around mutant clusters is not a trivial geometric consequence of larger loop size but reflects the emergence of coherent tangential motion specifically at the mutant cluster boundary.

      To further address the reviewer’s concern, we additionally computed the mean tangential velocity by normalizing the loop-integrated tangential velocity by the loop perimeter. As shown in Supplementary figure. 6a, this normalization preserves the same qualitative trend: tangential motion peaks near the periphery of mutant clusters, whereas no such trend is observed in wild-type regions. We therefore conclude that both metrics capture the same physical phenomenon: enhanced tangential cell motion localized to the mutant cluster boundary, consistent with the behavior observed in the time-lapse videos.

      Author response image 2.

      From simulation data

      (8) The authors should comment on how jamming and unjamming are related to shape indices because some readers may not be familiar with them.

      We have updated the same in the text of Results 2.

      (9) In the captions of Figure 3, the authors state that the bronchial epithelium gets kinetically arrested. This is not evident from the data in Figure 3d, where the velocity magnitude drops just a little bit for the bronchial epithelium, and it remains much higher compared to the mammary epithelium at long times.

      We agree with this comment, and that using the word, kinetically arrested, for Beas2b cells is misleading, since their motion is much higher, even after the initial drop. We have updated the text in the caption accordingly.

      (10) It is unclear why the authors have used the segregation index for analyzing experiments and the demixing parameter for analyzing simulations. Both parameters are trying to quantify the same thing, so it would have been better to use the same quantity for both experiments and simulations to enable easier comparison.

      We agree that using the same quantity for both experiments and simulation would enable easier comparison. Thus, we have replaced the demixing parameter with segregation index in Figure 4. 

      (11) It is unclear what experimental data were used for shape indices in Figure 4c. Was it the data from Mcf10a or Beas2b? It is also unclear which ROIs were used because different ROIs have very different shape indices in experiments, according to Figure 2e,f.

      We have used the experimental ∆(𝑆ℎ𝑎𝑝𝑒 𝑖𝑛𝑑𝑒𝑥) = 0.75, which is a rough estimate of the difference between the shape indices for ROI 2 (interface), and ROI 1, ROI 3 and ROI 4 (away from interface) from Fig. 2 e for MCFL10a. 

      (12) The authors find that the differences in shape indices are non-zero even for Lambda=0 for some threshold pressure parameters Pi_0 in Figure 4c. This should not happen because all the cells are identical in that case. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. How is this simulation data obtained? Is it from a single simulation, or is this averaged over a certain number of simulations? Authors should perform multiple simulations and report both the mean values and the standard deviation.

      We have addressed this in the response under main comments (1) and (2) from Reviewer 2.

      (13) It is unclear how the cell extrusion was simulated in the vertex model.

      Extrusion probability calculation: Simulations with just a single mutant cell were run for a range of differential interfacial line tension values (Λ = 0, 0.1, 0.4, 0.8, 1.2, 1.6) with shape tension coupling. The simulation was run till the area of the mutant cell fell below a threshold area = 0.1, after which we consider the mutant cell to be extruded. 9 different random initial seeds were run and analysed. Each seed gives a binary result – either extruded or not. This was used to calculate the extrusion probability. We have added this section to the Appendix.

      (14) The authors claim that HRas^V12 clusters in bronchial epithelium grew on top of one another, but it is not clear how this can be observed in Figure 2b or in any other Figure.

      We thank the reviewer for raising this point. Our original statement that cells were growing on top of each other was based on observations from the Z-stack images, which allowed us to resolve cell positions along the apico–basal axis. However, since these Zstack data are not included in the current manuscript, we agree that this claim cannot be directly supported by the figures shown. We have therefore removed this statement from the text and restricted our conclusions to what is directly supported by the presented data.

      (15) In the main text, the authors state that bronchial epithelial cells exhibited higher F-actin intensities compared to mammary bronchial cells, but this difference is not statistically significant according to Figure 5e.

      We agree with the reviewer and have thus changed the text because even though the Factin intensities seemed higher in bronchial epithelium visually, the difference was not statistically significant.

      (16) The definition of eccentricity is incorrect in the text. The authors state that the eccentricity is quantified as the ratio of the length of the minor axis to the major axis of an ellipse. According to this definition, the eccentricity would be 1 for a circle and not 0.

      We have updated the definition of eccentricity in the text to the correct one, including the correct equation.

      (17) It is unclear whether the active force F_act is used in the vertex model simulations. The active force is defined, but then its value is never specified. Note that the motility force is also an active force, so it is unclear why the motility and active forces were separated.

      In our model, the line tension force arising from the shape tension coupling is the active force. We agree that the motility force is also an active force, however, in the absence of any directional movement for instance, the homeostatic tissues in discussion here, we have discounted the role of motility force in our mode, presented here. 

      (18) The authors use inconsistent naming for different types of epithelia throughout the manuscript. Mcf10a cells are referred to as either mammary epithelium or breast epithelium, and Beas2b cells are referred to as either lung epithelium or bronchial epithelium. Because of the very broad spectrum of journal readers, it may not be obvious to all readers that different names refer to the same cell types.

      We have updated the text to keep the naming consistent throughout.

      (19) Many references to individual figure panels in the main text are incorrect. The authors should carefully check all the references to figures.

      We apologize for these errors. We have updated the incorrect references after carefully reviewing the entire manuscript.

      (20) In Figure 5, panel b is incorrectly labeled as d.

      We have corrected the same.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The aim of this work is to directly image collagen in tissue using a new MRI method with positive contrast. The work presents a new MRI method that allows very short, powerful radio frequency (RF) pulses and very short switching times between transmission and reception of radio frequency signals.

      Strengths:

      The experiments with and without the removal of 1H hydrogen, which is not firmly bound to collagen, on tissue samples from tendons and bones, are very well suited to prove the detection of direct hydrogen signals from collagen. The new method has great potential value in medicine, as it allows for better investigation of ageing processes and many degenerative diseases in which functional tissue is replaced by connective tissue (collagen).

      Weaknesses:

      It is clear that, due to the relatively long time intervals between RF excitation and signal readout, standard hardware in whole-body MRI systems can only be used to examine surrounding water and not hydrogen bound to collagen molecules.

      We agree that this is a regrettable situation (see also Discussion section). We are hoping that current and future efforts of MRI manufacturers towards improved hardware will eventually enable the technique for broader application.

      Reviewer #2 (Public review):

      Summary:

      This work presents direct magnetic resonance imaging (MRI) of collagen, which is not possible with conventional MRI or other tomographic imaging modalities.

      Strengths:

      The experimental work is impressive, and the presentation of results is clear and convincing. Through a series of thoughtfully prepared experiments, I found the evidence that the images reflect direct measurements of collagen to be highly compelling.

      Due to the technical demands, direct collagen imaging is unlikely to become widespread for routine clinical work, at least not anytime soon. That said, this work is nonetheless transformative and will likely be highly significant for research and perhaps clinical trials.

      Reviewer #3 (Public review):

      The paper is well written and well presented. The topic is important, and its significance is explained succinctly and accurately. I am only capable of reviewing the clinical aspects of this work, which is very largely technical in nature. Several clinical points are worth considering:

      (1) Tendons typically display large magic angle effects as a result of their highly ordered collagen structure (cortical bone much less so), and so it would have been of interest to know what orientation the tendons had to B 0 (in vitro and in vivo). This could affect the signal level at the longer echo time and thus the signal on the subtracted images.

      We have added arrows in the images showing the direction of the main magnetic field. For the in vivo case, the subject lay in the superman position, with B0 pointing from the hand towards the shoulder.

      (2) The in vivo transverse image looks about mid-forearm, where tendons are not prominent. A transverse image of the lower forearm, where there is an abundance of tendons, might have been preferable.

      We have added a distal view of the forearm, where more tendon structures are observed.

      (3) The in vivo images show the interosseous membrane as a high signal on both the shorter and longer TE images. The structure contains ordered collagen with fibres at different oblique angles to the radius and ulnar, and thus potentially to B 0. Collagen fibres may have been at an orientation towards the magic angle, and this may account for the high signal on the longer TE image and the low signal on the subtracted image.

      This is certainly an interesting take. While the magic angle effect is well established for collagen bound water, the orientation effects on the macromolecular collagen signal are still to be investigated. Our initial experiences so far suggest that the direct collagen signal is not as sensitive to orientation as the bound water.  

      Regarding the described observation for the interosseous membrane, we expect the high signal coming from collagen-bound water (yet not quite at the magic angle), which hardly decays between the two TEs, as their difference is small as compared to the T2* of this signal. Hence, this signal is removed in the subtraction image, and only the macromolecular collagen signal remains, which appears to be very low. Working with samples of the interosseus membrane may provide further insights into why this is the case.

      (4) Some of the signals attributed to the muscle may be from an attachment of the muscle to the aponeurosis.

      We have added the aponeurosis as a possible signal contributor in the muscle tissue.

      (5) There is significant collagen in subcutaneous tissues, so the designation "skin" may more correctly be "skin and subcutaneous tissue".

      We have updated the label accordingly.

      (6) Cortical bone is very heterogeneous, with boundaries between hard bone and soft tissue with significant susceptibility differences between the two across a small distance. This might be another mechanism for ultrashort T 2 * tissue values in addition to the presence of collagen. The two effects might be distinguished by also including a longer TE spin echo acquisition.

      Solid cortical bone may also have an ultrashort T 2 * in its own right.

      The described effect is clearly of importance for bone water but plays a negligible effect for the macromolecular signal. We would like to support this by a brief, coarse estimation. 𝑇<sub>2</sub>* can be approximated by 1/𝑇<sub>2</sub>* = 1/𝑇<sub>2</sub> + 1⁄𝑇<sub>2</sub>′, where 1⁄𝑇<sub>2</sub>′ \= 𝛾∆𝐵 = 𝛾∆𝜒𝐵<sub>0</sub> (Ref. 1).

      The susceptibilty difference reported for the interface between bone and water is ∆𝜒 = 2.5 ppm (Refs. 2 and 3), which at 3T leads to a 𝑇<sub>2</sub>′ ≈ 3000 𝜇𝑠. From our recorded FIDs, we use a 𝑇<sub>2</sub>* of 10 μs and thus obtain 𝑇<sub>2</sub> \= 10.03 𝜇𝑠.

      As can be seen, the change in the transverse relaxation constant due to susceptibility is negligible compared to the intrinsic decay of the macromolecular collagen signal. Notably, this is not the case for the pore water signal where T<sub>2</sub>s are on the order of milliseconds (Ref. 2).

      A footnote was added in the Introduction section regarding this topic.

      (7) It may be worth noting that in disease T 2 * may be increased. As a result, the subtraction image may make abnormal tissue less obvious than normal tissue. Magic angle effects may also produce this appearance.

      This is an important point regarding image interpretation. For this reason, it is advantageous that also the original anatomical images prior to subtraction are available, which will show such effects. They can be used in conjuction with the collagen-specific image to provide further insights regarding tissue disease. Increased T<sub>2</sub>* of diseased tissue has so far been reported for the bound water components due to a reduction of dipolar interactions between bound water and collagen (Ref. 4). A potential related change in T<sub>2</sub> for the macromolecular collagen component itself is certainly of interest and an avenue to explore in future work.

      (8) It may be worth distinguishing fibrous connective tissue (loose or dense), which may be normal or abnormal, from fibrosis, which is an abnormal accumulation of fibrous connective tissue in damaged tissue. Fibrosis typically has a longer T 2 initially and decreases its T 2 * over time. In places, the context suggests that fibrous connective tissue may be more appropriate than fibrosis.

      We are aware of this important distinction. We therefore checked the manuscript for references to fibrosis, making sure that the meaning is as intended.

      Overall, the paper appears very well constructed and describes thoughtful and important work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It should be stated that various methods with very short echo times (e.g. SWIFT by Garwood et al.) have been described in the past. This work shows for the first time that direct signals from collagen and be systematically detected in tissue samples.

      We have expanded a sentence in the introduction and reference selected publications studying short-T<sub>2</sub> water signal in collagen, including SWIFT.

      (2) It should be noted that the 1H atoms bound to collagen are located at different sites (at different amino acids of the protein) of the molecule and have different frequencies, and that further signal analyses are of interest.

      We have included additional information regarding distinct resonances of proton-binding sites of collagen in the introduction. The discrete observation of such signals requires advanced NMR methodology such as magic-angle spinning and RF decoupling, which is not a suitable approach for in vivo MRI. Without such methods, the broad lineshapes overlap strongly and are rather observed as a single decaying exponential with the dipolar oscillation as we observe in the FIDs.

      (3) Is it certain that the bump at 30 microseconds comes from 'dipolar coupling'? Is the development time probably too short for chemical shift-induced interference or J-coupling effects?

      30 microseconds is an extremely short interval to accumulate phase and requires large resonance offsets to observe significant changes. To investigate the nature of the bump, we also collected data on a Bruker 7T NMR spectrometer (see Author response image 1). Overall the same signal characteristics are observed as with 3T. In particular, the position of the bump is the same, excluding chemical shift as as source. However, with the higher field strength, chemical shift becomes significant for the signal phase, as observed by the change in the phase behavior at 50 microseconds, when the collagen component has decayed.

      While J-coupling is independent of field strength, the typical ranges are single-digit to tens of Hertz. In contrast, dipolar coupling interacts on the order of thousands of Hertz, which coincides with the values extracted from our signal model.

      To clarify this point, we extended the respective sentence in the Results section.

      Author response image 1.

      (4) It should be noted that short RF pulses have a relatively high energy content, and whether there are any particular stresses on patients during the examination (SAR, nerve stimulation?).

      SAR is an important issue in ZTE MRI. Since imaging bandwidths are large and excitation is performed with the imaging gradient being on, broadband pulses are necessary. Hence, significant RF deposition occurs and in vivo the flip angle can often not be optimized for the maximum signal, but will be limited by the SAR limit. We have added an explanation in the Discussion section.

      Peripheral nerve stimulation is generated by rapid switching of strong gradients. However, ZTE sequences are usually operated without switching gradients on and off, but with only minor adjustments of the gradient direction between TR intervals. Therefore, PNS is not a relevant issue.

      (5) In the Results section, Part B, 'substantial signal intensity' should be written instead of 'substantial image intensity'.

      We have changed this as suggested.

      References

      (1) Chavhan GB, Babyn PS, Thomas B, Shroff MM, Haacke EM. Principles, techniques, and applications of T2*-based MR imaging and its special applications. Radiographics. 2009 Sep-Oct;29(5):1433-49. doi: 10.1148/rg.295095034. PMID: 19755604; PMCID: PMC2799958.

      (2) Seifert, AC, Wehrli, SL, and Wehrli, FW (2015), Bi-component T<sub>2</sub>* analysis of bound and pore bone water fractions fails at high field strengths. NMR Biomed., 28, 861– 872. doi: 10.1002/nbm.3305.

      (3) Hopkins JA, Wehrli FW. Magnetic susceptibility measurement of insoluble solids by NMR: magnetic susceptibility of bone. Magn Reson Med. 1997 Apr;37(4):494-500. doi: 10.1002/mrm.1910370404. PMID: 9094070.

      (4) Loegering IF, Denning SC, Johnson KM, Liu F, Lee KS, Thelen DG. Ultrashort echo time (UTE) imaging reveals a shift in bound water that is sensitive to sub-clinical tendinopathy in older adults. Skeletal Radiol. 2021 Jan;50(1):107-113. doi: 10.1007/s00256-020-03538-1. Epub 2020 Jul 8. PMID: 32642791; PMCID: PMC7677198.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Based on the effects observed with OC vs. Ntf3 cKO, it is unclear whether OC is indeed exerting its non-cell-autonomous effects via Ntf3. Knocking out both Ntf3 and OC and comparing the effects to those seen with just OC cKO alone could provide more insight on this point.

      In this study, we did not intend to demonstrate that Onecut transcription factors exert their non-cell autonomous action on spinal interneuron development by regulating Ntf3 expression, and we do not state in the manuscript that this is the case. We only show that Onecut factors and Ntf3, the expression of which they regulate, contribute to the non-cell autonomous regulation of spinal interneuron development by the motor neurons. We are convinced that Onecut factors could regulate multiple independent factors and pathways involved in extrinsic regulation of interneuron development, as supported by the regulation of multiple secreted factor or membrane protein expression in motor neurons detected in the reported RNA-sequencing experiment (this manuscript and [1]). This possibly also includes, as demonstrated in cell culture for multiple homeoproteins including human Onecut factors [2], the intercellular transfer of the Onecut homeoproteins during spinal cord development, a process that we are currently investigating. Knocking out both OC and Ntf3 in the motor neurons, beyond being technically extremely challenging (1/64 probability to obtain triple-mutant embryos), would not enable to address this question, as it will simply results in the addition of two different defects.

      Also, a quantitative summary of the effects of Ntf3 overexpression in motor neurons in the chick is lacking.

      A quantitative summary of the effects of Ntf3 overexpression in the chicken embryonic spinal cord is provided in Figure S2.

      (2) How the authors assess changes in the spatial distribution of interneurons is unclear. In Figures 2 and 4, the control distributions (despite reporting the same populations in the same regions) look different, suggesting large sample-to-sample variance in distribution. Although the authors report that several sections in each level were taken from at least three animals for each condition, it's unclear how variance within WT or cKO sections was accounted for in the final statistical evaluation. It seems at a glance that a comparison between control samples in Figure 2 and Figure 4 could report statistically significant differences, which would be problematic. A more rigorous report of sample-to-sample variance and a more in-depth explanation of the statistical methods are needed.

      The experimental procedure to analyze the spatial distribution of spinal interneurons at different stages of development is described in details in the “Statistical analyses” paragraph of the Materials and Methods section of the manuscript, and has been repeatedly used by ourselves [3,4] and by others (see for example [5-7]) to conduct similar analyses.

      We also noticed that the distribution of the different analyzed interneuron populations in the control embryos showed some differences between the cOc1Oc<sup>2-/-</sup> and the cNtf3<sup>-/-</sup> lines. Several parameters can account for this observation. First, this study has been conducted over a period of 15 years, different investigators each contributing to different steps of the analysis. Second, the genetic background of these two lines is not identical, impacting both the duration of the gestation (hence, the embryonic stage of the performed analyses, even if the embryos were collected on the same gestation day) and possibly the distribution of some interneuron populations. Third, because of evolutions in the availability of the primary antibodies used to label the interneuron populations of interest, the same antibodies were not used throughout the study, as stated in the Materials and Methods section, although the same antibody was used by the same investigator to label the same interneuron population in each mouse line at each developmental stage.

      A detailed description of the number of sections and embryos included in each analysis as well as the whole statistical workflow that was used for the distribution analyses, which takes into account variance within control or mutant samples, will be provided in the revised version of the manuscript.

      Reviewer #2 (Public review):

      (1) The study primarily quantifies interneuron numbers and distribution at different levels of the spinal cord and under different genetic manipulations. Experimental details are lacking, defining how many sections were analyzed (several are noted in the methods) and how the rostrocaudal levels of the spinal cord were precisely aligned.

      A detailed description of the number of sections and embryos included in each analysis as well as the whole statistical workflow that was used for the distribution analyses will be provided in the revised version of the manuscript. The rostrocaudal levels of the spinal cord were precisely aligned using the distribution of Foxp1 in the Lateral Motor Columns (LMCs) at brachial or lumbar levels of the spinal cord [8,9], which will also be indicated in the revised version.

      In different figures, the values and distributions shown for controls vary quite a lot. For example, in Figure 2B vs Figure 4B, the number of FoxP2+ V1 neurons at brachial levels is ~350 vs 125. Similarly, the control distributions in 2I and 4I are quite different. This makes it challenging to determine whether the conclusions regarding the impact of each genetic manipulation on interneuron numbers and distribution are valid.

      Multiple factors may explain these observations. First, this study spans a 15-year period, with different researchers contributing to various stages of the analysis. Second, the genetic backgrounds of the two mouse lines are not identical, affecting both gestation length (thus influencing the embryonic stage at which analyses were performed, even when embryos were collected on the same gestational day) and potentially the distribution of certain interneuron populations. Third, due to changes in the availability of primary antibodies used to label the targeted interneuron populations, the same antibodies were not consistently employed throughout the study as noted in the Materials and Methods section though each investigator used the same antibody for a given interneuron population and developmental stage within each mouse line.

      (2) The relationship between OC and NT3 deletion data is not entirely clear. Both deletions presumably lead to changes in interneuron distribution, but is there any reverse relationship between the two that relates to relative changes in NT3 levels? The authors do not directly compare NT3 and OC KO IN distributions. Similarly, one might expect a decrease in interneuron numbers in OC mutants, which is only reported for V2c neurons. However, the image presented in Figure 2G shows an equal number of V2c INs in control and mutant.

      This study was not designed to demonstrate that Onecut transcription factors influence spinal interneuron development in a non-cell-autonomous manner through Ntf3 regulation, nor do we claim this in the manuscript. Instead, we show that Onecut factors and Ntf3, whose expression they control contribute to the non-cell-autonomous regulation of spinal interneuron development by motor neurons. We believe Onecut factors may regulate multiple independent factors and pathways involved in the extrinsic control of interneuron development. For instance, as noted earlier [2], we observed intercellular transfer of Onecut homeoproteins during spinal cord development, suggesting alternative mechanisms for non-cell-autonomous regulation.

      The two mouse lines studied here consist, on the one side, in a combination of OC inactivation and Ntf3 increased expression, and, on the other side, in Ntf3 inactivation. Therefore, a reverse relationship between the changes in interneuron distribution is not expected. Furthermore, gain-of-function and loss-of-function experiments in mouse models frequently generate phenotypes that are not inverse to each other [10-13].

      (3) It is not clear that the behavioral phenotypes seen in the olig2-cre mediated deletion of NT3 can be attributed to changes in interneuron development. How about a role of NT3 in oligodendrocytes? There is a big gap between the embryonic changes shown here and behavior, with no in-between circuit-level changes in locomotor circuits shown.

      We agree, the motor behavior changes that we recorded in Ntf3 conditional mutant mice are, as stated, “consistent with the hypothesis that Ntf3 produced by MNs is required to generate locomotor circuits with properly coordinated activity” but do not demonstrate a direct causal relationship. However, investigating the intrinsic activity of the spinal locomotor circuits, independently from, for example, oligodendrocyte contribution may prove to be extremely challenging and was beyond the scope of this study. In addition, to our best knowledge, Ntf3 has not been shown to be expressed in healthy oligodendrocytes in vivo, and TrkC has not been reported to be displayed by these cells in the same conditions.

      A more restricted manipulation would be deleting TrkC from specific interneuron populations. Related to this, although TrkC is shown to be broadly expressed in ventral interneurons, it is not shown specifically to colocalize with any of the interneuron markers. The authors should validate that the receptor is expressed in the subsets that they are investigating.

      We agree, investigating the consequences of inactivating the TrkC receptor in specific interneuron populations would be extremely informative. However, this experiment is also very challenging to perform, as most of the driver lines available to target spinal interneuron populations additionally target multiple neuronal populations outside of the spinal cord that are also involved in the control of movements and could therefore induce confounding effects on motor behavior analyses [14-20].

      We thank the reviewer for suggesting to investigate in more details the interneuron populations that display TrkC receptors, this will be include in the revised version of the manuscript.

      (4) The rationale for following up on NT3 seems to be the chick electroporation experiments; however, no changes in distribution are shown in those experiments, and only a very minor decrease in Chx10 interneurons. Shouldn't NT3 overexpression lead to substantial decreases in IN numbers according to the authors' model? The "data not shown", which presumably refers to distribution, would be important to show here, to further support this rationale.

      Chicken spinal cord electroporation only enables to study spinal cord development in a limited time-window, given the high mortality rate observed after longer incubation. At the stage we collected the electroporated embryos for analyses, interneuron migration has barely been initiated, and distribution cannot be studied yet. Consistently, we are not aware of any report of interneuron distribution analysis in electroporated chicken embryonic spinal cord, as compared to mouse embryos [3-7].

      (5) The idea that NT3 downregulation causes an increase in IN numbers is not intuitive. Also, considering the DTA experiments in Figure 1, showing that MN ablation leads to a decrease in several IN subtypes and no changes in V2a neurons. It would be helpful for the reader if the authors could synthesize their results in the discussion and reconcile their experimental findings.

      We agree, this will be included in the revise version of the manuscript.

      Reviewer #3 (Public review):

      (1) The manuscript relies heavily on quantifying numbers and the spatial distribution of interneuron populations. However, these do not seem to be consistent in control animals across experiments, making it difficult to interpret any changes observed in genetic manipulations. Specifically, in Figures 2 and 4, the same markers are being used to quantify V1, V2a, V2b, and V2c interneurons in controls vs. OC (Figure 2) or Ntf3 (Figure 4) conditional knockouts, but the numbers of neurons and their distribution in control animals are variable between these two figures. For example, there seems to be a mean of >300 V1 neurons in E12.5 brachial sections of Fig. 2 controls, but this number is <150 in Fig. 4 controls. The cell distribution scoring is similarly variable between these controls without any explanation. The same is true for E14.5 controls used in Figure S1 vs. Figure S3.

      We indeed observed variations in the quantifications and distributions of the analyzed interneuron populations in control embryos between the cOc1/Oc2<sup>⁻/⁻</sup> and cNtf3<sup>⁻/⁻</sup> lines. Several factors may explain this discrepancy. First, the study was carried out over 15 years, with different investigators contributing to distinct stages of the analysis—meaning interneuron distribution was not assessed by the same researchers in both lines. Second, the genetic backgrounds of the two lines differ, affecting gestation length (and thus the embryonic stage at analysis, even when embryos were collected on the same gestational day) as well as potentially altering the distribution of certain interneuron populations. Third, changes in the availability of primary antibodies targeting the interneuron populations of interest led to inconsistencies in antibody use across the study, as detailed in the Materials and Methods section. However, each investigator consistently used the same antibody for a given interneuron population and developmental stage within each mouse line.

      (2) Neurotrophic factors generally promote neuronal survival. However, in this study, the loss of Ntf3 leads to increased numbers of interneurons. This finding is in disagreement with previous observations in slice cultures of spinal cords, as stated in the discussion. This discrepancy makes it even more important that the cell counts reported in the figures discussed above are robust.

      Considering that neurotrophic factors only support neuronal survival would strongly neglect their important function in neuronal differentiation, which has been broadly demonstrated. Severe immunotoxic ablation of motor neurons or anti-serum blockade of Ntf3 activity severely depleted inhibitory, but not excitatory, interneurons in a highly apoptotic-prone organotypic culture model of embryonic rat spinal cord slices, which was rescued by Ntf3 in the first model [21]. Opposite results were obtained in vivo by other researchers using mouse models lacking almost all MNs due to the elimination of skeletal muscles, where the number of spinal INs remained unaffected [22,23]. Combined to our results, these in vivo observations suggest that Ntf-3 is involved in interneuron differentiation rather in their survival. Consistently, Ntf3 has been shown to promote neuronal differentiation [24].

      (3) The claim that phenotypes are non-cell autonomously driven by motor neurons is not well supported. In Olig2-Cre conditional knockouts of Onecut and Ntf3, there is no confirmation that the loss of these factors is specific to motor neurons. Therefore, it cannot be ruled out that other cell populations may be mediating the phenotypes.

      Combined conditional inactivation of Oc1 and Oc2 has been reported in [1]. Conditional inactivation of Ntf3 only impacts motor neurons as it is the only cell population in the ventral spinal cord wherein this factor is produced (this study and [25-27]). Furthermore, Olig2-Cre has been shown to be active in motor neurons and in V3 interneurons (see for example [10]), which, for this reason, have not been studied in the frame of this project as stated in the manuscript.

      (4) The claim that interneuron development is regulated by OC control of Ntf3 expression in motor neurons is not well supported. The authors show that loss of OC1/2 leads to an increase in Ntf3 expression in motor neurons. If this pathway were controlling interneurons, loss of OC function and overexpression of Ntf3 would have the same phenotype, which is not the case. Additionally, it would also be expected that loss of OC function and loss of Ntf3 function would have inverse phenotypes, which is also not the case. The phenotypes from OC loss of function and Ntf3 loss of function seem distinct from one another. The authors state that too little and too much Ntf3 are both bad for interneuron development, but there is no data to support their claim that OC1/2 mutants have altered interneuron development because of higher Ntf3 expression.

      This study was not aimed at proving that Onecut transcription factors mediate their non-cell-autonomous effects on spinal interneuron development through Ntf3 regulation, nor do we make this claim in the manuscript. Rather, we demonstrate that Onecut factors and Ntf3, whose expression they control—participate in the non-cell-autonomous regulation of spinal interneuron development by motor neurons. We propose that Onecut factors likely modulate multiple independent factors and pathways involved in the extrinsic regulation of interneuron development, as evidenced by the regulation of various secreted factors and membrane proteins in motor neurons observed in our RNA-sequencing data (this study and [1]). This may also involve intercellular transfer of Onecut homeoproteins during spinal cord development—a mechanism previously shown in cell culture for several homeoproteins, including human Onecut factors [2] and which we are currently exploring.

      (5) It is not clear that interneurons being studied express the Ntf3 receptor TrkC, which makes it difficult to assess whether changes in Ntf3 signaling are directly responsible for the phenotype.

      Immunofluorescence experiment in Figure 3C shows that TrkC receptor is present in cell populations surrounding motor neurons at e12.5, a stage where only the pre-motor interneuron populations reported in the manuscript are present. However, we thank the reviewer for suggesting to investigate in more details the interneuron populations that display TrkC receptors, this will be include in the revised version of the manuscript.

      (6) While the behavioral phenotypes are consistent with Ntf3 playing a role in motor circuits, there is no evidence to suggest that Ntf3's influence on premotor interneurons being studied is driving or contributing to this phenotype, as discussed by the authors.

      We acknowledge that the motor behavior changes observed in Ntf3 conditional mutant mice—as noted—are “consistent with the hypothesis that MN-derived Ntf3 is necessary for the formation of locomotor circuits with properly coordinated activity,” but they do not establish a direct causal link. However, analyzing the intrinsic activity of spinal locomotor circuits was beyond the scope of this study.

      (1) Toch, M. et al. Onecut-dependent Nkx6.2 transcription factor expression is required for proper formation and activity of spinal locomotor circuits. Sci Rep 10, 996 (2020). https://doi.org/10.1038/s41598-020-57945-4

      (2) Lee, E. J. et al. Global Analysis of Intercellular Homeodomain Protein Transfer. Cell Rep 28, 712-722 e713 (2019). https://doi.org/10.1016/j.celrep.2019.06.056

      (3) Harris, A. et al. Onecut factors and Pou2f2 regulate the distribution of V2 interneurons in the mouse developing spinal cord. Front Cell Neurosci 13 (2019). https://doi.org/10.3389/fncel.2019.00184

      (4) Kabayiza, K. U. et al. The Onecut Transcription Factors Regulate Differentiation and Distribution of Dorsal Interneurons during Spinal Cord Development. Front Mol Neurosci 10, 157 (2017). https://doi.org/10.3389/fnmol.2017.00157

      (5) Deska-Gauthier, D. et al. Embryonic temporal-spatial delineation of excitatory spinal V3 interneuron diversity. Cell Rep 43, 113635 (2024). https://doi.org/10.1016/j.celrep.2023.113635

      (6) Bikoff, J. B. et al. Spinal Inhibitory Interneuron Diversity Delineates Variant Motor Microcircuits. Cell165, 207-219 (2016). https://doi.org/10.1016/j.cell.2016.01.027

      (7) Hayashi, M. et al. Graded Arrays of Spinal and Supraspinal V2a Interneuron Subtypes Underlie Forelimb and Hindlimb Motor Control. Neuron 97, 869-884 e865 (2018). https://doi.org/10.1016/j.neuron.2018.01.023

      (8) Rousso, D. L., Gaber, Z. B., Wellik, D., Morrisey, E. E. & Novitch, B. G. Coordinated actions of the forkhead protein Foxp1 and Hox proteins in the columnar organization of spinal motor neurons. Neuron59, 226-240 (2008). https://doi.org/10.1016/j.neuron.2008.06.025 [pii]

      (9) Roy, A. et al. Onecut transcription factors act upstream of Isl1 to regulate spinal motoneuron diversification. Development 139, 3109-3119 (2012). https://doi.org/10.1242/dev.078501

      (10) Debrulle, S. et al. Vsx1 and Chx10 paralogs sequentially secure V2 interneuron identity during spinal cord development. Cell Mol Life Sci 77, 4117-4131 (2020). https://doi.org/10.1007/s00018-019-03408-7

      (11) Brunklaus, A. et al. in Brain Vol. 145 3816-3831 (2022).

      (12) Scekic-Zahirovic, J. et al. in EMBO J Vol. 35 1077-1097 (2016).

      (13) Wong, J. C. in Epilepsy Curr Vol. 25 347-349 (2025).

      (14) Hafler, B. P., Choi, M. Y., Shivdasani, R. A. & Rowitch, D. H. Expression and function of Nkx6.3 in vertebrate hindbrain. Brain Res 1222, 42-50 (2008). https://doi.org/10.1016/j.brainres.2008.04.072 [pii]

      (15) Nardelli, J., Thiesson, D., Fujiwara, Y., Tsai, F. Y. & Orkin, S. H. Expression and genetic interaction of transcription factors GATA-2 and GATA-3 during development of the mouse central nervous system. Dev Biol 210, 305-321 (1999).

      (16) Bretzner, F. & Brownstone, R. M. in J Neurosci Vol. 33 14681-14692 (2013).

      (17) Chopek, J. W., Zhang, Y. & Brownstone, R. M. in J Neurophysiol Vol. 126 1978-1990 (2021).

      (18) Miyagi, S., Kato, H. & Okuda, A. in Cell Mol Life Sci Vol. 66 3675-3684 (2009).

      (19) French, C. A. et al. in Mol Psychiatry Vol. 24 447-462 (2019).

      (20) Khouri-Farah, N., Guo, Q., Perry, T. A., Dussault, R. & Li, J. Y. H. in Nat Neurosci Vol. 28 2022-2033 (2025).

      (21) Bechade, C., Mallecourt, C., Sedel, F., Vyas, S. & Triller, A. in J Neurosci Vol. 22 8779-8784 (2002).

      (22) Grieshammer, U., Lewandoski, M., Prevette, D., Oppenheim, R. W. & Martin, G. R. Muscle-specific cell ablation conditional upon Cre-mediated DNA recombination in transgenic mice leads to massive spinal and cranial motoneuron loss. Dev Biol 197, 234-247 (1998). https://doi.org/10.1006/dbio.1997.8859

      (24) Kablar, B. & Rudnicki, M. A. Development in the absence of skeletal muscle results in the sequential ablation of motor neurons from the spinal cord to the brain. Dev Biol 208, 93-109 (1999). https://doi.org/10.1006/dbio.1998.9184

      (25) Dutton, R., Yamada, T., Turnley, A., Bartlett, P. F. & Murphy, M. Regulation of spinal motoneuron differentiation by the combined action of Sonic hedgehog and neurotrophin 3. Clin Exp Pharmacol Physiol 26, 746-748 (1999). https://doi.org/10.1046/j.1440-1681.1999.03108.x

      (26) Buck, C. R., Seburn, K. L. & Cope, T. C. Neurotrophin expression by spinal motoneurons in adult and developing rats. J Comp Neurol 416, 309-318 (2000).

      (27) Henderson, C. E. et al. Neurotrophins promote motor neuron survival and are present in embryonic limb bud. Nature 363, 266-270 (1993). https://doi.org/10.1038/363266a0

      (28) Usui, N. et al. Role of motoneuron-derived neurotrophin 3 in survival and axonal projection of sensory neurons during neural circuit formation. Development 139, 1125-1132 (2012). https://doi.org/10.1242/dev.069997

    1. Author response:

      General Statements

      We would like to extend our gratitude to all reviewers for their supportive feedback, which acknowledges our study as well performed and of interest to investigators studying muscle development and diseases and supporting a role for the fly model in testing potential human disease gene variants. We also thank the reviewers for their valuable critical comments. We carefully considered all of them and made additional experiments and suggested text amendments.

      We believe these modifications substantially improve the quality of our results and enhance general interest of our work.

      Point-by-point description of the revisions

      Reviewer #1:

      In this manuscript, Zmojdzian et al. provide an analysis of ryanodine receptor (RyR) expression and function in Drosophila. They also use CRISPR to engineer into flies a RyR variant of unknown significance (VUS) found in a human myopathy patient and demonstrate that it is likely a pathogenic mutation. From studies of RyR expression in embryonic and larval stages, and effects of RyR knockdown or overexpression in various muscle groups, the authors show that, in addition to its known actions in calcium-dependent excitation-contraction coupling, RyR promotes myogenesis during development.

      The key conclusions of the paper are convincing. I do not have suggestions for necessary additional experimental work, and my comments are minor. One conclusion, that RyR dysfunction may be involved in aging, is stated in multiple places, sometimes speculatively but once very forcefully. The latter is in the final paragraph of the Discussion, which states RyR "plays an instrumental anti-aging role in differentiated striated muscle". This conclusion must be tempered, as even if RyR knockdown phenotypes resemble some of those seen in aging flies, the study does not examine aged flies, and there is no mechanistic analysis that might link the two. I assume the authors would prefer to modify that sentence than initiate work with aging flies to prove the assertion.

      We thank the Reviewer for this comment and remove from the concluding sentence hypothetical anti-aging role of RyR. The modified sentence reads as follow:

      “To conclude, we report functional analysis of dRyR, the sole fruit fly RyR gene and show that in addition to ensuring contractile properties of differentiated striated muscle it plays a key pro-myogenic role during muscle development.”

      Finally, the use of CRISPR to test a VUS is excellent and suggests a good way for testing of additional RyR variants in the future.

      Minor comments:

      (1) Figure 1A: In the Introduction it is stated that non-mammalian vertebrates have two RyR genes, alpha and beta. In Fig. 1A, a single chicken and single frog gene are listed under names different than alpha or beta. The figure also focuses on RyR2 genes, yet the Introduction states that the non-mammalian vertebrate genes are homologous to RyR1 and RyR3 in mammals. The dichotomy between the text and the figure is confusing. Finally, the font used in Fig. 1A should be enlarged for better visibility.

      To avoid the dichotomy we modified our sentence concerning the non-mammalian vertebrate RYR genes in the Introduction section. As indicated, there are two RYR genes in chicken and frog, with one that shares homology with vertebrate RYR2 and is represented in the phylogenetic tree (Fig. 1A).  As requested by the reviewer, to ensure better visibility we enlarged the font in the revised Fig. 1A.

      (2) Figure 3G-I: IF to Kettin is used to reveal sarcomeres but is not mentioned in the text. This protein is not present in vertebrates (I believe) and may not be familiar to many readers. It should be described in the text when it is used.

      We are grateful for reminding us to provide information about Kettin, which represents the Drosophila counterpart of Titin. The following information has been added to the text on page 9: “ …which in turn correlated with shortening of Kettin/D-Titin-labelled sarcomeres…”

      (3) Figure S2: The panels are labelled E, F, G. They should be A-D, as is used in the text.

      In the revised version of Fig. S2 panel labels were amended and the panel E view enlarged. We also provide an additional control context (C57>LacZ).

      (4) The dRyR16 allele is used in Figure 5 and S4. It is described as a hypomorph in the text on page 12 but as a null in the legend to Figure 5. Do the authors actually mean "homozygous" in the legend? The difference should be clarified.

      The dRyR<sup>16</sup> allele has been previously described as hypomorph. Indeed, in the legend of Fig. 5 we by mistake describe it as a “null”. As suggested by the Reviewer we modify it to « homozygous ».

      (5) The Met codon that is mutated in the variant studied in Figure S5 and Figure 6 is position 488 in humans. It is referred to that way in the fly version also. Is that true, the actual amino acid number is identical in humans and flies? In Figure S5B, it might be worth showing the primary amino acid sequence surrounding Met488 to reveal the degree of local conservation (beyond the orange domain in that panel).

      To provide more information about the conservation we include to the revised Fig. S5 an alignment of amino acid sequence surrounding the human RYR1 4881 variant position, which corresponds to position 4971 in the Drosophila dRyR.

      Author response image 1 shows a snapshot from a larger portion of alignment encompassing variant mutation showing a high amino acids conservation around the variant position:

      Author response image 1.

      (6) At least two references cited in the text are not listed in the References section (Hadiatullah et al. and Nishimura et al.).

      We double check reference citation and two indicated positions are now listed in the References section.

      Reviewer #1 (Significance):

      The paper is significant in that RyR is known to be a critical protein in calcium-dependent excitationcontraction coupling but its role in developmental myogenesis is poorly studied. This study demonstrates that it is expressed during, and is important for, embryonic and larval myogenesis in the fly. RyR is also understudied in this valuable model organism, even though a P element-based mutant has been available since 2000. The mechanistic basis for the functional observations is not explored here but the work is well performed and will be of interest to investigators studying muscle development (my own field) and diseases caused by RyR mutations.

      To reinforce mechanistic/functional side of our studies we include to the revised Fig.5 a new panel G showing promyogenic role of another major cellular calcium regulator, ER calcium pump SERCA. The Lms targeted RNAi knockdown of SERCA leads to affected myotube growth resulting in a thin muscle fiber phenotype. This indicates that both dRyR-regulated cytosolic and SERCA-regulated ER store calcium levels are required to promote muscle development.

      Reviewer #2:

      Summary:

      This paper presents data using the Drosophila model to analyze the effects of a rare human mutation in the gene encoding the ryanodine receptor (ryr). The authors present a nice, comprehensive phylogenetic analysis that shows the Drosophila version of Ryr to be most similar to human RYR2 and that the known "hot spots" for mutations in RYR2 coincide with highly conserved regions of the Drosophila Ryr. They characterize the functional effects of ryr knockdown and overexpression on both adult heart function and larval body wall muscle. They identified embryonic ryr expression in association with actin-stained muscle precursor cells and provide beautiful stains, which clearly showed that embryonic muscle cell development was disrupted in ryr mutants. In support of these findings, KD of Calmodulin in larva (an Ryr inhibitor) phenocopied Ryr OE. They recreated a human variant of unknown function (RyR1 p.Met4881Ile ) in the conserved region of the fly gene and tested the effect on larval muscle. Their data suggested that this variant was likely deleterious as it negatively affected most muscle parameters. This work supports a role for the fly model in testing potential human disease gene variants.

      Major comments:

      (1) Fig, 1 In G there is no data for the RNAi KD situation.

      We are grateful to the Reviewer for pointing this out. We initially didn’t include these data because of large difference in crawling capacities of dRyR RNAi larvae. In the revised version of Fig. 1 we provide now dRyR-RNAi larva crawling data. Because of their inefficient crawling, the time scale in panel 1G was modified.

      (2) Fig. 2 Authors should include Diastolic Diameters; they mention dilated cardiomyopathy but don't show the dilation. The authors should also show staining in hearts with RYR OE and RNAi. It would be nice to have some kind of quantification of disorganized myofibrils.

      As requested, in the revised Fig. 2 we provide diastolic diameter measures. We also include systolic interval graph to show a full picture of cardiac parameters. We do not observe all signs of dilated cardiomyopathy in dRyR-RNAi context as there is systolic diameter increase but no significant change in diastolic diameter.

      We modify our comments in the text accordingly (page 7).

      “…As the diastolic diameter remained unchanged, we conclude that cardiac dRyR knockdown affects cardiac performance without causing dilated cardiomyopathy…”

      Regarding circular myofibrils pattern, we do not observe irregularity of myofibrils orientation but rather a fuzzy and less distinctive sarcomeric pattern that is difficult to quantify. We specify this in the figure 2 legend (page 8).

      “…circular fibers in Hand>dRyR RNAi (E) context showed a fuzzy pattern suggesting an affected sarcomeric organisation…”

      Author response image 2 shows the entire view of the cardiac tube in dRyRRNAi context (stained with phalloidin) in which in spite of less distinctive circular myofibrils no obvious differences with wt are observed.

      Author response image 2.

      (3) To evaluate and reproduce the data on the larva muscle parameters the authors should provide more details on how sarcomere length was quantified in each larva (replicates, ROI size, etc). Similarly, how were # of nuclei quantified / normalized? Importantly for these measurements, did the authors know what the contraction state of the muscles were when fixed?

      We add the requested information to the Materials and Methods section:

      “Muscle characteristics measurements:

      All analyses of muscle length and sarcomere size were performed on fixed larval muscle preparations in a relaxed state. Acquired confocal images were analysed in Fiji using the line tool. Analyze – Measure tool was then applied to obtain muscle length values and measurements were analysed with Prism. Sarcomere size and number were calculated using Analyze – Plot profile Fiji tool. The sarcomere size was measured between peaks corresponding to Z-disc (revealed with Z-line specific marker) on approximatively 100 µm of muscle length. Sarcomere measurements were then analysed with Prism.

      DAPI-stained nuclei were counted in Z-stacks of confocal views of VL3 larval muscle and data analysed with Prism. About 30 larval muscles from 6-8 larval filets were analysed for each measurement. »  Statistics

      All statistical analyses were performed using Prism (v9.5.1, GraphPad, Software, La Jolla, CA, USA). The t test was used to compare control to variant context and one-way ANOVA tests were used for comparisons with more than two datasets. Bar plot represent the mean and the standard deviation. On the figures, statistical comparisons of sample vs control are indicated as ****: P ≤ 0.0001; ***: P ≤ 0.001; **: P ≤ 0.01; *: P ≤ 0.05; ns > 0.05.

      (4) Fig. 3, Are RNAi and OE in the same background? I only see one control in the graphs for the RNAi line background.

      We agree and to avoid potential bias between the RNAi versus OE genetic contexts we provide now in the revised version of Fig. 3 an additional OE control (C57>lacZ).

      Thus, two controls, one for RNAi and one for OE contexts are now included.

      (5) Fig. 3 How VL3 length was determined needs more detail, the Zhang ref is not adequate.

      We are thanking the Reviewer for this comment and provide now more details about the method used to calculate VL3 length (new paragraph in Materials and Methods), see also our answer to point 3. Zhang et al. reference is in relation to the mitochondria pattern quantification.

      (6) In order to be able to evaluate the data, the statistical tests used should be cited in the figure legends along with what *, ** ,*** stand for (or just provide p values).

      We add now the information about the statistical tests to the Fig legends in addition to the specific paragraph in Materials and Methods section (answer to point 3).

      Minor comments:

      (1) Need more detail in the figures, e.g. add what colors go with which stain to the picture.

      We provide this information in the revised version of the figure legends

      (2) Page 13, (Fig. ?F, G).

      We apologize for this mistake and add the number - Fig. 5

      (3) Fig. 4 "partially co-localizing with actin".... this is confusing and probably an overstatement based on the staining pattern in a whole embryo and not on an optical section or a higher power image with a more restricted field of view.

      We agree and remove this statement from the Fig.4 legend.

      (4) Some of the graphs are a bit small, recommend reducing the statistical comparison brackets to straight lines, which eliminates a lot of white space and would allow the graphs to be enlarged.

      We increased the size of graphs in revised Fig. S2 and Fig.5.

      Reviewer #2 (Significance):

      The authors nicely characterized the role of Ryr in muscle development and function and recreated a human variant of unknown function (RyR1 p.Met4881Ile ) in the conserved region of the fly gene. Their data suggested that this variant was likely deleterious as it negatively affected most muscle parameters. This work supports a role for the fly model in testing potential human disease gene variants. The reviewers field of expertise is in Drosophila genetics and in the use of the fly as a model system for understanding the genetic networks contributing to muscle structure and function at the cellular level.

      Reviewer #3:

      Summary

      This paper examines the Drosophila Ryanodine Receptor (RyR or dRyR). Ryanodine receptors are enormous channel proteins that mediate calcium efflux from the endoplasmic reticulum and sarcoplasmic reticulum. One goal of the work is to describe salient developmental features of Drosophila RyR (i.e., where it localizes in the cell and how it contributes to muscle development and function) and to refine knowledge from prior reports. Many of the analyses toward that goal are well done; this reviewer especially likes the examination of how muscles develop (Fig. 5).

      Another goal is to compare this information with what is known about mammalian RyRs. There seems to be a lot in common between Drosophila and mammalian RyRs. The paper finishes by taking a human ryanodine receptor variant of unknown significance and generating the corresponding amino-acid substitution in Drosophila RyR. The substitution has some phenotypic consequences for fly coordination, so the authors conclude that the human variant is likely to be pathogenic.

      In terms of investigation, a refined description of RyR biology is welcome. Ryanodine receptors are critical contributors/mediators of intracellular calcium signaling processes. Understanding their properties can help to contextualize the results of studies where calcium dynamics are at play. This is true of for both Drosophila and non-Drosophila work. For this version of the paper, there are several statements that should be edited, both in terms of accuracy and in terms of reporting prior knowledge. Additionally, some experiments are missing controls or reagent verification. Importantly, the anti-RyR antibody needs supporting information regarding its specificity.

      Main Comments

      (1) The paper does not fully state what has been done before in terms of studying Drosophila ryanodine receptor expression. In comparing the work on ryanodine receptors in vertebrates versus Drosophila, the authors write, "By contrast, no systematic analyses have yet been performed to assess the expression of the sole Drosophila dRyR gene." I was a little surprised by this sentence, so I examined the literature. There are hundreds of Drosophila publications that mention the ryanodine receptor in some way, but they are not about gene expression . As stated, the sentence might depend on what the authors mean by "systematic analyses." Two early works are relevant here: the Hasan and Rosbash, 1992 paper and the Sullivan et al., 2000 paper. Both are cited in this study. And both of these early papers addressed RyR gene expression, so that fact should be acknowledged up front.

      We agree with the Reviewer that there is a large number of publications that mention Drosophila ryanodine receptor with two of them identified by the Reviewer that provide information about Drosophila RyR expression. We refer to both of them and follow Reviewer’s suggestion to further acknowledge their work. The modified sentence in the text reads as follow:

      “…in spite of early works by Hasan and Rosbash (1992) and Sullivan et al., (2000) no systematic analyses have yet been performed to assess the developmental expression pattern of the sole Drosophila dRyR gene…”

      Concerning “systematic analyses” we mean the analyses of dRyR expression at both transcripts and protein levels during embryonic development and in differentiated muscles.

      (2) (Related) I examined those two early papers to cross-check the extent of analysis done previously. The text of Hasan and Rosbash reports in situ examination of RyR transcript using a digoxigenin probe (though the online version of that 1992 paper seems to have left out the relevant mesodermal and muscle images referenced in the paper, in favor of duplicating Figure 5 three times - I emailed Development to alert them). More relevant, several experiments executed in the Sullivan paper agrees closely with the current paper. As such, it needs more complete referencing. The Sullivan paper showed short, round larvae in mutants (Fig. 1 of Sullivan); ubiquitous mRNA, strongly in muscle and mesoderm (Fig. 2 of Sullivan); impaired muscle function in mutants (Fig. 3 of Sullivan), and impaired larval heart rate (Fig. 4 of Sullivan).

      Sullivan et al. paper is indeed a reference paper for Drosophila RyR. Our data are however largely novel and/or substantially extending those reported by Sullivan. Notably, we show for the first time developmental dRyR protein expression pattern in embryos and in larval filets, we also analyse dRyR isoform transcripts expression and provide for the first time embryonic muscle phenotype analyses that shed light on so far under investigated developmental function of dRyR.

      We follow Reviewer’s suggestion and provide in the revised version additional citations of this work:

      “…attenuation of dRyR (C57>dRyR RNAi) led to a significantly reduced larva body length (Fig. 3B, M) compared to control (Fig. 3A, Q), an observation that correlates with previously observed (Sullivan et al., 2000) reduced body size of dRyR<sup>16</sup> mutant larvae…”.

      “…our data extend previous observations of affected muscle contractility in RyR mutants (Sullivan et al., 2000)…”

      “…Overall, observed dRyR loss-of-function heart phenotypes with a slow heart rate and increased arrhythmia correlate with impaired cardiac function in RyR mutant larvae (Sullivan et al., 2000)…”

      (3) Fig. 1B-D (antibody staining): There are puzzles with this experiment. The first is with the anti-Dlg channel. Dlg is a core component of the NMJ postsynaptic density, and the antibody reveals a bright cage of Dlg around the boutons. But with the muscle images in Figure 1B, there are no boutons apparent (unless they are so far out of focus as to be invisible).

      Indeed, Dlg also stains postsynaptic NMJs at the muscle surface. On the Fig. 1B showing more internal optical sections to reveal T tubules Dlg-positive NMJs are out of focus.

      The second question centers on the dRyR antibody. The results state, "We first tested the expression of dRYR at the protein level." This sentence appears immediately after the sentence for gene expression from point 1. Technically, this antibody will help determine protein localization, not gene expression. But more importantly, there is no supporting/verifying information about this guinea pig anti-dRYR antibody. The methods state that it was provided by Robert Scott from NIMH. But there is no accompanying citation, no information about the antigen used to raise the antibody, and no negative control (either mutant or RNAi) to show that the staining is specific. If this is a published anti dRyR antibody that already meets the standards of specificity, that should be made clear, and the citation should be given. But if not, the information and data about the production of the antibody and the testing of its quality needs to be shared.

      We apologize for this omitted citation. The anti-dRyR antibody has been previously described and its specificity tested in the article Gao et al., (2013). Corresponding author of this paper David J. Sandstrom left NIMH and anti-dRyR antibodies are currently curated by Rob Scott from Benjamin White’s lab at NIMH.

      He generously sent us sample of this antibody. We add this information to the Material and Methods section.

      (4) Fig. S1: Similar to the antibody, is there a negative control probe that does not reveal this expression pattern? There are any number of probes or secondary antibodies that non-specifically label Drosophila muscles in patterns just like this.

      We are confident that the HCR probes are working properly as they reveal dRyR transcripts expression that is consistent with dRyR protein expression pattern. In parallel they show differential expression in embryos.

      Author response image 3 shows the control HCR ISH experiment with a probe that detects Apterous transcripts (specific for a subset of embryonic muscles and not present in L3 larval muscles).

      Author response image 3.

      A comparison between Ap HCR (A, A’) and dRyR Ex23 HCR (E, E’) signals.

      Minor Comments

      (1) "Overall, observed dRYR loss-of-function heart phenotypes...are reminiscent of those associated with aging (Nishimura et al., 2010), indicating that dRyR RNAi-induced impairment of Ca2+ homeostasis contributes to cardiac aging..." The conclusion of the sentence does not logically follow from the first part. This is because the tests conducted here were on rhythm, not on calcium homeostasis and cardiac aging.

      So, the tests cannot definitively say anything about those latter phenotypes.

      To answer this reviewer’s coment we modify the concluding sentence as follow:

      “…We hypothesize that dRyR RNAi-induced impairment of Ca2+ homeostasis could contribute to cardiac aging, for which Drosophila is a recognized model (Nishimura et al., 2011).”

      (2) Fig. S2 (bar graph): "% of total" - Is this supposed to refer to the percentage of the total muscle area that is positive for ATP5a staining? That should be clarified.

      We provide clarification in the Fig.S2 legend. “% of total” means the percentage of the measured muscle area that is positive for ATP5a staining”.

      (3) Fig. 3M, should say length

      Done

      (4) Fig. 5A legend - See Sullivan; that paper concluded that RyR[16] was hypomorphic instead of null, based on RyR[16]/Df comparison to RyR[16]/RyR[16]. Intuitively, I agree; a lesion that rips out the start site would likely be null. The antibody could help with classifying the allele, depending on the part of RyR used as the antigen.

      The RyR<sup>16</sup> mutants were indeed described by Sullivan et al., as hypomorphic and not null. In the Fig. 5 legend we modify the comment to: “…homozygous dRyR<sup>16</sup> mutant embryo…”

      (5) Discussion: "This also suggests that all dRyR isoforms are collectively required for larval muscle function." That sentence does not logically follow the expression information. In order to test that idea, individual isoforms would need to be eliminated or knocked down.

      We agree with this comment and modify our sentence accordingly.

      “However, whether all dRyR isoforms are collectively required for larval muscle function requires further investigation.”

      Reviewer #3 (Significance):

      The idea that RyR is expressed in many kinds of muscle is put forth as a major conclusion. It is good that the authors report this fact, and the impacts on muscle development documented in Figure 5 are some of the best data in the paper. However, in terms of opening up a new understanding of RyR biology, the impact of this information seems modest. Prior Drosophila work and the work of others studying these channels show that ryanodine receptors are ubiquitous. The fact that there is only one Drosophila RyR gene would lead most scientists to hypothesize that it would be present on the ER surfaces of all kinds of tissues, including different types of muscle.Novel phenotypic information for Drosophila RyR is reported in the study, and this is good. But in terms of the model system, the strength of Drosophila is in using genetic combinations to make refined conclusions. That toolkit is not fully used here; therefore, the paper is mostly descriptive. This study is mostly a single-gene study (dRyR), with isolated exceptions, like Cam knockdown in Figure 5.

      To improve the functional/mechanistic aspect of the manuscript in the revised version we include to Fig.5 the analysis of myogenic role of additional calcium regulator: ER calcium pump SERCA.

    1. Author response:

      General Statements

      We thank the reviewers for their careful and supportive reviews of our manuscript. We have addresses all the reviewers comments and extensively revised the manuscript accordingly.

      During our revisions, we discovered a bug in the code that calculated the linear genomic distance between the captured promoter regions (bait regions) and the promoter-interacting fragments (PIFs). The error inadvertently halved the distance measurements in the output tables. This has been corrected in the revised manuscript and has resulted in updates to Figure 1B and corrected values in the ‘interaction_distance’ and/or ‘interaction_type’ columns of Supplementary Tables 2, 3, 6 and 8. We thank the reviewers for the opportunity to correct this.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      In this article, the authors conducted promoter-capture HiC experiments (pcHiC) in Mouse Cerebellar granule cell progenitors (GCps) and obtained a good set of 3D genome interactions map of protein-coding genes' promoters. This dataset was later integrated with ATAC-seq and ChIP-seq experiments to identify putative enhancer regions within promoter-interacting regions, and with higher base-pair resolution than what is obtained by pcHiC experiments. This set of enhancers is then compared to and presented as being more reliable than those present in VISTA enhancer database. In addition, ATAC-seq sites and RNA-seq datasets, both obtained in WT and CHD7 and KO conditions, are integrated to correlate expression of a set of genes to the chromatin accessibility of their distal enhancer(s) which is believed to be promoted by CHD7. The study is completed by focusing on transcription factor motif analysis on CHD7-regulated enhancers which shows an enrichment for proneural transcription factors, with special emphasis on Atoh1 found to be frequently co-recruited with CHD7. Data and methods are well detailed and correctly replicated and will be useful as a resource for the community. The overlap obtained between pcHiC experiments and auto-criticized by the authors is very common and expected in this kind of experiments. In general, the conclusions drawn the article are convincing but some aspects such as comparison to VISTA and the naming of 'enhancers' should be moderated.

      We thank the reviewer for their positive and constructive comments. We have amended the manuscript as indicated in detail below.

      (1) The comparison of pcHiC-identified enhancers vs. VISTA enhancers should be more balanced, as the two approaches have important conceptual differences. Although VISTA enhancers are based on functional annotation, their target genes might not necessarily be correctly assigned based on the distance. On the other hand, putative enhancer regions identified by pcHiC experiments do not rely on functional testing. So both type of information are useful but can be put in perspective.

      We thank the reviewer for making this point. We have amended the text to present a more balanced view e.g. “Using VISTA-designated hindbrain enhancers as an example, we identify the genes most likely regulated directly by these enhancers and update their annotation accordingly.”

      (2) To increase the strength of the paper, it would be preferable that authors include simple functional enhancer assays (e.g. CRISPR deletion of contacting enhancer, luciferase assay) to support their perspective since 3D conformation information in KO condition is lacking in the article. Although ideally these experiments should be better performed for a full demonstration, it would be acceptable to at least include a simple functional assay in the WT context to demonstrate that the regulatory regions obtained by crossing genomic data are real enhancers. This point is even more critical knowing that enhancers lacking classical histone marks (H3K27ac+H3K4me1) has been described. The same comment applies to promoter interacting fragments lacking these marks, that could be missing enhancers (i.e enhancers without these marks).

      To address this point, we performed luciferase assays to show that putative enhancers identified with our integrated bioinformatic approach (pcHi-C + ATACseq + H3K4me1 + H3K27ac) do indeed exhibit enhancer activity. For these experiments, we tested these putative fragments in an immortalized cell line SHH-NPD, a GCp-derived cell line generated by Fults laboratory (Jenkins et al. 2014). The results of these experiments are included as Suppl. Fig. 1 in the revised manuscript.

      Minor point

      - Figure 5B is lacking labels.

      We apologise for this oversight – labels have now been added.

      Reviewer #1 (Significance):

      This article, when completed with possible revision, will be be useful for the community in terms of useful resource of experimentally determined putative enhancers in Cerebellar granule cell progenitors. It also provides some insights into the association of CHD7 and Atoh1 in distal regulation in these cells.

      We thank the reviewer for acknowledging the significance of our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, the authors aim to identify active, long-range regulatory interactions in cerebellar granule cell progenitors (GCps). As such, the authors perform promoter capture Hi-C to map long-range interactions for all gene promoters, using cells isolated from P7 mouse brain samples. While the resolution of these maps is limited by the relatively large fragment sizes generated from a 6-bp cutter, the authors combine these interactions with other available published datasets, including from their own previous work, (e.g. ATAC-seq and ChIP-seq) to more precisely map putative enhancers within the long-range interacting regions of captured promoters. The paper further focuses on the importance of transcription factor Atoh1 and chromatin remodeler CHD7 in regulation of these putative enhancers in GCps. The authors suggest a direct interaction between CHD7 and Atoh1 by overexpression and co-immunoprecipitation in human embryonic kidney cells.

      As stated by the authors, this study represents a valuable resource for researchers interested in the identification of enhancers in GCps cells, and their linked target genes. While broadly descriptive, the study does highlight some gene loci of interest and of biological relevance. For example, through integration of previously published datasets, the study resolves which putative regulatory elements at the Reln locus may regulate its activity.

      We thank the reviewer for their supportive comments.

      We provide a summary of our major and minor comments here.

      Major comments:

      (1) The main take-home messages of the manuscript could be more clearly stated in the introduction to help readers understand the main conclusions of the work.

      We have added a sentence to the Introduction to clarify the key take-home messages:

      “We report putative distal regulatory elements for >12,000 genes, identify CHD7- and Atoh1-regulated enhancer elements and show that these factors interact and likely co-regulate the expression of key genes in the GCp lineage.”

      (2) In the discussion, a previous Hi-C dataset is referred to "Reddy et al. annotated 5,175 promoter-enhancer interactions in GCps using Hi-C without enrichment (Reddy, Majidi et al. 2021)." It would be beneficial to compare the interactions identified previously with the current study (5,175 vs 46,428 interactions).

      To address this comment we have performed an additional analysis and include text and Suppl. Figure 3 and Suppl. Table 13 to demonstrate the extent the two datasets compare, overlap and diverge. We have also added additional text to the discussion to highlight the difference and technical considerations between the two approaches and how they complement each other.

      The 5,174 enhancer-promoter (E-P) interactions identified by Reddy et al were downloaded and intersected with the 46,428 promoter-accessible PIF regions identified in our study. The new supplementary Figure 3A illustrates that 82% (843/1207) of genes that Reddy et al identifies long-range interacting regions for are represented in our pcHiC dataset. Our pcHiC data contains information on distal interacting regions and potential enhancer regions for an additional 11,511 protein coding genes. Suppl. Figure 3B provides an overview of the Reddy et al E-P interactions that are, and are not identified in the pcHiC. We replicate 38% of Reddy et al’s E-P findings, whilst 53% of the 3229 interactions unique to the Reddy data would not be detected in the pCHiC data due to technical reasons resulting from the capture design and analysis protocol. Of the remaining interactions that are specific to the Reddy data, we identify other distal regions interacting with those same promoters . Suppl. Table 13 details the full comparision of Reddy’s E-P interactions that are found within our dataset.

      The differences between the two datasets and the increased number of interactions detected in the pcHiC dataset likely result from the increased enrichment for the captured promoters enabling the detection of interactions that would have been below the detection threshold for the HiC study. In addition there are notable differences in analysis strategies for the two datasets which also contribute to differences in detection of regions. Reddy et al binned the HiC data into 10Kb regions to identify interacting regions and subsequently used chromatin marks to identify possible enhancer and promoter regions within these large regions. In contrast we have used the pCHiC and CHiCAGO algorithm to identify individual HindIII restriction fragments that are proximal to targeted promoter regions (PIFs), and prioritised those that have accessible regions within them which could represent various types of regions that play regulatory roles such as enhancers, CTCF site or facilitator regions, independent of their chromatin mark composition rather than focusing solely on enhancers.

      (3) The authors identify an overlap with some of their identified enhancers with those from VISTA. Is this a fair comparison seeing as the enhancer reporters were tested during early embryonic development (e.g. E11.5 and E13.5) and seen to be active in the hindbrain, would these stages be relevant to GCps from P7? Can the authors identify ATAC-seq for example from hindbrain from embryonic stages and determine if the enhancer accessibility profile looks similar to that for the P7 GCps cells?

      We thank the reviewer for this important question regarding the developmental relevance of our VISTA comparison and acknowledge that direct comparison between the time point requires careful consideration. Firstly ,to address the question of how similar the chromatin accessibility profiles are between the embryonic and P7 timepoints, we compared the ATAC-seq data from our paper to ENCODE data from the hindbrain. Of the 140 vista enhancers that were intersected with the pCHi-C dataset, 119 were identified from the lacZ studies as active in the hindbrain at E11.5 whilst 21 were identified as active at timepoint E12.5. We compared ENCODE ATAC-seq peaks from the E11.5 (ENCFF743IYX) and E12.5 ( ENCFF198TLF) hindbrain to the GCps from P7 across both the entire genome (global accessibility) as well as specifically +/- 3MB around the VISTA enhancer regions in the PIFs from the pCHiC to assess the conservation of local accessibility profiles.

      When looking at the global accessibility profile of embryonic hindbrain versus P7 GCps across the whole genome there was a large degree of overlap with ~85% (E11.5) and ~88% (E12.5) of all ENCODE ATAC peaks overlapping with accessible ATAC summit regions from P7 GCps:

      Author response image 1.

      To identify if this was consistent in the immediate chromatin environment of the VISTA enhancers themselves, we compared the accessibility profiles across timepoints in the local environment surrounding the VISTA enhancers. This local environment was defined as a region that added an additional 3MB on either side of all VISTA enhancer positions found in PIFs. 3MB was chosen as the longest interaction found for a single VISTA element was approximately 2.7MB. Consistent with the global analysis a similarly high level of overlap of accessible regions between the timepoints was found for the local chromatin environment in surrounding the VISTA enhancers that were found within PIFs in the pCHiC dataset with ~87% (E11.5) and ~89% (E12.5) of encode detected peaks overlapping with accessible ATAC summit regions from P7 GCps.

      Author response image 2.

      Regions +/-3MB of VISTA enhancers in PIFs

      Author response image 3.

      Regions +/-3MB of VISTA enhancers in PIFs

      Genome browser shots at the three example VISTA loci from Figure 1 further support this approach. In addition to this we also note that a recent study by Chen et al (2024 https://www.nature.com/articles/s41588-024-01681-2) where capture-HiC performed at E11.5 of 935 VISTA enhancers across multiple tissues confirmed that the majority of VISTA enhancer regions (61%) bypass adjacent genes which is consistent with our nearest gene comparison.

      (4) The co-IP experiment appears to support the conclusion that Atoh1 and CHD7 can interact, however there are bands in lanes where there should not be (i.e. Input lanes 1 and 4 for FLAG blot). It would be recommended to repeat this result at least once. [Expected time 2-4 weeks].

      This experiment has been repeated 3 times with the same result. It is normal for non-specific background bands to appear on Western blot from total cell lysates (inputs) as most antibodies have significant cross-reactivity. The anti-FLAG antibody clearly detects bands above background in lysates where FLAG-tagged CHD7 is expressed. Most critically, despite the presence of non-specific bands in input, FLAG-tagged CHD7 is only detected in immunoprecipitated samples where either FLAG-tagged proteins have been precipitated and FLAG-tagged CHD7 is expressed and HA-tagged Atoh1 has been precipitated when both FLAG-tagged CHD7 and HA-tagged Atoh1 are expressed.

      (5) The methods section describes analysis of several datasets, however we could not access the code at the time of review. Do the authors intend to make this code available at the time of publication?

      Yes once the publication is approved all code will be made available along with conda environment yaml files to replicate the software environment in which the analysis was performed.

      (6) Page 7 "replicate one and two, respectively". Can the authors clarify the number of biological replicates performed for pcHi-C?

      Two biological replicates were performed for pcHiC which were then bioinformatically combined into a ‘superset’ for CHiCAGO interaction calling as is standard practice for pcHiC data (see e.g. Cairns et al, 2016. We have revised the text to make this clearer.

      Minor comments:

      (1) Page 3 "controlling the expression of 577 genes in GCps" - the authors do not provide evidence that these enhancers control gene expression directly, this should be reworded.

      Thank you. We have reworded to: “contacting the promoters of 577 genes” to indicate that these were identified using pcHi-C and not functional assays.

      (2) Page 5 "where transient amplifying divisions exponentially expand GCps" - at what stages of embryonic/postnatal development are GCps first detected, and when do they amplify and then differentiate?

      GCps that form the EGL are specified in the rhombic lip from E13.5 (Machold, 2005 and Wang, 2005) and a clear EGL can be observed in the cerebellar anlage from E14 (Ben-Arie, 1997) of development. They amplify from this stage and differentiation, induced by neurogenic factors like NeuroD1 is visible from P0 onwards (Miyata, 1999). We have amended the text to include this additional information: “GCps that form the EGL are specified in the rhombic lip from E13.5 (Ben-Arie et al, 1997; Machold & Fishell, 2005) and a clear EGL can be observed in the cerebellar anlage from E14 (Ben-Arie et al., 1997) of development. They amplify from this stage and differentiation, induced by neurogenic factors like NeuroD1 is visible from P0 onwards (Miyata et al, 1999).”

      (3) Page 7 "identified 164,387 unique and significant interactions" - how is an interaction defined, a single read, or evidenced by a certain number of reads. "promoter interacting fragments or PIFs" - is PIF referring to a single read evidencing an interaction?

      An interaction is defined by the CHiCAGO algorithm. The number of reads needed to score an interaction depends on the both the distance away that PIF is from the promoter (this is modelled using a distance-dependent component that accounts for decay of contact frequence with genomic distance) and also includes a component that models how the sequence or other technical artifacts might influence the capture bias of some sequences compared to others. For each promoter a background model is generated of the expected number of reads that would be captured based on the above considerations and if the number of reads for those regions exceeds this background model by a certain threshold the interaction is deemed significant using a p-value like score. In practice this means that regions further from the promoter will often require less reads to signify a significant interaction compared to regions that are much closer to the promoter. The significant PIFs in the dataset are all evidenced by a minimum of 3 reads in at least one biological replicate. We have included a short explanation of this in the methods of the revised manuscript for clarity.

      The maximum reads in a single replicate library for a specific PIF was 1557, and the median number of reads per PIF was 17.

      (4) Page 8. What is the distinct between PIFs and "promoter interacting regions (PIRs)"? These could be better defined in the text.

      Thank you for picking up this discrepancy, we were using PIR and PIF interchany. We have amended the manuscript to refer to PIFs consistently throughout.

      (5) Figure 1C-F. Labels "Random" and "PIFs" don't line up well with the two bars.

      Thank you, this has been corrected.

      (6) Page 9. Could the authors show some representative images for the "VISTA hindbrain enhancers" (e.g. for Figure 1I-K).

      We have inserted representative images showing in vivo activity of these enhancers in mouse embryos from the VISTA enhancer site.

      (7) Fig 2G, Page 11 "The 12,354 genes that were linked to a PIF containing an ATAC-seq peak were found to have a higher median expression level than the 2,049 genes that had PIFs that did not coincide with ATAC-seq peaks" - is this significant?<br />

      Apologies for this oversight. We have performed a two-sided t-test on the log transformed TPMs between the two groups and have included the significance in the revised figure (p=1.8 e-40).

      (8) "Gene Ontology analysis of genes with accessible PIFs revealed a significant enrichment for 119 biological processes" - can you include the GO terms in a supplementary table? Is there a way to prioritise down the 12,354 genes to a shorter more significant list of genes, this seems a long list to include in GO analysis.

      We have included a supplementary table with this data in the revised manuscript (Suppl. Table 6). We included all 12,354 genes in this analysis as the point of this analysis was to demonstrate that developmental processes are enriched in the PIFs with accessible chromatin, compared to the genes where only PIFs without ATAC were identified.

      (9) Page 11 - "The chromatin remodelling factor CHD7 is essential for normal expansion of GCps in the postnatal mouse cerebellum (Whittaker et al., 2017b) and deletion of Chd7 from GCps results in striking cerebellar hypoplasia and polymicrogyria (Feng et al., 2017; Reddy et al., 2021; Whittaker et al., 2017b). CHD7 haploinsufficiency is also sufficient to cause cerebellar hypoplasia and foliation defects both in mouse models and in the context of CHARGE syndrome in humans (Whittaker et al, 2017a; Yu et al, 2013)." - this appears more suitable for the introduction.

      Thank you, we have moved this text to the Introduction.

      (10) Page 12 "the majority of which (4,663/5,369) displayed decreased accessibility when Chd7 is depleted". This was difficult to understand initially - which are expected to be the direct effects? Increased or decreased accessibility? Perhaps it would be better to focus only on the decreased accessibility sites?

      We have previously shown that the majority of differentially accessible regions in Chd7-deficient GCps show decreased accessibility. Chromatin remodelling by CHD7 could conceptually reduce or increase accessibility of a particular locus and the only way to infer direct effects are by identifying regions to which CHD7 is recruited.

      Approximately ~9% of the sites that decreased in accessibility overlapped with regions bound by CHD7 (464/4663), whilst ~2% of sites that increased in accessibility overlapped with regions of CHD7 binding (14/706). Whilst it is likely that the majority of directly regulated sites decrease in chromatin accessibility when CHD7 is removed, the number of sites that increases in accessibility is small but observed and should be included for completeness.

      (11) The analysis in Fig 3A reveals that only a small number of CHD7-bound enhancers show differential accessibility and altered linked gene expression upon CHD7-knock down. This requires a little more discussion - why do so many sites change in accessibility compared to the number of sites which change accessibility or are associated with gene expression change?

      Identifying CHD7-regulated enhancers is challenging, mostly due to the inefficiency of CHD7 ChIP-seq. The low quality of available CHD7 ChIP-seq data has made it particularly difficult to identify CHD7 peaks. However, the integration of this data with ATAC-seq accessibility, chromatin modification and pcHi-C data has allowed us to identify a subset of enhancers that are most likely directly regulated by CHD7. However, given these technical limitations, we would be hesitant to conclude from the present data that the majority of chromatin accessibility changes in enhancers in Chd7-deficient GCps are indirect. We have added the following text to the discussion to indicate this: “Identifying CHD7-regulated enhancers is challenging, mostly due to the inefficiency of CHD7 ChIP-seq. The low quality of available CHD7 ChIP-seq data has made it particularly difficult to identify CHD7 peaks. However, integrating CHD7 ChIP-seq data with ATAC-seq accessibility, histone modification ChIP-seq and pcHi-C data has allowed us to identify a subset of enhancers that are most likely directly regulated by CHD7. However, given these technical limitations, we would be hesitant to conclude from the present data that the majority of chromatin accessibility changes in enhancers in Chd7-deficient GCps are indirect, as suggested by the data in Fig. 3A.”

      (12) Page 12 - "Over-representation analysis confirmed an enrichment of genes linked to nervous system development" - could this and the GO term analysis be included in a supplementary figure?

      We have included these results as Suppl. Table 7 in the revised manuscript.

      (13) Fig 3D - what does the arrow represent in the chromatin schematic?

      The arrow in the schematic indicates chromatin remodelling – we have clarified this in the figure legend and added headings to these panels to indicate the 3 different types of elements: Direct CHD7 targets, Indirect targets and CHD7-bound elements.

      (14) Fig 3G does not appear to be referenced in the text. The value of the Upset plots in the main figure 3 wasn't very clear, perhaps these could be moved to the supplement? Is there a clearer plot to support the conclusion "CHD7 primarily regulates enhancers".

      We apologise, the panels were mis-labeled in the text. This has now been corrected. We hope that the amendments in response to point 13 above now clarifies these findings showing that direct CHD7 targets are characterised by active enhancer marks.

      (15) Page 14 "putative consensus sites for proneural bHLH TAL-family of proteins Neurog2, Neurod2, Neurod1, and, Atoh1 in elements" - HOCOMOCO motifs are only shown for Atoh1 and Nhlh1. It may be valuable to show the sites for all the listed TFs. What does white represent in the heatmap in Fig 3H? This plot is difficult to interpret, and also relatively small in the figure but appears important to conclusions. Perhaps Fig 3H could be made more prominent?

      Thank you for highlighting that the white boxes might be confusing. The white blocks indicate that these motifs do not pass threshold for significantly enriched in the dataset based on the p and q values.This has now been clarified in the figure legend.

      We have enlarged panel H to make more prominent.

      (16) Page 15 - "Myb was the only motif specific to CHD7 bound regions that changed in accessibility compared to those that exhibited accessibility changes without CHD7 binding or CHD7 binding without accessibility changes (Suppl. Fig. 1)." I couldn't interpret this sentence, requires clarifying.

      We agree that this description is confusing and since it is difficult to draw clear conclusions about the significance of enhancers with Myb motifs in this context, we have removed this sentence from the revised manuscript.

      (17) Page 16 and Fig 4B - a discussion of why both up and down regulated genes are detected for Atoh1 depletion? Which class of genes are expected to be directly regulated (the down-regulated genes)?

      Like most transcription factors, ATOH1 may be able to function as both a repressor and activator depending on the context. Although the majority of genes are downregulated in Atoh1-defivcient cells, suggesting that Atoh1 functions as an activator in most cases, our analysis have identified several up-regulated genes that contain Atoh1 ChIP-seq peaks in their cognate enhancers (See Suppl. Table 7), consistent with these also being direct Atoh1 targets.

      (18) Fig 5B - the genomic traces are not labelled in this figure.

      Thank you, labels have been added.

      (19) Page 17 - "Pathway enrichment analysis of the 22 genes compared to all genes that were expressed in GCps shows a significant enrichment of terms: Hypoplasia of the pons (HP:0012110 P=0.006) and Abnormal pons morphology (HP:0007361 P=0.016) from human phenotype ontology, due to the presence of Reln, Dcc, Mab21l1 and Gli2." - this analysis should be included in the supplementary tables.

      These results have been included as Suppl. Table 12 in the revised manuscript.

      (20) Do the authors have a suggestion for which domains of Atoh1 and CHD7 could be interacting? Could the authors design truncated constructs for overexpression in HEK cells to test this hypothesis? [Expected time 4-6 weeks, interesting but not essential to do experimental work here].

      We agree this is an interesting question. Our collaborator, Professor Peter Scambler (UCL) has performed a yeast two hybrid screen for CHD7 interacting proteins in a mouse E11.5 library using the CHD7 BRK domain (aa 2521-2708) as bait. The screen had a single hit, which encompassed the N-term 127aa of ATOH1 (personal communication). This observation supports our co-IP data and suggests that the N-terminus of ATOH1 interacts with the BRK domain of CHD7 but further validation will be needed to confirm this.

      (21) Page 28 "Differential accessibility analysis was performed using DESeq2 (v 1.22.1)" and Page 19 "Whereas chromatin accessibility at some of these enhancers were affected by Chd7-deficiency" - what were the cutoffs used for looking at differentially accessible regions? Complete loss of accessibility or a quantitative change?

      Quantitative change rather than complete loss was used. Thresholds based on adjusted p-values (padj<0.05) were used as indicated in the methods.

      Requested comments on referencing:

      - "Long-range" - how do the authors define long-range? Can this be referenced. CO? good reference here.- look to CHiCAGO paper

      - "When chromatin conformation or 3D organisation data is not available, studies typically assign regulatory elements to the nearest gene promoter" - needs referencing.

      - "Many of these 22 genes regulated by CHD7 and Atoh1 have established critical roles in cerebellar development, including Neurod2, Pax6 and Gli2 (Fig. 5B)" - needs referencing. "from human phenotype ontology, due to the presence of Reln, Dcc, Mab21l1 and Gli2" - needs referencing.

      Thank you, references have been added.

      - "active enhancers (H3K27ac+, H3K4me1+), promoters (H3K27ac+, H3K4me3+), regulatory elements (H3K27ac+, H3K4me1+, H3K4me3+), or poised enhancers (H3K4me1+)" - needs referencing.

      Thank you, references have been added.

      - Reference required in main text for VISTA (e.g. Visel et al., 2007)

      Thank you, reference added.

      Reviewer #2 (Significance):

      The strengths of this manuscript are the integrated approach to identify cell-type specific enhancers utilizing available epigenomic datasets, and leveraging 3D genome topology to directly link them to their target genes. For example for the Reln gene previously implicated in cerebellar phenotypes for CHD7 mutants. The pcHi-C dataset generated in this study provides a valuable reference for the community of enhancer-promoter pairs for a specific cell-type of interest with human disease relevance.

      We thank the reviewer for recognising the potential value of our work to the community.

      The limitations of the study are partially addressed in the text by the authors, including the resolution from the pcHi-C using a 6-bp cutter, the limitation of sequencing depth (more interactions may have been identified with more depth), and the limitated of correlation between replicates (likely due to undersampling the library). Page 9 "some additional interactions with the nearest gene promoters might be identified in our pcHi-C dataset with deeper sequencing".

      We thank the reviewer for highlighting our acknowledgements of the potential limitations of our work.

      Additional limitations include the use of the VISTA browser mouse LacZ embryos to validate some of their enhancers, the limitation here being that the VISTA browser tests enhancers at embryonic stages (focused at E11.5 and E13.5) while the GCps cells were collected at P7. The LacZ images from VISTA are also not shown. The HEK cells used for the co-IP could be seen as a limitation as these are not relevant cells for the cell state studied, the authors could clarify their use of these cells.

      We thank the reviewer for their careful assessment of the limitations of our study. We have now included images of the VISTA enhancers in Fig. 1I,J,K. Rather than a limitation, using irrelevant cells for co-IP might be seen as a better approach, as conceivably the chances of an indirect interaction between the two proteins being tested by a bridging complex is less in an irrelevant cell types that might not contain such complexes. Either way, HEK293T cells is the standard laboratory model for co-IP studies as they can be transfected with ease.

      The study reported here is largely based on previous work from the authors (Whittaker et al 2017b). This study reported that the chromatin remodelling factor CHD7 is essential for normal expansion of GCps in the postnatal mouse cerebellum and deletion of CHD7 from GCps resulted in the phenotype of cerebellar hypoplasia. This study also largely leverages previously published datasets from the Whittaker et al 2017b (e.g. CHD7 deletion data) and reanalyses it in the light of the new pcHi-C datasets.

      This manuscript will be of interest to researchers interested in analysing long-distance targets of as well as researchers trying to understand the precise gene regulation in cerebellar development. It may also be of interest to clinical geneticists to interpret novel putative non-coding disease mutations.

      We thank the reviewer for highlighting the wide interest of our manuscript.

      In assessing this manuscript, my expertise lies in models of human development and gene regulation, with a focus on enhancer function.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Riegman et al have explored the gene regulatory landscapes of cerebellar granule cell progenitors (GCps). They have generated promoter capture Hi-C data to identify regions that interact with promoters in these cells. In addition they generate ATACseq data in wild-type and CDH7 knock-out cells. They integrate these data to identify enhancers that potentially regulate genes in GCps. In addition, the authors identify an interaction between CHD7 and ATOH1, whose binding sites also overlap in the genome.

      The dataset can be potentially interesting for people studying cerebellar development.

      I have a few concerns regarding the paper. The most pressing one is that the authors seem to equate interactions in pcHi-C with regulation. This is problematic for two reasons. First whether interaction equates regulation is still debated and whether this can be detected with a low-resolution C-method (i.e. using HindIII) is a further point of contention.

      We thank the reviewer for pointing this out. We agree and apologise for not being clear in our manuscript. We have made the necessary amendments to indicate that pcHi-C by itself only assess proximity in the nucleus, not function.

      We acknowledge the limitations of the pcHi-C method, including that resolution is limited by the use of a restriction enzyme. However, we (see e..g. Suppl. Fig. 1) and others (see e.g. Freire-Pritchett et al (2017) and Mifsud et al (2015)) have used this approach successfully to identify functional enhancer elements.

      The second issue has to do with the way the pcHi-C data is interpreted. What is detected as a significant interaction by Chicago are regions that have a contact frequence above background. This means that local regions with a (much) higher contact frequency may not be called as significant. When we follow the logic that contact frequency is related to gene activation (which may not necessarily be true) whether a fragment is more frequently contacted than the background should not matter (relative contact frequency), rather it should be interpreted based on the absolute contact frequency.

      The reviewer is right that local regions will have a higher contact frequency and that local contacts aren’t always captured by the CHiCAGO model. However, the purpose of this study was to prioritise the identification of distal elements that are not captured by existing methods including nearest gene annotation.

      There are a number of reasons why absolute contact frequency might not be an appropriate measure to infer gene regulation: 1) Many factors can affect the absolute contact frequency including the proportion of cells that are exhibiting active transcription at that time across a population, especially if expression is limited to a small number of this population at that time. 2) Absolute contact frequency assumes that more contact results in more regulation which is not necessarily true and would depend on the combination of factors that are associated with that regulatory element. Figure 1 from https://www.nature.com/articles/s41596-023-00817-8 - Figure 1 – Micro capture C show that regions with low absolute contact frequency compared to adjacent regions have potential to regulate gene expression, as have other studies that have used CHiCAGO to identify regulatory elements. 3) The sequence of some fragments makes them more likely to captured or enriched in the HiC protocol, which the relative contact frequency above background controls for.

      This becomes relevant because the authors claim that 80% of enhancers are wrongly annotated based on their metrics. The only way to correctly annotate an enhancer is to knock it out and checking the effect on genes in the vicinity. Therefore, to claim that their method can correctly annotate enhancer is grossly overstated, particularly when considering the issues with contact frequency stated above. Therefore, claims like 80% of enhancers are wrongly annotated should be removed from the paper. The authors should discuss how to annotate enhancers, in the Discussion and what the proper method is for annotations.

      We have amended the text to indicate that we do not suggest that VISTA enhancers are wrongly annotated but incompletely assigned. We apologise for making this suggestion in the first draft. There is however complementary evidence from Cheng et al (2024), now referenced in the revised manuscript, that also find 60% of the VISTA enhancers skip their adjacent gene. It is also well established in the literature that nearest genes are not always regulated.

      Other points:

      - The authors claims that PIFs have 2.14 and 2.69 fold enrichment of H3K4me1 and H3K27ac sites. Did the authors use the whole genome as background. If so, they should take into account that promoter are more likely in regions of high gene density, which are more dense in active marks. It would be better to perform local, circular permuation of the the PIFs around the promoter.

      The reviewer is correct that a whole genome background is not an appropriate background for testing enrichment of active marks within PIFs. Fortunately, this is taken into account in the CHiCAGO enrichment test which selects the background from fragments that are matched to the same distance of the PIFs to account for the observation that promoters are more likely in regions of high gene density and are therefore more enriched for active chromatin modifications.

      - The authors talk about "lead PIF", which is the fragment with the "most significant CHICAGO score". What does this mean? Something is significant or not, despite common misuse of the term there is no gradient of significance.

      The reviewer makes a good point here and we apologise for the oversight in wording and have corrected the text to be more specific that the lead PIF is the one with the highest ChiCAGO score.

      - In the GO analysis the categories with the lowest p-value are presented, but this biases for large categories. It would be more relevant to also select for and show the enrichment scores.

      We agree with the reviewer that a drawback of GO analysis is that it biases for large categories and that if by ‘enrichment score’ the reviewer means the –log10(p-value) we have included that in the supplementary tables which also includes the size of the category and number of genes detected in it.

      Reviewer #3 (Significance):

      The study provides a dataset that may be interesting for people studying cerebellar development. In that sense the data is mostly interesting from a fundamental viewpoint. The data seem of good quality.

      The authors claim that they a very sizeable fraction of enhancers are misannotated, but I do not believe that this is correct.

      We thank the reviewer for pointing this out. We apologise for creating the impression that VISTA enhancers are incorrectly annotated. We have amended the text to reflect that these are incompletely annotated.

      My expertise is 3D genome, bioinformatics.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study concerns the propagation of waves in bacterial biofilms, bridging active matter physics and bacterial biophysics. While the experimental observations are solid, the theoretical interpretation and model validation are currently incomplete and require further refinement. This work will be of interest to microbiologists, biophysicists, and researchers studying collective behavior in biological systems.

      In the revised manuscript, we have added new experimental results that strengthen the connection between our observations and the modeling framework used to interpret the collective oscillations. We have not introduced a new theoretical model; rather, we employed established active matter models and sought to link the observed phenomena to these frameworks. In particular, our new data demonstrate that the transition between the motile and biofilm-forming states specifically modulates the elasticity and elasto active coupling of the bacterial structure. This behavior is in excellent agreement with the predictions of the active solid model. All the experimental details are given below. We believe that the revised version of the manuscript now establishes this connection more clearly and convincingly.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting paper. The authors have found multiple experimental knobs to perturb a mechanical wave behavior driven by pilli feedback. The authors framed this as nonreciprocal interactions - while I can see how nonreciprocity could play a role - what about mechanical feedback? Phenomenological models are fine, but a lack of mechanistic understanding is a weakness. I think it will be more interesting to frame the model based on potential mechanochemical feedback to understand microscopic mechanisms. Regardless, more can be done to better constrain the model through finding knobs to explain experimental observations (in Figures 3, 4, 5, and 7).

      We thank the reviewer for the positive assessment and for highlighting this important point. The reviewer is correct that the phenomenological Kuramoto-based model does not explicitly show the detailed cell–cell interactions. However, the active solid model is formulated on detailed elastic couplings and active forces, which inherently represent mechanical feedback within the biofilm structure. In this framework, nonreciprocity emerges naturally from the tensorial nature of active forces between bacteria—a concept already well established in the active matter literature. Importantly, this mechanism is purely mechanical and closely parallels nonreciprocal hydrodynamic interactions among active particles, which also arise from tensorial couplings.

      In our system, elastic interactions within the biofilm matrix, combined with pilus-generated active forces, provide a natural origin for nonreciprocal interactions. To further validate this, we improved our imaging to record single-cell dynamics both at the colony edge and on the biofilm surface. (new supplementary Video). These experiments show that motile bacteria at the leading edge of the biofilm structure do not generate waves, whereas stationary bacteria within the biofilm display local oscillations within the elastic network. This observation supports the view that collective oscillations are a property of the elastic biofilm state rather than of freely motile cells.

      Moreover, the main control parameter for these oscillations is the ratio between elastic strength and the active force generated by pili. In the active solid model, this ratio is captured by the parameter π and alpha terms. Experimentally, we can tune this ratio simply by adding or removing water from the biofilm, thereby modulating its elasto active coupling. We further motivated the controllability of this feature experimentally. We let the plate dry nonuniformly and observed that the transition between spiral target and plane waves could emerge spontaneously across the plate (see Figure 3a). This observation also states the importance of moisture in the biofilm. Starting from this point we established the connection between experimental observation and modelling. In our new simulations we also noticed that the transition from spiral to target wave is particularly driven by merging processes of different topological charges +/- 1 spiral pairs. This critical point was also confirmed by modelling which links the process to elasto active coupling. Further we supported our claim by imagining the edge and the biofilm structure. These new results clarify that elastic structure of the biofilm is critically important (Supplementary Figure 3). We have clarified this mechanistic link in the revised manuscript and rewritten the relevant sections to make this connection explicit.

      Modification in the manuscript:

      “To gain deeper insight into the mechanisms underlying wave formation, we imaged the dynamics of individual bacteria from the fingering regions toward the center of the biofilm. This distinction is critical because, unlike the biofilm center, the edges do not generate waves. We observed that bacteria near the fingering regions remain motile and exhibit collective flow. In contrast, bacteria at the biofilm center are surface-attached and undergo periodic lifting motions. This behavior strongly resembles Mexican-wave dynamics.”

      “We further found that the central region of the biofilm is mechanically more elastic, whereas the edge regions—where wave formation is absent—are motile. These observations suggest that gradual biofilm maturation is a key factor that transforms motile bacteria into a periodically moving but spatially constrained state. Consistent with this picture, the PAO1 strain, which has a strong biofilm-forming capability, completely suppresses surface oscillations. In contrast, the PA14 strain exhibits intermediate behavior, sustaining a partial transition between motile and locally constrained dynamics. Remarkably, signatures of this transition and wave generation are already detectable at the earliest stages of finger formation.”

      Strengths:

      The report of mechanical waves in bacterial collectives. The mechanism has potential application in a multicellular context, such as morphogenesis.

      We thank the reviewer for the positive assessment and for highlighting this potential broad impact of our findings.

      Weaknesses:

      My most serious concern is about left-right symmetry breaking. I fail to see how the data in Figure 6 shows LR symmetry breaking. All they show is in-out directionality, which is a boundary condition. LR SM means breaking of mirror symmetry - the pattern cannot be superimposed on its mirror image using only rigid body transformations (translation and rotation) - as far as I am aware, this condition is not satisfied in this pattern-forming system.

      We thank the reviewer for pointing out this critical issue. We acknowledge that we overlooked the distinction between biological and physical definitions of left–right symmetry in our initial submission, and we agree that our terminology was confusing.

      In developmental biology, the term “left–right symmetry breaking” is often used to describe asymmetric flows generated by nodal cilia, which subsequently establish developmental asymmetry. This usage differs fundamentally from the physical definition of mirror symmetry breaking, which refers to chirality switching upon mirror reflection. As the reviewer correctly noted, our system does not exhibit mirror symmetry breaking in this strict physical sense.

      To avoid confusion, we have revised the manuscript and replaced the term left–right symmetry breaking with left–right asymmetry between the edge and the center of the biofilm. This asymmetry arises from frequency gradients across the biofilm and is not a trivial boundary effect. For circular colonies, this phenomenon is more accurately described as radial asymmetry. We have rewritten the relevant sections of the manuscript to clarify this distinction and prevent misinterpretation.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Altin et al. examines the dynamics of bacterial assemblies, building on previously published work documenting mechanical spiral waves. The authors show that the emergent dynamics can be influenced by various factors, including the strain of bacteria and water content in the sample. While the topic of this paper would be of broad interest, and the preliminary results are certainly interesting, various aspects of this paper are underdeveloped and require further exploration.

      Strengths:

      One of the nice features of this system is the ability to transition between the different states based on the addition or withdrawal of water. The authors use a similar experimental model system and mathematical model to previously published work (Reference 49), but extend by showing that the behaviour can be modified through simple interventions. Specifically, the authors show that adding water droplets or drying the sample through heating can result in changes in the observed wave structure. This represents a possible way of controlling active matter.

      The mathematical model proposed in this paper involves a phase-oscillator model of Kuramotostyle coupling (similar to previously reported models). A non-reciprocal phase lag is introduced in order to facilitate the patterns seen in experiments. The qualitative agreement in the behaviour is quite striking, showing both spiral waves and travelling waves.

      We thank the reviewer for the positive assessment and for pointing out areas that required further development. The reviewer is correct that our work builds on previously reported bacterial spiral wave systems; however, there are several significant differences that we now emphasize more clearly in the revised manuscript.

      First, our study involves a different bacterial species and reveals a distinct dynamical process: the waves we report are strictly localized on the surface of the biofilm, in contrast to the bulk oscillations detected through density fluctuations in the earlier work (Ref. 49). The surface waves in our system resemble “Mexican wave”-like motions, in which surface bacteria periodically lift upward. To highlight this key distinction, we performed new imaging experiments that directly visualize this process. (New Video 5 and 6, Author response image 1).

      Second, we systematically compared different bacterial strains, including pathogenic species such as P. aeruginosa PA14 and PAO1, alongside our BSL-1 strain. This comparative approach demonstrates that the observed phenomenon spans strains with different pathogenicity levels, and genetic variations while also showing that our strain provides a safer and more broadly usable model system for laboratory investigations.

      Third, the modeling frameworks differ. Whereas the referred study relied primarily on phase models similar to those used in cilia systems, we combine a delayed Kuramoto-style oscillator model with an active solid model. This combination provides both a phenomenological description and a physical interpretation of the collective dynamics. We acknowledge that, in the original submission, the physical interpretation of the model in relation to our experimental system was underdeveloped. In the revision, we have now established this link explicitly through the elasticity and elasto active coupling of the biofilm. Specifically, we show that the transition from motile to biofilm states is accompanied by changes in elasticity, which directly influence the observed transitions between different types of wave defects. This connection is consistent with prior theoretical works and has even been only studied in robotic active matter systems.

      Together, these clarifications and new results reinforce the novelty of our findings and establish a stronger connection between the experiments and the modeling framework.

      Author response image 1.

      Comparison between the elastic biofilm core and the motile colony edge. Highresolution video recordings revealing individual bacterial motion highlight the key physical differences driving wave-generating. Time-lapse snapshots show that bacteria at the colony edge move freely and form fingering structures, whereas bacteria in the elastic central biofilm periodically lift vertically, producing a Mexican-wave–like collective motion across the surface. See new Video

      Weaknesses:

      The principal observation of the paper - that spiral waves emerge in these systems and can be controlled in various ways - is not linked to microscale dynamics at the cell level. It is recognised that hydrodynamics can introduce non-reciprocity, an essential ingredient of this model. However, in this work the authors have not identified a physical mechanism for the lag, e.g., either through steric interactions or hydrodynamic disturbances. This is also relevant in the phase oscillator modelling section. In low Reynolds number flows, dynamics are instantaneously determined. In this light, what does the phase lag term represent?

      The reviewer is correct that, at low Reynolds numbers, fluid dynamics are instantaneous and do not generate real temporal delays. However, nonreciprocity in hydrodynamic interactions can still emerge from the tensorial structure of the Blake–Oseen Green’s function. In this formalism, the effective asymmetry can be represented mathematically as a phase-lag–like term. This has been theoretically demonstrated in Ref.40. While this is not a literal time delay, it functions analogously by breaking odd symmetry in the coupling.

      In our system, strong long-range hydrodynamic interactions are absent, as the bacteria are embedded in an elastic biofilm matrix. Instead, the dominant interactions are active elastic couplings mediated by pili and biofilm structure. The elastic solid model behaves in a way that is conceptually similar to the hydrodynamic case: pili-induced deformations of the elastic medium produce anisotropic stresses that play a role analogous to the tensorial hydrodynamic Green’s function. Thus, the phase-lag term in our Kuramoto-based model can be interpreted as an effective representation of these nonreciprocal elastic interactions.

      We have clarified this point in the revised manuscript by explicitly connecting the phenomenological phase-lag term to the underlying elastic coupling in biofilms.

      What is the origin of the coupling term, b? Can this be varied systematically or derived from experimental measurements or parameters?

      The term b represents the enhanced elasto-active coupling of the pili process. The length of the Pili varies, and the elongated Pili has more potential to modulate the coupling between bacteria which is known to depend on a critical threshold. This process resembles the pinning dynamics and is driven by the activity of molecular motors within the pili machinery. However, the detailed mechanisms that set the effective coupling strength remain highly complex and are not yet fully understood.

      At present, we do not have a direct way to systematically manipulate b in experiments. A major technical limitation is the nanoscale nature of type IV pili: these protein assemblies are extremely small and difficult to monitor or manipulate directly. Even basic tools such as GFP-based labeling have proven challenging to implement, which restricts our ability to track the detailed dynamics of these structures in live biofilms.

      While we cannot currently derive b directly from experimental parameters, we emphasize in the revised manuscript that b should be understood as an effective parameter capturing the excitability of pili retractions. We also highlight this limitation and note that future advances in molecular imaging and manipulation of pili will be essential for quantitatively linking b to microscopic processes.

      Classification of wave properties is an important aspect of this paper, but is not accomplished in a quantitative sense. What is the method for distinguishing between travelling and spiral waves? There is a range of quantitative tools that could be used to investigate these dynamics (and also compare quantitatively with the models). For example, examining the correlation functions and order parameters could assist with the extraction of wave features (see extensive literature on oscillator models).

      We thank the reviewer for emphasizing this important point. In the revised manuscript, we have incorporated the classic Kuramoto order parameter (S) to characterize the dynamics in our model simulations. However, this metric is not directly applicable to our experimental system, because we cannot resolve the phase of individual bacteria at large scales.

      Instead, we have focused on a flux-based parameter, as previously used in Ref. 40, which can be measured experimentally from collective surface dynamics. Interestingly, we find that the directional flux extracted from our experimental movies closely matches the trends predicted by the model order parameter. We suspect that this similarity arises from the combination of our optical illumination method and the characteristic surface modulations of the biofilm. While we currently lack a rigorous theoretical justification for this correspondence, so we want to keep this discussion in the review document.

      In summary, we now use the classic Kuramoto order parameter in simulations and rely on the experimentally accessible flux measure for our experimental data. This dual approach allows us to compare model and experiment in a consistent manner.

      Author response image 2.

      Critical order parameters of the coupled biofilm system. (a) The Kuramoto global order parameter increases continuously as the system becomes globally synchronized. In contrast, in the nonreciprocally coupled system the order parameter saturates at a critical level. (b) In the experimentally observed biofilm, however the flux generated by the coupled oscillations provides a more appropriate measure of synchronization. Blue curves indicate directionally propagating planar waves, red curves correspond to spiral wave formation, and green curves represent the globally synchronized reciprocal system.

      Author response image 3.

      Comparison of flux profiles of the simulations with experimental measurements. Directional optical illumination enhances the flux term on the surface of the biofilm.

      The methodology of changing the dynamics through moisture content appears to be slightly underdeveloped, e.g., adding water involves a droplet, and removing water is accomplished by heating (which presumably could cause other effects). Could the dynamics not be controlled more directly by varying the humidity?

      We thank the reviewer for this valuable suggestion. Our results indicate that water content in the biofilm plays a key role in driving the transition to the biofilm state by modulating its elasticity. During the initial submission, we did not know how to systematically vary humidity without simultaneously altering temperature. Standard approaches typically involve water evaporation in controlled chambers, which inherently changes both parameters.

      Following the reviewer’s recommendation, we first measured the ambient moisture levels inside closed culture plates. To our surprise, the relative humidity was already ~98%, leaving virtually no room to increase it further. We then attempted to decrease humidity by flowing dry synthetic air, but even under these conditions we could not reduce it below ~85%, and achieving this required unrealistically high flow rates. Moreover, we noticed that in closed-lid NGM plates, evaporation is already substantial, and when the lid is left open the evaporation rate reaches ~1 µm/s. This rapid surface thinning severely limits the quality of long-term time-lapse imaging.

      Taken together, these technical constraints explain why we have to reliy on localized perturbations such as water droplets and heating rather than global humidity control. We have clarified this point in the revised manuscript and now explicitly discuss both the challenges and limitations of humidity-based approaches.

      At the same time, the authors also mention that temperature itself plays a role in shaping the behaviour. What is the mechanism for this? Is it just through evaporation? Since the frequency increases with temperature, could it just be that activity increases with temperature?

      We thank the reviewer for raising this critical point. We believe that temperature has two distinct impacts operating on different timescales.

      Short timescale (~minutes): We observed that biofilm oscillations respond to temperature changes very rapidly and in a reversible manner. This timescale is too short to be explained by modulation of water content or bulk elasticity of the biofilm. Instead, we attribute the immediate frequency increase to enhanced biological activity of the bacteria at elevated temperatures.

      Long timescale (~tens of minutes to hours): During processes such as the transition from planar to spiral waves, prolonged heating can significantly alter the biofilm structure. These changes are not reversible and likely involve modifications of elasticity and other structural properties.

      In the modeling framework, the short-timescale effect is represented as an increase in the active force term, while the long-timescale effect is captured by concurrent changes in both the active force and the elastic properties of the biofilm. We have clarified this mechanism and its representation in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This manuscript presents a novel investigation into unidirectionally propagating waves observed on the surface of Pseudomonas nitroreducens bacterial biofilms. The authors explore how these waves, initially spiral in form, transition into combinations of spiral, target, and planar patterns. The study identifies the periodic extension-retraction cycles of type IV pili as the driving mechanism for wave propagation, which preferentially moves from the colony's edge to its center. Furthermore, the manuscript proposes two theoretical models-a phase-oscillator model and a continuum active solid model-to reproduce these phenomena, and demonstrates how external manipulations (e.g., water droplets, temperature, PEG) can control wave patterns and direction, often correlating with oscillation frequency gradients. The work aims to bridge the fields of activematter physics and bacterial biophysics by providing both experimental observations and theoretical frameworks for understanding these complex biological wave phenomena.

      We thank the reviewer for the positive assessment of our work and for highlighting both the novelty and the key contributions of our study.

      Strengths:

      The experimental discovery of unidirectionally propagating waves on bacterial biofilms is highly intriguing and represents a significant contribution to both microbiology and active-matter physics.

      The detailed observations of wave pattern transitions (spiral to target to planar) and their response to various environmental perturbations (water, temperature, PEG) provide valuable empirical data. The identification of type IV pili as the driving force offers a concrete biological mechanism. The observed correlation between frequency gradients and wave direction is a compelling finding with potential for broader implications in understanding biological pattern formation. This work has the potential to stimulate further research in the collective behavior of living systems and the physical principles underlying biological organization.

      We thank the reviewer once again for emphasizing the importance of wave directionality. We also believe that this phenomenon may provide insight into early symmetry-breaking processes observed in developmental biology, where oxygen or nutrient gradients in dense environments could play a similar role.

      Weaknesses:

      The manuscript attempts to link unidirectional wave propagation to non-reciprocal couplings but ultimately shows that the wave direction is determined by the gradient of the oscillation frequency. The couplings in the two theoretical models are both isotropic and thus cannot dictate the wave direction. A clear distinction should be made between non-reciprocity as a source of wave generation and non-uniformity as a controlling factor of wave direction.

      We greatly appreciate the reviewer’s careful evaluation, particularly for highlighting this important and often confusing distinction. The relationship between nonreciprocity, spontaneous symmetry breaking, and frequency gradients has also been a challenging concept for us and required significant effort to clarify.

      Recent theoretical studies have established that traveling wave formation requires nonreciprocity, which provides a framework for understanding phenomena ranging from spiral to target and planar waves. In our system, nonreciprocity arises between the displacement field (U) and the pili force vector (P): as a result in broken phase U effectively “chases” P, breaking PT symmetry locally and thereby enabling the generation of local directional flux and traveling waves. In this sense, nonreciprocity is essential for travelling wave generation and spontaneous symmetry breaking in either direction.

      However, we now agree that global directionality (always from right to left, or edge to center) is set by an independent factor—namely, the oscillation frequency gradient across the biofilm. Thus, while nonreciprocity determines whether waves can travel, frequency gradients determine the large-scale direction in which they propagate. Put differently, PT symmetry is already broken spiral waves due to nonreciprocity, but global asymmetry (frequency gradients) is required to align the overall propagation in one direction.

      We have clarified this distinction in the revised manuscript, emphasizing that nonreciprocity is a necessary ingredient for travelling wave generation, whereas global asymmetry controls global wave direction.

      Modification in the manuscript:

      “We should note that traveling waves indicate broken PT symmetry between these fields triggered by nonreciprocity, with spiral waves serving as a classic signature of this phenomenon. A further transition from spiral to planar waves reflects an overall asymmetry in the frequency profile, which is not directly related to PT-symmetry breaking.”

      The relationship between the phase oscillator model and the active solid model is unclear. Given that U and P are both dynamical variables evolving in three-dimensional space, defining the phase Φ precisely in the phase space spanned by U and P could be challenging. A graphical illustration of the definition of Φ would be beneficial. To ensure reproducibility of the numerical results, the parameter values used in the numerical simulations and an explicit definition of the elastic force in the active solid model should be provided.

      We agree with the reviewer that the relationship between the phase oscillator model and the active solid model can be confusing, but establishing this link is essential to connect different modeling approaches in the literature. As the reviewer notes, in a fully three-dimensional setting with freely moving bacteria, defining the oscillation phase (Φ) in the phase space spanned by U and P is indeed complicated.

      However, our recent imaging results show that bacteria within the biofilm do not undergo large translational motions but instead exhibit periodic “Mexican wave”-like oscillations. These oscillations are confined to a restricted phase space, which allows us to define Φ in a straightforward way. In this context, the phase oscillator model becomes a natural reduction of the dynamics.

      Similarly, in the active solid (or active gel) model, we can plot not only the displacement and force vectors but also the local phase, which shows strong agreement with the phenomenological Kuramoto-style model. To make this connection clearer, we have now included a schematic illustration in the revised manuscript that explicitly shows how Φ is defined in the reduced phase space, and we provide the parameter values used in the simulations as well as the explicit definition of the elastic force in the active solid model to ensure reproducibility.

      The link between the theoretical models and experimental results is weak. For example, the propagation of the kink from the lower to the higher part of the surface (Figure 1e) could be addressed within the framework of the active solid model. The mechanism of transition from spiral to target waves (Figure 3a), b)) requires clarification, identifying which model parameter is crucial for inducing this transition. The wave propagation toward the lower frequency side is numerically demonstrated using the phase oscillator model, but a physical or intuitive explanation for this phenomenon is missing. Also, the wave transitions induced by the addition of water droplets and temperature rise are not linked to specific parameters in the theoretical models.

      We thank the reviewer for highlighting this important weakness, which was also consistently noted by the other reviewers. We fully agree that the link between our theoretical models and experimental results required significant strengthening.

      With improved imaging in the revised study, we were able to uncover additional connections that help establish this link more clearly. We acknowledge that our ability to measure detailed biofilm parameters is limited, which restricts us from providing fully quantitative mappings. Nonetheless, based on the reviewers’ suggestions, we carried out additional imaging and simulations to compare bacterial dynamics at the colony edge and within the biofilm surface. These data confirm that cells within the biofilm undergo restricted, “Mexican wave”-like oscillations, emphasizing the critical role of elasticity in governing the collective dynamics.

      Experimentally, we found that adding water or PEG, or alternatively inducing drying, strongly modulates the effective elasticity of the biofilm. Within the active solid framework, elasticity and the elasto-active coupling are the key parameters controlling the system. By tuning these parameters in simulations, we could reproduce the qualitative transitions observed experimentally. Specifically, we observed that:

      At low elasticity, topological defects are mobile and can move, merge, or annihilate, leading to the emergence of planar waves.

      At high elasticity, defects remain pinned, across the biofilm surface, dominating the dynamics.

      These observations suggest that the motility of defects is the crucial parameter governing the transition between spiral, target, and planar waves. Although we cannot independently manipulate each parameter in experiments, varying the moisture content provides an effective and experimentally accessible control.

      Finally, our simulations and new analyses reveal that spiral defect cores can move and merge to form target waves or annihilate entirely—processes that we also observe experimentally. This rich dynamical behavior underscores the importance of elasticity in shaping pattern transitions, and we believe it warrants further theoretical exploration. We have clarified this connection and its implications in the revised manuscript.

      First, we compare defect dynamics in both Kuramoto-based simulations and the active solid model. Both systems exhibit similar defect-survival behavior. As shown in the review , pairs of unlike (+/−) defects can stably persist only at high nonreciprocity. We further quantify this behavior by plotting the separation distances between unlike defect pairs and find that short-range defect separations are possible exclusively in the high-nonreciprocity regime Supplementary Figure 11.

      This high-nonreciprocity regime corresponds to the dry biofilm state. Increasing moisture reduces elasticity, leading to the loss of stable defect dynamics and promoting the annihilation of unlike defect pairs, which in turn drives the system toward target-wave formation and ultimately planar waves. Conversely, heating the biofilm removes water, enhances elasticity, and increases the system’s ability to sustain closely separated defect pairs.

      Experimentally, we further observe that removing water by heating enhances surface nonuniformities, which readily trigger defect-pair formation. To investigate this mechanism, we performed additional simulations in which local nonuniformities were introduced Supplementary Figure 12. Consistent with experiments, defect-pair generation occurs only at high nonreciprocity, where pairs of unlike defects can be stably maintained. Experimental observation (Author response image 4) also show that surface nonuniformities on the biofilm surface similarly trigger the formation of closely separated defect pairs. We have updated the details of the defect dynamics in the revised manuscript to clarify the transition between these waves.

      Author response image 4.

      Experimental observation showing that small surface nonuniformities on the biofilm surface trigger the formation of closely separated defect pairs. Arrows indicate the position of the nonuniformities

      Modification in the manuscript:

      Defect dynamics controlling the transition between spiral to target waves

      “To better understand the dynamics of the transition between different form of the waves we focused on numerical simulations. We noticed that the motility of defects is the crucial parameter governing the transition between spiral, target, and planar waves varying the moisture content provides an effective and experimentally accessible control this motility. Our analyses revealed that spiral defect cores can move and merge to form target waves or annihilate entirely—processes that we also observe experimentally. This rich dynamical behavior underscores the importance of elasticity in shaping pattern transitions. First, we compare defect dynamics in both Kuramotobased simulations and the active solid model. Both systems exhibit similar defect-survival behavior. As shown in Supplementary Figure10, pairs of unlike (+/−) defects can stably persist only at high nonreciprocity. We further quantify this behavior by plotting the separation distances between unlike defect pairs and find that short-range defect separations are possible exclusively in the high-nonreciprocity regime (Supplementary Figure11). This high-nonreciprocity regime corresponds to the dry biofilm state. Increasing moisture reduces elasticity, leading to the loss of stable defect dynamics and promoting the annihilation of unlike defect pairs, which in turn drives the system toward target-wave formation and ultimately planar waves. Conversely, heating the biofilm removes water, enhances elasticity, and increases the system’s ability to sustain closely separated defect pairs. Experimentally, we further observe that removing water by heating enhances surface nonuniformities, which readily trigger defect-pair formation (Supplementary Video9). To investigate this mechanism, we performed additional simulations in which local nonuniformities were introduced (Supplementary Video12-13). Consistent with experiments, defect-pair generation occurs only at high nonreciprocity, where pairs of unlike defects can be stably maintained. Experimental observation (Supplementary Video9) also show that surface nonuniformities on the biofilm surface similarly trigger the formation of closely separated defect pairs.”

      All the recommended points have been addressed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary

      Sullivan and colleagues examined the modulation of reflexive visuomotor responses during collaboration between pairs of participants performing a joint reaching movement to a target. In their experiments, the players jointly controlled a cursor that they had to move towards narrow or wide targets. In each experimental block, each participant had a different type of target they had to move the joint cursor to. During the experiment, the authors used lateral perturbation of the cursor to test participants’ fast feedback responses to the different target types. The authors suggest participants integrate the target type and related cost of their partner into their own movements, which suggests that visuomotor gains are affected by the partner’s task.

      Strengths

      The topic of the manuscript is very interesting, and the authors are using well established methodology to test their hypothesis. They combine experimental studies with optimal control models to further support their work. Overall, the manuscript is very timely and shows important findings - that the feedback responses reflect both our and our partner’s tasks.

      We thank the reviewer for the positive comments regarding our work.

      Weaknesses

      However, in the current version of the manuscript, I believe the results could also be interpreted differently, which suggest that the authors should provide further support for their hypothesis and conclusions.

      Major Comments

      (1) Results of the relevant conditions:

      In addition to the authors’ explanation regarding the results, it is also possible that the results represent a simple modulation of the reflexive response to a scaled version of cursor movement. That is, when the cursor is partially controlled by a partner, which also contributes to reducing movement error, it can also be interpreted by the sensorimotor system as a scaling of hand-to-cursor movement. In this case, the reflexes are modulated according to a scaling factor (how much do I need to move to bring the cursor to the target). I believe that a single-agent simulation of an OFC model with a scaling factor in the lateral direction can generate the same predictions as those presented by the authors in this study. In other words, maybe the controller has learned about the nature of the perturbation in each specific context, that in some conditions I need to control strongly, whereas in others I do not (without having any model of the partner). I suggest that the authors demonstrate how they can distinguish their interpretation of the results from other explanations.

      We thank the reviewer for the thoughtful comment. While it is possible that the change in the visuomotor feedback responses could be just from a scaling factor. This hypothesis could explain the difference between two conditions, but would fail to explain differences between two other conditions. Specifically, this hypothesis could explain a decrease in involuntary visuomotor feedback responses between partner-irrelevant/self-relevant and partner-relevant/self-relevant. Critically, this hypothesis could not explain the difference between partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant. That is, there is no reason to scale a response to correct for a partner’s relevant target when your own target is irrelevant. However, our finding that there is a greater involuntary visuomotor feedback response in partner-relevant/self-irrelevant compared to partner-irrelevant/self-irrelevant is predicted by the notion that humans form a representation of others and consider their movement costs.

      We have added a paragraph in the discussion to justify our hypothesis over the scaling factor hypothesis.

      “Our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses can parsimoniously explain all of our experimental findings. There are a few alternative hypotheses that could explain a subset of results. One alternative hypothesis is that participants simply learned the hand to center cursor mapping in each experimental condition. That is, instead of using a model of their partner, participants simply adapted to the dynamics of the center cursor. However, this hypothesis would not predict an increased involuntary visuomotor feedback response in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-irrelevant condition. If participants did not form a model of their partner nor consider their partner’s costs, then they would not display an increased feedback response when they had an irrelevant target and their partner’s target was relevant. An increased feedback response to help a partner achieve their goal is captured by our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses.”

      (2) The effect of the partner target:

      The authors presented both self and partner targets together. While the effect of each target type, presented separately, is known, it is unclear how presenting both simultaneously affects individual response. That is, does a small target with a background of the wide target affect the reflexive response in the case of a single participant moving? The results of Experiment 2, comparing the case of partner- and self-relevant targets versus partner-irrelevant and self-relevant targets, may suggest that the system acted based on the relevant target, regardless of the presence and instructions regarding the self-target.

      We thank the reviewer for bringing up another valid point, which we discussed at length as a group when designing the experiment. The reviewer is correct in pointing out the lack of difference in the involuntary epoch between the partner-relevant/self-relevant and partner-irrelevant/self-relevant could potentially suggest that the sensorimotor system acted based on only relevant targets, irrespective if it was a self or partner relevant target. While the effect of the simultaneous presentation of a narrow and wide target on an individual’s response by themselves is unknown, comparing the differences between our other experimental conditions control for this potential confound. Participants viewed a wide target and a narrow target on the screen, in both the partner-irrelevant/self-relevant condition and the partner-relevant/self-irrelevant condition. Crucially, we found that the visuomotor feedback responses were greater in the partner-irrelevant/self-relevant condition compared to the partner-relevant/self-irrelevant condition in both Experiment 1 and 2. That is, participants were able to distinguish between the self-target and partner target and appropriately modify their feedback responses in both Experiment 1 and 2, despite there being both a wide and narrow target on the screen in both conditions. Given that we found different visuomotor feedback responses between the two conditions that had both a narrow and wide target, this rules out the alternative hypothesis that the sensorimotor system acted based just on a relevant target being present. We have added to our discussion to clarify this point.

      “Another alternative hypothesis would be that the sensorimotor system was responding only to the relevant target displayed on the screen. Again, this hypothesis would only explain a subset of our results. In particular, this relevant target hypothesis cannot explain the observed feedback response differences between the partner-relevant/self-irrelevant and partner-irrelevant/self-relevant conditions in both Experiments 1 and 2.”

      (3) Experiment instructions:

      It is unclear what the general instructions were for the participants and whether the instructions provided set the proposed weighted cost, which could be altered with different instructions.

      Our instructions explicitly informed participants that their performance bonus was only based on them stabilizing within their own self-target within the time constraint. We have added the following in the methods to emphasize this instruction.

      “In other words, we ensured participants had a clear understanding that their performance in the task was only based on stabilizing the center cursor in their own self-target within the time constraint. Therefore, the instructions and timing constraints did not enforce participants to work together.”

      (4) Some work has shown that the gain of visuomotor feedback responses reflects the time to target and that this is updated online after a perturbation (Cesonis & Franklin, 2020, eNeuro; Cesonis and Franklin, 2021, NBDT; also related to Crevecoeur et al., 2013, J Neurophysiol). These models would predict different feedback gains depending on the distance remaining to the target for the participant and the time to correct for the jump, which is directly affected by the small or large targets. Could this time be used to target instead of explaining the results? I don’t believe that this is the case, but the authors should try to rule out other interpretations. This is maybe a minor point, but perhaps more important is the location (&time remaining) for each participant at the time of the jump. It appears from the figures that this might be affected by the condition (given the change in movement lengths - see Figure 3 B & C). If this is the case, then could some of the feedback gain be related to these parameters and not the model of the partner, as suggested? Some evidence to rule this out would be a good addition to the paper - perhaps the distance of each partner at the time of the perturbation, for example. In addition, please analyze the synchrony of the two partners’ movements.

      (1) Time to target and forward position

      The reviewer raises an interesting point. In our task, the cursor/target jump occurs once the center cursor crosses 6.25 cm from the start. We analyzed the time it took for the center cursor to intercept the targets from perturbation onset (Supplementary D). In Experiment 1, an ANOVA with center cursor time-to-target as the dependent variable showed no main effect of self-target (F[1,47] = 2.45, p = 0.124) or partner target (F[1,47] = 2.50, p=0.120), nor any interaction (F[1,47] = 1.97, p = 0.166). In Experiment 2, an ANOVA with center cursor time-to-target as the dependent variable showed a significant interaction (F[1,47] = 5.87, p = 0.019). Post-hoc mean comparisons showed that only the difference between the partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant condition was significant (p = 0.006). Given that only one comparison in Experiment 2 showed a difference in time-to-target, we do not believe that time-to-target was a significant driver of the change in involuntary visuomotor feedback responses observed between conditions. While time-to-target is likely a metric the nervous system modifies feedback gains around, our results suggest that the nervous system can also use a partner model to modify feedback gains. We have added a supplemental analysis on time to target

      “Previous work by Česonis and Franklin (2020) showed that time to-target is a key variable the sensorimotor system uses to modify feedback responses. In their experiment, they manipulated the time-to-target of the participant’s cursor, while controlling for other movement parameters (e.g., distance from goal) [1]. When compared to classical optimal feedback control models, they showed that a model that modifies feedback responses based on time-to-target best predicted their results. In our task, it’s possible that the time-to-target could have influenced visuomotor feedback responses, since the distance to the center of the target is greater for a narrow target than a wide target on perturbation trials.”

      “We calculated the time from perturbation onset to the center cursor reaching the forward position of the targets (Supplementary Fig. S5). In Experiment1, an ANOVA with center cursor time-to-target as the dependent variable showed no main effect of self-target (F[1,47]=2.45,p=0.124) or partner target (F[1,47] = 2.50, p=0.120), nor any interaction (F[1,47] = 1.97, p = 0.166). In Experiment2, an ANOVA with center cursor time-to-target as the dependent variable showed a significant interaction (F [1,47] = 5.87, p = 0.019). Post-hoc mean comparisons showed that only the difference between the partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant condition was significant (p=0.006). Although time-to-target and hand position are important variables for the control ofmovement,[1,2,3] they are likely not driving factors of the different in voluntary visuomotor feedback responses between our experimental conditions.”

      However, it is possible that the participant forward position at perturbation onset could also influence the involuntary feedback response. We show the forward positions at perturbation onset in Supplementary D. Statistical analysis of the forward positions in Experiment 1 showed a main effect of self-target (F[1,47] = 12.72, p < 0.001), main effect of partner target (F[1,47] = 12.82, p < 0.001), and no interaction (F[1,47] = 0.00, P = 0.991). We see the same trend in experiment 2, showing a main effect of self-target (F[1,47] = 12.11, p < 0.001), main effect of partner target (F[1,47] = 12.04, p < 0.001), and no interaction (F[1,47] = 0.00, p = 0.986). The fact that there was no interaction implies that the results could not solely be due to forward position. Nevertheless, given there were main effects, we proceeded to run an ANCOVA on the involuntary visuomotor feedback responses with forward position as a covariate. For experiment 1, we still observed a significant interaction between self and partner target (F[1,47] = 43.14, p < 0.001). Further, we also observed no significant main effect of forward position on the involuntary visuomotor feedback responses. The ANCOVA for Experiment 2 also showed that there was still a significant interaction of self and partner target on the involuntary visuomotor feedback responses (F[1,47] = 9.80, p = 0.002). However, here we did find a significant main effect of the forward position (F[1,47] = 5.06, p = 0.026). Therefore, we ran follow-up mean comparisons with the covariate adjusted means. We found the same statistical trend as reported in the main results. We found significant differences between the partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant conditions (p = 0.003), partner-relevant/self-irrelevant and partner-irrelevant/self-relevant conditions (p < 0.001), partner-relevant/self-irrelevant and partner-relevant/self-relevant conditions (p < 0.001). We found no significant difference between the partner-irrelevant/self-relevant and partner-relevant/self-relevant conditions (p = 0.381). Given that there was no main effect of forward position in Experiment 1, and that our adjusted mean comparisons in Experiment 2 showed the same trends as the unadjusted mean comparisons in the main manuscript, our results show that the forward position of the participants is not a significant factor in explaining the differences in involuntary visuomotor feedback responses between conditions.

      “Supplementary Fig. 6 shows the participant hand forward position at perturbation onset time for Experiment 1 (A) and Experiment 2 (B). It is possible that the participant forward hand position at perturbation onset time could influence their visuomotor feedback responses. Therefore, we ran an ANCOVA with self-target and partner target as factors, and participant forward hand position at perturbation onset time as a covariate. In Experiment 1, we found no main affect of participant forward hand position on involuntary visuomotor feedback responses (F[1,47] = 1.466, p = 0.228). Further, when including the covariate, we still found a significant interaction between self-target and partner target on in voluntary visuomotor feedback responses (F[1,47]=43.2, p<0.001).”

      “In Experiment 2, we found a significant main effect of participant forward hand position on involuntary visuomotor feedback responses (F[1,47] = 6.73, p = 0.010). We still found a significant interaction between self-target and partner target (F[1,47] = 9.78, p = 0.002). Since we found a main effect of participant forward hand position, we calculated the adjusted means of the involuntary visuomotor feedback responses. We then performed follow-up mean comparisons on the adjusted means of the involuntary visuomotor feedback responses (using emmeans in R). We found the same significant trends as the unadjusted means in the main manuscript. Specifically we found involuntary visuomotor feedback responses to be: significantly greater in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-irrelevant condition (p =0.003),significantly greater in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-relevant condition (p<0.001), significantly greater in the partner-relevant/self-relevant condition compared to the partner-relevant/self-irrelevant condition (p<0.001),and not different between the partner-irrelevant/self-relevant and partner-relevant/self-relevant conditions (p = 0.824).”

      We have also included in the discussion how time-to-target and participant forward hand position are important control variables to consider, and their potential relationship to our findings.

      “Finally, we also considered whether time to target [1,2]. (Supplementary D), participant forward hand position (Supplementary E), or learning [4] (Supplementary G-H) influenced feedback responses, but found that none impacted the observed differences between experimental conditions nor changed our interpretation. Our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses parsimoniously accounts for the differences observed between all conditions.”

      (2) Synchrony

      In our task, participants movements were not self-initiated. We had them begin the movement as soon as they hear an audible tone so that they would begin their movements at as similar a time as possible. We have analyzed the movement onset synchrony between participants within a pair, shown in Supplementary F.

      Supplementary: “We calculated movement onset times at the time that the participants left the start target [8]. We then took the absolute value of the difference between the participants within a pair as a measure of movement onset synchrony. For Experiment 1, an ANOVA with movement onset synchrony as the dependent variable showed no main effect of self-target (F[1,47] = 1.38, p = 0.252), no main effect of partner target (F[1,47] = 0.057, p = 0.813), and no interaction (F[1,47] = 0.45, p = 0.508). For Experiment 2, an ANOVA with movement onset synchrony as the dependent variable showed no main effect of self-target (F[1,47] = 0.07, p = 0.788), no main effect of partner target (F[1,47] = 2.75, p = 0.111), and no interaction (F[1,47] = 2.31, p = 0.142).”

      Further, we have modified our methods to emphasize that participants within a pair generally began their movement at the same time.

      “Instead of self-initiating their movements, we specifically had participants move at the sound of a tone so that the movement onset between participants in a pair was as synchronous as possible (see Supplementary F for movement onset synchrony analysis).”

      Reviewer #1 (Recommendations for the authors):

      (1) Lines 291-292: One study extensively examined cursor and target jump visuomotor on set times and found no difference (Franklin et al., 2016; J Neuroscience), which strongly argues against this interpretation.

      We thank the reviewer for pointing out this work. We have modified the following lines:

      “However, other work by Franklin and colleagues (2016) found no difference in visuomotor feedback response latencies between cursor and target jumps [6].”

      (2) Line 411: What were the instructions regarding partner performance in terms of the reward? Did you explain that individual performance alone will determine the reward?

      As addressed above, we have made the following changes to emphasize the instructions given to participants.

      “In other words, we ensured participants had a clear understanding that their performance in the task was only based on stabilizing the center cursor in their own self-target within the time constraint. Therefore, the instructions and timing constraints did not enforce participants to work together.”

      (3) Line 506: Ten probe trials in each direction is very low. Can this still be in the transition state of the feedback response, rather than at steady state? There are many studies done looking at the learning of visuomotor responses in which changes are still occurring after several hundred trials (e.g., Franklin et al., 2017 J Neurophysiol; Franklin et al., 2008; J Neuroscience). In this experiment, each block only lasts 151 trials total if my calculations are correct. How certain are you that the results are at a steady state and not continuously changing? Perhaps with further experimental experience, the feedback responses would approach the predictions of a different model.

      The reviewer raises an important point. We had run these analyses prior to submitting the manuscript and did not see anything. However, we believe this information is important to include since both we and yourself asked the same question. Specifically, we have analyzed the visuomotor feedback responses over the trials (Supplementary G), which shows little to no learning over time. Additionally, we also found no difference in the visuomotor feedback response trends between the first and second half of trials in each condition (Supplementary H). Therefore, it appears that the sensorimotor system was at steady state behaviour very quickly and we do believe that the feedback responses would approach the predictions of a different model if participants performed more trials. We have added the following

      Supplementary: “Given there were 151 trials and 10 left/right probe trials for each experimental condition, it is possible that completing more trials may have lead to different involuntary visuomotor feedback responses. Therefore, we analysed the in voluntary visuomotor feedback responses over the course of each experimental condition. Visually, involuntary visuomotor feedback responses in neither Experiment 1 (Fig. S8) nor Experiment 2 (Fig. S9) show any consistent learning (see Fig. S10 for statistical analysis). Therefore, it appears participants rapidly formed a partner model based on knowledge of their movement goal to modify their involuntary visuomotor feedback responses.”

      Supplementary: “Supplementary Fig. S10 shows the involuntary visuomotor feedback responses in the first half (A,C) and second half (B,D) for each experimental condition. In Experiment 1, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 37.09, p < 0.001) and second half (F[1,47] = 48.68, p < 0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (see Fig. S10A-B).”

      Supplementary: “In Experiment 2, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 9.42, p = 0.004) and second half (F[1,47] = 17.40, p < 0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (Fig. S10C-D).”

      Supplementary: “Showing the same involuntary visuomotor feedback response trends across the experimental conditions for the first half, second half, and all trials suggests that the sensorimotor system quickly formed a model of a partner and considered their costs to modify rapid motor responses.”

      We have also added to the discussion:

      “Finally, we also considered whether time to target [1,2] (Supplementary D), participant forward hand position (Supplementary E), or learning [4] (Supplementary G) influenced feedback responses, but found that none impacted the observed differences between experimental conditions nor changed our interpretation.”

      (4) The authors should also discuss some of the prior work which is very relevant to the tasks studied: (Knill, Bondata & Chhabra, 2011, J Neuroscience). There may also be other papers that use this task for visuomotor feedback responses and therefore, should be included.

      We have included the Knill 2011 paper and also Cross 2019 in our discussion:

      “This modification of feedback responses based on a relevant/irrelevant task goal has also been shown in response to visual perturbations [7,8].”

      (5) Lines 301-303: The terms ’relevant’ and ’irrelevant’ here describe different concepts than the ones used in this study. I suggest making a distinction to avoid confusion for the reader.

      We thank the reviewer for pointing out that this is confusing. We’ve made the following changes to improve the clarity:

      “Further, Franklin and colleagues (2008) designed a visual perturbation to be relevant or irrelevant when reaching to the same target, showing greater involuntary visuomotor feedback responses to a relevant visual perturbation compared to an irrelevant visual perturbation [9].”

      (6) Line 459: The reaching movement was quite slow (25cm in about 1.2 seconds). Is this needed to ensure that both participants can complete the movements, given potentially very different start times? Please comment as this is different than many previous studies.

      Participants needed to stabilize the cursor for 500ms in their target within a time constraint of 1400 - 1600 ms. Therefore, they had to reach the target between 900 - 1100 ms (before stabilizing). Additionally, participants did not perform self-initiated movements, but were required to begin their movement as soon as they heard an audible tone. Given that reaction times are ~200ms, participants had ~700 - 900 ms to reach the target, which aligns with previous research (Franklin et al. (2008), Franklin et al. (2012), Nashed et al. (2012)). We have clarified the time constraints of the task in our Methods:

      “They therefore had 700 - 900 ms to first reach the target, since humans generally have response times ~200 ms, and they needed to stabilize within the target for 500 ms (i.e., 1400 - 200 - 500 = 700 ms and 1600 - 200 - 500 = 900 ms). Movement times of 700 - 900 ms are thus consistent with previous human reaching studies [4,9,10].”

      (7) Reference [25] is incomplete

      Thank you for catching this.

      And thank you for the thoughtful and clear review. We feel it has greatly improved the quality and clarity of our manuscript!

      Reviewer #2 (Public review):

      Summary

      Sullivan and colleagues studied the fast, involuntary, sensorimotor feedback control in interpersonal coordination. Using a cleverly designed joint-reaching experiment that separately manipulated the accuracy demands for a pair of participants, they demonstrated that the rapid visuomotor feedback response of a human participant to a sudden visual perturbation is modulated by his/her partner’s control policy and cost. The behavioral results are well-matched with the predictions of the optimal feedback control framework implemented with the dynamic game theory model. Overall, the study provides an important and novel set of results on the fast, involuntary feedback response in human motor control, in the context of interpersonal coordination.

      We thank the reviewer for the kind words!

      Review:

      Sullivan and colleagues investigated whether fast, involuntary sensorimotor feedback control is modulated by the partner’s state (e.g., cost and control policy) during interpersonal coordination. They asked a pair of participants to make a reaching movement to control a cursor and hit a target, where the cursor’s position was a combination of each participant’s hand position. To examine fast visuomotor feedback response, the authors applied a sudden shift in either the cursor (experiment 1) or the target (experiment 2) position in the middle of movement. To test the involvement of partner’s information in the feedback response, they independently manipulated the accuracy demand for each participant by varying the lateral length of the target (i.e., a wider/narrower target has a lower/higher demand for correction when movement is perturbed). Because participants could also see their partner’s target, they could theoretically take this information (e.g., whether their partner would correct, whether their correction would help their partner, etc.) into account when responding to the sudden visual shift. Computationally, the task structure can be handled using dynamic game theory, and the partner’s feedback control policy and cost function are integrated into the optimal feedback control framework. As predicted by the model, the authors demonstrated that the rapid visuomotor feedback response to a sudden visual perturbation is modulated by the partner’s control policy and cost. When their partner’s target was narrow, they made rapid feedback corrections even when their own target was wide (no need for correction), suggesting integration of their partner’s cost function. Similarly, they made corrections to a lesser degree when both targets were narrower than when the partner’s target was wider, suggesting that the feedback correction takes the partner’s correction (i.e., feedback control policy) into account.

      The strength of the current paper lies in the combination of clever behavioral experiments that independently manipulate each participant’s accuracy demand and a sophisticated computational approach that integrates optimal feedback control and dynamic game theory. Both the experimental design and data analysis sound good. While the main claim is well-supported by the results, the only current weakness is the lack of discussion of limitations and an alternative explanation. Adding these points will further strengthen the paper.

      Reviewer #2 (Recommendations for the authors):

      (1) While the current version is already well-written, it would be helpful for readers to further discuss the relationship between the current study and some potentially relevant studies, such as Braun et al. (2009), Ganesh et al. (2014), and Takagi et al. (2017) (2019).

      Thank you for pointing out these papers that we missed, which we now cite appropriately in light of our own work. In particular, we have added the following to our discussion, including Braun et al. (2009) and Takagi et al. (2017) (2019). However, Beckers et al. (2020) showed conflicting results from Ganesh et al. (2014), and since these works are about learning, we feel it is outside the scope of our work.

      “Further, others have shown that the sensorimotor system modifies movement selection according to game-theoretic predictions, [11] and that the sensorimotor system modifies movements using an estimate of the joint goal during human-human interactions [12,13].”

      (2) For an alternative interpretation of the results, one could consider, for instance, that the target’s visual appearance could have served as a contextual cue for learning different movement gains in the lateral direction (e.g., whether the partner corrects the shift might be approximated as a gain change). Although less likely, this alternative account could be tested by simulation and would strengthen the argument.

      This a thoughtful comment, also brought up by Reviewer 1. Here we provide our previous response that addresses this concern. While it is possible that the change in the visuomotor feedback responses could be just from a scaling factor. This hypothesis could explain the difference between two conditions, but would fail to explain differences between two other conditions. Specifically, this hypothesis could explain a decrease in involuntary visuomotor feedback responses between partner-irrelevant/self-relevant and partner-relevant/self-relevant. Critically, this hypothesis could not explain the difference between partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant. That is, there is no reason to scale a response to correct for a partner’s relevant target when your own target is irrelevant. However, our finding that there is a greater involuntary visuomotor feedback response in partner-relevant/self-irrelevant compared to partner irrelevant/self-irrelevant is predicted by the notion that humans form a representation of others and consider their movement costs.

      We have added a paragraph in the discussion to justify our hypothesis over the scaling factor hypothesis.

      “Our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses can parsimoniously explain all of our experimental findings. There are a few alternative hypotheses that could explain a subset of results. One alternative hypothesis is that participants simply learned the hand to center cursor mapping in each experimental condition. That is, instead of using a model of their partner, participants simply adapted to the dynamics of the center cursor. However, this hypothesis would not predict an increased involuntary visuomotor feedback response in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-irrelevant condition. If participants did not form a model of their partner nor consider their partner’s costs, then they would not display an increased feedback response when they had an irrelevant target and their partner’s target was relevant. An increased feedback response to help a partner achieve their goal is captured by our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses.”

      (3) Another (maybe unlikely) alternative interpretation is that the targets’ visual appearances might have been confusing. One might find that the closed square is common to both targets for the “Partner Relevant Self Irrelevant” and the “Partner Relevant Self Relevant”, and that this might have elicited the response to perturbation in “Partner Relevant Self Irrelevant”. Related to this point, it would be informative to describe how the “cooperative” fast feedback response developed over the course of the experiment, for instance, by comparing behaviors across experimental blocks.

      We have partitioned this question into two responses, relating to visual appearance of the targets and the development (i.e., learning) of visuomotor feedback responses over the course of the experiments.

      (1) Participants confused by visual appearance of the targets.

      We were also concerned that participants might be confused by the targets, and therefore confirmed with participants after the experiment that they correctly understood that the light grey filled rectangle was their own target and the dark grey hollow rectangle was their partners. Furthermore, in the partner-relevant/self-irrelevant, partner-irrelevant/self-relevant, and partner-relevant/self-relevant conditions, there is a small square target in each of the conditions. However, we found that the partner-irrelevant/self-relevant and partner-relevant/self-relevant conditions both elicited significantly greater involuntary visuomotor feedback responses than the partner-relevant/self-irrelevant condition. Thus, participants involuntary visuomotor feedback responses suggest that they correctly formed different representations based on an accurate understanding of the self vs partner target. The other reviewer had related comments about the visual stimuli, which we also address within the discussion.

      “Another alternative hypothesis would be that the sensorimotor system was responding only to the relevant target displayed on the screen. Again, this hypothesis would only explain a subset of our results. In particular, this relevant target hypothesis cannot explain the observed differences between the partner-relevant/self-irrelevant and partner-irrelevant/self-relevant conditions in both Experiments 1 and 2.”

      (2) Comparing feedback responses over time

      We have included the visuomotor feedback responses over each experimental condition in Supplementary G. Notably, we did not find any learning effect, suggesting that the sensorimotor system quickly developed a model of a partner’s behaviour and used that model to modify feedback responses. We have also added a paragraph on learning to our discussion.

      We’ve addressed how learning did not play a role in this study:

      “Finally, we also considered whether time to target [1,2] (Supplementary D), participant forward hand position (Supplementary E), or learning [4] (Supplementary G-H) influenced feedback responses, but found that none impacted the observed differences between experimental conditions nor changed our interpretation.”

      Supplementary: “Given there were 151 trials and 10 left/right probe trials for each experimental condition, it is possible that completing more trials may have lead to different in voluntary visuomotor feedback responses. Therefore, we analysed the in voluntary visuomotor feedback responses over the course of each experimental condition. Visually, involuntary visuomotor feedback responses in neither Experiment 1 (Fig. S8) nor Experiment 2 (Fig. S9) show any consistent learning (see Fig. S10 for statistical analysis). Therefore, it appears participants rapidly formed a partner model based on knowledge of their movement goal to modify their involuntary visuomotor feedback responses.”

      Supplementary: “Supplementary Fig. S10 shows the involuntary visuomotor feedback responses in the first half (A,C) and second half (B,D) for each experimental condition. In Experiment 1, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 37.09, p < 0.001) and second half (F[1,47] = 48.68, p < 0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (see Fig. S10A-B).”

      Supplementary: “Supplementary Fig. S10 shows the involuntary visuomotor feedback responses in the first half (A,C) and second half (B,D) for each experimental condition. In Experiment 1, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 37.09, p < 0.001) and second half (F[1,47] = 48.68, p <0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (see Fig. S10A-B).”

      Supplementary: “Showing the same involuntary visuomotor feedback response trends across the experimental conditions for the first half, second half, and all trials suggests that the sensorimotor system used a model of a partner based on their goals and considered their costs to modify rapid motor responses.”

      (4) It looks slightly counter intuitive (and therefore interesting) that the participant shows some amount of fast feedback responses in the “Partner Relevant Self Irrelevant” condition, since they were instructed to only consider the self-target. Based on the results, the authors suggest an altruistic feature of the motor system (lines 333-340). It would be helpful to clarify the basis for this interpretation, whether it is formally derived from the game-theoretic framework or represents a more conceptual interpretation. Providing additional explanation that translates the game-theoretic reasoning into more accessible, intuitive terms would help readers better understand and evaluate this claim.

      We are glad the reviewer also finds this result interesting. The reviewer raises an important point that there needs to be a more clear explanation for why we believe this result was found. We have made the following changes to the discussion:

      “Furthermore, this result is predicted by our dynamic game theory models that include the partner’s costs in the self cost function. In other words, a dynamic game theory model that selects feedback gains to minimize both the self and partner cost reflects an altruistic control policy.”

      (5) Please check whether all references are displayed correctly. Some of them (e.g., 25, 65) seemed not correctly shown in the References section.

      We have fixed the citation.

      We thank the reviewer for providing a clear and insightful review. Their comments have significantly improved the manuscript.

      References

      (1) Česonis, J., & Franklin, D. W. (2020). Time-to-Target Simplifies Optimal Control of Visuomotor Feedback Responses. eneuro, 7 (2), ENEURO.0514–19.2020.

      (2) Česonis, J., & Franklin, D. W. (2022). Contextual Cues Are Not Unique for Motor Learning: Task-dependant Switching of Feedback Controllers. PLOS Computational Biology, 18 (6), ed. by Haith, A. M.: e1010192.

      (3) Crevecoeur, F., Kurtzer, I., Bourke, T., & Scott, S. H. (2013). Feedback Responses Rapidly Scale with the Urgency to Correct for External Perturbations. Journal of Neurophysiology, 110 (6), 1323–1332.

      (4) Franklin, S., Wolpert, D. M., & Franklin, D. W. (2012). Visuomotor Feedback Gains Upregulate during the Learning of Novel Dynamics. Journal of Neurophysiology, 108 (2), 467–478.

      (5) Liu, Y., Leib, R., Dudley, W., Shafti, A., Faisal, A. A., & Franklin, D. W. (2025). Partner-Sourced Haptic Feedback Rather than Environmental Inputs Drives Coordination Improvement in Human Dyadic Collaboration. Scientific Reports, 15 (1), 40347.

      (6) Franklin, D. W., Reichenbach, A., Franklin, S., & Diedrichsen, J. (2016). Temporal Evolution of Spatial Computations for Visuomotor Control. The Journal of Neuroscience, 36 (8), 2329–2341.

      (7) Knill, D. C., Bondada, A., & Chhabra, M. (2011). Flexible, Task-Dependent Use of Sensory Feedback to Control Hand Movements. The Journal of Neuroscience, 31 (4), 1219–1237.

      (8) Cross, K. P., Cluff, T., Takei, T., & Scott, S. H. (2019). Visual Feedback Processing of the Limb Involves Two Distinct Phases. The Journal of Neuroscience, 39 (34), 6751–6765.

      (9) Franklin, D. W., & Wolpert, D. M. (2008). Specificity of Reflex Adaptation for Task-Relevant Variability. The Journal of Neuroscience, 28 (52), 14165–14175.

      (10) Nashed, J. Y., Crevecoeur, F., & Scott, S. H. (2012). Influence of the Behavioral Goal and Environmental Obstacles on Rapid Feedback Responses. Journal of Neurophysiology, 108 (4), 999–1009.

      (11) Braun, D. A., Ortega, P. A., & Wolpert, D. M. (2009). Nash Equilibria in Multi-Agent Motor Interactions. PLoS Computational Biology, 5 (8), ed. by Friston, K. J.: e1000468.

      (10) Takagi, A., Ganesh, G., Yoshioka, T., Kawato, M., & Burdet, E. (2017). Physically Interacting Individuals Estimate the Partner’s Goal to Enhance Their Movements. Nature Human Behaviour, 1 (3), 0054.

      (11) Takagi, A., Hirashima, M., Nozaki, D., & Burdet, E. (2019). Individuals Physically Interacting in a Group Rapidly Coordinate Their Movement by Estimating the Collective Goal. eLife, 8 , e41328.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how collective navigation improvements arise in homing pigeons. Building on the Sasaki & Biro (2017) experiment on homing pigeons, the authors use simulations to test seven candidate social learning strategies of varying cognitive complexity, ranging from simple route averaging to potentially cognitively demanding selective propagation of superior routes. They show that only the simplest strategy-equal route averaging-quantitatively matches the experimental data in both route efficiency and social weighting. More complex strategies, while potentially more effective, fail to align with the observed data. The authors also introduce the concept of "effective group size," showing that the chaining design leads to a strong dilution of earlier individuals' contributions. Overall, they conclude that cognitive simplicity rather than cumulative cultural evolution explains collective route improvements in pigeons.

      Strengths:

      The manuscript addresses an important question and provides a compelling argument that a simpler hypothesis is necessary and sufficient to explain findings of a recent influential study on pigeon route improvements, via a rigorous systematic comparison of seven alternative hypotheses. The authors should be commended for their willingness to critically re-examine established interpretations. The introduction and discussion are broad and link pigeon navigation to general debates on social learning, wisdom of crowds, and CCE.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The lack of availability of codes and data for this manuscript, especially given that it critically examines and proposes alternative hypotheses for an important published work.

      We thank the reviewer for their comment. The code and data for our manuscript are an important aspect of the study, and we had intended to make them publicly available upon publication. The link to our code and data on fig share can be found here: (https://doi.org/10.6084/m9.figshare.28950032.v1). We have now revised the manuscript to include a link to our dataset.

      Reviewer #2 (Public review):

      Summary:

      The manuscript investigates which social navigation mechanisms, with different cognitive demands, can explain experimental data collected from homing pigeons. Interestingly, the results indicate that the simplest strategy - route averaging - aligns best with the experimental data, while the most demanding strategy - selectively propagating the best route - offers no advantage. Further, the results suggest that a mixed strategy of weighted averaging may provide significant improvements.

      The manuscript addresses the important problem of identifying possible mechanisms that could explain observed animal behavior by systematically comparing different candidate models. A core aspect of the study is the calculation of collective routes from individual bird routes using different models that were hypothesized to be employed by the animals, but which differ in their cognitive demands.

      The manuscript is well-written, with high-quality figures supporting both the description of the approach taken and the presentation of results. The results should be of interest to a broad community of researchers investigating (collective) animal behavior, ranging from experiment to theory. The general approach and mathematical methods appear reasonable and show no obvious flaws. The statistical methods also appear.

      Strengths:

      The main strength of the manuscript is the systematic comparison of different meta-mechanisms for social navigation by modeling social trajectories from solitary trajectories and directly comparing them with experimental results on social navigation. The results show that the experimentally observed behavior could, in principle, arise from simple route averaging without the need to identify "knowledgeable" individuals. Another strength of the work is the establishment of a connection between social navigation behavior and the broader literature on the wisdom of crowds through the concept of effective group size.

      We thank the reviewer for their positive comments.

      Weaknesses:

      However, there are two main weaknesses that should be addressed:

      (1) The first concerns the definition of "mechanism" as used by the authors, for example, when writing "navigation mechanism." Intuitively, one might assume that what is meant is a behavioral mechanism in the sense of how behavior is generated as a dynamic process. However, here it is used at a more abstract (meta) level, referring to high-level categories such as "averaging" versus "leader-follower" dynamics. It is not used in the sense of how an individual makes decisions while moving, where the actual route followed in a social context emerges from individuals navigating while simultaneously interacting with conspecifics in space and time. In the presented work, the approach is to directly combine (global) route data of solitary birds according to the considered "meta-mechanisms" to generate social trajectories. Of course, this is not how pigeon social navigation actually works-they do not sit together before the flight and say, "This is my route, this is your route, let's combine them in this way." A mechanistic modeling approach would instead be some form of agent-based model that describes how agents move and interact in space and time. Such a "bottom-up" approach, however, has its drawbacks, including many unknown parameters and often strongly simplifying (implicit) assumptions. I do not expect the authors to conduct agent-based modeling, but at the very least, they should clearly discuss what they mean by "mechanism" and clarify that while their approach has advantages-such as naturally accounting for the statistical features of solitary routes and allowing a direct comparison of different meta-mechanisms is also limited, as it does not address how behavior is actually generated. For example, the approach lacks any explicit modeling of errors, uncertainty, or stochasticity more broadly (e.g., due to environmental influences). Thus, while the presented study yields some interesting results, it can only be considered an intermediate step toward understanding actual behavioral mechanisms.

      We thank the reviewer for their comment and thoughtful suggestions. We agree that the inherent behavioral mechanisms and the biological basis of these mechanisms cannot be determined just through the navigational data alone. For instance, it remains unexplored if pigeons are adapting their behavior based only on social cues from their partners or using other navigational features such as landmarks or roads, location of the sun, geomagnetic cues or prior learnt routes. However, we do agree (as also pointed by the reviewer) that these behavioral rules generate an emergent ‘meta-mechanism’ where the bird pairs are behaving as if their preferred routes are averaged during a flight. It will be important in future work to explore the biological basis of these mechanisms, but our current approach allows us to only describe the mechanisms in a meta sense with any confidence. Considering this, we believe that our analysis is a more top-down approach towards describing the outcomes of these underlying mechanisms in an abstract sense. We would also like to point the reviewer to Dalmaijer, 2024 [1] who used a bottom up approach, using naive agents and showed that cumulative route improvements emerged in the absence of any sophisticated communication in the same dataset, in agreement with our approach. We have now added a paragraph: “It is also important to clarify that we use the terms…… that lead to these meta-mechanisms arising remain an open question.” found in lines 120-129 in our Introduction to make this clarification.

      (2) While the presented study raises important questions about the applicability and viability of cumulative cultural evolution (CCE) in explaining certain animal behaviors such as social navigation, I find that it falls short in discussing them. What are the implications regarding the applicability of CCE to animal data and to previously claimed experimental evidence for CCE? Should these experiments be re-analyzed or critically reassessed? If not, why? What are good examples from animal behavior where CCE should not be doubted? Furthermore, what about the cited definitions and criteria of CCE? Are they potentially too restrictive? Should they be revised-and if so, how? Conversely, if the definitions become too general, is CCE still a useful concept for studying certain classes of animal behavior? I think these are some of the very important questions that could be addressed or at least raised in the discussion to initiate a broader debate within the community.

      We thank the reviewer for their comments and interesting questions regarding our study. We agree with the reviewer that our study opens up new avenues for critically analysing the criteria previous studies have used for providing evidence of CCE in non-human animals. According to our literature review, we found that the field has been usually motivated in thinking about CCE in a ‘process’ focused manner (Reindl et al. [2]) in regards to individuals being able to compare strategies and selecting ones resulting in higher individual fitness. This preferential selection of strategies – termed innovations — allows for the stereotypical ratcheting effect seen in CCE. In our study, we propose that in the case of homing pigeons, the ratcheting effect is more of a statistical outcome rather than deliberate individual judgement. We believe that this strategy is also amenable to certain task types (which in our study was homing route choice) and may change for others (for example solving a puzzle box) and the task also needs to be sufficiently complex for animals to benefit from the use of social information (Caldwell et al. 2008 [3]). Thus, we recommend future work to address what classes of problems would fit well within the definition of “emergent” CCE and which ones don’t. Keeping this framework in mind, studies should clearly state what definition of CCE they are using and should be critically evaluated for their underlying task type and cognitive mechanisms to deem them as CCE. Considering these points, we have now expanded our Discussion to include a paragraph: “Our results highlight the need for more…..range of task types and cognitive abilities.” found in lines 420-433 to highlight these key questions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I do not have any major objections, but I am clarifying my points as major or minor depending on the effort required to address (mostly via rewriting and clarifications).

      Major comments:

      (1) A schematic summary of the original study: Since the current manuscript builds directly on Sasaki & Biro (2017), it would greatly help readers if you included a concise schematic figure summarizing the original experiment. For instance, a simple panel could depict the chain design (experienced + naïve replacements), the control treatments, and the key empirical findings (improvements in route efficiency across generations, and route similarity within vs. between chains). Presenting this visually would save readers the effort of reconstructing the design and main results from text alone, especially for those unfamiliar with the original paper. It would also clarify exactly what empirical patterns your simulations are intended to reproduce.

      We thank the reviewer for this comment. We have now revised the manuscript with a schematic illustration adapted from the original study by Sasaki and Biro (2017). We hope this clarifies the experimental design and results we aimed to highlight in our work.

      (2) Reproducibility: Code and data are only "available on request." I believe eLife has strong policies on open science; a lack of immediate open access to analysis would be a barrier. I find it jarring that a paper intending to reproduce and improvise a previously published paper does not make the codes and data available for peer review or to readers without an explicit request.

      We have taken the feedback into consideration and updated the Data Availability section with a link to our Fig share dataset.

      (3) One huge drawback of the current format of the manuscript, where Methods come after Results, is that one has to really struggle to understand and appreciate Figures 2 and 3. I would strongly urge authors to have a shorter methods section embedded either as a subsection before the Results, or within the results section, as described in each figure. Perhaps a lot of my confusion also comes from not having known the previous paper, but it may be true for other readers, too. More specifically, for Figure 3, how is social weight for the experiments inferred? Figure 3 caption talks of mean difference, but one has to check the manuscript at multiple places throughout to really understand what this difference is (the definition) and how it is computed.

      While we agree that our manuscript includes the Methods section at the end, we tried to structure our text to tell a story (as stated in our manuscript title). To this end, we organized the text into short titled subsections that briefly convey the relevant background, identify the knowledge gap and outline our approach. We chose this structure to reserve the indepth details about model implementation and statistical analysis for the Methods.

      Additionally, we made sure to include references to methodological details in relevant segments of the Introduction and Results section so as to not bog down the reader by model complexities and keep a coherent narrative that delivers the message of our study. To further address the background of our work, we have now added a schematic of the original study in response to a previous comment by the reviewer, which we hope helps the reader better understand our work. We hope this explanation clarifies the intention behind our writing choice and decision to retain the current structure.

      (4) The introduction of the 'effective group size' concept is a potentially valuable and intuitive way to interpret chain dynamics, but the explanation is somewhat buried in the Results/Methods; I suggest highlighting it more prominently (e.g., in the Discussion or with a schematic in the Results) so readers can readily grasp this useful idea.

      We thank the reviewer that they found our concept of ‘effective group size’ useful. However, we do believe that we introduced the idea and rationale behind using this method in the Results: “We asked to what extent……to an equivalent group size” found in lines 305-314. We reserved a detailed description of this method in the Methods section. However, to further emphasize the importance of the concept we have now added a text: “This is further supported….. slightly better than two individuals.” found in lines 389-394 in the Discussion. 

      Minor comments:

      (1) Line 12: "what is the navigation mechanism(s)" - the (s) is a bit awkward. Either remove (s) or ask what the mechanisms are.

      We have fixed the typo to clarify the statement.

      (2) Line 78: "Such 'ratchet'-like improvements is referred to..." → "are referred to."

      We have fixed the typo to clarify the statement.

      (3) Figure 3 caption: "color scheme in the plots are same" → should be "is the same."

      We have fixed the typo to clarify the statement.

      (4) Clarification on reporting confidence intervals: The manuscript reports confidence intervals (CIs) for the model-based comparisons (e.g., Figures 2-3). This might seem unnecessary for simulation studies, since running more iterations can arbitrarily shrink uncertainty. However, in your case, the CIs are justified because the simulations are anchored to a finite empirical dataset (only 9 solo trajectories), sampled with replacement, and analyzed with mixed-effects models that incorporate bird identity as a random effect. Thus, the intervals reflect biological sample variability rather than simulation noise. This must be clarified.

      We have added a clarifying statement: “...and reflect the biological uncertainty in the empirical dataset, not simulation noise” found in lines 241 and 293 in the captions of Figures 2 and 3 in accordance with the reviewer’s comment. 

      (5) One part of the issue is that details of methods come much later in the manuscript, perhaps following journal style. Therefore, I recommend explicitly highlighting this rationale in the Results, so readers do not misinterpret the CIs as simply reflecting simulation error.

      We believe that the clarifying statements we have now added in the captions of Figures 2 and 3 should convey this interpretation of CIs and further changes in the Results may not be required.

      With these proposed changes we hope that we improved upon the clarity of our manuscript.

      References:

      (1) Dalmaijer ES (2024) Cumulative route improvements spontaneously emerge in artificial navigators even in the absence of sophisticated communication or thought. PLoS Biol. 22:e3002644.

      (2) Reindl, E., Gwilliams, A.L., Dean, L.G. et al. (2020) Skills and motivations underlying children’s cumulative cultural learning: case not closed. Palgrave Commun 6, 106.

      (3) Caldwell CA, Millen AE (2008) Studying cumulative cultural evolution in the laboratory. Phil. Trans. R. Soc. B 363:3529-3539.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript reports a very interesting, novel and important research angle to add to the now enormous interest in how pesticides can be toxic to beneficial insects like the honey bee. Many studies have reported on how pesticides in standard use formulations show both lethality as well as sublethal negative effects on behavior and reproduction. The authors propose to use machine learning algorithms to identify new volatile compounds that can be tested for repellency. They use as input chemical structures that are derived from chemicals that have known repellent effects as identified in their initial behavioral assays.

      Strengths:

      The conclusion is that such chemicals specific to repelling bees and not pest insects (using the fruit fly as a model for the latter) can be identified using the ML approach. Have a list of such chemicals that can be rotated among in any field application would be a benefit because of the honey bees' ability to learn its way around any kind of stimulus designed to keep it from nectar and pollen, even when they may be tainted by pesticide.

      Weaknesses:

      The use of machine learning seems well-executed and legitimate. But this is beyond my expertise. So other reviewers can maybe comment more on that.

      The behavioral data report on the use of a two-choice assay for bees in small Petrie plates. Bess can feed from two small wells place of filter paper impregnated with control or the control containing a chemical. The primary behavior, for ex in Fig 2C, is the first choice by one of the five bees in the plate of which well to feed from. For some chemical compound, there seems to be a 50:50 choice, indicating no repellent effects. In other cases the first bee making the choice chose the control, indicating possible repellent effects of the test chemical. Choices in this assay were validated in a free flying assay.

      Concerns with the choice assay:

      50-70 microliters amounts to what one hungry bee will drink. Did the first bee drink most of it, such that measures of bait consumed reflect a single bee or multiple bees?

      The measure of lure consumed reflects multiple bees. We observed that the first bee did not empty the 70 ul of honey, allowing us to estimate honey consumption by several bees.

      How many bees were repelled to the control side? Was it just the one bee?

      All the bees in a group were repelled to the control side for repellents. Evaluating lack of honey consumption, also allowed us to repellency as well. As an example: if 100% honey is consumed on the control side meant that the bees were hungry, but if 0% honey was consumed on the repellent side, this meant that the bees were not hungry enough to drink from the honey on the repellent side.

      Were other measures considered? E.g. time to first approach; the number of bees feeding at different time points; the total number of bees observed feeding per unit time.

      Bees were cooled down to place them in the plates for the experiments. Therefore, time to first approach could also depend on how long it took the bees to warm up, which was not as relevant for our research question. Because bees can communicate where to find food sources to each other, we restricted ourselves to first choice, only, to get independent data points for each plate. However, we investigated whether the first cup the first bee chose was also the one it drank from, which was the case.

      Reviewer #2 (Public review):

      Summary:

      The search for new repellent odors for honey bees has significant practical implications. The authors developed an iterative pipeline through machine learning to predict honey bee-repellent odors based on molecular structures. By screening a large number of candidate compounds, they identified a series of novel repellents. Behavioral tests were then conducted to validate the effectiveness of these repellents. Both the discovery and the methodological approach hold value for related fields.

      Strengths:

      The study demonstrates that using molecular structures and a relatively small training dataset, the model could predict repellents with a reasonably high success rate. If the iterative approach works as described, it could benefit a wide range of olfaction-related fields.

      The effectiveness of the predicted repellents was validated through both laboratory and field behavioral tests.

      Weaknesses:

      The small size of the training dataset poses a common challenge for machine learning applications. However, the authors did not clearly explain how their iterative approach addresses this limitation in this study. Quantitative evidence demonstrating improvements achieved in the second round of training would strengthen their claims. For instance, details on whether the success rate of predictions or the identification of higher-affinity components would be helpful. Furthermore, given that only 15 new components were added for the second round of training, it is surprising that such a small dataset could result in significant improvements.

      The original repellency dataset was collected from multiple older studies, each with differences in assays for bee behavior, and using differing delivery and chemical concentrations. Moreover, the number of strong repellents were limited in number, and because they varied structurally from non-repellents in the dataset, the AUC appeared high. A smaller dataset result in unusual AI/ML model performance trends, as any algorithm is just a reflection of its training data. As a result, we found that the Round 1 predictions had a low success rate in behavior assays (~20%). Subsequently, even small amounts of data collected using one standard concentration and assay, could dramatically change the quality of the dataset, not just for structures of repellents, but also related structures that were not repellent. What we observe is a more complete representation of how repellents and non-repellents are distributed when adding just 15 chemicals. And the prediction success of Round 2 is more than doubled in repellent behavior assays at >50%. The initially observed performance gains with even small additions to the training dataset will stabilize and ultimately plateau due to the limits of the ML algorithm and/or chemical featurization technique. A more complex model, trained on a large dataset, may not be expected to benefit from a handful of additional examples, it is because the chemical feature distributions are already better approximations of the real world. To put simply, smaller datasets imply there is more to learn.

      It is also true that the size of the training dataset is important for AI/ML algorithms, Artificial neural network, for instance, are highly sensitive to noise and generalize poorly with limited data; the noise is amplified in these cases, and the solution—reducing the complexity of the model—impedes learning. Many algorithms like the decision trees and support vector machines featured in our paper can handle noise more efficiently and are suitable for smaller datasets in that they can still make reasonably successful predictions.

      Reviewer #3 (Public review):

      The manuscript of Kowalewski et al. titled "Machine learning of honey bee olfactory behavior identifies repellent odorants in free flying bees in the field" did machine learning to predict potential candidates for honeybee repellents, which may keep foraging bees from pesticides. This is a pilot research with strong significance in the research of olfactory behavior and in pest control. However, some major issues need to be addressed to enhance the manuscript's clarity, strength, and overall coherence.

      (1) Drosophila melanogaster is not considered as a true agricultural pest. The manuscript would be more compelling if using true pests, for example, Drosophila suzukii or others.

      Honeybees face a critical risk of lethal pesticide exposure when they drift from their designated orchards into adjacent blooming crops or honeydew-coated fields, where they encounter chemical treatments intended for insects like Citrus Thrips, Asian Citrus Psyllid, Alfalfa Weevil, Peach Twig Borer, Oriental Fruit Moth, Lygus Bugs , Cotton Aphids, Whiteflies, Corn Rootworm, Sunflower Head Moth, Vine Mealybug, Cucumber Beetles, and Sugarcane Aphids. Unfortunately, testing such pest species is outside the scope of this paper, but would deserve further research.

      (2) For repellency test, the result relies on dosage. An attractant may become a repellent at high concentration. Test a range of concentrations for each chemicals and compare responses between honeybees and pests.

      Testing freely flying honey bees in the field is an extremely challenging undertaking. Nevertheless, we added extra tests for two strong repellents, BR4.5 and BR3.81, at half dose of 0.05 mg/cm<sup>2</sup>. As expected, we found that there was a reduction in repellency. Testing more concentrations was not within the scope of this paper.

      (3) Be more clear about bee behavior data and their scores (as in Page 4 Results "184 training chemicals and later for 203 chemicals" and Page 10 Methods). I suggest that authors add a supplemental table with each chemical and its behavioral score, feature and reference - which ones were used for training, and which ones for testing. Also add your own behavioral test data (second input) to this table

      We have added the training chemical lists as Supplemental Tables S3 and S4.

      (4) The AUC in the first validation was 0.88 (Page 4), and in Page 5, "As expected, the computational validation results based on the AUC values, show an improvement." However, there were no other AUC values to show improvement.

      (5) Show plots of ROC AUC curves from Round 1 and Round 2.

      The round one ROC curve is shown in Figure 1. The round two ROC curves obtained from 3 different approaches (Author response image 1). The manuscript shows direct behavioral validation of chemicals identified, which is more important.

      Author response image 1.

      (6) In the Discussion, the authors mentioned olfactory receptors in honeybees. It would be useful to provide a general review of the current understanding of these receptors and their (potential) functions.

      We have expanded the discussion and pointed to a review on honey bee olfaction.

      (7) I suggest combining Fig. 1 and Fig. 3A as one pipeline for this work.

      (8) Figure 2C, some sample sizes are very small, such as 2-piperidone: 1 first-choice control vs 0 first-choice repellent? Increase sample size and do statistical analysis.

      Most compounds except the one pointed out, have small sample sizes because of the low percentage of bees participating in the trials. Consequently, we improved methods in round 2 and were able to increase participation from 68% to 81%, as described in the methods. However since the compound was included in the second round of training, we would like to report it anyway. This compound had the highest rate of non-participating plates compared to the others and there is a possibility that it it may affect both the stimuli.

      (9) In general, to assist reviewers, include line numbers to the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Other factors about the newly identified chemicals:

      Is there a toxicity index for these chemicals that can be listed? This would be important obviously for any humans around the repellents

      While toxicity index determination is outside the scope of this manuscript, it is possible to predict Rat LD50 values using the EPA Suite’s toxicity prediction tool. In a pilot test, the software predicted an average oral toxicity is ~3064mg/kg for the 18 repellents in Round 2, which is considered “Practically non-toxic” by the EPA.

      Was there any indication of bees being behaviorally impaired or dying when exposed to the chemicals in a confined space? Even exposure to intense floral perfumes in a confined space and be toxic over a longer period.

      Less than 5% of the 2225 honey bee died after the experiments, and none of the compounds showed a significantly higher level of dying, suggesting that the minor effect was not due to chemicals, but possibly due to handling steps (starving, chilling, recovery, etc).

      The 'plates not participating' measure indicates plates in which no bees fed on either choice. Is that correlated to the choice index? That is, when bees showed some repellency was it the case that often that led to no choice?

      Yes, non-participating plates were those, in which the bees did not drink any honey at all. The reason for this could have been that the bees were too cold and unable to heat up enough to participate in the trials, or that the chemical was so repellent, the bees did not want to drink any honey at all. Because we were not able to distinguish between these two reasons, we excluded plates in which the bees did not drink any honey at all from our dataset.

      It is unclear why the McNemar test was used.

      The McNemar test is used for hypothesis testing for paired dichotomous data. In our data file, we created two columns to report our first-choice results: “Control side first” and “Repellent side first”. When the first bee in a plate drank from the control side first, we added a 1 to the “Control side first” column and a “0” to the “Repellent side first” column. Because one control and one repellent-side honey pot were in the same Petri dish, the bees could only choose one side first, this meant it could not choose the other side at the same time. Consequently, our dataset consisted of paired samples, which were dependent from each other. We therefore split the dataset by Repellent candidate, and we used the paired -sample McNemar tests for non-parametric data. (Lachenbruch P.A. McNemar Test, Wiley StatsRef: Statistics Reference Online)

      The statistical result is not discussed in the text, only shown in the figure. And it looks to be significant only for one chemical and DEET. Yet on page 4 the end of the second paragraph, the authors write "For many of the tested compounds the bees preferred to visit the honey-water pots on the control side versus the repellent side,". That implies that they are not really using the test as a meaningful means for showing differences. If they are arguing only from trends, then that should be clearer in the text.

      We reported the p-values for each test we had used in tables in Figure 2C and S2. In the methods section we report which statistical tests were used to evaluate the data.

      There is no mention of attractant chemicals:

      Slessor and Winston used queen pheromone to attract bees to fields and improve pollination. Honey bees use the Nasonov pheromone to attract other bees to feeding locations. Could the addition of their chemical features change ML outcomes? This should be at least discussed.

      We thank the referee for the suggestion; however the focus this manuscript is repellents and therefore we restricted the background to that area of knowledge.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Releasing the dataset and code will benefit the readers interested in this study.

      The behavioral data are reported within the figures, tables, and supplementary. The computational code will be available upon request from the communicating author for non-commercial use.

      Figure 1, AUC curve, "AUC = 0.XX", should there be an actual value from the experiment?

      Added

      Page 4, "(Talbe S1)" should be placed in the next sentence, as "From the initial training set we identified 45 features that were considered important for predicting aversive valence (Table S1)."

      We have added this in the appropriate spot.

      Page 5, "As expected, the computational validation results based on the AUC values, show an improvement.". Please list the AUC values.

      Author response image 2.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1) Page 3: "they sense using a sophisticated olfactory system of >180 odorant receptor genes in the genome". In the cited Robertson & Wanner's paper, there are around 160 receptors, and 170 if pseudogenes are included.

      We thank the referee and have updated the numbers.

      (2) Page 4: "initially for 184 training chemicals and later for 203 chemicals (Table S1)." Table S1 is about features, not chemicals?

      We have moved the reference to an appropriate location.

      (3) Figure 2A: What is the control? Acetone or another solvent?

      Acetone, but it rapidly evaporates before the time of experiment.

      (4) Figure 2A: What does asterisks mean?

      Statistically significant.

      (5) Figure 3: When you added your own testing data as a second input for Round 2, put details about these data: chemical names, preference scores... Also, are Round 2 data (Round 1 plus your own) were also split as 90:10 into training and testing partitions?

      Yes, the validation was performed on the updated data set including the new chemicals.

      (6) Figure 3D: Is asterisk at correct location? What does it mean?

      Means that BR3.15 was significantly different from BR4.5

      (7) Figure 4D: "4D" in legend is missing. Also, "... tested at the regular dose (0.1mg/cm2) and half dose (0.05mg/cm2)". In the panel, it is only 0.05mg/cm2.

      Added

      (8) Table S2 is the same as Fig. 2C? Remove one.

      We have deleted Table S2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) While the manuscript is written for a scientific audience, the authors are likely aware that findings like this will be of broad appeal to the field of neurology, where treatments for memory loss are desperately needed. For this reason, the authors could consider including a statement regarding an interpretation of this meta-analysis from a clinical standpoint. Statements such as 'safe and effective' imply a clinical indication, and yet the manuscript does not engage with clinical trials terminology such as blinding, parallel arm versus crossover design, and trial phase. While the authors might prefer not to engage with this terminology, it can be confusing when studies delivering intervention-like five days of consecutive TMS (e.g., Wang et al., 2014) are clustered with studies that delivered online rhythmic TMS, which tests target engagement (e.g., Hermiller et al., 2020). While the 'sessions' variable somewhat addresses the basic-science versus intervention-like approach, adding an explicit statement regarding this in the discussion might help the reader navigate the broad scope of approaches that are utilized in the meta-analysis.

      We appreciate the suggestion to enhance interpretability of our report by broader audiences. First, to avoid confusion, we have eliminated “safe” and “effective” descriptors from the main summary of findings in the Abstract (pg. 1) and Discussion (pg. 6). Second, we now describe that reviewed studies included those categorized as traditional clinical trials, as well as non-clinical studies that generally follow clinical trial designs (i.e., multi-day intervention-like studies), in addition to more basic-oriented studies that are geared towards target engagement (Introduction, pg. 2). Third, we now clarify that the Design and Control factors (Figure 3) correspond to fairly standard distinctions in the clinical trials literature and were intended to capture major study design factors choices that are used in both clinical-trial and non-trial studies (Methods, pg. 9; Table S1). Finally, we now clarify that future clinical trials would be needed to evaluate HITS for any specific indication, and that our findings motivate such investigations but do not conclusively indicate efficacy for any given indication (Abstract, pg. 1; Discussion, pg. 7).

      Reviewer #1 (Recommendations for the authors):

      (1) The color scheme of Figure 1 was a bit confusing. All of the colors used for the flagged regions were incredibly similar. At first glance, it looks like the hippocampus was targeted directly due to the subtle color difference. Could the authors use colors that are more different? Similarly, zooming into the specific locations shows blue dots encompassed by teal. I am not sure what I am looking at here.

      We have updated the figure for clarity.

      (2) Given the broad appeal of the current study, I would encourage the authors to include a brief visual depiction of "HITS." This could help the more casual reader to understand the general approach.

      We have included this in Figure 1A.

      Reviewer #2 (Public review):

      (1) While the introduction centers on the role of the hippocampus in episodic memory and posits hippocampal neuromodulation by TMS as causative, the true mechanism may be more complex. Clean hippocampal lesions in primates cause focal loss of spatial and place memory, and I am aware of no specific evidence that the hippocampus does more than this in humans. Moreover, there is evidence that lateral parietal TMS also reaches neighboring temporal lobe regions, which contribute to episodic memory. The hippocampus may, therefore, be a reliable deep seed for connectivity-based targeting of the episodic memory network, but might not be the true or only functional target.

      We regret to have implied that we think the hippocampus is the true or only functional target. We agree with the reviewer that the hippocampus is “a reliable deep seed for connectivity-based targeting of the episodic memory network” and that the specific locus/loci of the HITS effects and mechanisms are not yet clear. We now emphasize that although hippocampus is used to define the targeted network, effects of TMS are likely distributed throughout the network, citing relevant studies that have shown that brain activity changes due to HITS are certainly not restricted to the hippocampus (Introduction, pg. 2).

      (2) The meta-analysis combines studies with confirmation of targeting and target-network engagement from fMRI and studies without independent evidence of having stimulated the putative target (e.g., Koch et al). That seems like a more important methodological distinction than merely the use of any individual targeting method. In my experience, atlas-based estimates are at least as accurate as eyeballing cortical areas in individuals. Hence, entering individual functional targeting as a factor might reveal an effect on efficacy.

      Our current definition of the “Targeting” factor appears to satisfy this concern. That is, we distinguish studies that used “individual functional targeting” (i.e., resting-state fMRI or DTI connectivity in each individual to select the target) from those that did not (i.e., atlas or other group-average approach). Notably, the Targeting factor modulation effect failed to survive correction for multiple comparisons. We think this satisfies the reviewer criticism, unless the reviewer is suggesting that we categorize studies based on whether they included evaluation of target engagement (e.g., tested for change in fMRI activity or connectivity of the network due to HITS) versus those that measured only behavioral outcomes. We did not include this distinction as a factor, as our analysis focuses on behavioral effects of HITS, and it is not clear what the neural effects would have been in studies in which they were not measured. Notably, we are providing the full raw dataset of effect sizes in a public repository with our final version of record, such that any other categorization schemes could be assessed by others.

      (3) The funnel plot and Egger's regression for episodic memory outcomes suggested possible bias, and the average sample size of 23 is small, contributing to the likelihood of false positive results. It would be informative, therefore, to know how many or which studies had formal power estimates and what the predicted effect sizes were.

      Regarding the average sample size of 23, we note that we used Hedges’ g for the effect size measure because it corrects for bias associated with small samples (pg. 10). Further, small sample sizes contribute to noisy estimates of true effects, allowing outliers to contribute to false positives and low power to contribute to false negatives, but without any reason to systematically yield bias towards false positives. Regarding potential publication bias, although we cannot rule this out based only on the statistics, we think that bias against publication of negative results is unlikely. First, HITS experiments are time consuming and expensive, and most in the field seem to be motivated to publish, whatever the outcome. Second, the notion of memory enhancement via brain stimulation is controversial, and groups have certainly been motivated, if not overly eager, to publish “failure to replicate” studies for HITS (e.g., the failure-to-replicate publication by Hendrikse et al. 2020, which was then re-analyzed by many of the original authors to arrive at different conclusions in Cash et al. 2022). Given these considerations, we think that it is very unlikely that publication bias had any major impact on our conclusions, but of course it cannot be conclusively excluded. Finally, we note that our finding of HITS selectivity for recollection enhancement is likely not affected by publication bias, as this selectivity versus other memory and non-memory outcomes was found only within published studies (i.e., it is very unlikely that publication bias would have led researchers to withhold publication of studies that found effects of HITS on recognition but not on recollection).

      (4) In the Discussion, the authors might provide a comparison between the effect size for memory improvement found here with those reported for other brain-targeted interventions and behavioral strategies. It may also be worthwhile pointing out that HITS/memory is one of the very few, or perhaps the only, neuromodulatory effects on cognition that has been extensively reproduced and survived rigorous meta-analysis.

      We now emphasize that this is, to our knowledge, the only neuromodulatory effect on cognition that is selective, has been extensively reproduced, and survived rigorous meta-analysis (Discussion, pg. 6). However, we wish to avoid the clinical overinterpretation of our findings that might result if we were to compare directly to effect size estimates for other current therapies, which have been evaluated for specific clinical indications. For example, antibody and pharmacological interventions for Alzheimer’s dementia typically have been associated with similar effect sizes to our estimate for HITS. However, those estimates derive from systematic review of randomized controlled trials measuring clinically relevant outcomes at relatively long delays, whereas the HITS studies we review include a mix of controlled and uncontrolled trials, vary in whether clinical outcomes were assessed, and mostly assessed outcomes at shorter delays. Thus, it could be misleading to directly compare the effect sizes. We instead continue to highlight that the HITS effects are promising and warrant rigorous testing for any given clinical indication.

      (5) The section of the Discussion on specificity compares HITS to transcranial electrical stimulation without specifying an anatomical target or intended outcome. A better contrast might be the enormous variety of cognitive and emotional effects claimed for TMS of the dorsolateral prefrontal cortex.

      We now also note that TMS of lateral frontal cortex has not been associated with similarly high specificity (Discussion, pg. 6). Note however that we cannot exclude anti-depressant or other psychological effects of HITS, as such outcomes were not consistently assessed in HITS studies and so were not included in our analyses.

      (6) With reference to why other nodes in the episodic memory network have not been tested, current flow modeling shows TMS of the medial prefrontal cortex is unlikely to be achievable without stronger stimulation of the convexity under the coil, in addition to being uncomfortable. The lateral temporal lobe has been stimulated without undue discomfort.

      We now additionally indicate that medial prefrontal stimulation may be ineffective given conventional TMS (Discussion, pg. 7). However, we are aware of no studies that have stimulated the portion of middle temporal gyrus that shows strong connectivity with hippocampus. We have tried this location, which positions the coil on or slightly above the ear and bordering on the temple area that is very sensitive to most. We were not able to minimize pain/discomfort for most subjects in pilot experiments, and so had to abandon it. Perhaps others have succeeded? If the reviewer has any specific references that could be included we would be happy to add them and update this section accordingly.

      (7) Finally, a critical question hanging over the clinical applicability of HITS and other neuromodulation techniques is how well they will work on a damaged substrate. Functional and/or anatomical imaging might answer this question and help screen for likely responders. The authors' opinion on this would be informative.

      We appreciate this point but don’t think there are enough data to assess the level of substrate damage needed to frustrate any stimulation benefits. The only thing we can say is that HITS was equally effective for mild to moderate Alzheimer’s dementia as it was for other non-neurodegenerative groups (nonsignificant effect of the Population factor, Figure 3B), suggesting that whatever degree of damage present in that group is insufficient to prevent the stimulation effects. We now highlight this point and raise the issue that, presumably, some level of damage would render HITS ineffective (Discussion, pg. 8).

      Reviewer #3 (Public review):

      (1) My only significant concern is how studies are categorized in the 'Timing' factor (when stimulation is applied). Currently, protocols in which TMS is administered across days are categorized as 'pre-encoding' in the Timing factor. This has the potential to be misleading and may lead to inaccurate conclusions. When TMS is administered across multiple days, followed by memory encoding and retrieval (often on a subsequent day), it is not possible to attribute the influence of TMS to a specific memory phase (i.e., encoding or retrieval) per se. Thus, labeling multi-day TMS studies as 'pre-encoding' may be misleading to readers, as it may imply that the influence of TMS is due to modulation of encoding mechanisms per se, which cannot be concluded. For example, multi-day TMS protocols could be labeled as 'pre-retrieval' and be similarly accurate. This approach also pools results from TMS protocols with temporal specificity (i.e., those applied immediately during encoding and not on board during memory testing) and without temporal specificity (i.e., the case of multi-day TMS) regarding TMS timing. Given the variety of paradigms employed in the literature, and to maximize the utility/accuracy of this analysis, one suggestion is to modify the categories within the Timing factor, e.g., using labels like 'Temporally-Specific' and 'Temporally Non-specific'. The 'Temporally-Specific' category could be subdivided based on the specific memory process affected: 'encoding', 'retrieval', or 'consolidation' (if possible). I think this would improve the accuracy of the approach and help to reach more meaningful conclusions, given the variety of protocols employed in the literature.

      We agree in principle with this criticism and think that the most straightforward way to address it is to relabel the “Pre-Encoding” category as “Pre-Task”. The issue with labeling/considering single-session stimulation delivered immediately before encoding as “Pre-encoding” is that this makes the assumption that this stimulation doesn’t also affect retrieval (i.e., is temporally specific). We do not have certainty about the timecourse of how a single session of stimulation affects brain activity. We think the “Pre-Task” label and interpretation is the best way to address this, to avoid suggesting that we are confident about the timecourse/selectivity of stimulation effects. Notably, the “Sessions” factor directly compares among designs that delivered stimulation in a single session versus in multiple consecutive sessions, and was a nonsignificant modulator. Thus, our analyses already compare studies that are relatively temporally specific versus those that, likely, are less so. In addition to relabeling, we have also added clear caveats to address the interpretive constraint imposed by the unknown timecourse of stimulation effects (Discussion, pg. 6-7) and revised the Abstract to reflect this change.

      (2) As the scope of the meta-analysis is limited to TMS applied to parietal or superior occipital cortex, it is important to highlight this in the Introduction/Abstract. The 'HITS' terminology suggests a general approach that would not necessarily be restricted to parietal/nearby cortical sites.

      This was previously highlighted only in the Methods and Discussion (with a Discussion paragraph dedicated to the issue of target selection; see also Comment 6 from Reviewer 2). We now also note this in the Introduction (pg. 2) and Abstract.

      Minor:

      (1) To reduce the number of study factors tested, data reduction was performed via Lasso regression to remove factors that were not unique predictors of the influence of TMS on memory. This approach is reasonable; however, one limitation is that factors strongly correlated with others (and predict less unique variance) will be dropped. This may result in a misrepresentation, i.e., if readers interpret factors left out of this analysis as not being strongly related to the influence of TMS on memory. I do see and appreciate the paragraph in the Discussion which appropriately addresses this issue. However, it may be worth also considering an alternative analysis approach, if the authors have not already done so, which explicitly captures the correlation structure in the data (i.e., shown in Figure S2) using a tool like PCA or an appropriate factor analysis. Then, this shared covariance amongst factors can be tested as predictors of the influence of TMS - e.g., by testing whether component scores for dominant PCs are indeed predictive of the influence of TMS. This complementary approach would capture rather than obfuscate the extent to which different factors are correlated and assess their joint (rather than independent) influence on memory, potentially resulting in more descriptive conclusions. For example, TMS intensity and protocol may jointly influence memory.

      We argue that feature selection via Lasso regression is a better approach for our research question than PCA, factor analysis, or other latent variable methods. The main reason is that PCA would sacrifice the interpretability of our findings with respect to the design of future experiments using or testing HITS. That is, because PCA creates composite components that are linear combinations of multiple variables, we would lose the ability to provide clear, actionable guidance to researchers about which specific study design choices (e.g., stimulation intensity, protocol type, timing) influence memory outcomes. Given that a major goal of our meta-analysis is to inform future experimental design, we believe that it is essential to maintain interpretability of the individual factors that must be decided when designing a study. Regarding factor analysis, this approach would require making a priori theoretical decisions about how to group individual moderators, which could introduce subjective bias into the analysis and would introduce other complications such as a need for validation of the resulting factor scores. We believe that the exploratory nature of our investigation, examining which among many possible study design factors substantially determine TMS efficacy, is better suited to a data-driven selection approach like Lasso. While the reviewer correctly notes that Lasso may drop factors that are correlated with stronger predictors, this feature can be considered advantageous in terms of identifying factors for inclusion in future study designs. That is, this can help identify the most parsimonious set of independent predictors, such that researchers can focus on the study design elements that matter most when controlling for other factors. Notably, we provide the table of factor relationships (Figure S2) so that interested readers can inspect how dropped factors were related to those that were retained.

      It is also important to note that we have provided the full dataset with our resubmission, which has been deposited in Dryad with a link in the Data Availability section (pg. 15). Thus, others are free to explore alternative analytical approaches should they wish to examine the data from different perspectives or to answer different questions.

      (2) Given the specific focus on TMS applied to parietal cortex to modulate hippocampal and related network function, it would be fruitful if the authors could consider adding discussion/speculation regarding whether this approach may be effectively broadened using other stimulation methods (e.g., tACS, tDCS), how it may compare to other non-invasive brain stimulation methods with depth penetration to target hippocampal function directly (transcranial temporal interference, or transcranial focused ultrasound), and/or how or whether other stimulation sites may or may not be effective.

      We briefly discuss a meta-analysis of tACS studies which reported nonspecific effects, including for parietal targets overlapping those used for HITS (Discussion, pg 6). We briefly speculate about how tES effects remain mechanistically uncertain. We are afraid that further speculation about other stimulation modalities and targets would be beyond the scope of this focused meta-analysis, given especially the few datapoints for newer approaches such as TI or tFUS.

      (3) Studies were only included in the meta-analysis if they contained objective episodic memory tests. How were studies handled that included both objective and subjective memory, or other non-episodic memory measures? For example, Yazar et al. 2014 showed no influence of TMS on objective recall, but an impairment in subjective confidence. I assume confidence was not included in the meta-analysis. Similarly, Webler et al. 2024 report results from both the mnemonic similarity task (presumably included) and a fear conditioning paradigm (presumably excluded). Please clarify in the methods how these distinctions were handled.

      Studies were included in our meta-analysis if they included at least one objectively scorable test of episodic memory. We only included objectively scorable test performance in our analysis, excluding scores from any other subjective measures if they were also reported. This is now clarified in Methods (pg. 9).

      (4) The analysis comparing memory to non-memory measures is important, showing the specificity of stimulation. Did the authors consider further categorizing the non-memory tasks into distinct domains (i.e., language, working memory, etc.)? If possible, this could provide a finer detail regarding the selectivity of influences on memory vs. other aspects of cognition. It is likely that other aspects of cognition dependent on hippocampal function may be modulated as well, i.e., tasks with high relational/associative processing demands.

      This is an interesting idea, but it is beyond our expertise to categorize these other tasks based on the nature of processing demands that they capture. Note that the task names are provided in the data table that we are making available online with our submission of record (via Dryad), such that other groups could address this question if interested.

      (5) In the analysis of the Intensity factor, how were studies using Active (rather than resting) MT categorized? Only resting MT is mentioned in Table S1. This is important as the original theta-burst TMS protocol from Huang et al. 2005 determines intensity based on Active Motor Threshold.

      MT was resting/passive in all reviewed studies except for one (Tambini et al. 2018), which used 80% of active MT. We categorized this as <100% MT for the Intensity factor, as it was <100% of MT as defined in that study. Although one could make the argument that 80% AMT might instead correspond to 100+% RMT, this change would have very little influence on our results or conclusions. We now clarify this in Table S1.

      (6) Is there a reason why the study by Koen et al. 2018 (Cognitive Neuroscience) was not included? TMS was performed during encoding to the left AG, and objective memory was assessed, so it would seemingly meet the inclusion criterion.

      The failure to include Koen et al. 2018 was our error. Koen et al. 2018 is the only study that used “online” stimulation, delivered during the trials when memoranda were displayed for encoding in the task. In contrast, all other reviewed studies delivered “offline” stimulation either before the memoranda was presented (“Pre-Task”) or after the encoding period but before retrieval (“Post-Encoding”). Therefore, categorization for the “Timing” factor would be problematic for its inclusion in the main analysis. We therefore now include Koen et al. 2018 in the “Supplementary Results” section as well as the corresponding main Results section on “Similar outcomes in studies that were excluded from meta-analysis”. We also note in the relevant discussion that “online” stimulation, as done in Koen et al. 2018, is typically considered disruptive (e.g., Beynel et al. 2019 Neuroscience & Biobehavioral Reviews; Yeh & Rose 2019 Frontiers in Psychology), which should be taken into account when considering the findings of Koen et al. 2018 relative to other reviewed studies that used “offline” designs.

      (7) It would be helpful to briefly differentiate the current meta-analysis from that performed by Yeh & Rose (How can transcranial magnetic stimulation be used to modulate episodic memory?: A systematic review and meta-analysis, 2019, Frontiers in Psychology) (other than being more current).

      Beyond being more current and therefore including many more studies in which stimulation targets were based on hippocampal connectivity (which tend to have been published more recently), the differences with Yeh & Rose 2019 are subtle. Our review focuses on assessment of network targeting and whether effects were specific to episodic memory versus other tasks, which differs somewhat from the focus of Yeh & Rose 2019. The main difference in conclusions likely derives from there being more network-focused memory TMS experiments now than were available for Yeh & Rose’s review. We also differentiate episodic memory into recollection versus other components to test specificity and analyze modulation by many study design factors relevant to HITS studies that were not emphasized in Yeh & Rose’s review. Note that we now cite Yeh & Rose for those interested in potential differences.

      (8) For transparency and to facilitate further understanding of the literature and potential data re-use, it would be great if the authors consider sharing a supplementary table or file that describes how individual studies/memory measures were categorized under the factors listed in Table S1.

      As promised in our original submission, we are providing the full data table, including how individual studies and memory measures were categorized, as an open dataset in Dryad. The Dryad dataset is cited in “Data availability” (pg. 15).

      Reviewer #3 (Recommendations for the authors):

      Please explicitly state in the Methods (Meta-analysis of effect modifiers section) that the criteria used for categorizing each measure into a factor (e.g., probing Recollection, Recognition, etc.) are fully described in Table S1; this will help readers to find these details (it took me a while!).

      This is now emphasized (pg. 10).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer 1 (Public review):

      (1) "The timescales of the peptide recognition and unbinding process are much longer than what can be sampled from unbiased simulations. Therefore, the proposed mechanism of recognition should only be considered a hypothesis based on the results presented here. For example, peptides that do not dissociate within one one-microsecond MD simulation are considered to be stable binders. However, they may not have a viable way to bind to the narrow protein cleft in the first place."

      We thank the Reviewer for this valuable feedback and we agree with the Reviewer. Our work on the IRE1 cLD activation mechanism is focused on generating a hypothesis of the binding mechanism driven by MD simulations. We recognize the limitations in defining a stable binder due to the time scales sampled. However, our primary focus was to sample and characterize a possible binding pose in the center of the cLD dimer. We contextualized our statements about stable binders and limited our claims to stating that the protein-peptide complex is stable within 1 µs-long simulations. However, we believe that our finding that the cLD dimer groove is not able to accommodate peptides is solid, as the steric impediment described is present in all our replicas, both with and without peptides, in a cumulative sampling time of 24 µs without peptides and 66 µs with peptides. Additionally, we included a plot showing the distribution of groove width across all replicas.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) The title was changed from “Unfolded polypeptides can stably bind to hIRE1α cLD dimer” to “Unfolded polypeptides bind to hIRE1α cLD dimer surface”

      Addition to the text. (Figure 15 A legend) “(A) Distributions of the groove width of peptide-bound cLD dimers throughout all simulations performed. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      (2) Oftentimes, representative structures sampled from MD simulation are used to draw conclusions (e.g., Figure 4 about the role of R161 mutation in binding affinity). This is not appropriate as one unbinding event being observed or not observed in a microsecond-long trajectory does not provide sufficient information about the binding strength of the free energy difference.

      We thank the Reviewer for the insightful comment. As explained in the previous point, we believe that our simulations provide useful hypotheses. We are aware of the limitations due to the timescale and agree that these limitations cannot be overcome with standard equilibrium simulations. To address these limitations, used orthogonal methods, specifically MM/PB(GB)SA calculations, to calculate binding free energies from existing trajectories. We added predictions of all the peptides using AlphaFold 3, to confirm the binding region. Importantly, we now provide experimental results to assess the binding affinity of cLD dimer mutants E102R and Y161R.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “AlphaFold3 predictions of the complexes indicate that the peptides adopt the same preferred orientation, despite being predominantly helical (Supplementary Fig. 16A). We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      Addition to the text. (Figure 16 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD dimer in complex with peptides. Colors represent the confidence of the prediction (plDDT). (B) Difference in enthalpy (enthalpy of binding, ∆H) as an estimate of the binding free energies of unfolded polypeptides to hIRE1α cLD dimer derived from MM/PBSA calculations of our peptide simulations.”

      Addition to the text. (Figure 4 G legend) “(G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Results section: Point mutations destabilize unfolded peptide binding to cLD) “To experimentally test whether these residues are involved in hIRE1α LD’s interaction with peptides, we expressed and purified these mutants and conducted fluorescence anisotropy experiments using fluorescently labeled MPZ1N-2X peptide. We could purify both E102R and Y161R mutants to high purity (Supplementary Fig. 18C). They both behaved similarly to the wild type during purification. Notably, both E102R and Y161R mutants demonstrated around two-fold lower binding affinity (Fig. 4G, E102 K<sub>1/2</sub>= 6.35 µM and Y161R K<sub>1/2</sub>= 5.4 µM, Supplementary Table 3) compared to the wildtype (K<sub>1/2</sub>= 2.14 µM, Supplementary Table 3), revealing that the protein’s central area is crucial for binding unfolded proteins and that binding activity occurs within the pocket defined by E102 and Y161.”

      Addition to the text. (Figure 4G legend) “(G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Supplementary Table 3)

      Reviewer 2 (Public review):

      (1) Improving presentation to include more computational details.

      We thank the Reviewer for raising this critical point. We agree that the manuscript is tailored for a biology audience, as the data are particularly relevant for that community. Nevertheless, we also understand the importance of providing sufficient methodological detail for computational readers. We added more references to the methods for computational information in the main text.

      (2) More quantitative analysis in addition to visual structures.

      We added an uncertainty estimate for the HDX calculations using bootstrapping and included additional information on bond distances for E102 and Y161. We also incorporated time-series data showing the distance of the peptide from the groove across all replicas.

      Addition to the text. (Figure 1C legend) “(C) The deuterated fraction obtained from experimental results (dashed line, shaded area indicates the error we calculated from bootstrapping) published by Amin-Wetzel et al. and the fraction computed from MD simulations (solid lines, blue for TIP3P water and orange for TIP4PD water) for the PDB and AF model at incubation time point 0.5 min. This time point corresponds to experimental incubation times, not MD simulation time. Each point represents the mean value derived from three replicas and two monomers per replica. The error bars were obtained from bootstrapping. Below each absolute value plot, we report the discrepancy, which is defined as the difference between the simulated and experimental deuterated fractions, with the shaded area indicating the corresponding error.”

      Addition to the text. (Figure 15B legend) “(B) Minimum groove-peptide distance over time for all simulations of cLD dimer in complex with a peptide. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      Reviewer 3 (Public review):

      A potential weakness of the study is the usage of equilibrium (unbiased) molecular dynamics simulations, so that processes and conformational changes on the microsecond time scale can be probed. Furthermore, there can be inaccuracies and biases in the description of unfolded peptides and protein segments due to the protein force fields. Here, it should be noted that the authors do acknowledge these possible limitations of their study in the conclusions.

      We appreciate the Reviewer’s thoughtful comment. As noted in our response to Reviewer 1, we addressed the concern about sampling by applying orthogonal methods and experimental techniques. We agree with the Reviewer that some form of enhanced sampling is necessary if we want to assess binding in a more quantitative way, e.g., via free energy calculations. However, we also realize that applying any enhanced sampling scheme to our system is very challenging, given its large size and the complex peptide-protein interactions, which are not easily captured in a few collective variables. After a careful assessment and some preliminary tests, we decided that estimating free energies using enhanced sampling would necessitate a separate paper due to both the conceptual complexity of the project and the size of the necessary sampling campaign.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Some enhanced sampling or path sampling simulations may be carried out to identify the peptides’ binding and unbinding mechanisms to the protein. This can show whether the disordered peptides studied in this work do indeed bind to the protein.

      We thank the Reviewer for this constructive criticism. We acknowledge the limitations associated with investigating binding and unbinding mechanisms of disordered peptides within the time scales accessible to our equilibrium simulations. However, the primary objective of our study was to sample and characterize a plausible binding pose at the center of the cLD dimer. We wanted to understand if unfolded model peptides require an open groove able to contain them to bind to IRE1’s core luminal domain or if binding also in the absence of an open groove.

      Enhanced sampling is, of course, an important strategy to overcome the limits of equilibrium simulations. However, we note that implementing enhanced sampling approaches in this system poses significant challenges due to its large size and the complexity of peptide–protein interactions, which cannot be easily captured using a limited set of collective variables. We decided that a thorough application of enhanced sampling would therefore constitute a separate study. Instead, we decided to validate our simulations in two ways: 1) we ran a new set of free energy calculations, and 2) we tested key predictions in experiments, adding significant new data to strengthen the conclusions of our manuscript.

      To evaluate whether the binding free energies of MPZ-derived peptides to human IRE1α cLD dimers are consistent with experimentally reported binding constants, we employed the MM/PBSA (Molecular Mechanics/Poisson–Boltzmann Surface Area) method. Calculations were performed over the final 250 ns of each simulation replica using the Single Trajectory Protocol (STP), which avoids the need for additional simulations. This approach provides an estimate of the effective binding free energy (i.e., enthalpy of binding) by accounting for bonded and non-bonded interactions, as well as solvation contributions. The entropic contribution, being computationally more demanding and subject to additional approximations, was not included. Binding enthalpies were obtained for MPZ1-N (in different initial orientations), MPZ1-C, MPZ1-N-2X, and MPZ1-N-2X-RD. The results indicated small differences in effective binding energies between the shorter peptides (MPZ1-N and MPZ1-C), whereas MPZ1-N-2X exhibited the lowest binding energy and MPZ1-N-2X-RD the highest, consistent with experimental trends. These findings support the reliability of our model and sampling strategy as a framework for analyzing peptide binding conformations to cLD.

      We identified residues E102 and Y161 as key contributors to the binding of unfolded peptides in our simulations. Contact analysis revealed these residues as binding hotspots, centrally located within the observed interaction regions. To probe their relevance, we conducted simulations of cLD dimers with single arginine mutations in these residues, aimed at disrupting these hotspots through charge repulsion. These simulations revealed increased instability of the MPZ1N2X on the cLD dimer surface. We further validated these findings experimentally using fluorescence anisotropy assays. Fluorescently labeled MPZ1N-2X was titrated with purified cLD mutants (E102R and Y161R), and anisotropy measurements were fitted to derive  K<sub>1/2</sub> values. Both mutations resulted in approximately a two-fold reduction in binding affinity relative to the wild-type cLD, confirming the importance of these residues in stabilizing peptide binding.

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “Thus, we investigated how the point mutations of two key residues, E102R and Y161R, would affect peptide binding by simulating the cLD mutant in complex with MPZ1N-2X (Fig. 4C-E). We initialized the systems in the pose described for the other peptide-cLD systems described earlier (Fig. 3B, t = 0 µs). In simulations of the wild-type (WT) cLD dimer, the peptide generally remained near the center (Fig. 4C,F). By contrast, MPZ1N-2X displayed reduced binding to E102R, fully dissociating in one TIP4P-D replica (Fig. 4E,F). A similar trend was observed for Y161R, where one partial dissociation event occurred (Fig. 4D,F). Comparative analysis of MPZ1N-2X contact sites on the WT and mutant cLD dimers (Supplementary Fig. 17B-D) revealed that, in the presence of mutations, the peptide engages a broader surface region rather than remaining centrally localized, while forming fewer contacts with the specific residues (Supplementary Fig. 18A-B).”

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “To experimentally test whether these residues are involved in hIRE1α LD’s interaction with peptides, we expressed and purified these mutants and conducted fluorescence anisotropy experiments using fluorescently labeled MPZ1N-2X peptide. We could purify both E102R and Y161R mutants to high purity (Supplementary Fig. 18C). They both behaved similarly to the wild type during purification. Notably, both E102R and Y161R mutants demonstrated around two-fold lower binding affinity (Fig. 4G, E102  K<sub>1/2</sub>= 6.35 µM and Y161R  K<sub>1/2</sub>= 5.4 µM, Supplementary Table 1) compared to the wildtype (K<sub>1/2</sub>= 2.14 µM, Supplementary Table 1), revealing that the protein’s central area is crucial for binding unfolded proteins and that binding activity occurs within the pocket defined by E102 and Y161.”

      Addition to the text. (Figure 4 legend) “(E) Side view snapshot after 1 µs of simulation of E102R hIRE1α cLD dimer (gray) in complex with MPZ1N-2X (orange). The amino acid R102 on both monomers is represented in magenta sticks. (F) Time series of the minimum groove-peptide distance for MPZ1N-2X simulated in complex with wild-type, E102R, and Y161R hIRE1α cLD dimer in TIP3P (3 replicas) and TIP4P-D (3 replicas) water. The darker lines show the rolling average over 25 frames, while the shaded lines represent the raw data. (G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Methods section: Binding free energy calculations (MM/PBSA)) “The binding free energy of noncovalently bound complexes of human IRE1 cLD and peptides was calculated with MM/PBSA (Molecular mechanics/PoissonBoltzmann Surface Area) method via gmx_MMPBSA (version 1.6.4)[1, 2]. The Poisson-Boltzmann method was used to estimate the electrostatic contribution to solvation free energy as recommended for data obtained with the CHARMM force field. The contribution of the entropic term was omitted, obtaining effective binding free energy values, or enthalpy of binding (∆H). We used the Single Trajectory Protocol (STP), using the cLD-peptide simulations as input. The calculations were performed on the last 250 ns of each replica. Single-term total non-polar solvation free energy (inp = 1) was used. The charmm_radii (PBRadii= 7) was used to build amber topology files [3]. The default parameters were applied for other terms.”

      Addition to the text. (Methods section: Protein purification) “To express hIRE1α LD (24-443) human cDNA sequences were cloned into pET47b(+) to create a coding sequence with N-terminal His6-tag. Mutations of hIRE1α LD were introduced by overlap extension PCR and restriction cloning into pET47b(+). For expression of the proteins, the plasmid of interest was transformed into Escherichia coli strain BL21DE3* RIPL (Agilent Technologies). Cells were grown in Luria Broth until OD600=0.6-0.8. Protein expression was induced with 0.6 mM IPTG, and cells were grown in 20°C overnight. For purification, cells after harvesting were resuspended in Lysis Buffer (50 mM HEPES pH 7.2, 400 mM NaCl, 20 mM imidazole, 5% glycerol, 5 mM β-mercaptoethanol) and were lysed in Constans Systems cell disruptor at 25 000 psi. The supernatant was collected after centrifugation for 45 minutes at 48000×g in 4°C. Supernatant was loaded onto Ni-NTA column (Cytiva) and the protein eluted with a linear gradient of imidazole from 20 to 500 mM. Fractions containing the protein were diluted 1:8 with anion exchange wash buffer (50 mM HEPES pH 7.2, 5 mM β-mercaptoethanol), loaded onto HiTRAP-Q ion exchange column (Cytiva) and eluted with a linear gradient from 50 mM to 1 M NaCl. Afterwards, the His6tag was removed by cleavage with Precission protease (GE Healthcare, 1 µg of enzyme per 100 µg of protein). The cleavage was performed overnight in 4°C. The protein sample after cleavage was loaded onto a Ni-NTA column, and the flow-through containing protein without the tag was collected. The protein was further purified on a Superdex 200 10/300 gel filtration column equilibrated with Buffer A (25 mM HEPES pH 7.2, 150 mM NaCl, 2 mM DTT). Protein concentrations were determined using extinction coefficient at 280 nm predicted by the Expasy ProtParam tool (http://web.expasy.org/protparam/).”

      Addition to the text. (Methods section: Fluorescence anisotropy) “For fluorescence anisotropy measurements, the MPZ1-N-2X peptide attached to 5 carboxyfluorescein (5-FAM) at its N-terminus was obtained from GenScript at >95% purity. Binding affinities of hIRE1α LD mutants to FAM-labeled peptides were determined by measuring the change in fluorescence anisotropy on a Tecan CM Spark Micro Plate Reader with excitation at 485 nm and emission at 525 nm with increasing concentrations of hIRE1α LD variants. Measurements were performed in Buffer A supplemented with Tween 20 (25 mM HEPES pH 7.2, 150 mM NaCl, 2 mM DTT, 0.025% Tween 20). Fluorescently labeled peptides were used in a concentration of 90 nM. The reaction volume of each data point was 25 µL and the measurements were performed in 384-well, black flat-bottomed plates (Corning) after incubation of peptide with hIRE1α LD variants for 30 min at 25◦C. Binding curves were fitted using Prism Software (GraphPad) using the following equation: F<sub>bound</sub> = r<sub>free</sub> +( r<sub>max</sub>r<sub>free</sub>)/(1+10((Log K<sub>1/2</sub> −x)·n<sub>H</sub>)), where F<sub>bound</sub> is the fraction of peptide bound, r<sub>max</sub> and r<sub>free</sub> are the anisotropy values at maximum and minimum plateaus, respectively. n<sub>H</sub> is the Hill coefficient and x is the concentration of the protein in log scale. Curve-fitting was performed with minimal constraints to obtain K<sub>1/2</sub> values with high R<sup>2</sup> values. However, as this equation does not consider the equilibria between hIRE1α LD dimers/oligomers, these apparent K<sub>1/2</sub> values do not reflect the dissociation constant.”

      (2) Wherever possible, conclusions related to binding affinity should not be drawn from single unbinding events. For example, the title of Figure 4, "Single point mutation of cLD alters the binding affinity of unfolded peptide," should be softened. Similar changes should be made throughout the manuscript where such claims have been presented.

      We thank the Reviewer for highlighting this important point. In the revised manuscript, we have adjusted the text to remove or soften conclusions related to binding affinity that were based on single unbinding events in the MD simulations.

      Addition to the text. (Figure 4 title) “Single point mutations of cLD alter the binding of unfolded peptide MPZ1N-2X.”

      Addition to the text. (Results section title: Unfolded polypeptides can stably bind to hIRE1α cLD dimer) “Unfolded polypeptides bind to hIRE1α cLD dimer surface.”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1αα cLD dimer surface) “Our goal was to elucidate a potential binding pose and identify the relevant features of unfolded proteins and the cLD that affect the binding.”

      Reviewer #2 (Recommendations for the authors):

      (1) A table of all simulated trajectories, including simulation conditions, number of replicas, box size, number of atoms, equilibration length, recording time step, number of frames for further analysis.

      We thank the Reviewer for this helpful suggestion. We have added a summary table of all simulations, including the requested details, to the Supplementary Information (Table 1).

      Addition to the text. (Supplementary figures and tables: Table 2)

      (2) The current NVT equilibration time was 0.125ns, and then no productive NPT simulations were mentioned as equilibration. Even though this is a simulation of mostly folded structures, it still takes some time for these amino acids to relax within the force field.

      We thank the Reviewer for this constructive comment and acknowledge the validity of the concern. However, our simulations were extensively sampled, and equilibration was achieved within the first 50 ns of the production runs. Therefore, the segments of the trajectories from which we draw conclusions correspond to equilibrated states (see RMSD analysis, Figure 1). Additionally, binding free energy calculations (MM/PBSA) were carried out on the last 250 ns of the simulation replicas.

      (3) At least three histograms were presented in Figure 2C, which I guess is from multiple simulations, and does not seem to be discussed.

      We thank the Reviewer for pointing out the lack of reference to Figure 2C. We added the correct reference to the text where the groove width of luminal domains of human and yeast is discussed.

      Author response image 1.

      RMSD analysis of human IRE1_α_ cLD dimer simulated in complex with unfolded peptides.

      Addition to the text. (Results section: The putative groove of human IREα cLD is dynamic but unable to contain peptides ) In simulations of the dimeric structures, the average groove width was 7.3 ± 0.1 Å for the human cLD and 8.9 ± 0.1 Å for the yeast cLD, averaged over three TIP3P and three TIP4P-D replicas per system (Fig. 2C).

      (4) The comment regarding the CHARMM force field on Page 6 is not justified. Actually the force field the authors used (CHARMM36m, Jing et al Nat Methods 2016) did include scaling of TIP3P LJ parameters to correctly capture the dimensions of the intrinsically disordered proteins (IDPs). However, the authors cited a couple of examples of literature of previous versions of CHARMM force fields and commented that it cannot capture IDP dimensions with TIP3P.

      We thank the Reviewer for pointing out this source of confusion. We cited the main papers of CHARMM as [4, 5], which were misleading, and following the Reviewer’s advice, we removed these citations.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “Current all-atom force fields used in MD simulations are mainly designed to reproduce the dynamics of folded and globular proteins [6].”

      (5) I am fine that the authors used TIP4PD with CHARMM36m, but caution should be taken for such a combination of protein and water force fields. Note that when optimizing force fields for IDPs, one often has to balance protein-water interactions by either enhancing protein-water interactions, enhancing water dispersions, or reducing protein-protein interactions. So, all such optimization is dependent on both protein and water force fields. TIP4PD was designed to pair with Amber99sb-ildn or, most recently, Amber99sb-disp instead of CHARMM36m. This could result in rescaling of LJ parameters.

      We thank the Reviewer for raising this issue. We argue that the TIP4P-D water model has been used in combination with the CHARMM36m force field [7] and has been shown to yield satisfactory results for disordered regions.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “The TIP4P-D water model was developed to address limitations of existing force fields in reproducing the structural ensembles of intrinsically disordered proteins and regions. It incorporates enhanced dispersion and moderately stronger electrostatic interactions to improve the balance between water dispersion and electrostatics [8]. Zapletal et al. [7] showed that for proteins containing both folded and disordered regions, the CHARMM36m force field [9] in combination with the TIP4P-D water model provides a robust framework, preventing collapse of disordered regions while preserving folded regions. Acknowledging that the behavior of disordered regions can be case-specific, we conducted molecular dynamics simulations of the two cLD dimer models using the CHARMM36m force field with both TIP3P and TIP4P-D water models.”

      (6) I suggest referring to the methodology part for simulation details as much as possible when presenting the story.

      We thank the Reviewer for this suggestion. In the revised manuscript, we now refer the reader to the Methodology section for detailed descriptions of the HDX-MS data analysis and the MM/PBSA free energy calculations.

      Addition to the text. (Results section: Hydrogen-deuterium exchange experimental data validate the cLD dimer structure) “From our simulations, we calculated the theoretical deuterated fraction using the method by Bradshaw et al.[10] and compared it to the experimental data (Fig. 1C-D and Supplementary Fig. 10) (see Methods).”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      (7) Error bars and methodology of error analysis should be provided for all cases of all-atom simulations if possible, since convergence is always an issue when considering these conformational changes within microseconds of all-atom simulations.

      We thank the Reviewer for the important observation. We agree and added error methodology for the estimation of theoretical deuterated fractions (Fig. 1C).

      Addition to the text. (Figure C legend) “Each point represents the mean value derived from three replicas and two monomers per replica. The error bars were obtained from bootstrapping.”

      Addition to the text. (Methods section: Hydrogen-deuterium exchange fractions calculation from MD simulations) “To reproduce the time points after incubation in deuterium (D<sub>2</sub>O), we computed deuterated fractions separately for each of the two monomers constituting a dimer for the time points 0.5 min (30 s) and 5 min (300 s). Then, we computed the mean and standard deviation over the data coming from replicas of the same cLD dimer model (AF or PDB model) and the same water model (TIP3P or TIP4P-D). To estimate the uncertainty of the mean values obtained from our datasets and the dataset from Amin-Wetzel et al. ([11] Figure 3—source data 1), we applied a non-parametric bootstrap resampling procedure. For each sequence range from HDX-MS analysis, we treated the measurements from the N=6 independent datasets as independent samples, accounting for 3 replicas each with two monomers (6 monomers total). We then generated 10,000 bootstrap replicates by sampling the datasets with replacement, maintaining the same number of samples N in each resample. For each replicate, we calculated the mean at each sequence position. The resulting distribution of bootstrap means was used to compute the standard deviation as an estimate of the standard error. We computed the difference between simulation and experimental data (deuterated fraction discrepancy), and for each residue, we selected as the ‘best structure’ the model with the discrepancy closest to zero among PDB-TIP3P, PDB-TIP4P-D, AF-TIP3P, and AF-TIP4P-D systems.”

      (8) Technically I would call DR1 and DR2 linker regions within a folded structure. Their motions are quite restrained by the fold part. I therefore, am not sure how much TIP4PD really helps in contrast to a scaled TIP3P. A plot of structures colored with PLDDT score or b-factor within the PDB should be provided. Quantitative metrics of these regions (e.g. chi chi-squared) might help justify the choice of the AF model against the PDB model. Currently, the two models look very similar in Figures 1c and 1d. Similarly, quantitative metrics as a function of different simulation time windows will help justify the convergence of the simulation and indicate the flexibility of these regions.

      We thank the Reviewer for this thoughtful comment. In response, we analyzed the AlphaFold2 and AlphaFold3 predictions, which consistently assign very low pLDDT values (<50) to the DR2 region, while DR1, is predicted with higher but still low confidence (50 < pLDDT < 70). These scores indicate intrinsic uncertainty in the structural definition of both regions, supporting their flexibility despite being located within a folded context.

      Addition to the text. (Results section: The hIRE1_α_ cLD forms a stable dimer) “All five AlphaFold 2 predictions closely resembled the top-ranked model used for our simulations (Supplementary Fig. 7C). In contrast, the five AlphaFold 3 predictions yielded greater variability in DR2 organization and longer helices in DR2, but still consistently maintain low pLDDT scores in this region, indicating disorder (Supplementary Fig. 7D).”

      Addition to the text. (Figure 7 C-D legend) “(C) Superposition of the 5 structures predicted by AlphaFold 2 Multimer for the cLD dimer and colored by confidence prediction score (pLDDT). (D) Superposition of the 5 structures predicted by AlphaFold 3 for the cLD dimer and colored by confidence prediction score (pLDDT).”

      (9) Fluorescence anisotropy seems to be an important set of experimental data to justify the binding of multiple unfolded peptides to IRE. I suggest the authors include a bar plot of binding affinity of different variants in Figure 3. The raw titration curves should also be included in SI.

      We thank the Reviewer for this valuable suggestion. The binding affinities reported in previous studies are summarized in Table 2; the reader is referred to those works for the corresponding raw titration curves. The binding affinities for the cLD mutants analyzed in the present study are provided in Table 3, and the associated titration curves are shown in Figure 4G.

      Addition to the text. (Figure 4G legend) “Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Supplementary figures and tables: Table 3) See Tab. 1

      (10) The authors should discuss the dependence of initial orientations of unfolded peptides on the final results. The authors claimed that after 1 microsecond simulations, the orientation of these peptides to IRE changed. Quantitative metrics showing both the binding (e.g., number of contacts) and binding orientation (contact region or angles) should be provided to tell whether the simulation is converged. The comparison to the experimental data lacks quantitative metrics. The authors mentioned the dissociation of MPZ1N-2X-RD in half of the simulations; they might want to provide such a metric for all peptides. Technically, 1 microsecond brute-force simulation is quite short for observing such a binding event, and enhanced sampling methods (e.g. metadynamics) might be necessary for investigating binding. However, at least the presentation and interpretation of the current results should be improved for comparing simulations and experiments.

      We thank the Reviewer for the insight. We expanded the discussion of the peptide orientation and added an analysis of the peptide angle with respect to the cLD central groove and contacts. Additionally, we inserted AlphaFold 3 predictions of all the simulated complexes.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1_α_ cLD dimer surface) “In initial simulations with peptides valine8 and MPZ1-N, we positioned the polypeptides over the cLD, aligning them parallel to the principal axis of the central groove in accordance with the proposed binding mode. We refer to this pose as the "0◦ orientation", as the peptide forms a 0 ◦ angle with the principal axis of the groove. We observed that the peptides could rearrange into an orientation perpendicular to the central groove axis, while maintaining contact with the dimer (Fig. 3A, Supplementary Fig. 13A, valine8 TIP4P-D, and Supplementary Fig. 14). Conversely, when MPZ1-N was initially oriented perpendicularly to the groove, it did not transition to a parallel (0◦) orientation (Supplementary Fig. 14). We refer to these poses as the "90◦ orientation" and "270◦ orientation".”

      Addition to the text. (Supplementary Figures and Tables Fig. 14) “(A) Peptide orientation with respect to the central groove principal axis. The angle was computed as the dihedral angle described by the Cα atoms of Y161 residues (groove principal axis) and the C_α_ atoms of residues L1 and A12 of the MPZ1N peptide. The dark lines indicate the rolling average of the fraction of native contacts over 10 frames, while the shaded lines indicate the value per frame. (B) Number of contacts between hIRE1α cLD dimer and MPZ1N peptide. The dark lines indicate the rolling average of the fraction of native contacts over 50 frames, while the shaded lines indicate the value per frame. The analysis were performed on three sets of simulations: "90 degrees" orientation, the peptide is initially placed perpendicular to the central groove principal axis; "270 degrees" orientation, the peptide is initially placed perpendicular to the central groove principal axis but flipped 180 degrees with respect to the 0 degree; "0 degrees" orientation, the peptide is placed parallel to the groove principal axis.”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “AlphaFold3 predictions of the complexes indicate that the peptides adopt the same preferred orientation, despite being predominantly helical (Supplementary Fig. ??A).”

      Addition to the text. (Supplementary Figures and Tables Fig. 16A) “(A) Prediction of AlphaFold 3 for hIRE1α cLD dimer in complex with peptides. Colors represent the confidence of the prediction (plDDT).”

      (11) I also have a couple of questions regarding the point mutant Y161R. a) The motivation of mutating Y161 to R is more speculative (Figures 4a,b) than quantitative. The authors might want to show an intermolecular contact map between IRE and unfolded peptides or IRE contact probability along residue indexes to show the interaction hotspots. Figure S11 only showed the structure instead of any metrics for such a purpose. b) It might be better to also show a histogram of the distances of Figure 4e and 4f. Figure 4f actually suggested 1 microsecond simulation is quite short to observe the dissociation event. c) Testing the mutation within the experiment, if possible, would clearly strengthen this part of the manuscript.

      We thank the Reviewer for these constructive suggestions. We have added an analysis of intermolecular contacts for the Y161R and E102R mutants (Fig. 18A–B), which highlights the interaction hotspots between IRE1 residues and the unfolded peptides. To further characterize peptide–groove interactions, we now provide minimum peptide–groove distance time series for all peptides (Fig. 15B). Moreover, to experimentally support our simulations, we performed fluorescence anisotropy measurements on the MPZ1N-2X peptide with cLD WT and mutant constructs. These experiments confirm our computational observations (Fig. 4F–G and Fig. 18C).

      Addition to the text. (Figure 18 legend) “(A) Number of contacts between residues 102 on both monomers and the MPZ1-N-2X peptide during simulations of WT hIREα LD and mutants E10R and Y161R. The dark lines indicate the rolling average of the fraction of native contacts over 25 frames, while the shaded lines indicate the value per frame. (B) Number of contacts between residues 161 on both monomers and the MPZ1-N-2X peptide during simulations of WT hIREα LD and mutants E10R and Y161R. The dark lines indicate the rolling average of the fraction of native contacts over 25 frames, while the shaded lines indicate the value per frame. (C) Protein purification of WT hIREα LD and mutants E10R and Y161R.”

      Addition to the text. (Figure 4F-G legend) “(F) Time series of the minimum groove-peptide distance for MPZ1N-2X simulated in complex with wild-type, E102R, and Y161R hIRE1α cLD dimer in TIP3P (3 replicas) and TIP4P-D (3 replicas) water. The darker lines show the rolling average over 25 frames, while the shaded lines represent the raw data. (G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Figure 15B legend) “(B) Minimum groove-peptide distance over time for all simulations of cLD dimer in complex with a peptide. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      (12) Similar comments of quantitative analysis (e.g. contact map as a function of simulation time) apply to the last part of results when discussing the intermolecular interactions. Observations such as "the interface predicted by AlphaFold showed stability across MD simulation replicas lasting 200 ns" were provided, but there is no quantitative analysis. How consistent was this observation across multiple replicas of simulations, and how many replicas were used?

      We thank the Reviewer for this valuable suggestion. To provide a quantitative assessment, we performed new triplicate simulations of the BiP–cLD monomer complex and plotted the fraction of native contacts over time. These results, which demonstrate the consistency of the interface across replicas, are now included in the Supplementary Material.

      Addition to the text. (Figure 19 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ATP-bound BiP. The colors are as in Fig. 5B. (B) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ADP-bound BiP. (C) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with BiP not bound to any nucleotide. (D) Structure of hIRE1α cLDBiP-ATP after 2 µs of simulation. (E) Structure of hIRE1α cLD-BiP-ADP after 2 µs of simulation. (F) Structure of hIRE1α cLD-BiP after 2 µs of simulation.”

      Addition to the text. (Figure 20 legend) “Fraction of native contacts between BiP and cLD monomer in simulations of the structures predicted by AlphaFold 3 without ligands or in complex with ADP or ATP. The dark lines indicate the rolling average of the fraction of native contacts over 100 frames, while the shaded lines indicate the value per frame. The fraction of native contacts (Q) was calculated according to the definition of Best et al. [12]: . For N pairs of native contacts (i, j), where is the distance of the pair in the initial configuration (here the AlphaFold 3 prediction), r<sub>(i,j)</sub>(X) is the distance at frame X, β is a smoothing parameter (β = 50 nm<sup>−1</sup>), λ is the tolerance of the reference distance (λ \= 1.8) and the cutoff used to define a contact between heavy atoms was 0.45 nm.”

      (13) The figure legends are noted using lowercase letters but are described using uppercase.

      We thank the Reviewer for pointing that out, and we changed everything to capital letters.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1: I am confused about the HDX-MS results shown in Figure 1. Here, I must also mention that I am not familiar with comparing HDX-MS experiments with MD simulations. The authors mention that they show the deuterated fraction computed from MD simulations for the PDB and AF model at time points 0.5 min and 5 min. However, this time certainly does not correspond to the MD simulation time, thus, it is unclear to me where the difference between the results comes from. Are the two time points some input parameters to the script used to calculate the deuterated fraction? Thus, I would ask the authors to better explain what is the difference in the results between the two time points. Especially, since the general reader might not be familiar with comparing HDX-MS experimental results to MD simulations. Furthermore, I would ask the authors to clarify in the Figure 1 caption that these time points do not correspond to the MD simulation time.

      We thank the Reviewer for pointing us to this possible source of confusion. The time points are effectively input parameters to the calculations of theoretical deuterated fractions from MD simulations. We expanded the explanation of the method in the method section and clarified in the Figure 1 caption that these time points do not correspond to the MD simulation time.

      Addition to the text. (Methods section: Hydrogen-deuterium exchange fractions calculation from MD simulations) “To determine the deuterated fraction of a peptide segment from simulations, the protection factor for each residue i, Pi, must be computed from the simulation snapshots, following the approach of Best and Vendruscolo [13]: . Here, N<sub>C,i</sub> and N<sub>H,i</sub> are the number of H-bonds and heavy-atom contacts of the backbone amide of residue i, and the scaling factors β<sub>C</sub> and β<sub>H</sub> are set to 0.35 and 2.0, respectively. The simulated deuterated fraction of a peptide segment, , defined by residues m<sub>j</sub> +1 to n<sub>j</sub>, was then calculated at any exchange time point t as:

      Where m<sub>j</sub> and n<sub>j</sub> are the first and last residue numbers of the j-th protein fragment, respectively. The intrinsic exchange rate constants for each residue type () were obtained from Bai et al. with updated acidic residues and glycine [14, 15].”

      Addition to the text. (Figure 1 legend: ) “This time point corresponds to experimental incubation times, not MD simulation time.”

      Addition to the text. (Figure 10 legend: ) “Time points correspond to experimental incubation times, not MD simulation time.”

      (2) For AlphaFold 2 Multimer prediction, the authors only considered the top predicted structure. However, AF2-M, one generally obtains 5 structures, and it is also possible to obtain more structures by using an additional random seed. Thus, it would be interesting if the authors would consider the difference between the 5 structures they obtained from the AF2-M prediction. Are they all very similar? (Especially considering the DR1 and DR2 segments, that is the main difference between the PDB and AF2 structures). Analyzing the different predicted AF2 structures would give more insight into the accuracy of the AF2-M predicted model.

      We thank the Reviewer for this insightful suggestion. All AF2-M predicted structures were found to be highly similar, and we now include them in Figure 7E for comparison.

      Addition to the text. (Figure 7E legend) “(E) Superposition of the 5 structures predicted by AlphaFold 2 Multimer for the cLD dimer and colored by confidence prediction score (pLDDT).”

      (3) On Page 6, the authors talk about a "an early PDB model". First, I find the nomenclature "early" confusing here; perhaps it would be better to talk about "an initial PDB model", but I leave it up to the authors to think about if they want to change that. More importantly, reading the Comp. detail on Page 23, it is not so clear what the difference is between the "early" and "final" PDB models, and how the difference in their setups leads to different results. The information is somewhat there on Page 6 and Page 23, but it can be made much clearer. Thus, I would ask the authors to better explain the difference between the early and final PDB models.

      We thank the Reviewer for this helpful comment. In the revised manuscript, we have clarified the terminology and provided a more explicit explanation of the differences between the two IRE1 models, both in the Results section and in the Methods.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “An initial PDB model with modified side chain orientations in residues L116 and Y166 due to the modelling of neighbouring missing DR1, caused the dimer to dissociate in one-third of the replicas. [...] The final PDB model, with correctly oriented L116 and Y166 (Supplementary Fig. 9B), was stable in simulations in both TIP3P and TIP4P-D water (Supplementary Fig. 7B).”

      Addition to the text. (Methods section: IRE1_α_ core Luminal Domain (cLD) structural models - Human PDB dimer) “An initial PDB model was briefly equilibrated in NPT, and a conformation with a groove width of approximately 0.6 nm was selected. This snapshot was used as the initial structure for the initial “PDB model” simulations, in which the dimer dissociates.”

      (4) Page 12: "In early simulations", again, I find the nomenclature "early" confusing here. Perhaps it would be better to talk about "In initial simulations" or "In preliminary simulations", but I leave it up to authors to think about this.

      We thank the Reviewer for pointing out this possible source of confusion. We improved the text by referring to these simulations based on the different orientations of the peptide on the cLD dimer in the modeled complex.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1_α_ cLD dimer surface) “In initial simulations with peptides valine8 and MPZ1-N, we positioned the polypeptides over the cLD, aligning them parallel to the principal axis of the central groove in accordance with the proposed binding mode. We refer to this pose as the "0° orientation", as the peptide forms a 0° angle with the principal axis of the groove. We observed that the peptides could rearrange into an orientation perpendicular to the central groove axis, while maintaining contact with the dimer (Fig. 3A, Supplementary Fig. 13A, valine8 TIP4P-D, and Supplementary Fig. 14). Conversely, when MPZ1-N was initially oriented perpendicularly to the groove, it did not transition to a parallel (0°) orientation (Supplementary Fig. 14). We refer to these poses as the "90° orientation" and "270° orientation".”

      Here, we provide a detailed description of the additional changes made to the manuscript.

      Additional edits to the manuscript

      Following discussions with Prof. Dr. David Ron, we refined our BiP model by removing the signal peptide (residues 1–18). Using AlphaFold 3, we predicted BiP–cLD heterodimeric complexes in the presence of ADP, ATP, or without nucleotide. Each of the three complexes was simulated in TIP3P water, in three independent replicas of 1 µs each.

      Addition to the text. (Results section: hIRE1α cLD intermolecular interactions guide the activation process) “We used AlphaFold 3 to model the interaction between a cLD monomer and BiP (residues E19–L654) in the presence of ATP and ADP (Fig. 5B, Supplementary Fig. 19A). Prediction quality was limited in the apo and ADP-bound states (pTM = 0.48, ipTM = 0.59; pTM = 0.49, ipTM = 0.61, respectively), whereas ATP binding improved accuracy (pTM = 0.66, ipTM = 0.72). The predicted interfaces involved DR2, particularly residues 314PLLEG-318, forming a short parallel β-sheet with the substrate-binding domain (SBD) of BiP through two hydrogen bonds. All AlphaFold 3 models were stable across three 1-µs simulations (Supplementary Fig. 19B), with cLD–BiP interfaces retaining 60–80% of initial contacts (Supplementary Fig. 20). In the apo and ADP-bound states, the nucleotide-binding domain (NBD) showed high Predicted Aligned Error (PAE) relative to the cLD, indicating uncertain positioning of the two domains relative to each other. Notably, in the ADP-bound state, which is thought to interact with hIRE1α cLD, the NBD remained mobile but proximal to the αB-helices, thereby restricting access to this region. Together, the AlphaFold 3 predictions suggest that BiP engages hIRE1α cLD by sterically hindering the oligomerization interface defined by DR2 and the αB-helices [16].”

      Addition to the text. (Figure 5 legend) “(B) BiP-cLD monomer complex as predicted by AlphaFold (BiP in shades of purple, cLD in orange) before the simulation (t = 0 µs) and at the end of the simulation (t = 1 µs). The SBD (residues E19-D408) is colored in light purple, and the NDB (residues C420-E650) in dark purple, and the interdomain linker (residues D409-V419) and KDEL motif (residues K651-L654) in light purple.”

      Addition to the text. (Figure 19 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ATP-bound BiP. The colors are as in Fig. 5B. (B) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ADP-bound BiP. (C) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with BiP not bound to any nucleotide. (D) Structure of hIRE1α cLDBiP-ATP after 2 µs of simulation. (E) Structure of hIRE1α cLD-BiP-ADP after 2 µs of simulation. (F) Structure of hIRE1α cLD-BiP after 2 µs of simulation.”

      Addition to the text. (Methods section: cLD monomer in complex with BiP) “The BiP-cLD heterodimer systems were predicted with AlphaFold 3 using the AlphaFold server[17] at https://alphafoldserver.com/. The hIRE1α cLD sequence used is the same used for predicting the dimer: the PDB 2HZ6 sequence, Uniprot identifier O75460 with mutations C127S and C311S, and residues P29-P368. The BiP sequence used is taken from UniProt identifier P11021, residues E19L654. We predicted three complexes: one without any nucleotide, one containing ADP, and another containing ATP. Simulations of the BiP-cLD complex were run in TIP3P water.”

      We have updated the Zenodo repository with additional data and calculations, and the corresponding link is provided in the manuscript.

      References

      (1) Mario S. Valdés-Tresanco, Mario E. Valdés-Tresanco, Pedro A. Valiente, and Ernesto Moreno. gmx_mmpbsa: A New Tool to Perform End-State Free Energy Calculations with GROMACS. Journal of Chemical Theory and Computation, 17(10):6281–6291, October 2021. Publisher: American Chemical Society.

      (2) Bill R. III Miller, T. Dwight Jr. McGee, Jason M. Swails, Nadine Homeyer, Holger Gohlke, and Adrian E. Roitberg. MMPBSA.py: An Efficient Program for End-State Free Energy Calculations. Journal of Chemical Theory and Computation, 8(9):3314–3321, September 2012. Publisher: American Chemical Society.

      (3) Fanhao Wang, Yuzhe Wang, Laiyi Feng, Changsheng Zhang, and Luhua Lai. Target-Specific De Novo Peptide Binder Design with DiffPepBuilder. Journal of Chemical Information and Modeling, 64(24):9135–9149, December 2024. Publisher: American Chemical Society.

      (4) Alexander D. MacKerell Jr., Bernard Brooks, Charles L. Brooks III, Lennart Nilsson, Benoit Roux, Youngdo Won, and Martin Karplus. CHARMM: The Energy Function and Its Parameterization. In Encyclopedia of Computational Chemistry. 2002.

      (5) Bernard R. Brooks, Robert E. Bruccoleri, Barry D. Olafson, David J. States, S. Swaminathan, and Martin Karplus. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. Journal of Computational Chemistry, 4(2):187–217, 1983.

      (6) Junxi Mu, Hao Liu, Jian Zhang, Ray Luo, and Hai-Feng Chen. Recent Force Field Strategies for Intrinsically Disordered Proteins. Journal of Chemical Information and Modeling, 61(3):1037–1047, March 2021.

      (7) Vojtech Zapletal, Arnošt Mládek, Kateˇ ˇrina Melková, Petr Louša, Erik Nomilner, Zuzana Jasenáková, Vojtˇ ech Kubᡠn, Markéta Makovická, Alice Laníková, Lukᚡ Žídek, and Jozef Hritz. Choice of Force Field for Proteins Containing Structured and Intrinsically Disordered Regions. Biophysical Journal, 118(7):1621–1633, April 2020.

      (8) Stefano Piana, Alexander G. Donchev, Paul Robustelli, and David E. Shaw. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. Journal of Physical Chemistry B, 119(16):5113–5123, April 2015.

      (9) Jing Huang, Sarah Rauscher, Grzegorz Nawrocki, Ting Ran, Michael Feig, Bert L. de Groot, Helmut Grubmüller, and Alexander D. MacKerell. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nature Methods, 14(1):71–73, January 2017.

      (10) Richard T. Bradshaw, Fabrizio Marinelli, José D. Faraldo-Gómez, and Lucy R. Forrest. Interpretation of HDX Data by Maximum-Entropy Reweighting of Simulated Structural Ensembles. Biophysical Journal, 118(7):1649–1664, April 2020.

      (11) Niko Amin-Wetzel, Lisa Neidhardt, Yahui Yan, Matthias P. Mayer, and David Ron. Unstructured regions in IRE1 specify BiP-mediated destabilisation of the luminal domain dimer and repression of the UPR. eLife, 8, December 2019.

      (12) Robert B. Best, Gerhard Hummer, and William A. Eaton. Native contacts determine protein folding mechanisms in atomistic simulations. Proceedings of the National Academy of Sciences, 110(44):17874–17879, October 2013. Publisher: Proceedings of the National Academy of Sciences.

      (13) Robert B. Best and Michele Vendruscolo. Structural Interpretation of Hydrogen Exchange Protection Factors in Proteins: Characterization of the Native State Fluctuations of CI2. Structure, 14(1):97–106, January 2006.

      (14) Yawen Bai, John S. Milne, Leland Mayne, and S. Walter Englander. Primary structure effects on peptide group hydrogen exchange. Proteins: Structure, Function, and Bioinformatics, 17(1):75–86, 1993. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.340170110.

      (15) David Nguyen, Leland Mayne, Michael C. Phillips, and S. Walter Englander. Reference Parameters for Protein Hydrogen Exchange Rates. Journal of the American Society for Mass Spectrometry, 29(9):1936–1939, September 2018. Publisher: American Society for Mass Spectrometry. Published by the American Chemical Society. All rights reserved.

      (16) G Elif Karagöz, Diego Acosta-Alvear, Hieu T Nguyen, Crystal P Lee, Feixia Chu, and Peter Walter. An unfolded protein-induced conformational switch activates mammalian IRE1. eLife, 6:e30700, 2017.

      (17) Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua Bambrick, Sebastian W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David Reiman, Kathryn Tunyasuvunakool, Zachary Wu, Akvile Žemgu-˙ lyte, Eirini Arvaniti, Charles Beattie, Ottavia Bertolli, Alex Bridgland, Alexey˙ Cherepanov, Miles Congreve, Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs, Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin, Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian Stecula, Ashok Thillaisundaram, Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Žídek, Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis, and John M. Jumper. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, pages 1–3, May 2024.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Thach et al. report on the structure and function of trimethylamine N-oxide demethylase (TDM). They identify a novel complex assembly composed of multiple TDM monomers and obtain high-resolution structural information for the catalytic site, including an analysis of its metal composition, which leads them to propose a mechanism for the catalytic reaction.

      In addition, the authors describe a novel substrate channel within the TDM complex that connects the N-terminal Zn<sup>2</sup>-dependent TMAO demethylation domain with the C-terminal tetrahydrofolate (THF)-binding domain. This continuous intramolecular tunnel appears highly optimized for shuttling formaldehyde (HCHO), based on its negative electrostatic properties and restricted width. The authors propose that this channel facilitates the safe transfer of HCHO, enabling its efficient conversion to methylenetetrahydrofolate (MTHF) at the C-terminal domain as a microbial detoxification strategy.

      Strengths:

      The authors provide convincing high-resolution cryo-EM structural evidence (up to 2 Å) revealing an intriguing complex composed of two full monomers and two half-domains. They further present evidence for the metal ion bound at the active site and articulate a plausible hypothesis for the catalytic cycle. Substantial effort is devoted to optimizing and characterizing enzyme activity, including detailed kinetic analyses across a range of pH values, temperatures, and substrate concentrations. Furthermore, the authors validate their structural insights through functional analysis of active-site point mutants.

      In addition, the authors identify a continuous channel for formaldehyde (HCHO) passage within the structure and support this interpretation through molecular dynamics simulations. These analyses suggest an exciting mechanism of specific, dynamic, and gated channeling of HCHO. This finding is particularly appealing, as it implies the existence of a unique, completely enclosed conduit that may be of broad interest, including potential applications in bioengineering.

      Weaknesses:

      Although the idea of an enclosed channel for HCHO is compelling, the experimental evidence supporting enzymatic assistance in the reaction of HCHO with THF is less convincing. The linear regression analysis shown in Figure 1C demonstrates a THF concentration-dependent decrease in HCHO, but the concentrations used for THF greatly exceed its reported KD (enzyme concentration used in this assay is not reported). It has previously been shown that HCHO and THF can couple spontaneously in a non-enzymatic manner, raising the possibility that the observed effect does not require enzymatic channeling. An additional control that can rule out this possibility would help to strengthen the evidence. For example, mutating the THF binding site to prevent THF binding to the protein complex could clarify whether the observed decrease in HCHO depends on enzyme-mediated proximity effects. A mutation which would specifically disable channeling could be even more convincing (maybe at the narrowest bottleneck).

      We agree with the reviewer that HCHO and THF can react spontaneously in a non-enzymatic manner, and our experiments were not intended to demonstrate enzymatic channeling. The linear regression analysis in Figure 1C was designed solely to confirm that HCHO reacts with THF under our assay conditions. Accordingly, THF was titrated over a broad concentration range starting from zero, and the observed THF concentration–dependent decrease in HCHO reflects this chemical reactivity.

      We do not interpret these data as evidence that the enzyme catalyzes or is required for the HCHO–THF coupling reaction. Instead, the structural observation of an enclosed channel is presented as a separate finding. We have clarified this point in the revised text to avoid overinterpretation of the biochemical data (page 2, line 16).

      Another concern is that the observed decrease in HCHO could alternatively arise from a reduced production of HCHO due to a negative allosteric effect of THF binding on the active site. From this perspective, the interpretation would be more convincing if a clear coupled effect could be demonstrated, specifically, that removal of the product (HCHO) from the reaction equilibrium leads to an increase in the catalytic efficiency of the demethylation reaction.

      We agree that, in principle, a decrease in detectable HCHO could also arise from an indirect effect of THF binding on enzyme activity. However, in our study the experiment was not designed to assess catalytic coupling or allosteric regulation. The assay in question monitors HCHO levels under defined conditions and does not distinguish between changes in HCHO production and downstream consumption.

      Additionally, we do not interpret the observed decrease in HCHO as evidence that THF binding enhances catalytic efficiency, or that removal of HCHO shifts the reaction equilibrium. Instead, the data are presented to establish that HCHO can react with THF under the assay conditions. Any potential allosteric effects of THF on the demethylation reaction, or kinetic coupling between HCHO removal and catalysis, are beyond the scope of the current study, and are not claimed.

      While the enzyme kinetics appear to have been performed thoroughly, the description of the kinetic assays in the Methods section is very brief. Important details such as reaction buffer composition, cofactor identity and concentration (Zn<sup>2+</sup>), enzyme concentration, defined temperature, and precise pH are not clearly stated. Moreover, a detailed methodological description could not be found in the cited reference (6), if I am not mistaken.

      Thank you for the suggestion. We have added reference [24] to the methodological description on page 8. The Methods section has been revised accordingly on page 8 under “TDM Activity Assay,” without altering the Zn<sup>2+</sup> concentration.

      The composition of the complex is intriguing but raises some questions. Based on SDS-PAGE analysis, the purified protein appears to be predominantly full-length TDM, and size-exclusion chromatography suggests an apparent molecular weight below 100 kDa. However, the cryo-EM structure reveals a substantially larger complex composed of two full-length monomers and two half-domains.

      We appreciate the reviewer’s careful analysis of the apparent discrepancy between the biochemical characterization and the cryo-EM structure. This issue is addressed in Figure S1, which may have been overlooked.

      As shown in Figure S1, the stability of TDM is highly dependent on protein and salt conditions. At 150 mM NaCl, SEC reveals a dominant peak eluting between 10.5 and 12 mL, corresponding to an estimated molecular weight of ~170–305 kDa (blue dot, Author response image 1). This fraction was explicitly selected for cryo-EM analysis and yields the larger complex observed in the reconstruction. At lower salt concentrations (50 mM) or higher (>150 mM NaCl), the protein either aggregates or elutes near the void volume (~8 mL).

      SDS–PAGE analysis detects full-length TDM together with smaller fragments (~40–50 kDa and ~22–25 kDa). The apparent predominance of full-length protein on SDS–PAGE likely reflects its greater staining intensity per molecule and/or a higher population, rather than the absence of truncated species.

      Author response image 1.

      Given the lack of clear evidence for proteolytic fragments on the SDS-PAGE gel, it is unclear how the observed stoichiometry arises. This raises the possibility of higher-order assemblies or alternative oligomeric states. Did the authors attempt to pick or analyze larger particles during cryo-EM processing? Additional biophysical characterization of particle size distribution - for example, using interferometric scattering microscopy (iSCAT)-could help clarify the oligomeric state of the complex in solution.

      Cryo-EM data were collected exclusively from the size-exclusion chromatography fraction eluting between 10.5 and 12 mL. This fraction was selected to isolate the dominant assembly in solution. Extensive 2D and 3D particle classification did not reveal distinct classes corresponding to smaller species or higher-order oligomeric assemblies. Instead, the vast majority of particles converged to a single, well-defined structure consistent with the 2 full-length + 2 half-domain stoichiometry.

      A minor subpopulation (~2%) exhibited increased flexibility in the N-terminal region of the two full-length subunits, but these particles did not form a separate oligomeric class, indicating conformational heterogeneity rather than alternative assembly states (Author response image 2). Together, these data support the 2+2½ architecture as the predominant and stable complex under the conditions used for cryo-EM. Additional techniques, such as iSCAT, would provide complementary information, but are not required to support the conclusions drawn from the SEC and cryo-EM analyses presented here.

      Author response image 2.

      The authors mention strict symmetry in the complex, yet C2 symmetry was enforced during refinement. While this is reasonable as an initial approach, it would strengthen the structural interpretation to relax the symmetry to C1 using the C2-refined map as a reference. This could reveal subtle asymmetries or domain-specific differences without sacrificing the overall quality of the reconstruction.

      We thank the reviewer for this thoughtful suggestion. In standard cryo-EM data processing, symmetry is typically not imposed initially to minimize potential model bias; accordingly, we first performed C1 refinement before applying C2 symmetry. The resulting C1 reconstructions revealed no detectable asymmetry or domain-specific differences relative to the C2 map. In addition, relaxing the symmetry consistently reduced overall resolution, indicating lower alignment accuracy and further supporting the presence of a predominantly symmetric assembly.

      In this context, the proposed catalytic role of Zn<sup>2+</sup> raises additional questions. Why is a 2:1 enzyme-to-metal stoichiometry observed, and how does this reconcile with previous reports? This point warrants discussion. Does this imply asymmetric catalysis within the complex? Would the stoichiometry change under Zn<sup>2+</sup>-saturating conditions, as no Zn<sup>2+</sup> appears to be added to the buffers? It would be helpful to clarify whether Zn<sup>2+</sup> occupancy is equivalent in both active sites when symmetry is not imposed, or whether partial occupancy is observed.

      The observed ~2:1 enzyme-to-Zn<sup>2+</sup> stoichiometry likely reflects the composition of the 2 full-length + 2 half-domain (2+2½) complex. In this assembly, only the core domains that are fully present in the complex contribute to metal binding. The truncated or half-domains lack the Zn<sup>2+</sup> binding domain. As a result, only two metal-binding sites are occupied per assembled complex, consistent with the measured stoichiometry.

      We note that Zn<sup>2+</sup> was not deliberately added to the buffers, so occupancy may not reflect full saturation. Based on our cryo-EM and biochemical data, both metal-binding sites in the full-length subunits appear to be occupied to an equivalent extent, and no clear evidence of asymmetric catalysis is observed under these current experimental conditions. Full Zn<sup>2+</sup> saturation could potentially increase occupancy, but was not explored in these experiments.

      The divalent ion Zn<sup>2+</sup> is suggested to activate water for the catalytic reaction. I am not sure if there is a need for a water molecule to explain this catalytic mechanism. Can you please elaborate on this more? As one aspect, it might be helpful to explain in more detail how Zn-OH and D220 are recovered in the last step before a new water molecule comes in.

      Thank you for your suggestion. We revised our text in page 2 as bellow.

      Based on our structural and biochemical data, we propose a structurally informed working model for TMAO turnover by TDM (Scheme 1). In this model, Zn<sup>2+</sup> plays a non-redox role by polarizing the O–H bond of the bound hydroxyl, thereby lowering its pK<sub>a</sub>. The D220 carboxylate functions as a general base, abstracting the proton to generate a hydroxide nucleophile. This hydroxide then attacks the electrophilic N-methyl carbon of TMAO, forming a tetrahedral carbinolamine (hemiaminal) intermediate. Subsequent heterolytic cleavage of the C–N bond leads to the release of HCHO. D220 then switches roles to act as a general acid, donating a proton to the departing nitrogen, which facilitates product release and regenerates the active site. This sequence allows a new water molecule to rebind Zn<sup>2+</sup>, enabling subsequent catalytic turnovers. This proposed pathway is consistent with prior mechanistic studies, in which water addition to the azomethine carbon of a cationic Schiff base generates a carbinolamine intermediate, followed by a rate-limiting breakdown to yield an amino alcohol and a carbonyl compound, in the published case, an aldehyde (Pihlaja et al., J. Chem. Soc. Perkin Trans. 2, 1983, 8, 1223–1226).

      Overall, the authors were successful in advancing our structural and functional understanding of the TDM complex. They suggest an interesting oligomeric complex composition which should be investigated with additional biophysical techniques.

      Additionally, they provide an intriguing hypothesis for a new type of substrate channeling. Additional kinetic experiments focusing on HCHO and THF turnover by enzymatic proximity effects would strengthen this potentially fundamental finding. If this channeling mechanism can be supported by stronger experimental evidence, it would substantially advance our understanding and knowledge of biologic conduits and enable future efforts in the design of artificial cascade catalysis systems with high conversion rate and efficiency, as well as detoxification pathways.

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports a cryo-EM structure of TMAO demethylase from Paracoccus sp. This is an important enzyme in the metabolism of trimethylamine oxide (TMAO) and trimethylamine (TMA) in human gut microbiota, so new information about this enzyme would certainly be of interest.

      Strengths:

      The cryo-EM structure for this enzyme is new and provides new insights into the function of the different protein domains, and a channel for formaldehyde between the two domains.

      Weaknesses:

      (1) The proposed catalytic mechanism in this manuscript does not make sense. Previous mechanistic studies on the Methylocella silvestris TMAO demethylase (FEBS Journal 2016, 283, 3979-3993, reference 7) reported that, as well as a Zn2+ cofactor, there was a dependence upon non-heme Fe2+, and proposed a catalytic mechanism involving deoxygenation to form TMA and an iron(IV)-oxo species, followed by oxidative demethylation to form DMA and formaldehyde.

      In this work, the authors do not mention the previously proposed mechanism, but instead say that elemental analysis "excluded iron". This is alarming, since the previous work has a key role for non-heme iron in the mechanism. The elemental analysis here gives a Zn content of about 0.5 mol/mol protein (and no Fe), whereas the Methylocella TMAO demethylase was reported to contain 0.97 mol Zn/mol protein, and 0.35-0.38 mol Fe/mol protein. It does, therefore, appear that their enzyme is depleted in Zn, and the absence of Fe impacts the mechanism, as explained below.

      The proposed catalytic mechanism in this manuscript, I am sorry to say, does not make sense to me, for several reasons:

      (i) Demethylation to form formaldehyde is not a hydrolytic process; it is an oxidative process (normally accomplished by either cytochrome P450 or non-heme iron-dependent oxygenase). The authors propose that a zinc (II) hydroxide attacks the methyl group, which is unprecedented, and even if it were possible, would generate methanol, not formaldehyde.

      (ii) The amine oxide is then proposed to deoxygenate, with hydroxide appearing on the Zn - unfortunately, amine oxide deoxygenation is a reductive process, for which a reducing agent is needed, and Zn2+ is not a redox-active metal ion;

      (iii) The authors say "forming a tetrahedral intermediate, as described for metalloproteinase", but zinc metalloproteases attack an amide carbonyl to form an oxyanion intermediate, whereas in this mechanism, there is no carbonyl to attack, so this statement is just wrong.

      So on several counts, the proposed mechanism cannot be correct. Some redox cofactor is needed in order to carry out amine oxide deoxygenation, and Zn2+ cannot fulfil that role. Fe2+ could do, which is why the previously proposed mechanism involving an iron(IV)-oxo intermediate is feasible. But the authors claim that their enzyme has no Fe. If so, then there must be some other redox cofactor present. Therefore, the authors need to re-analyse their enzyme carefully and look either for Fe or for some other redox-active metal ion, and then provide convincing experimental evidence for a feasible catalytic mechanism. As it stands, the proposed catalytic mechanism is unacceptable.

      We thank the reviewer for the detailed and thoughtful mechanistic critique. We fully agree that Zn<sup>2+</sup> is not redox-active, and cannot directly mediate oxidative demethylation or amine oxide deoxygenation. We acknowledge that the oxidative step required for the conversion of TMAO to HCHO is not explicitly resolved in the present study. Accordingly, we have revised the manuscript to remove any implication of Zn<sup>2+</sup>-mediated redox chemistry, and have eliminated the previously imprecise analogy to zinc metalloproteases.

      We recognize and now discuss prior biochemical work on TMAO demethylase from Methylocella silvestris (MsTDM), which proposed an iron-dependent oxidative mechanism (Zhu et al., FEBS 2016, 3979–3993). That study reported approximately one Zn<sup>2+</sup> and one non-heme Fe<sup>2+</sup> per active enzyme, implicated iron in catalysis through homology modeling and mutagenesis, and used crossover experiments suggesting a trimethylamine-like intermediate and oxygen transfer from TMAO, consistent with an Fe-dependent redox process. However, that system lacked experimental structural information, and did not define discrete metal-binding sites.

      In contrast,

      (1) Our high-resolution cryo-EM structures and metal analyses of TDM consistently reveal only a single, well-defined Zn<sup>2+</sup>-binding site, with no structural evidence for an additional iron-binding site as in the previous report (Zhu et al., FEBS 2016, 3979–3993).

      (2) To investigate the potential involvement of iron, we expressed TDM in LB medium supplemented with Fe(NH<sub>4</sub>)<sub>2</sub>SO<sub>4</sub> and determined its cryo-EM structure. This structure is identical to the original one, and no EM density corresponding to a second iron ion was observed. Moreover, the previously proposed Fe<sup>2+</sup>-binding residues are spatially distant (Figure S6).

      (3) ICP-MS analysis shows undetectable Iron, and only Zinc ion (Figure S5).

      (4) Our enzyme kinetics analysis with the TDM without Iron is comparable to that of from MsTDM (Figure 1A). The differences in Km and Vmax we propose is due to the difference in the overall sequence of the enzymes. Please also see comment at the end on a new published paper on MsTDM.

      While we cannot comment on the MsTDM results, our ‘experimental’ results do not support the presence of an iron-binding site. Our data indicate that this chemistry is unlikely to be mediated by a canonical non-heme iron center as proposed for MsTDM. We therefore revised our model as a structural framework that rationalizes substrate binding, metal coordination, and product stabilization, while clearly delineating the limits of mechanistic inference supported by the current data.

      The scheme 1 and proposal mechanism section were revised in page 4. Figure S6 was added.

      (2) Given the metal content reported here, it is important to be able to compare the specific activity of the enzyme reported here with earlier preparations. The authors do quote a Vmax of 16.52 µM/min/mg; however, these are incorrect units for Vmax, they should be µmol/min/mg. There is a further inconsistency between the text saying µM/min/mg and the Figure saying µM/min/µg.

      Thank you for the correction. We converted the V<sub>max</sub> unit to nmol/min/mg. and revised the text in page 2. We also compared with the value of the previous report in the TDM enzyme by revising the text on page 2. See also the note on a newly published manuscript and its comparison.

      (3) The consumption of formaldehyde to form methylene-THF is potentially interesting, but the authors say "HCHO levels decreased in the presence of THF", which could potentially be due to enzyme inhibition by THF. Is there evidence that this is a time-dependent and protein-dependent reaction? Also in Figure 1C, HCHO reduction (%) is not very helpful, because we don't know what concentration of formaldehyde is formed under these conditions; it would be better to quote in units of concentration, rather than %.

      We appreciate this important point. We have revised Figure 1C to present HCHO levels in absolute concentration units. While the current data demonstrate reduced detectable HCHO in the presence of THF, we agree that distinguishing between HCHO consumption and potential THF-mediated enzyme inhibition would require dedicated time-course and protein-dependence experiments. We have therefore revised the description to avoid overinterpretation and limit our conclusions to the observed changes in HCHO concentration in page 2, line 18-19.

      (4) Has this particular TMAO demethylase been reported before? It's not clear which Paracoccus strain the enzyme is from; the Experimental Section just says "Paracoccus sp.", which is not very precise. There has been published work on the Paracoccus PS1 enzyme; is that the strain used? Details about the strain are needed, and the accession for the protein sequence.

      Thank you for this comment. We now indicate that the enzyme is derived from Paracoccus sp. DMF and provide the accession number for the protein sequence (WP_263566861) in the Experimental Section (page 8, line 4).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The ITC experiment requires a ligand-into-buffer titration as an additional control. Also, maybe I misunderstood the molar ratio or the concentrations you used, but if you indeed added a total of 4.75 μL of 20 μM THF into 250 μL of 5 μM TDM, it is not clear to me how this leads to a final molar ratio of 3.

      We thank the reviewer for this suggestion. A ligand-into-buffer control ITC experiment was performed and is now included in Figure S8C, which shows no realizable signal.

      Regarding the molar ratio, it is our mistake. The experiment used 2.45 μL injections of 80 μM THF into 250 μL of 5 μM TDM. This corresponds to a final ligand concentration of ~12.8 μM, giving a ligand-to-protein molar ratio of ~2.6. We revised our text in page 9, ITC section.

      (2) Characterization/quality check of all mutant enzymes should be performed by NanoDSF, CD spectroscopy or similar techniques to confirm that proteins are properly folded and fit for kinetic testing.

      We appreciate the reviewer’s suggestion. All mutant proteins, including D220A, D367A, and F327A, were purified with yields similar to the wild-type enzyme. Additionally, cryo-EM maps of the mutants show well-defined density and overall structural integrity consistent with the wild-type. These findings indicate that the introduced mutations do not significantly affect protein folding, supporting their use for kinetic analysis. While NanoDSF might reveal differences in thermal stability due to mutations, it does not provide structural information. Our conclusions are not based on minor differences in thermostability. Our cryo-EM structures of the mutants offer much more reliable structural data than CD spectroscopy.

      (3) Best practice would suggest overlapping pH ranges between different buffer systems in the pH-dependence experiments to rule out buffer-specific effects independent of pH.

      We thank the reviewer for this helpful suggestion. We agree that overlapping pH ranges between different buffer systems can be valuable for excluding buffer-specific effects. In this study, the pH-dependence experiments were intended to provide a qualitative assessment of pH sensitivity rather than a detailed analysis of buffer-independent pKa values. While we cannot fully exclude minor buffer-specific contributions, the overall trends observed were reproducible and sufficient to support the conclusions drawn. We have added a clarifying statement to the revised manuscript to reflect this consideration, page 2, line 12.

      (4) Structural comparison revealed high similarity to a THF-binding protein, with superposition onto a T protein.": It would be nice to show this as an additional figure, as resolution and occupancy for THF are low.

      We thank the reviewer for this suggestion. To address this point, we have revised Figure S6 by adding an additional panel (C, now is Figure S7C) showing the structural superposition of TDM with the THF-binding T protein. This comparison is included to better illustrate the structural similarity, despite the limited resolution and partial occupancy of THF density in our map.

      (5) Editing could have been done more thoroughly. Some spelling mistakes, e.g. "RESEULTS", "redius", "complec"; kinetic rate constants should be written in italic (not uniform between text and figures); Prism version is missing; Vmax of 16.52 µM/min/mg - doublecheck units; Figure S1B: The "arrow on the right" might have gone missing.

      We corrected the spelling in page 2 ~ line 10, page 5 ~ line 34, page 6 ~ line40. All were highlighted as blue color. Prism version was added. The arrow was added into figure S1B. The Vmax unit is corrected to nmol/min/mg

      Reviewer #2 (Recommendations for the authors):

      (1) The authors must re-examine the metal content of their purified enzyme, looking in particular for Fe or another redox-active metal ion, which could be involved in a reasonable catalytic mechanism.

      We thank the reviewer for this suggestion and have carefully re-examined the metal content of TDM. Elemental analyses by EDX and ICP-MS consistently detected Zn<sup>2+</sup> in purified TDM (Zn:protein ≈ 1:2), whereas Fe was below the detection limit across multiple independent preparations (Fig. S5A,B). To assess whether iron could be incorporated or play a functional role, we expressed TDM in E. coli grown in LB medium supplemented with Fe(NH<sub>4</sub>SO<sub>4</sub>)<sub>2</sub> and performed activity assays in the presence of exogenous Fe<sup>2+</sup>. Neither condition resulted in enhanced enzymatic activity.

      Consistent with these biochemical data, all cryo-EM structures reveal a single, well-defined metal-binding site coordinated by three conserved cysteine residues and occupied by Zn<sup>2+</sup>, with no evidence for an additional iron species or other redox-active metal site.

      (2) The specific activity of the enzyme should be quoted in the same units as other literature papers, so that the enzyme activity can be compared. It could be, for example, that the content of Fe (or other redox-active metal) is low, and that could then give rise to a low specific activity.

      Thank you for the suggestion, we quoted the enzyme units as similar with previous report. and revised the text in in page 2.

      Since the submission of our paper a new report on MsTDM has been published (Cappa et al., Protein Science 33(11), e70364). It further supports our findings. First, the reported kinetic parameters using ITC (Vmax = 0.309 μmol/s, approximately 240 nmol/min/mg; Km = 0.866 mM) are comparable to our observed (156 nmol/min/mg and 1.33 mM, respectively) in the absence of exogenous iron. Second, the optimal pH for enzymatic activity similar to that observed in our paraTDM. Third, the reported two-state unfolding behavior is consistent with our cryo-EM structural observations, in which the more dynamic subunits appear to destabilize prior to unfolding of the core domains. Based on these findings, we now propose that Zn<sup>2+</sup> appears to function primarily as an organizational cofactor at the core catalytic domain (revised Scheme 1).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments both from the rodent and the human literature such as splitter cells, lap cells, the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over/under representation of context information.

      My general assessment of the work is unchanged, and I still have some questions requesting methodological clarification

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, action selection. The model also nicely links ideas from reinforcement learning to a neuronally interpretable mechanisms, e.g. learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be heavily improved. Judgment of generality and plausibility of the results is severely hampered but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is impossible to judge whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work on the field.

      Thank you for pointing this out.

      In the revised text, we clarified the definition of “time step” and how hippocampal neurons behaved in each time step (see individual comments below). Also, we clarified the implementation of disorder conditions in our model by indicating the exact neuron numbers of the stimulus domain in H module as below. (Other parameters were common in all conditions.)

      “𝑋 consists of two domains: stimulus domain 𝑋 and context domain 𝑋. The neuron ratio in the stimulus domain over the whole neurons dim 𝑋/𝑁 is 16.7% (200 neurons) for the control condition, 2.5% (30 neurons) for the SZ condition, and 50% (600 neurons) for the ASD condition.”

      Comments:

      The authors have made strong efforts to improve on their description of the methods, however, it is still very hard to understand. As a result of some of their clarifications, new issues appeared that I was not able to extract in the previous version.

      (1) Particularly I had problems figuring out how the individual dynamical systems are interrelated (sequences, attractor, action, learning). As I understand it now (and I still might be wrong) there is one discrete time dynamics, where in each time step one action takes place as well as the attractor and sequence dynamics are moved one step forward. Also, synaptic updates happen in every one of those time steps. The authors may verify or correct my interpretations and further improve on their description in the manuscript. It is also confusing that time in the figure panels is given in units of trials, where each trial may consist of (maybe different amounts of) multiple time steps. Are the thin horizontal red ad blue lines time steps?

      Thank you for raising the confusing point.

      The reviewer’s understanding is correct. In our model, at each time step the agents transition to the next environmental state (which also corresponds to the contextual state). During this step, each processing stage proceeds in order: Context selector performs attractor selection, Sequence composer performs sequence selection, followed by action selection and synaptic updates. As learning progresses and hippocampal sequences begin to predict longer futures, reducing the need for step-by-step planning. However, at least at the beginning of each task, all processes are conducted at each time step (see Fig. 1G).

      In all tasks, trials are reset when the agents visit the reward sites (i.e., S4 or S5). n Fig. 2C, for example, one trial consists of three time steps (i.e., three state transitions), and the red and blue shaded regions indicate individual trials. During each time step, two types of hippocampal neurons are activated: a state-coding neuron and a transition-coding neuron. (In contrast, in X, one contextual state is active during one time step). Therefore, in Fig. 2E, two neuronal activities correspond to a single time step.

      For clarification, we have revised Fig. 2 and related descriptions in the manuscript as follows.

      “Here, we simplified this task by using an environment with five discrete states (S1-S5), i.e., five discrete external stimuli (Figure 2A), where agents transition to the next state at each time step.”

      “Figure 2C illustrates an example of both the environmental state transition and the corresponding contextual state transition of an agent, with each trial resetting upon visiting the reward sites (S4 or S5). ”

      “At each time step, one state-coding neuron and one transition-coding neuron are active in this order.”

      “At each time step, the agents transition between environmental states.”

      “The model’s computational dynamics are fundamentally synchronized with the environmental (behavioral) time step, and at each time step, the agents transition to the next environmental state. Upon a state transition, the agents first perform contextual state estimation by Context selector and activate a corresponding hippocampal neuron.”

      (2) As a consequence of my new understanding of the model dynamics, I have become doubts about the interpretation of the attractor network as context encoding. Since the X population mainly serves to disambiguate sequence continuation, right before the action has to be taken (active for only two time steps in Figure 1C?) they could also be considered to encode task space (El-Gaby et al. 2024; doi: 10.1038/s41586-024-08145-x).

      We thank the reviewer for this insightful comment.

      First of all, we would like to clarify that Figure 1C shows the following process: the activity of H at time step t−1 and the external stimulus at time step t jointly provide input to X module, and the activity of X settles into a contextual state at the time step t. As explained in our response to comment (1), the activity of X remains constant during each time step.

      The primary function of X module in our model is to disambiguate the environmental states defined by the external stimuli based on the history information. It is true that, in practice, whether an ongoing sequence is maintained or remapped depends on whether the observed stimulus is consistent with the predicted stimulus. However, this is a consequence of the predictive sequence obtained from scratch rather than the primary computational role of X module. In contrast, X module becomes particularly important when past experience does not uniquely determine the next state. In this situation, the agent must infer the contextual state by associating the current situation with previously experienced contexts, rather than relying solely on temporal continuity.

      We also add that, in most successful cases, the contextual states learned by the agent often correspond to the hidden states of each task as a result of disambiguation. In this sense, the resulting representation may resemble a “task space” encoding, as suggested by the reviewer. However, an important aspect of our model is that the agent does not assume the existence or number of hidden states a priori. Instead, we considered the situation where the agent initially underestimates the number of contextual states, and through remapping it incrementally increases the number of contextual representations. When the number of contextual states matches the number of hidden task states, the task is typically solved.

      (3) Also technically, I wonder why the authors introduce the criterion of 50(!) time steps to allow the attractor to converge, if the state of the attractor network is only relevant in one time step to choose the appropriate continuation of the sequence of actions. Is attractor dynamics important at all? What would happen if just the input and output weights to the X population are kept and the recurrent weights are set 0?

      We thank the reviewer for raising this confusing point.

      First, we would like to clarify that the “50 iterations” mentioned in the manuscript does not refer to 50 environmental time steps. We implemented multiple iterations of attractor updates (typically until convergence) by Context selector within each behavioral time step.

      We clarify this point in the Method section as below.

      “After history-based or landmark-based initialization, X iteratively updates its contextual state at the beginning of each time step according to the associative memory dynamics:”

      The recurrent connectivity within the X population is essential for attractor updates. If the recurrent weights were removed (i.e., set to zero), the network would lose the ability to retrieve distinct contextual states for the same stimulus. In that case, the model would be unable to solve the context-dependent task as we showed in this manuscript.

      (4) Figure 3E: How many time steps are the H cells active (red bars?) Figure 4J: What are the units of the time axis?

      Thank you for pointing this out.

      In Figure 3E, each time step is indicated in the X-axis ticks (i.e., each environmental state). As we explained in the comment (1), two hippocampal neurons’ activity (red bars) corresponds to each time step.

      Similarly, in Figure 4J, each time step is indicated in the X-axis ticks. To better represent the results, we added descriptions of the environmental states in our model to the X-axis tick labels in Figure 4J.

      We added the following texts below in Figure captions.

      “The x-axis represents each time step (corresponding to environmental states), and the y-axis shows the sorted activity of H module.”

      “The x-axis represents each time step (corresponding to environmental states), and the y-axis shows the decoding accuracy of each context based on hippocampal activity.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Reviews:

      In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.

      The authors studied the Early Cell Cycle (ECC) network as a proof of concept, specifically focusing on PI3K, EGFR, and CDK4/6, with particular interest in identifying the mechanisms that cancer could potentially exploit to display drug resistance. The biochemical reaction model consists of 50 equations (state variables) with 94 kinetic parameters, described using SBML and computed in Matlab. Based on the simulations, the authors concluded the following main points: a large number of network states can facilitate resistance, the individual biophysical parameters alone are insufficient to predict resistance, and adaptive resistance is an emergent property of the network. Finally, the authors attempt to validate the model's prediction that differential core sub-networks can drive drug resistance by comparing their observations with the knock-out information available in the literature. The authors identified subnetworks potentially responsible for drug resistance through the inhibition of individual pathways. Importantly, some concerns regarding the methodology are discussed below, putting in doubt the validity of the main claims of this work.

      While the authors proposed a potentially useful computational approach to better understand the effect of heterogeneity in a system's dynamic response to a drug treatment (i.e., a perturbation), there are important weaknesses in the manuscript in its current form:

      (1) It is unclear how the random parameter sets (i.e., model instances) and initial conditions are generated, and how this choice biases or limits the general conclusions for the case studied. Particularly, it is not evident how the kinetic rates are related to any biological data, nor if the parameter distributions used in this study have any biological relevance.<br /> (2) Related to this problem, it is not clear whether the considered 100,000 random parameter samples sufficiently explore parameter space due to the combinatorial explosion that arises from having 94 free parameters, nor 100,000 random initial conditions for a system with 50 species (variables).<br /> (3) Moreover, the authors filter out all the cases with stiff behaviour. This filtering step appears to select model parameters based on computational convenience, rather than biological plausibility.<br /> (4) Also, it is not clear how exactly the drug effect is incorporated into the model (e.g., molecular inhibition?), nor how it is evaluated in the dynamic simulations (e.g., at the beginning of the simulation?). Moreover, in a complex network, the results may differ depending on whether the inhibition is applied from the start or after the network has reached a stable state.<br /> (5) On the same line, the conclusions need to be discussed in the context of stability, particularly when evaluating the role of initial conditions. As stable steady states are determined by the model parameters, once again, the details of how the perturbation effect is evaluated on the simulation dynamics are critical to interpret the results.<br /> (6) The presented validation of the model results (Fig. 7) is only qualitative, and the interpretation is not carefully discussed in the manuscript, particularly considering the comparison between fold-change responses without specifying the baseline states.

      We thank the reviewers for their thoughtful and constructive comments. In response to their comments, we have undertaken a substantial revision to address all the comments, improve clarity, transparency, and robustness while preserving the paper’s core contribution: a principled, scalable framework (MDN) for mapping how molecular heterogeneity and network architecture shape adaptive drug-response dynamics. At a high level, we clarified the study design and analysis goals, tightened definitions, and added methodological detail where it most advances interpretability. Importantly, these updates leave the analytical pipelines and major conclusions unchanged.

      Conceptually, we now make explicit that our objective is coverage of the output space of qualitative dynamics supported by the network topology, not exhaustive enumeration of parameter space. To support this, we added a convergence analysis and clarified that “triplicates” refers to independent ensembles used to demonstrate reproducibility. We also refined how we describe and implement initial conditions (as conserved total abundances that encode expression heterogeneity) and reframed filtering as minimal numerical/feasibility checks, using rejection sampling to obtain the prespecified ensemble size. Solver choices and input modelling (constant step mitogen/drug) are now spelled out succinctly.

      We expanded the model specification and rationale (complete reaction list with rate laws and brief biological justifications in the Supplement) and unified terminology throughout. Figures and legends have been overhauled for readability and accuracy, with missing labels added and ordering corrected. For validation, we clarified the nature of the single-cell reporter readout, improved Figure 7’s presentation, and emphasised - consistent with our aims - that comparisons are qualitative.

      Finally, we have rewritten the Discussion to centre on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe these revisions materially strengthen the manuscript and fully address all the reviewers’ comments. A detailed, point-by-point response follows.

      Joint Recommendations for the Authors:

      (1) It is confusing exactly what are the different sets evaluated in each cases, e.g. "generated 100,000 model instances, each with the same set of ICs but a unique set of randomly generated parameter values" (lines 299-300), "generated 100,000 model instances (in triplicate), each with the same set of 'nominal' parameter values (see supplementary Table S1), and a unique set of ICs, and repeated the analysis as performed previously" (lines 366-368), "combined the 1000 IC sets with each parameter set to create 1000 model instances" (lines 382-383), "repeated for 1000 parameter sets, allowing us to observe how frequently IC variation induced adaptive resistance independent of the chosen parameter set" (lines 386-387). A small table or just a clearer explanation is needed.

      In response to these comments, we have revised the main text to clarify the process of model instance generation. Specifically, we have made changes at page 7: line 297 - page 8: line 302, page 8: lines 305 - 310, page 9: lines 372-378, and page 9: line 384 – page 10: line 399 in the revised main text.

      We have also added a new Figure (Figure S1) to the supplementary file to allow readers to visualise the model generation process for each relevant set of experiments. Supplementary figures are referenced in the main text where appropriate.

      (2) The authors mentioned performing each simulation in triplicate, which is puzzling as the model is based on deterministic ODEs with fixed parameters for each simulation. Under such conditions, one would anticipate identical results from multiple simulations with the same initial conditions and fixed parameters. Perhaps the authors expect the model to exhibit chaos or aim to assess the precision of the parameter estimates through triplicate simulations. Further clarification from the authors would be valuable to comprehend the rationale behind conducting triplicate simulations in a deterministic setting.

      We agree that repeating deterministic ODE simulations with identical inputs would be redundant. In our study, “triplicate” referred instead to generating three independent ensembles of 100,000 unique model instances each, where model parameters (or initial conditions) were randomly resampled. These ensembles were analysed separately to assess whether the inferred meta-dynamic distributions converged robustly. Indeed, the distributions from the three replicates were nearly indistinguishable, confirming that the results are reproducible and not artefacts of a particular random draw.

      We have revised the main text to clarify this distinction (page 8: lines 305 - 310) and added an extended explanation for meta-dynamic behaviour convergence in the new section Error Convergence in the supplementary text (page 6: lines 184 - 210).

      (3) While the lack of a connection between model parameters and biological data (mentioned in the public review) may not be a fatal flaw in the manuscript, the concern about the 100,000 random samples being insufficient to explore the parameter space is valid. In a thought experiment, considering the high and low rate for each parameter and the combinatorial explosion of possibilities (2^94), the number of simulations performed (100,000) represents only an extremely small fraction of the entire parameter space (~1/10^(23)). This limitation might not accurately capture the true heterogeneity present inside a solid tumour. One potential solution is to determine biological bounds on model parameters through data fitting, which can provide more meaningful constraints for the simulations. Alternatively, increasing the number of simulations and adopting more efficient sampling techniques can enhance the coverage of possible parameter sets.

      We thank the reviewer for this insightful comment. We agree that the 94-dimensional parameter space is vast, and that 100,000 simulations represent only a fraction of the total combinatorial possibilities. However, the objective of our study is not to exhaustively sample the entire parameter space, but rather to sufficiently sample the ‘output space’ - that is, the complete spectrum of qualitative dynamic behaviours the network topology can generate. The key question is whether 100,000 model instances are sufficient for the distribution of these output dynamics to converge.

      To formally address this, we have performed a convergence analysis, which is now detailed in the new supplementary section "Error Convergence" (Supplementary text page 6: lines 184 - 210) and illustrated in Supplementary Figure S12. This analysis demonstrates that the mean squared error (MSE) between dynamic distributions from N and 2N simulations exponentially decreases as N increases, and the distribution of protein dynamics changes negligibly well before reaching 100,000 instances. Furthermore, performing the entire analysis in triplicate with independent random seeds yielded nearly identical meta-dynamic maps (average standard deviation < 0.04%), giving us high confidence that we have robustly captured the network's behavioural repertoire.

      We believe this convergence occurs because the system is degenerate: many distinct parameter sets within the high-dimensional space map to the same qualitative outcome (e.g., 'rebound' or 'decreasing'). Our goal was to capture the set of possible outcomes, not every unique parameter combination that leads to them.

      Regarding the parameter range, we intentionally chose a broad, unbiased range (10<sup>-5</sup> to 10<sup4></sup>)as a proof-of-concept to delineate the theoretical upper limit of heterogeneity the network can support, thereby capturing even rare but potentially critical resistance dynamics. We agree with the reviewer that a future direction is to constrain these ranges using biological data. Such an approach would transition from defining what is possible (the focus of this manuscript) to predicting what is probable in a specific biological context. We have added this important point to the Discussion (page 16: lines 663-679) to highlight this avenue for future work.

      (4) One of the manuscript's main results indicates that protein interactions play a more significant role in driving adaptive resistance than protein expression. To explore the impact of protein expression, the authors fixed a nominal parameter set and generated 100,000 initial concentrations of the 50 proteins in the ODE model. However, the simulations' equilibrium concentrations in the "starvation" and "fed" phases, which form the initial condition for the treated phase, are uniquely determined by the nominal model's kinetic parameters and not the initial conditions, which remain identical for each simulation. From a dynamical systems perspective, stable steady states are determined by the model parameters and attract all initial conditions within their basin of attraction. As a result, a random sampling of the initial conditions has a limited impact on the model dynamics. The authors' conclusion that "the ability of expression to induce resistance also seems to be dependent on the master parameter set" can be explained by this dynamical systems perspective, where the resistance state corresponds to a stable steady state determined by the master parameter set. Considering this, the evidence presented in the manuscript may not fully support the authors' conclusion regarding the importance of protein expressions relative to protein dynamics. The discrepancy might be attributed to a possible misunderstanding of this point, and further clarification from the authors could be helpful.

      We thank the reviewer for the thoughtful perspective. We agree that, in a monostable system with fixed kinetic parameters and fixed conserved totals, varying only the initial split among moieties (e.g., X vs pX) will not change the final steady state; trajectories converge to the same attractor. In our analysis, however, “initial conditions” predominantly refer to total protein abundances (e.g., X_tot = X + pX + complexes), used as a proxy for expression heterogeneity. These totals are invariants on the simulated timescale (no synthesis/degradation in the pre-equilibration phases), and therefore alter the value of the steady state under a given parameter set. In other words, our IC sampling mostly varies conserved totals rather than merely redistributing a fixed total; hence the equilibrium reached after the starvation/fed pre-equilibrations depends on the sampled totals and the kinetics. This can be seen in the new Supplementary Figure S4, showing that changing the ICs does shift the eventual steady state even when kinetic parameters are fixed.

      We have revised the text to: (1) define ICs explicitly as total abundances for multi-state species, (2) distinguish “initial split” from “conserved totals,” and (3) clarify that expression effects are context-dependent rather than universally dominant (page 4: lines 139-141 and page 10: lines 413-416)

      (5) Additionally, it is important to note that the random sampling of 100,000 initial concentrations might not sufficiently explore the vast space of possible initial conditions. In the thought experiment mentioned earlier, where each protein can have high or low expression concentrations, there are approximately 2^(50) = ~10^(15) possible combinations of initial concentrations. Thus, the 100,000 random simulations only represent around ~1/10^(10) of the possible initial conditions in this simplistic scenario. Consequently, this limited sampling of initial conditions may not provide enough information to draw meaningful conclusions, even if the initial conditions were more directly linked to kinetic rates.

      Please see our response to Comment (3). Briefly, our ICs are continuous total abundances (conserved moieties), not binary high/low states; many IC configurations converge to the same qualitative attractors, so we estimate distributional properties rather than enumerate all combinations. Our convergence diagnostics (independent replicates and sample-size doubling) show that the meta-dynamic distributions stabilise well before N=100,000 (see Supplementary Figure S12). We have clarified this in the Supplementary Information (Error Convergence section) with the new convergence results.

      (6) The authors implement a parameter selection step in the manuscript, where they filter out parameter sets that lead to what they term non-biological simulations. However, the rationale for determining if a given parameter set results in a stiff system of ODEs remains unclear. The authors cite references [38,39] to support the claim that stiff equations are not biologically plausible. Still, upon review, it is evident that [38] does not include the term "stiff," and [39] discusses using implicit methods to simulate stiff ODE models without specifically commenting on the biological plausibility of stiff systems. The manuscript lacks direct evidence to justify the conclusion that filtering out parameter sets that result in stiff ODE systems is reasonable. Since the filtering step accounts for the majority of discarded parameter sets, a stronger foundation is required to support the statement that stiff equations are non-biological.

      We thank the reviewer for pointing out the issue in our original justification. The reviewer is correct: stiff systems are a common feature of biological models, and our claim that they are likely ‘biologically implausible’ was not well substantiated. The filtering of these model instances was, in fact, due to a computational limitation rather than a biological principle. The issue was that these parameter sets produced systems of ODEs that were so numerically stiff they were unsolvable within a reasonable timeframe by the SUNDIALS ODE solver suite, which is specifically designed for such systems.

      Following the reviewer's comment, we investigated the source of this prohibitive stiffness. We discovered it was not an intrinsic property of the parameter sets themselves, but rather an artifact of our simulation setup. The extreme stiffness occurred almost exclusively during the initial integration timesteps, caused by the large initial discrepancy between the concentrations of active and inactive protein forms. This large discrepancy created the conditions for overtly stiff solutions i.e. unsolvable with implemented ODE solve settings. To overcome this problem, we set a large maximum number of steps in the ODE solver for the first couple of time points, enabling the solver to overcome the excessively stiff portion of the solve. We found that the vast majority of the previously 'unsolvable' model instances could now be successfully simulated. Consequently, the number of parameter sets discarded due to solver failure is now negligible (< 1%), and this filtering step no longer accounts for the majority of discarded parameter sets. Most importantly, the distributions of dynamics were not significantly altered by this adaptation.

      We have revised the " Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section to reflect this more accurate understanding. We have corrected our original claim regarding the biological plausibility of stiff systems and corrected our use of the references. Ref [38] was included to demonstrate that models of biological systems are stiff, which was a major conclusion of that paper, and [39] was originally included to demonstrate that solving ODEs is reliant on solvers that can integrate stiff systems. Upon review, ref [39] has been removed.

      Overall, this investigation has made our analysis more robust by allowing us to include a wider, more representative range of parameter sets, and has tangibly improved the quality of our study.

      (7) Additionally, it is important to consider the standard method for accounting for stiff systems, as presented in [39], which involves using implicit numerical methods for ODE simulation. The authors mention using numerical methods from the SUNDIALS suite, which includes implicit methods, but the specific numerical method used remains unclear. Furthermore, it would be valuable for the authors to disclose the number of parameter sets that were filtered to obtain the final set of 100,000 accepted parameter sets. This information would provide insights into the extent of filtering and the proportion of parameter sets that were excluded during the selection process.

      We apologise for the lack of specific detail and have now updated the text. To clarify, all ODE simulations were performed using the CVODE solver from the SUNDIALS suite. This solver employs an implicit, variable-order, variable-step Backward Differentiation Formula (BDF) method, which is robust and specifically designed for handling the stiff systems common in biological network modelling. We have now explicitly stated this in the "ODE model construction, modelling, and simulations (page 4: lines 162 – 164)" section of the Methods.

      Regarding the filtered parameters, we have included a revised and detailed discussion of this in the "Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section (see our response to comment (6) above). Briefly, after applying the filters, ~40–45% of instances did not reach steady state within the simulation timeframe, and ~50–55% did not meet the minimum drug-response criterion. Approximately 10% satisfied all criteria and were retained for analysis. Importantly, we employed ‘rejection sampling’ and continued drawing until we had N = 100,000 accepted instances that satisfied all the criteria.

      (8) An important step in the simulation process described by the authors is the simulation of the "fasted" and "fed" states until an equilibrium is reached. However, it is not clear how the authors determine if the system has reached an equilibrium. It would be helpful if the authors could provide more information regarding the criteria used to assess equilibrium in the simulations. Regarding the "fed" state, it is not explicitly stated whether the mitogen stimulus is assumed to be constant throughout the "fed" experiment. Considering the dynamic nature of mitogen stimulation in biological systems, it would be beneficial if the authors could clarify this assumption and discuss its biological relevance.

      We apologise for the lack not specifying this in the original text. A simulation was considered to have reached equilibrium when the concentration of every protein species changed by < 1% over the final 100 time steps of the simulation phase. We have now added this criterion to the "Sampling and filtering of model instances (page 5: lines 177 – 179)" part of the Methods section.

      Regarding the second part of the comment, in our simulations, both the mitogenic and the drug inputs were modelled as constant, stepwise functions that, once turned on, remained at a fixed concentration for the remainder of the simulation. The biological rationale for this choice was to rigorously test for bona fide adaptive resistance. By maintaining a constant mitogenic and drug pressure, we can ensure that any observed recovery in the activity of downstream proteins is due to the internal rewiring and adaptation of the signalling network itself, rather than an artefact of the removal or decay of the external stimulus/drugs. We have now clarified this rationale in the "ODE model construction, modelling, and simulations (page 4: lines 168 – 171)" part of the Methods section.

      (9) The "Description of Model Scope and Construction" section in the Supplementary Information should include explicitly the model reactions and some discussion about their specific form (e.g., why is '(((kc2f1*pIR*PI3K) / (1 + (pS6K/Ki2))) + (kc2f2*pFGFR*PI3K))' representing the phosphorylation rate of PI3K, with pS6K in the denominator?).

      The reviewer is right to ask for model justification. We have expanded the Supplementary “Description of Model Scope and Construction” section (page 2: line 63 – page 5: line 185) to include a complete reaction list with rate laws and a brief rationale for each. We also explain the specific PI3K phosphorylation term: activation by pIR and pFGFR is attenuated by pS6K via a denominator, which captures the well-described S6K-mediated negative feedback that reduces activation (e.g., via IRS1 phosphorylation).

      (10) In line 349, the statement "Given that CDK46cycD is only strongly suppressed in just under 60% of the model instances (Figure 3C)" lacks clarity regarding where to look to interpret the 60% value. If this means that 4 out of the 7 model instances are resistant, and the other 2 proteins also have the same percentage of resistance, then there is no apparent reason to focus solely on CDK46cycD.

      The reviewer is correct; the figure reference was an error, which has been rectified in the main text (page 9: line 355). The actual figure reference was to Supplementary Figure 2A, which shows the heatmap of all the frequencies for each protein dynamics for all the active protein forms. CDK4/6cycD shows a sustained decreasing dynamic for 59.93% of model instances, which is where this number was derived. We have also now explicitly referenced this number in the supplementary Figure 2A legend.

      We focus on CDK4/6cycD because it is the direct pharmacological target of CDK4/6 inhibitors. Our point was to suggest that even when the target is suppressed in the majority of instances (~60%), this does not reliably propagate to uniform downstream inhibition across the network, thus highlighting emergent, network-driven adaptive responses.

      (11) We observed that in Fig. 5A, the authors show that multiple pathways are blocked. However, it is unclear whether they reduced the value of one parameter in the experiment or simulated multiple combinations of parameter inhibition. Considering the large number of parameters (94) in the model, if the authors simulated all possible combinations of parameter inhibition, the number of combinations would be significantly more than 94. An actual inhibitor typically has an inhibitory effect on multiple molecules. Therefore, it would be necessary to identify the parameters that lead to drug resistance when multiple molecules are inhibited. However, examining the inhibition patterns for all 94 parameters would be practically impossible. As a potential approach, we suggest using ensemble learning techniques, such as random forests, to handle this problem efficiently. With a dataset of binary outputs indicating the presence or absence of resistance for a sufficient number of inhibition patterns, ensemble learning can be applied to find the parameters that contribute to drug resistance. Popular feature selection algorithms like Boruta could be utilised to identify the most relevant parameters. The results obtained by ensemble learning are similar to the ranking in Fig. 5C, potentially providing a more robust validation of the authors' findings. By incorporating these additional analyses, the authors could strengthen the reliability and significance of their results related to parameter inhibition and drug resistance.

      We appreciate the suggestion and the opportunity to clarify. Figure 5A depicts multiple pathways were interrogated, but in the analysis, parameters were inhibited one at a time (OAT) - not in combination. We have revised the figure legend and added a section named “Protein knockdown perturbation analyses (page 6: lines 228 – 233)” in the Methods section to make this explicit. Moreover, some additional text in the main text has been slightly modified to make this clearer (page 11: lines 462-463, page 24: lines 856-857).

      We chose the OAT design intentionally to obtain causal, first-order attribution of control points across a broad parameter ensemble without confounding from simultaneous co-inhibition. This provides an interpretable ranking of primary drivers (Figure 5C) that is consistent with the paper’s mechanistic focus. We agree that a multi-target inhibition approach could be a useful next step; however, an exhaustive combinatorial screen is beyond the scope of this proof-of-concept. In such future studies, the ensemble learning, as suggested by the reviewer, could be layered onto our MDN framework to assess robustness of the ranking under co-inhibition.

      (12) In explaining the parameterization of the model, we find an implication of a quantitative model. However, upon examining the results in Fig. 7D, we observe that they are only qualitatively correct. When comparing Figs. 7A and 7C, we note that many model instances are immediately suppressed, and the time scale remains unknown. We believe it would be essential for the authors to explain how the model of this study maintains its quantitative nature despite the results in Fig. 7. If such an explanation cannot be provided, it raises concerns regarding the biological reliability of several findings within this study.

      While our framework is built on quantitative ODEs, the validation we present in Figure 7 is indeed qualitative. This is an intentional and key feature of our study's design. Our goal was not to build a calibrated, quantitative model of a specific cell line (e.g., MCF10A), but rather to establish a proof-of-concept theoretical framework that systematically explores the full spectrum of dynamic behaviours a given network topology can possibly generate. To achieve this, we intentionally sampled parameters from a very broad, unbiased range to delineate the theoretical upper limit of heterogeneity. This in silico population is therefore designed to be far more heterogeneous than any single isogenic cell line.

      The striking qualitative agreement seen between our meta-dynamic distributions and the single-cell data in Figure 7D is thus not a failure of quantitative prediction, but rather a strong validation of our core premise: that a significant degree of signalling heterogeneity exists in cell populations and that our framework can effectively capture its emergent properties.

      Regarding the specific comment on Figure 7C, we apologise for the lack of clarity. Nominally, we chose to simulate for 24 hours however, the x-axis in our simulations represents arbitrary time units, as the timescale is dependent on the meaning/units of the parameter values. The goal is to compare the qualitative shape of the response (e.g., rebound, sustained decrease), not the absolute time in hours. Moreover the rapid initial suppression seen in many of our model instances (Fig 7C) is a direct parallel to the rapid suppression seen in the experimental data (Fig 7A). This initial phase is followed by a wide variety of adaptive behaviours (or lack thereof) in both our simulations and the real cells, which is the key phenomenon we are studying.

      We have revised the text (page 14: lines 598-601) and Figure 7’s legend to state more explicitly that our validation is qualitative and to clarify the purpose of our broad, uncalibrated approach. We have also added a note in the Discussion (page 18: lines 744-747) that calibrating this framework with cell-line-specific data is a natural next step for generating quantitative, context-specific predictions.

      (13) Related to the previous point, the experimental data is presented as fold-change during CDK4/6 inhibition, and we notice that the initial fold-change at time 0 varies between 1 and 1.8. The difference in initial fold-change is unclear to us, as our understanding of fold-change typically corresponds to the change from baseline, typically represented by the protein concentration at time 0.

      Furthermore, while the experimental data exhibits uniformly decreasing CDK4/6 activity, a substantial number of simulations indicate constant CDK4/6cycD, showing a significant qualitative discrepancy between the simulations and experimental findings. This disparity makes it difficult for us to interpret the comparison between the two datasets effectively, given the complexities in comprehending the experimental fold-change figure.

      As Figure 7 serves as the primary validation of model simulations in the manuscript, we believe that the current presentation may not provide a compelling reason to believe that the model accurately captures experimental data. To enhance clarity and validation, we suggest overlaying the experimental data over the simulations or considering the median and 10/90% percentile of the experimental data, which may potentially offer improved readability and facilitate a more robust interpretation of the comparison.

      The experimental data from Yang et al. (ref 55, main text) measures kinase activity using a nucleus-to-cytoplasm translocation reporter system, wherein a bait protein is phosphorylated by the target kinase causing it to translocate from the nucleus to the cytoplasm. Hence, the y-axis represents the ratio of nuclear vs. cytoplasmic fluorescence, not a fold-change from a t=0 baseline. The variation in the starting value (between 1 and 1.8) reflects the inherent heterogeneity in the reporter's localization across individual cells even before the drug is added. We have updated the y-axis label and revised Fig. 7’s legend to state this explicitly.

      The most likely explanation for the discrepancy between experimental dynamics and our simulation dynamics is that the experimental data comes from an isogenic cell line that is largely sensitive to CDK4/6 inhibition. Our simulations are derived from a very wide parameter sweep, where the intent is to represent all possible cell states. It is quite striking that that there is such a high correlation between the experimental data and simulations, indicating that perhaps the heterogeneity of even isogenic cell lines is significantly greater than might be intuited; a point we now mention in the revised Discussion (page 17: lines 716-727).

      It is worth noting again, that our analysis is intentionally constructed to be as heterogeneous as possible, and is not trained on any biological data that might otherwise constrain the output-behaviour space. The isogenic cell line almost certainly represents a much more constrained output-behaviour space than our analysis.

      The y-axis label has also been updated accordingly. As mentioned in (12) this result is intended as a qualitative validation, showing that cell lines indeed have highly variable signalling dynamics. Given the range of parameters tested, we think it is surprising that the degree of agreement between the experiment and our analysis is as high as it is. Again, we believe this suggests that heterogeneity may be more prevalent than is intuited. We do not believe we have made any strong quantitative claims in the main text, and we certainly aim to work towards biological, quantitative validation in the future. Finally, we altered the wording of the results heading (page 14: line 562) to make it clear that we are only making qualitative claims and removed the claim that the evidence was strong.

      With these clarifications and corrections, we believe the validation is now much more compelling. The key point is not a perfect quantitative match, but the strong similarity in the distribution of heterogeneous behaviours.

      (14) The authors mention simulating treatment with 10nM of CDK4/6i or Ei, but specific details on how this treatment is included in the model simulations are not provided. This lack of information makes it challenging to fully evaluate the comparison between model simulations and experimental evidence in Figure 7. It would be highly appreciated if the authors could clarify how the treatment with CDK4/6i or Ei is incorporated into the simulations to facilitate a better understanding and interpretation of the results.

      To clarify, the effects of the inhibitors were incorporated directly into the kinetic rate laws of their respective target reactions.

      CDK4/6 inhibitor (CDK4/6i): This was modelled as an inhibitor of the formation of the active CDK4/6-cyclin D complex. We have now explicitly detailed this in the description for reaction R27 in the "Description of Model Scope and Construction" section of the Supplementary Information.

      Estrogen Receptor inhibitor (Ei): This was modelled as an inhibitor of the estrogen-dependent activation of the Estrogen Receptor. This is now explicitly detailed in the description for reaction R15 in the same supplementary section.

      It is however important to reiterate that our goal in Figure 7 is qualitative, shape-based comparison; therefore, we used a fixed fractional inhibition (reported in Methods) rather than a calibrated IC50/Hill model.

      (15) The authors state strong support for their modelling conclusions based on the literature. However, we still have concerns regarding the validation of the model against CDK2 or CDK4/6 data in Figure 7, as it appears less convincing to us. Furthermore, the authors list known resistance mechanisms that are replicated in their modelling. Nevertheless, we find the conclusion somewhat weakened by Figure S10, where approximately 80% of the nodes are implicated in some form of resistance pathway. This raises questions about the model's selectivity, as many proteins included in the model seem to drive resistance in some manner. In the Supplementary Information, the authors mention excluding or abstracting some protein species from the mitogenic and cell cycle pathways to manage computational resources effectively. This abstraction makes it difficult to determine if the proteins identified as potential drivers of resistance genuinely drive resistance or might represent abstractions of other potential drivers. To enhance the manuscript's clarity and address potential concerns about the model's selectivity and abstraction, we suggest providing more details and discussion in the main text.

      The reviewer's observation that a large number of nodes are implicated in resistance pathways in Figure S10 is correct. However, we argue this is not a weakness of the model's selectivity, but rather a key finding that reflects the biological reality of adaptive resistance. The literature is replete with a wide and growing number of distinct mechanisms of resistance even to a single class of drugs (1,2), which supports the idea that cancer can co-opt a wide variety of network nodes to survive.

      Figure S10 is not a binary map where every implicated node is equal, instead it is a likelihood map, where the colour and weight of the connections represent how often a particular interaction participates in driving resistance across the theoretical full range of possible network dynamics. The figure shows that while many nodes can contribute to resistance, they do so in a hub-like manner i.e. small subsets of nodes coordinate to drive resistance. This provides a rationalised, data-driven prioritisation of the most dominant and recurrent resistance strategies. We draw two important conclusions from this work 1) Resistance likely occurs due to resistance hubs, not individual proteins, and 2) that the frequency of a resistance hub in an MDN analysis is likely proportional to the frequency of that hub emerging as a resistance mechanism in a population of cells and patients.

      Regarding the issue of abstraction, the reviewer is correct that this is an inherent feature of any tractable systems model. In our case, several species in the mitogenic/cell-cycle pathways are module-level proxies to control model size. The highly implicated "hub" nodes in our model likely represent critical cellular processes that are themselves composed of several individual protein interactions.

      To address these concerns, we have significantly revised the Discussion (page 16: lines 681 – 694) to: (1) frame resistance as a network-level phenomenon; (2) show that our frequency-based ranking is selective, prioritising the most probable, recurrent mechanisms; and (3) clarify that - given model abstraction -our findings implicate critical processes (modules), not just single proteins, as the drivers.

      Overall, these changes do not alter our main conclusions: adaptive resistance is an emergent, network-level property; many routes exist, but a smaller set of nodes/modules consistently carry the largest influence across heterogeneous contexts.

      (16) We consider that the figures and legends, including the supplementary information, are inadequately explained. The information provided is insufficient for us to comprehend the figures fully, leading to the need for interpretation on our part as readers. This could potentially introduce biases when trying to understand the claims made by the authors. To improve our understanding, it would be essential for the authors to assign appropriate labels to the figures and provide comprehensive explanations in the legends. For example, in Fig 3, we suggest labelling the tree diagrams in panels A and B, as well as the colour bars. We also recommend applying the same approach to other figures, adding accurate axis labels and descriptions of colour gradients to enhance clarity.

      We thank the reviewer for this critical feedback. To address this comment, the figure legends have been revised where appropriate and greatly expanded to improve their comprehension. Moreover, we have added explicit labels to all previously unlabelled components, such as the cluster dendrograms and colour code bars in Figure 3A, B.

      (17) To enhance readability, we recommend interchanging the order of Figures 1 and 2 in the sequence they appear in the main text. Alternatively, the text can be adjusted to refer to the figures in the correct order. Additionally, attention should be given to the bottom of Fig 1, which appears to be cropped or cut off. Furthermore, the incorrect word spacing in some figure elements, such as Fig. 3A title, Fig. 5B title, and Fig. 6B y-label, should be corrected for improved visual presentation.

      Following the reviewer’s comment, the order of Figures 1 and 2 has been switched to reflect the order in which they are referred to in the main text. These Figures have been re-exported to fix unintentional word spacing errors.

      (18) We recommend that the language used to refer to the initial conditions in the manuscript is clarified and homogenised. Currently, the authors use different terms such as "basal expression," "protein expression," "state variable values," or "initial conditions" to refer to them. This variation in terminology can be confusing for readers. In particular, the use of "basal expression" is problematic, as it typically refers to the leaky value of a reaction in the absence of an inducer, making it another biophysical parameter of the system rather than an initial condition. To enhance clarity and consistency, we suggest the authors decide on a single term to refer to the initial conditions throughout the manuscript and provide a clear explanation of its meaning to avoid any confusion. This will help readers better understand the concept being discussed and prevent any potential misinterpretations.

      We thank the reviewer for this very helpful suggestion. To resolve this and improve clarity, we have homogenized the language throughout the manuscript. We now clarify the use the following 3 terms in their specific contexts:

      We use “protein abundances” exclusively for the conserved total abundances of multi-state species (e.g., Xtot = X + pX + complexes) that are sampled across instances to represent expression heterogeneity.

      We use ‘initial conditions’ to refer to initial values of the state variables in a model simulation. This term is related to protein abundance as the setting of initial conditions for conserved species sets the protein abundance. This is explicitly stated in the text (page 3: lines 87 - 91).

      We use “state variables” to refer to the time-dependent model species.

      We avoid the term “basal expression” in technical descriptions. Where a biology-facing phrase is helpful, we use “protein expression level”. This is used when referring to the biological concept that the initial conditions are intended to represent, i.e. the heterogeneity in protein amounts across a cell population.

      We have performed a thorough search-and-replace to ensure this new convention is applied consistently and have removed the potentially confusing term "basal expression" from the revised manuscript.

      (19) Why are saturable functions (e.g., Michaelis-Menten functions) ignored in the model? What are the potential consequences?

      The main objective of this work was to perform a large-scale, systematic exploration of a high-dimensional parameter space (94 parameters) to map the full repertoire of qualitative dynamic behaviours a network topology can support. Using saturable functions like Michaelis-Menten kinetics would have roughly doubled the number of parameters to be explored (from k to Vmax and Km for each enzymatic reaction), making a parameter sweep of this scale computationally intractable. We therefore prioritised the breadth of the parameter search over the depth of kinetic detail, which we believe is the appropriate choice for a proof-of-concept study focused on heterogeneity.

      This simplification has potential consequences. A major one is that our model cannot capture phenomena that arise specifically from enzyme saturation, such as zero-order kinetics or certain forms of ultrasensitivity (switch-like responses). However, we argue that this is an acceptable trade-off for two main reasons: (1) Our analysis is based on classifying broad, qualitative response shapes (increasing, decreasing, rebound, etc.). Mass-action kinetics are fully capable of generating this rich spectrum of behaviours; and (2) by varying the mass-action rate constants over nine orders of magnitude (from 10<sup>-5</sup> to 10<sup4></sup>), our parameter sweep effectively samples a vast range of reaction efficiencies. A very low rate-constant can approximate the behaviour of a saturated, low-efficiency enzyme, while a high rate-constant can approximate a highly efficient, non-saturated one. In this way, the broad sweep of the rate parameter partially reflects the effects that would be captured by varying Vmax and Km.

      For transparency, we have added a brief rationale to the “ODE model construction, modelling, and simulations” part of the Methods (revised main text, page 4: lines 153-155) and the "Description of Model Scope and Construction" section in the Supplementary file (Supplementary text page 2: lines 63-73).

      (20) Given the relevance of the concept of "heterogeneity" in this work, a short discussion about biochemical noise and its implications on the analysis (e.g., why it is not included, and if it will be a next step) would be appreciated.

      Our MDN modelling framework represents heterogeneity by creating an ensemble of deterministic models, where each model instance has a unique set of kinetic parameters and/or initial protein abundances. We propose that this is a powerful way to mechanistically represent the functional consequences of all sources of cellular variation. Over time, the effects of genetic mutations, epigenetic states, and even the time-averaged impact of intrinsic biochemical noise will manifest as changes in the effective interaction strengths and protein concentrations within a cell. Our large-scale parameter/IC sweep is designed to systematically explore the full range of dynamic behaviours that can emerge from this underlying biological variation. Therefore, our approach does not compete with stochastic modelling but is complementary to it. While stochastic simulations can capture the dynamic trajectories of single cells, our framework provides a panoramic view of the entire spectrum of possible stable phenotypes that can emerge at the population level. We agree that modelling intrinsic biochemical noise (stochasticity arising from finite copy numbers), e.g. using chemical Langevin or SSA, is a possible extension in future work but expected to be very computationally expensive. We have added a brief discussion on this as future direction in the revised Discussion.

      (21) We have noticed that the first four paragraphs of the Discussion section overlap with the Introduction, as they mainly reiterate the significance of the study itself rather than focusing on the specific results obtained. To avoid redundancy and provide a more cohesive and informative discussion, we recommend that the authors shift the focus of the Discussion section towards presenting potential interpretations, even if they are not definitive, of the results obtained. By doing so, the Discussion will serve as a valuable platform for deeper analysis and insightful observations, allowing readers to better comprehend the implications and significance of the research findings.

      We thank the reviewer for this structural feedback. Following the reviewer's feedback, we have significantly rewritten and restructured the Discussion section. The redundant introductory material has been removed.

      The rewritten Discussion centres on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe this substantial revision has transformed the Discussion into a much more insightful and valuable part of the manuscript that directly addresses the reviewer's concerns.

      (22) The supplemental text file containing the model equations can be a bit challenging to read and understand. It would be greatly beneficial if the authors could consider generating a file using a typesetting program.

      We have now included a typeset list of state variable equations and ODEs, along with the original model files.

      (23) The authors mentioned that some model parameterizations result in negative solutions, which is surprising. Access to the model equations would help understand why this happens and is crucial for researchers who may want to use this approach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.ach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.

      The reviewer is correct to be surprised by the mention of negative solutions, as negative concentrations are physically impossible. We clarify that these are not a result of any structural flaw in our model's equations but are a well-known, although rare, numerical artifact of floating-point arithmetic in computational solvers.

      Our model is constructed using standard mass-action and first-order kinetics, which structurally guarantee non-negativity. However, when a species' concentration approaches the limits of machine precision (i.e., becomes a very small number extremely close to zero), the ODE solver can, in rare instances, numerically undershoot zero, resulting in a small negative value. If this occurs, it can lead to instability in subsequent integration steps.

      This is not a biological phenomenon but a computational one. Therefore, the standard and appropriate procedure, which we follow, is to implement a filter that discards any simulation trajectory where such a numerical instability occurs.

      (24) The reference listed for the CDK4/6 and CDK2 measurements is Yang et al. [55] in the figure caption, but as Xe et al. in lines 559-561 of the manuscript.

      The text has been updated to match citation.

      (25) We suggest that the authors revise and cite a previous study conducted by Yamada et al. (Scientific Reports, 2018), which presents an approach to expressing cell heterogeneity as a probability distribution of model parameters.

      Following this suggestion, we have revised the Discussion (see response to comment (21)) to include and discuss Yamada et al. (Scientific Reports, 2018), which models cell heterogeneity as a probability distribution over parameter values.

      (26) In the manuscript, on line 677, the authors state, "This indicates that there is an upper limit to the degree to which parameter sets can influence the qualitative shape of a protein's dynamic within a given network topology." We wish to highlight that this finding may not be particularly surprising. Given that the parameters were randomly determined within a specific range, it is understandable that altering the number of parameter samples would not substantially impact the distribution of model instances.

      We thank the reviewer for this insightful comment, which allows us to clarify the significance of this finding. While it is true that any sampling from a fixed distribution will eventually converge statistically, our conclusion is not about statistics but about the intrinsic, constraining properties of the network's topology. The novelty is not that the distribution converges, but that it converges to a surprisingly limited and finite repertoire of qualitative dynamic behaviours. A complex, non-linear network with nearly 100 free parameters could theoretically generate an almost endless variety of complex dynamics. Our finding is that this specific biological topology acts as a powerful filter, robustly channelling the vast majority of the near-infinite parameter combinations into a small, recurring set of functional outputs (increasing, decreasing, rebound, etc.).

      The reason for this finite limit is mechanistic, as the reviewer's comment prompted us to investigate further. Our parameter sweep already covers an extremely wide, 9-order-of-magnitude range. As we pushed parameter values to even greater extremes in exploratory simulations, we found they do not generate novel, complex dynamic shapes. Instead, they tend to drive network nodes into saturated states- either permanently "on" (maximally activated) or permanently "off" (minimally activated). In both cases, the node becomes unresponsive to upstream perturbations.

      Therefore, further expanding the parameter range would be unlikely to uncover new behavioural categories; it would simply increase the proportion of model instances classified as "no-response." This demonstrates a fundamental principle: the network topology itself enforces an upper limit on its dynamic complexity. We think this inherent robustness is what allows for reliable cellular signalling in the face of constant biological variation. We believe this is a non-trivial finding, and we have revised the Discussion (page 16: lines 664 - 680) to state this conclusion and its implications more clearly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Xiong and colleagues presents a compelling validation of UniDesign, a fully computational protein design framework, by using it to engineer a novel, PAM-relaxed variant of Staphylococcus aureus Cas9 (SaCas9) named KRH. The core achievement is the successful de novo generation of a high-performance nuclease (E782K/N968R/R1015H) solely through in silico modeling, without any subsequent experimental optimization or directed evolution. The authors demonstrate that KRH expands the SaCas9 PAM specificity from NNGRRT to NNNRRT, achieving genome editing and base editing efficiencies across multiple human cell types that are comparable to, and sometimes exceed, the well-known evolution-derived KKH variant. The work positions UniDesign not merely as an analytical tool, but as a powerful engine for the generative design of complex molecular functions, offering a scalable and mechanistically insightful alternative to traditional experimental screening.

      Strengths:

      This is an outstanding manuscript that serves as a powerful proof-of-concept for the next generation of computational protein design. The primary selling point-the raw predictive and generative power of UniDesign-is convincingly demonstrated throughout.

      The manuscript shows that the tool can:

      (1) successfully navigate a complex sequence landscape to identify a minimal set of three mutations (KRH) that remodel a critical protein-DNA interface;

      (2) accurately model and balance the delicate interplay between specific base contacts and non-specific backbone interactions to achieve relaxed PAM specificity;

      (3) deliver a final product whose performance is indistinguishable from, and in some cases superior to, a variant that required extensive wet-lab evolution.

      The experimental validation is rigorous, thorough, and directly supports the computational predictions. This work will stand as a landmark study for the field, illustrating that computational design has matured to the point where it can reliably generate sophisticated tools for genome engineering.

      (1) Demonstration of Generative Power:

      The most significant finding is that UniDesign, without any experimental feedback, generated a variant (KRH) that matches the performance of the evolution-derived KKH. This is a remarkable achievement. The iterative design strategy-first reducing PAM bias (R1015H), then restoring binding through non-specific interactions (e.g., N968R, E782K)-is a textbook example of rational design, but it is executed entirely by the algorithm. This validates UniDesign's energy function and search algorithm as capable of capturing the subtle biophysical principles governing PAM recognition.

      (2) Mechanistic Insight as a Built-in Feature:

      A key advantage of UniDesign highlighted by this work is its inherent ability to provide mechanistic explanations. The computational models not only predicted which mutations would work (e.g., N968R over N968K in the KRH variant) but also why they work. The structural and energetic analyses showing the bidentate salt bridge formed by Arg968 versus the single bond formed by Lys968 (Figure 4A) is a perfect example of how the tool's output can rationalize functional differences, a level of insight that is rarely attainable from directed evolution campaigns alone.

      (3) Scalability and Accessibility for Engineering:

      The authors explicitly contrast UniDesign's efficiency (minutes to hours per design run) with the computational expense of methods like COMET and the experimental overhead of directed evolution. The improvements to UniDesign v1.2, specifically the mutation-count and sequence-uniqueness penalties, directly address a key challenge in computational design (generating diverse, low-energy point-mutant libraries). This positions the tool as a highly accessible and scalable platform for engineering other CRISPR systems, a point that will be of immense interest to the community.

      We sincerely thank the reviewer for the comprehensive summary and the highly positive and encouraging comments on our manuscript.

      Weaknesses:

      (1) Title and Abstract Emphasis:

      The title and abstract are effective but could be slightly sharpened to emphasize the primary message. Consider a title like "Fully computational design of a PAM-relaxed SaCas9 variant with UniDesign demonstrates power to match directed evolution." The abstract could more explicitly state upfront that the design was achieved without any experimental iteration.

      Thank you for this valuable suggestion. We have revised the title and abstract accordingly to better reflect your feedback.

      (2) Figure 1, Panel M:

      The data points in panel M are currently presented at a font size that makes them difficult to read, particularly the labels for the many triple-mutant variants. This density obscures the clear identification of the top-performing designs, such as the KRH variant selected for experimental validation. I recommend that the authors increase the font size of all text elements within this panel, including axis labels, tick marks, and data point labels, to improve legibility. If necessary, the panel dimensions can be adjusted or the layout reorganized to accommodate the larger text without compromising clarity. Ensuring this figure is readable is important, as it visually communicates the energetic convergence that led to the selection of KRH.

      Thank you for this helpful suggestion. We have increased the font size the Figure 1M, as well as in Figure 1C and Figure 1E, to improve the readability in the revised manuscript.

      (3) Generality of the Design Strategy for Other PAM Positions:

      The design strategy focused on relaxing specificity at the highly constrained third position of the PAM (the guanine in NNGRRT). How transferable is this specific strategy (i.e., disrupting a key specific contact and compensating with non-specific backbone binders) to relaxing other positions in the PAM or to other Cas enzymes with different PAM-interaction architectures? A short discussion on this point would help readers understand the broader applicability of the "fine-tuning the balance" principle.

      Thank you for this insightful question and suggestion. The current study builds upon our previous work on CRISPR–Cas PAM recognition modeling using UniDesign (PMID: 37078688), in which eight Cas9 proteins and two Cas12 proteins (each has a different PAM) were investigated. Our computational results demonstrated that UniDesign can effectively capture the mutual preferences between natural PAMs and native PAM-interacting amino acids (PIAAs). For example, UniDesign accurately predicted the canonical PAMs of SpCas9 and SaCas9 as NGG and NNGRRT, respectively; conversely, given their canonical PAMs, UniDesign successfully recapitulated the corresponding PIAAs in both systems.

      These findings provide the foundation for the present study and motivate our selection of SaCas9 as a representative system to explore PAM relaxation, thereby further demonstrating UniDesign’s predictive power through experimental validation. Although we did not perform similar PAM relaxation designs for other Cas9 or Cas12 proteins, we believe that the UniDesign framework is broadly generalizable and can be readily extended to these systems. We have included additional discussion to clarify this point and highlight the broader applicability of our design strategy.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes the fully in silico design of a new variant of Staphylococcus aureus Cas9 (SaCas9) using an improved UniDesign workflow.

      The design strategy consists of three sequential steps:

      (1) reducing positional bias at PAM position 3;

      (2) restoring DNA binding through nonspecific interactions;

      (3) combining individually favorable substitutions.

      The overall pipeline is conceptually elegant and logically structured, and the genome-editing activity of the designed variants is comprehensively characterized. The resulting KRH variant exhibits relaxed PAM specificity, expanding the targeting range of SaCas9 across diverse cell types. Notably, the KRH variant demonstrates performance comparable to that of the evolution-derived KKH variant, underscoring the effectiveness of the proposed computational design framework.

      Strengths:

      The design pipeline is entirely computational and does not rely on experimental data for pretraining or iterative optimization.

      We thank the reviewer for the concise and accurate summary of our manuscript.

      Weaknesses:

      The computationally generated KRH mutant differs from the experimentally evolved KKH variant by only a single residue, which may reflect insufficient exploration of the available sequence space.

      Thank you for this insightful critique. In the present study, our strategy was not to allow UniDesign to freely explore all 27 mutable positions simultaneously, but rather to constrain the search to point mutations (e.g., double or triple mutants) within the full sequence space (approximately 20<sup>27</sup>). Even with this constraint, UniDesign effectively samples a substantially large design space compared to traditional protein engineering approaches.

      Through iterative design, we observed that only certain residue types became enriched at a subset of positions when identifying effective double mutants. These enriched residues were then systematically combined to generate performance-enhancing triple mutants in an automated manner. Although we ultimately selected the KRH mutant for experimental validation due to its high similarity to the known KKH variant, UniDesign also proposed additional multi-mutants that are distinct from KKH (see Figure 1M).

      Reviewer #3 (Public review):

      Summary:

      This study reports KRH, a SaCas9 variant computationally engineered via UniDesign to recognize an expanded NNNRRT PAM with substantially enhanced editing efficiency at non-canonical sites. KRH achieves genome- and base-editing efficiencies comparable to or exceeding the evolution-derived KKH variant across multiple human cell types, demonstrating that computational design can effectively remodel PAM specificity while preserving nuclease activity.

      Strengths:

      The research follows a clear line of reasoning, and the results appear sound. The computational design strategy presented offers a valuable alternative to directed evolution, with potential applicability beyond Cas9 engineering.

      We thank the reviewer for the concise and accurate summary of our manuscript.

      Weaknesses:

      The benchmarking of the UniDesign method is insufficient. How its performance compares to other protein design algorithms, whether the energy function parameters were systematically optimized, and if the design strategy can be generalized to other Cas9 orthologs or genome engineering tasks.

      Thank you for this valuable critique. The present study builds upon our previous work on CRISPR–Cas PAM recognition modeling using UniDesign (PMID: 37078688), in which many of these concerns were systematically addressed. In that study, UniDesign was benchmarked against Rosetta, a well-established protein design platform, across eight Cas9 proteins and two Cas12 proteins, each recognizing distinct PAM sequences.

      Our results demonstrated that UniDesign effectively captures the mutual preferences between natural PAMs and native PAM-interacting amino acids (PIAAs) across these CRISPR–Cas systems. For example, UniDesign accurately predicted the canonical PAMs of SpCas9 and SaCas9 as NGG and NNGRRT, respectively; conversely, given their canonical PAMs, UniDesign successfully recapitulated the corresponding PIAAs in both systems.

      These findings provide the foundation for the present study and motivate our selection of SaCas9 as a representative system to explore PAM relaxation, thereby further demonstrating UniDesign’s predictive power through experimental validation. Although we did not perform analogous PAM relaxation designs for other Cas9 or Cas12 proteins in this work, we believe that the UniDesign framework is broadly generalizable and can be readily extended to these systems. We have incorporated additional discussion in the revised manuscript to address these points and clarify the broader applicability of our approach.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) SaCas9 is highlighted for its AAV compatibility, but the manuscript does not further discuss how the KRH variant may benefit AAV-based genome editing applications. A brief discussion on how expanded PAM compatibility could facilitate target selection in AAV-constrained therapeutic settings would strengthen the translational relevance of the work, potentially reducing the need for split-Cas9 or dual-vector strategies.

      Thank you for your helpful suggestion. We have added a brief discussion in the revised manuscript highlighting how the KRH variant’s expanded PAM compatibility may enhance AAV-based genome editing applications. Specifically, this property can broaden the range of targetable genomic sites and may reduce the need for split-Cas9 or dual-vector delivery strategies in size-constrained AAV therapeutic contexts.

      (2) The study shows that a fully computational workflow can recapitulate the performance of an evolution-derived variant. A short discussion comparing the scalability and practical advantages of computational design versus directed evolution for future PAM engineering would help emphasize the broader methodological significance of UniDesign.

      Thank you for your valuable suggestion. We have added a brief discussion in the revised manuscript comparing the scalability and practical advantages of computational design with directed evolution for PAM engineering. Specifically, we highlight that UniDesign enables rapid and scalable exploration of sequence space without requiring iterative experimental screening, thereby offering a complementary—and in some cases more efficient—approach to directed evolution for future protein engineering applications.

      (3) The noticeable variation in editing efficiency across cell types, particularly the lower activity in A549 cells. Could the authors explain why the differences in editing efficiency are so large?

      Thank you for this insightful comment. We agree that the variation in editing efficiency across cell types—particularly the lower activity observed in A549 cells—warrants clarification, and we have added a corresponding discussion in the revised manuscript. We attribute this observation to two main factors. First, transfection efficiency varies substantially across cell lines; in our experiments, A549 cells exhibited lower transfection efficiency compared to HEK293T, HeLa, and U2OS cells, which likely contributes to the reduced editing efficiency. Second, the intrinsic performance of genome editing systems can differ across cellular contexts due to variations in DNA repair pathways, including chromatin accessibility and the expression levels of key repair-related genes. Importantly, despite this cell-type-dependent variability in absolute editing efficiency, the KRH variant consistently outperformed wild-type SaCas9 across all tested cell lines, underscoring the robustness and general applicability of our design.

      (4) Given that the computationally generated KRH mutant differs from the experimentally evolved KKH variant by only a single residue, it would be valuable to discuss whether R968 (or saturation mutations at this site) has previously been explored experimentally, and to elaborate on strategies for further expanding the diversity of mutations identified through the computational design framework.

      Thank you for your suggestion. We have added a brief discussion in the manuscript noting that, to the best of our knowledge, R968 has not been experimentally characterized prior to this study. It was identified solely through our computational design workflow, highlighting the strength of our approach.

      Reviewer #3 (Recommendations for the authors):

      (1) During the protein amino acid conformational sampling process in UniDesign, were nucleic acid conformational changes taken into consideration?

      Thank you for this question. Nucleic acid conformational changes were not explicitly considered during the protein sequence design stage in UniDesign after the four specific PAM variants (e.g., TTAGGT, TTCGGT, TTGGGT, and TTTGGT) were defined. We consider this assumption reasonable, as the base conformations in these PAM sequences are expected to remain largely stable, with minimal structural variation due to preserved base-stacking interactions.

      (2) The authors used a mutation-count penalty to control the number of mutations generated during the design process, which appears to occasionally yield results that exceed the intended limit. Is this an efficient approach? Could the count be controlled more directly by imposing constraints within the design procedure itself?

      Thank you for these insightful questions. You are correct that the design process may occasionally yield variants exceeding the intended mutation limit. This occurs because the mutation-count penalty is implemented as a soft constraint, where violations incur a penalty rather than being strictly excluded. Based on our benchmarking, this strategy—combined with the duplicate-design penalty—has been effective in generating multimutant variants with mutation counts close to the desired range. However, we acknowledge that this approach may not achieve optimal efficiency. We are currently developing improved strategies in UniDesign to more directly control mutation counts by incorporating explicit constraints during the sequence simulation process, which we expect will further enhance design precision and efficiency.

      (3) Is the new version of UniDesign developed specifically for the Cas9 design task in this study? What are its advantages and disadvantages compared to other state-of-the-art protein design algorithms?

      Thank you for this important question. The new version of UniDesign (v1.2) was not developed specifically for Cas9 engineering. Rather, it is intended as a general framework for protein engineering tasks that focus on introducing point mutations to improve protein properties, as opposed to de novo design. Compared to current state-of-the-art protein design methods—many of which are deep learning–based—UniDesign offers distinct advantages and limitations. Deep learning approaches are often highly efficient and powerful but may lack interpretability in their predictions. In contrast, UniDesign is a well-benchmarked, lightweight, physics-based method that provides greater interpretability, allowing users to better understand the underlying basis of the design decisions. On the other hand, a limitation of UniDesign is that it is less straightforward to incorporate experimental feedback for iterative refinement, such as fine-tuning the scoring function for specific design tasks.

      (4) The study employed a three-round design process to obtain the mutants. Is there a conformational correlation between the mutation sites identified in these three rounds? Could this have been accomplished in a single computational run instead of three separate calculations?

      Thank you for these insightful questions. We adopted a multi-round design strategy for SaCas9 PAM relaxation because this task inherently involves multi-objective optimization: enhancing PAM compatibility—particularly relaxing base recognition at the third PAM position—while preserving editing activity comparable to wild-type SaCas9. In our view, identifying the key mutations (e.g., E782K, N968R, and R1015H) in a single UniDesign run would be highly challenging due to competing energetic requirements. In the first round, R1015H emerged from single-site mutational scanning as the most favorable PAM-relaxing mutation based on its minimal MAD score. However, this mutation also significantly increased the binding energy relative to wild-type SaCas9 with its native PAM, suggesting a likely reduction in editing activity due to weakened binding. To address this, the second round focused on compensatory mutations. Variants such as E782K and N968R (along with several additional candidates) were identified in the context of R1015H to reduce binding energy and partially restore affinity. In the third round, we further combined compatible mutations from the second round, resulting in variants that more effectively lowered binding energy and restored it to levels comparable to wild-type SaCas9 with its native PAM. Notably, the design objectives in rounds one and two drive binding energy in opposite directions, making it unlikely that all key mutations could be identified simultaneously in a single run. During the design process, we also observed conformational correlations among mutation sites. For example, R1015H can form hydrogen-bonding interactions with residue E993, and we observed multiple alternative mutations at position 993 (e.g., E993S, E993P, E993A, E993G, E993K, and E993R), suggesting local structural coupling between these positions.

      (5) In Figure 4D, for the FANCF-1 site, there appears to be a noticeable difference in editing efficiency between KKH-ABE and KRH-ABE. Is this difference statistically significant? If so, please provide an explanation for this observation.

      Thank you for this question. For the FANCF-1 site shown in Figure 4D, we performed statistical analyses and found that the differences in editing efficiency between KKH-ABE and KRH-ABE are not statistically significant: P(A4) = 0.1239, P(A10) = 0.0671, P(A12) = 0.0942, and P(A13) = 0.1349 (two-tailed unpaired Student’s t-test). These results indicate that KRH-ABE and KKH-ABE exhibit comparable editing efficiencies at this site, supporting our overall conclusion that the computationally designed KRH variant achieves performance on par with the KKH variant.

      (6) Does the evolutionary term within the UniDesign scoring function bias the designed sequences towards pre-existing protein features?

      Thank you for this question. In this study, as well as in our previous work on Cas9 PAM recognition modeling (PMID: 37078688), the evolutionary term in the UniDesign scoring function was completely disabled. Therefore, it does not introduce any bias toward pre-existing protein features in the designed sequences.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their thoughtful and constructive feedback. We appreciate that all reviewers recognized the value of our study in linking adult neurogenesis and synaptic plasticity to representational drift in the olfactory system. They described the model as elegant and well-motivated, and agreed that it provides new theoretical insight into how stability and adaptability can coexist in sensory representations. The reviewers also identified areas where our manuscript could be strengthened, and as outlined in our revision plan we have:

      (1) Refined our description of mitral/tufted cell stability and expand on within-session and across-day variability.

      (2) Substantially expanded the Discussion to compare our modeling assumptions with experimental findings and recent anatomical evidence. Additionally, we have included the limitations of the study and areas for future investigation.

      (3) Included a clearer description of the STDP implementation, plastic synapses, and their functional effects.

      (4) Add a short section outlining model-based predictions that can guide future experiments. We also made minor textual edits to improve precision and flow, including citing prior conceptual work and clarifying model procedures.

      These changes have strengthened both the conceptual framing and technical clarity of the paper. We are grateful for the reviewers’ careful reading and valuable suggestions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors build a network model of the olfactory bulb and the piriform cortex and use it to run simulations and test their hypotheses. Given the model's settings, the authors observe drift across days in the responses to the same odors of both the mitral/tufted cells, as well as of piriform cortex neurons. When representing the M/T and PCx responses within a lower-dimensional space, the apparent drift is more prominent in the PCx, while the M/T responses appear in comparison more stable. The authors further note that introducing spike-time dependent plasticity (STDP) at bulb synapses involving abGCs slows down the drift in the PCx representations, and further link this to the observation that repeated exposure to the same odorant slows down drift in the piriform cortex.

      The model is clearly explained and relies on several assumptions and observations:

      (1) Random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity.

      (2) Higher dimensionality of piriform cortex representations compared to M/T responses, which enables superior decoding of odor identity in the piriform cortex.

      (3) Spike time-dependent plasticity (STDP) at synapses involving the abGCs.

      The authors address an open topical problem, and the model is elegant in its simplicity. I have however, several major concerns with the hypotheses underlying the model and with its biological plausibility.

      Concerns:

      (1) In their model, the authors propose that MTC remain stable at the population level, despite changes in individual MTC responses.

      The authors cite several experimental studies to support their claims that individual MTC responses to the same odors change (some increase, some decrease) across days. Interpreting the results of these studies must, however, take into account the variability of M/T responses across odor presentation repeats within the same session vs. across sessions. In the Shani-Narkiss et al., Frontiers in Neural Circuits, 2023 study referenced, a large fraction of the variability across days in M/T responses is also observed across repeats to the same odorant in the same session (Shani-Narkiss et al., Figure 4), while the authors have M/T responses in the same session that are highly reproducible. This is an important point to consider and address, since it constrains how much of the variability in M/T responses can be attributed to adult neurogenesis in the olfactory bulb versus to other networks' inhibitory mechanisms, which do not rely on neurogenesis. In the authors' model, the variability in M/T responses observed across days emerges as a result of adult-born neurogenesis, which does not need to be the main source of variability observed in imaging experiments (Shani-Narkiss et al., Figure 4).

      We agree with the reviewer and believe this is a critical discussion point. Indeed, both in Shani-Narkiss et al, Kay and Laurent, 1999, and in our lab, we observe trial-to-trial variability that occurs in the same recording session; as the reviewer correctly points out, this cannot be due to neurogenesis. These fluctuations may be trial to-trial noise, or reflect dynamics associated with other behaviors such as running (Chockanathan, et al. 2021) and decision making (Kay and Laurent, 1999). There is growing repertoire of literature showing that neural variability in early sensory coding appears to depend on behavioral fluctuations and internal states (Niell and Stryker for example). This variability that happens within a session in the Shani-Narkiss et al work may reflect some of these behaviorally relevant features of early olfactory coding, something that our model cannot account for. This is an excellent discussion point and we have included text (line 153-157, and line 321-330) in the manuscript to note this aspect of the data and how one can think of it in the context of our results.

      Another study (Kato et al., Neuron, 2012, Figure 4) reported that mitral cell responses to odors experienced repeatedly across 7 days tend to sparsen and decrease in amplitude systematically, while mitral cell responses to the same odor on day 1 vs. day 7 when the odor is not presented repeatedly in between seem less affected (although the authors also reported a decrease in the CI for this condition). As such, Kato et al. mostly report decreases in mitral cell odor responses with repeated odor exposure at both the individual and population level, and not so much increases and decreases in the individual mitral cell responses, and stability at the population level.

      Thank you for raising this important point regarding the findings of Kato et al. (2012). We agree that their results suggest increased sparsening and stability in M/T cell odor responses with repeated exposure. However, as noted in Yamada et al. (2017), the experimental literature on this question remains mixed. Yamada and colleagues reported a “drastic reorganization of ensemble odor representation” across days and emphasized that “sensory experience does not necessarily cause a major sparsening of the odor response,” explicitly contrasting their findings with those of Kato et al. (2012).

      Our model captures the dynamics observed in Yamada et al. (2017), providing a mechanistic explanation for how significant reorganization can emerge in M/T ensembles despite stable low-dimensional population structure. In both Yamada et al (2017) and Kato et al (2012) the investigators have nuanced differences in experimental design (method of head fixation, behavioral paradigm used, training etc.), all of which are known to affect olfactory responses and therefore the degree of sparsity and overlap in population codes. Our model does not include any of these behavioral features that may differentially engage the olfactory circuit and thus affect population responses. Notably, in previous work, we highlight how even simple changes to top down feedback that reflect one phenomenological manipulation to functional connectivity in the olfactory circuit could have disparate effects on the degree of sparsity in neural representations over time whereby this manipulation would be activated by some behavior broadly. In our current model, there is no behavior that would allow us to study the critical features of the neural activity code in the M/T cells. Instead we focus on one specific aspect, adult neurogenesis which we can explicitly manipulate and affect in a biologically meaningful way. The review’s point however is well taken and important, and we have added text to the Discussion (line 336-344) to highlight the differing experimental outcomes and to clarify how our model aligns with the Yamada et al. results.

      (2) In Figure 1, a set of GCs is killed off, and new GCs are integrated in the network as abGC. Following the elimination of 10% of GCs in the network, new cells are added and randomly assigned synaptic weights between these abGCs and MTC, GCs, SACs, and top-down projections from PCx. This is done for 11 days, during which time all GCs have gone through adult neurogenesis.

      Is the authors' assumption here that across the 11 days, all GCs are being replaced? This seems to depart from the known biology of the olfactory bulb granule cells, i.e., GCs survive for a large fraction of the animal's life.

      Thank you for raising this important point regarding the lifespan of granule cells (GCs). We agree that developmentally born GCs are not fully replaced. Indeed, multiple studies indicate that some developmentally born GCs can survive for very long periods, up to 18-24 months, essentially the lifetime of the animal (Kaplan, 1985; Petreanu & Alvarez-Buylla, 2002). However, the fraction of total GCs that such long lived GCs constitute remains an open question, in part because of challenges to measure the lifetime survival of newborn neurons. What there is consensus on is the significant size of the granule-cell population undergoing continuous turnover through adult neurogenesis (reviewed in Lepousez et al., 2013).

      We should clarify that we do not assume that 100% of the granule cell population turns over in an 11 day period. We use “day” to represent a static epoch over which we can implement plasticity rules across two time scales. Critically, we also randomize the turnover treating every cell in the GC population as equally likely to be replaced. Prior experimental evidence suggests that some GCs are more likely to persist (possibly as a result of experience, Magavi et al., 2005) which may in some regards make our result on stabilization following repeated sensory exposure more dramatic (as the GCs that show the largest change following STDP may also be the ones that are the most stable, and therefore least likely to turnover). We do not include this in our model as we could not identify a framework for “selecting” which GCs would persist that would not be tautological. The point the reviewer raises is critical, and a discussion of these points is warranted - which we now include in the manuscript (line 352-361).

      Additionally, there is some evidence that behaviors, such as novelty, can increase the rate of adult neurogenesis (Kamimura et al., 2022, H.van Praag et al.,1999, Gheusi and Lledo., 2014) , suggesting a complex reciprocal relationship between the mechanisms that generate the cells shaping how olfactory stimuli are encoded for and the encoding process itself; our model also does not include any of these dynamic features which represent an additional layer of complexity, which may further provide an intermediate time scale, one of behavioral selection and action, that is slower than the milliseconds on which spike time dependent plasticity happens, but faster than the time scale of neurogenesis. We include this point in the discussion also (line 352-361). 

      Our 11-day simulation however is designed to uncover how plasticity across multiple timescales (STDP and adult neurogenesis) at the network level shapes odor representations as multiple rounds of GC turnover occur. Changing the timescale and magnitude replacement in the simulations (either in terms of days or percent cells replaced) would affect the degree to which drift happens, but not phenomenon. Additionally, the representational structure in our model at intermediate time points (e.g., days 8~10) would correspond well to scenarios in which some fraction of developmentally born GCs persists in the circuit. Thus, our simulations span a range of possible empirical regimes, from high turnover to partial preservation. We have added discussion to the revised manuscript (line 352-361) clarifying this point and acknowledging the biological heterogeneity in GC lifespans.

      (3) The authors' model relies on several key assumptions: random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity. These assumptions are not necessarily accurate, as recent work revealed structure in the projections from the olfactory bulb to the piriform cortex and structure within the piriform cortex connectivity itself (Fink et al., bioRxiv, 2025; Chae et al., Cell, 2022; Zeppilli et al., eLife, 2021).

      How do the results of the model relating adult neurogenesis in the bulb to drift in the piriform cortex representations change when considering an alternative scenario in which the olfactory bulb to piriform and intra-piriform connectivity is not fully distributed and indistinguishable from random, but rather is structured?

      Thank you for pointing us to these important studies. We fully agree with the reviewer that the structure of the olfactory system might not be purely random, but we do not believe these papers contradict the level of abstraction used in our model.

      Zeppilli et al. (2021) map molecularly defined projection neuron subtypes and their preferential targeting of different cortical and subcortical regions, but they do not report any fine-scale topographic organization of bulb → piriform connectivity that would contradict a view of randomly distributed input to piriform cortex. Studies from our lab using retrograde tracers in the blub show some spatial clustering of piriform cortical neurons whose axons project to the bulb (Padmanabhan et al., 2016, 2019), but these studies do not identify any “functional organization” or structure. Chae et al., (2022) focus on distinct long-range functional loops (mitral ↔ piriform vs tufted ↔ AON) and the differential role of cortical feedback, but again, at the level of cortical regions rather than individual cells and connectivity. Notably, our model does not consider AON.

      Finally, Fink et al. (2025) reports a “like-to-like” excitatory connectivity motif within the piriform cortex and an experience-dependent reorganization of inhibitory synapses. As the authors note, “... this like-to-like motif is unlikely to reflect common input from the olfactory bulb”, so it does not conflict with our assumption of broadly random bulb → piriform input. This “like-to-like” motif is reflected in our model by wiring a certain subpopulation of piriform cells. On the other hand, we agree that the experience dependent changes in inhibitory connectivity within PCx are highly relevant for learning related plasticity but fall outside the scope of our study. We intentionally omitted piriform plasticity to isolate the contributions of adult neurogenesis in the bulb and plasticity acting on adult-born granule cells. But incorporating such cortical plasticity is an important direction for future work. We added a discussion (line 395-405) on this important point raised by the reviewer in the revised manuscript.

      (4) I didn't understand the logic of the low-dimensional space analysis for M/T cells and piriform cortex neurons (Figures 2 & 3). In the authors' model, the full-ensemble M/T responses are reorganized over time, presumably due to the adult-born neurogenesis. Analyzing a lower-dimensional projection of the ensemble trajectories reveals a lower degree of re-organization. This is the same for the piriform cortex, but relatively, the piriform ensembles displayed in a low-dimensional embedding appear to drift more compared to the M/T ensembles.

      This analysis triggers a few questions: which representation is relevant for the brain function - the high or the low-dimensional projection? What fraction of response variance is included in the low-dimensional space analysis? How did the authors decide the low-dimensional cut-off? Why does STDP cause more drift in piriform cortex ensembles vs. M/T ensembles? Is this because of the assumed higher dimensionality of the piriform cortex representations compared to the mitral cells?

      Thank you for these thoughtful questions. We clarify the logic and purpose of the low-dimensional analyses and address each point below.

      (1) Which representation is relevant for brain function, the high-dimensional or low-dimensional one?

      We believe both representations are meaningful, with each capturing different aspects of the neural code. The high-dimensional activity reflects the full variability of individual cell responses, while the low-dimensional projection captures the dominant population level components that downstream areas are most likely to use for readout. We found that the low-dimensional representations are more stable in the bulb than in PCx, suggesting that information is used differentially between the two areas. The bulb provides a stable, sensory-anchored population code that reliably represents odor identity over time, consistent with both electrophysiological and behavioral studies (Nagayama et al., 2004, Chen et al., 2009, Davison and Katz, 2007, Cavaretta et al., 2018). This is consistent with its role as the first stage of information processing in the olfactory system which provides faithful representations that downstream circuits receive. The piriform cortex, by contrast, transforms this stable input into a more flexible representation. Drift in its low-dimensional space may reflect ongoing plasticity (Schoonover et al., Nature, 2021), integration of contextual signals, or higherdimensional computations characteristic of PCx (Fink et al., bioRxiv, 2025), suggesting its role more as an associative cortex instead of a pure sensory cortex.

      (2) What fraction of variance is included in the low-dimensional space, and how was the cutoff chosen?

      In our simulations, these PCs captured the majority of variance relevant for odor identity (~60–70% for M/T cells and ~55–65% for piriform cortex). We now report these fractions explicitly in Methods (line 937-939).

      (3) Why does STDP cause more drift in piriform-cortex ensembles than in M/T ensembles? Does this reflect higher dimensionality in piriform cortex?

      In our model, STDP does not cause more drift in PCx. It actually reduces drift and stabilizes PCx representations relative to the condition without STDP (as shown in Fig. 4C2). STDP has a much smaller effect in the bulb because: (1) M/T cells continue to receive stable odor input from the glomeruli and (2) the low-dimensional M/T representation is already stable even without plasticity. We have edited the manuscript to reiterate this point in both the results and discussion.

      The reviewer is correct that the piriform cortex naturally exhibits more drift than the bulb, and their comment that this is due to its substantially higher representational dimensionality is spot on. The PCx contains many more neurons, receives highly divergent OB → PCx inputs, and has dense recurrent connectivity, all of which create many more degrees of freedom through which representations can drift. Additionally, because individual PCx neurons are sampling from a substantially more diverse combinatorial space of inputs (include feedback to piriform from an array of regions, Illig, 2005, Majak et al., 2004, Chapuis et al., 2013), the “dimensionality” of the population code is likely higher dimensional. While STDP stabilizes the dimensions of the PCx representation that are reinforced during plasticity, due to the large number of orthogonal dimensions available, some residual drift remains. Additionally, as the reviewer notes, there are some forms of plasticity, such as inhibitory plasticity in PCx that are not included in the model, that may also have an impact on both the representations, and the underlying dimensionality of those representations. We include these points in the discussion (line 381-394).

      (5) Could the authors comment whether STDP at abGC synapses and its impact on decreasing drift represent a new insight, and also put it into context? Several studies (e.g., Lledo, Murthy, Komiyama groups) reported that abGC integrates in the network in an activity-dependent manner, and not randomly, and as such stabilizes the active neuronal responses, which is consistent with the authors' report.

      Related, I couldn't find through the manuscript which synapses involving abGCs they focus on, or what is the relative contribution of the various plastic synapses shown in the cartoon from Figure 4 A1 (circles and triangles).

      We thank the reviewer for raising this question. As the reviewer pointed out, several studies have shown that abGCs integrate into the bulb circuit in an activity dependent manner. They preferentially form synapses onto mitral/tufted cells that respond to behaviorally important odors, this “selection of surviving cells” is not included in our model. Instead, we use STDP at the synaptic level. This is of course not analogous, but provides a computational framework wherein the selection of surviving abGCs could be incorporated in future studies. It is perhaps notable that in our large scale simulations, synaptic changes at the population level may reflect some of this activity-dependent selection.

      To that end, our model provides a new insight and suggests a broader function for adult neurogenesis. For example, when certain odors are reinforced in an activity dependent manner, abGCs born during that period may stabilize the circuits that respond to those odors. The resulting reduction of drift would help keep the representation of those odors stable over time, even while other parts of the circuit continue to change. We now highlight this idea in the Discussion (line 366-373).

      For the second part of the question: in our model, STDP acts on two sets of connections. It applies to the synapses onto abGCs from M/T cells, GC/SAC cells, and PCx neurons. It also applies to the synapses that abGCs project to, including those onto M/T cells and GC/SAC cells. We have clarified this in the revised Methods (line 10011004).

      (6) The study would be strengthened, in my opinion, by including specific testable predictions that the authors' models make, which can be further food for thought for experimentalists.

      How does suppression of adult-born neurogenesis in the OB impact the stability of mitral cell odor responses? How about piriform cortex ensembles?

      We appreciate the reviewer’s suggestion and formalize the following two predictions from our model:

      Prediction 1: Suppressing adult neurogenesis will reduce spontaneous representational drift in the PCx. Increasing spike-timing-dependent plasticity during periods of experience with a specific odor will selectively stabilize representations of that odor.

      Prediction 2: Adult neurogenesis will not affect AON representations of odor identity or concentration in the same way that PCx representations are altered and drift.

      We include these two ideas in the discussion as experimentally testable predictions.

      Reviewer #2 (Public review):

      Summary:

      The authors address a critical problem in olfactory coding. It has long been known that adult neurogenesis, specifically in the form of adult-born granule cells that embed into the existing inhibitory networks on the olfactory bulb, can potentially alter the responses of Mitral/Tufted neurons that project activity to the Piriform Cortex and to other areas of the brain. Fundamentally, it would seem that these granule cells could alter the stability of neural codes in the OB over time. The authors develop a spiking network model to explore how stability can be achieved both in the OB over time and in the PC, which receives inputs. The model recapitulates published activity recordings of M/T cells and shows how activity in different M/T cells from the same glomerulus shifts over time in ways that, in spite of the shift, preserve population/glomerular level codes. However, these different M/T cells fan out onto different pyramidal cells of the PC, which gives rise to instability at that level. STDP then, is necessary to maintain stability at the PC level as long as odor environments remain constant. These results may also apply to a similar neurogenesis-based change in the Dentate Gyrus, which generates instability in CA1/3 regions of the hippocampus

      Strengths:

      A robust network model that untangles important, seemingly contradictory mechanisms that underlie olfactory coding.

      Weaknesses:

      The work is a significant contribution to understanding olfactory coding. But the manuscript would benefit from a brief discussion of why neurogenesis occurs in the first place - e.g., injury, ongoing needs for plasticity, and adapting to turnover of ORNs. There is literature on this topic. It seems counterintuitive to have a process in the MOB (and for that matter in the DG) that potentially disrupts the ability to generate stable codes both in the MOB and PC, and in particular a disruption that requires two different mechanisms - multiple M/T cells per glomerulus in the MOB and STDP in the PC - to counteract.

      We appreciate the reviewer’s suggestion and added discussion on this point in the revised manuscript (line 431-435).

      Given that neurogenesis has an important function, and a mechanism is in place to compensate for it in the MOB, why would it then be disrupted in fan-out projections to the PC? The answer may lie in the need for fan-out projections so that pyramidal neurons in the PC can combinatorially represent many different inputs from the MOB. So something like STDP would be needed to maintain stability in the face of the need for this coding strategy.

      This kind of discussion, or something like it, would help readers understand why these mechanisms occur in the first place. It is interesting that PC stability requires that odor environments be stable, and that this stability drives PC representational stability. This result suggests experimental work to test this hypothesis. As such, it is a novel outcome of the research.

      We agree with the reviewer. The fan-out from the bulb to the piriform cortex is essential for the combinatorial coding that allows PCx neurons to represent many odor features and mixtures. This architecture gives the piriform cortex great coding capacity, but it also makes the system sensitive to small changes in its inputs. As a result, drift that originates in the bulb can spread more easily in PCx. A stabilizing mechanism is therefore needed downstream. In our model, STDP provides this stabilization by reinforcing the dimensions that carry meaningful odor structure. This allows the piriform cortex to keep a stable population code even when its inputs change over time. Neurogenesis supplies the flexibility, the fan-out supplies the expressive power, and STDP supplies the stability. All three elements work together to support a system that must recognize odors reliably while still adapting to new sensory experiences. We have added discussion on this point in the revised manuscript (line 395-405).

      Reviewer #3 (Public review):

      Summary

      The authors set out to explore the potential relationship between adult neurogenesis of inhibitory granule cells in the olfactory bulb and cumulative changes over days in odorevoked spiking activity (representational drift) in the olfactory stream. They developed a richly detailed spiking neuronal network model based on Izhikevich (2003), allowing them to capture the diversity of spiking behaviors of multiple neuron types within the olfactory system. This model recapitulates the circuit organization of both the main olfactory bulb (MOB) and the piriform cortex (PCx), including connections between the two (both feedforward and corticofugal). Adult neurogenesis was captured by shuffling the weights of the model's granule cells, preserving the distribution of synaptic weights. Shuffling of granule cell connectivity resulted in cumulative changes in stimulus-evoked spiking of the model's M/T cells. Individual M/T cell tuning changed with time, and ensemble correlations dropped sharply over the temporal interval examined (long enough that almost all granule cells in the model had shuffled their weights).

      Interestingly, these changes in responsiveness did not disrupt low-dimensional stability of olfactory representations: when projected into a low-dimensional subspace, population vector correlations in this subspace remained elevated across the temporal interval examined. Importantly, in the model's downstream piriform layer, this was not the case. There, shuffled GC connectivity in the bulb resulted in a complete shift in piriform odor coding, including for low-dimensional projections. This is in contrast to what the model exhibited in the M/T input layer. Interestingly, these changes in PCx extended to the geometrical structure of the odor representations themselves. Finally, the authors examined the effect of experience on representational drift. Using an STDP rule, they allowed the inputs to and outputs from adult-born granule cells to change during repeated presentations of the same odor. This stabilized stimulus-evoked activity in the model's piriform layer.

      Strengths

      This paper suggests a link between adult neurogenesis in the olfactory bulb and representational drift in the piriform cortex. Using an elegant spiking network that faithfully recapitulates the basic physiological properties of the olfactory stream, the authors tackle a question of longstanding interest in a creative and interesting manner. As a purely theoretical study of drift, this paper presents important insights: synaptic turnover of recurrent inhibitory input can destabilize stimulus-evoked activity, but only to a degree, as representations in the bulb (the model's recurrent input layer) retain their basic geometrical form. However, this destabilized input results in profound drift in the model's second (piriform) layer, where both the tuning of individual neurons and the layer's overall functional geometry are restructured. This is a useful and important idea in the drift field, and to my knowledge, it is novel. The bulb is not the only setting where inhibitory synapses exhibit turnover (whether through neurogenesis or synaptic dynamics), and so this exploration of the consequences of such plasticity on drift is valuable. The authors also elegantly explore a potential mechanism to stabilize representations through experience, using an STDP rule specific to the inhibitory neurons in the input layer. This has an interesting parallel with other recent theoretical work on drift in the piriform (Morales et al., 2025 PNAS), in which STDP in the piriform layer was also shown to stabilize stimulus representations there. It is fascinating to see that this same rule also stabilizes piriform representations when implemented in the bulb's granule cells.

      The authors also provide a thoughtful discussion regarding the differential roles of mitral and tufted cells in drift in piriform and AON and the potential roles of neurogenesis in archicortex.

      In general, this paper puts an important and much-needed spotlight on the role of neurogenesis and inhibitory plasticity in drift. In this light, it is a valuable and exciting contribution to the drift conversation.

      We appreciate the reviewer’s comment and thank them for their thoughtful feedback.

      Weaknesses

      I have one major, general concern that I think must be addressed to permit proper interpretation of the results.

      I worry that the authors' model may confuse thinking on drift in the olfactory system, because of differences in the behavior of their model from known features of the olfactory bulb. In their model, the tuning of individual bulbar neurons drifts over time.

      This is inconsistent with the experimental literature on the stability of odor-evoked activity in the olfactory bulb.

      In a foundational paper, Bhalla & Bower (1997) recorded from mitral and tufted cells in the olfactory bulb of freely moving rats and measured the odor tuning of well-isolated single units across a five-day interval. They found that the tuning of a single cell was quite variable within a day, across trials, but that this variability did not increase with time. Indeed, their measure of response similarity was equivalent within and across days. In what now reads as a prescient anticipation of the drift phenomenon, Bhalla and Bower concluded: "it is clear, at least over five days, that the cell is bounded in how it can respond. If this were not the case, we would expect a continual increase in relative response variability over multiple days (the equivalent of response drift). Instead, the degree of variability in the responses of single cells is stable over the length of time we have recorded." Thus, even at the level of single cells, this early paper argues that the bulb is stable.

      This basic result has since been replicated by several groups. Kato et al. (2012) used chronic two-photon calcium imaging of mitral cells in awake, head-fixed mice and likewise found that, while odor responses could be modulated by recent experience (odor exposure leading to transient adaptation), the underlying tuning of individual cells remained stable. While experience altered mitral cell odor responses, those responses recovered to their original form at the level of the single neuron, maintaining tuning over extended periods (two months). More recently, the Mizrahi lab (Shani-Narkiss et al., 2023) extended chronic imaging to six months, reporting that single-cell odor tuning curves remained highly similar over this period. These studies reinforce Bhalla and Bower's original conclusion: despite trial-to-trial variability, olfactory bulb neurons maintain stable odor tuning across extended timescales, with plasticity emerging primarily in response to experience. (The Yamada et al., 2017 paper, which the authors here cite, is not an appropriate comparison. In Yamada, mice were exposed daily to odor. Therefore, the changes observed in Yamada are a function of odor experience, not of time alone. Yamada does not include data in which the tuning of bulb neurons is measured in the absence of intervening experience.)

      Therefore, a model that relies on instability in the tuning of bulbar neurons risks giving the incorrect impression that the bulb drifts over time. This difference should be explicitly addressed by the authors to avoid any potential confusion. Perhaps the best course of action would be to fit their model to Mizrahi's data, should this data be available, and see if, when constrained by empirical observation, the model still produces drift in piriform. If so, this would dramatically strengthen the paper. If this is not feasible, then I suggest being very explicit about this difference between the behavior of the model and what has been shown empirically. I appreciate that in the data there is modest drift (e.g., Shani-Narkiss' Figure 8C), but the changes reported there really are modest compared to what is exhibited by the model. A compromise would be to simply apply these metrics to the model and match the model's similarity to the Shani-Narkiss data. Then the authors could ask what effect this has on drift in piriform.

      The risk here is that people will conclude from this paper that drift in piriform may simply be inherited from instability in the bulb. This view is inconsistent with what has been documented empirically, and so great care is warranted to avoid conveying that impression to the community.

      We thank the reviewer for highlighting this important issue. We agree that the interpretation of our model requires care to avoid implying that the olfactory bulb exhibits spontaneous drift. As the reviewer points out, the empirical literature shows that M/T-cell tuning is highly stable for infrequently experienced odors, but can change with daily, persistent odor exposure (e.g., Kato et al., 2012; Yamada et al., 2017).

      We thank the reviewer for highlighting the Bhalla and Bower paper, as it is foundational and actually raises a number of interesting and important points. As the authors noted, there was significant variability in trial-to-trial responses over sessions and days in single neurons. This is likely due to on-going dynamics (Laurent, 1999), the impact of behaviorally relevant top-down feedback (Chen and Padmanabhan, 2022), decision making (Kay and Laurent, 1999), and an array of factors that our model does not include. In that manuscript, the authors note “the variability of the same neuron recorded over different days…was not statistically different from the within day comparisons.” While these results appear prima facie to be different from our results, there are several reasons why they may not be the case.

      First, different metrics are used for measuring neuronal stability, which may contribute to some of the differences. Second, and perhaps more importantly and interestingly, the authors in that study noted the significant trial-to-trial variability within day, which is not present in our study because our model has none of the richness of behavior that Bhalla and Bower found in the freely behaving rat. This variability within day (which is much higher than what we report) would reduce the impact of drift across days - a result that would complicate how plasticity across multiple timescales occurs. We thank the reviewer for the insights on this critical study and include these points in our discussion (line 321-330).

      Neural responses to odor representations are incredibly variable across different time scales (Padmanabhan and Urban 2010, Angelo et al 2011, Kapoor and Urban 2006, Friedrich and Laurent, 2001, Smear et al 2011, Wesson et al 2008). In our model, none of this selection of survival related to behavior is included, nor are there specific rules about which synapses may be preferentially strengthened (due to neuro modulation corresponding to behavioral choice and reinforcement learning). Instead, we aimed to recapitulate the experimental design of a few studies (Kato et al 2012, Yamada et al, 2017) to understand how neurogenesis and drift are related. Over the simulated 10 days, the odor is presented every day, and the network is otherwise frozen between sessions—meaning the model lacks mechanisms that would normally support recovery during intervals without odor exposure. Under these conditions, adult neurogenesis effectively interacts with repeated experience, producing gradual changes in individual M/T-cell tuning. Thus, our results should be interpreted as modeling experience dependent changes over the timescale of neurogenesis, not as evidence for spontaneous drift in the bulb. We now state this explicitly in the Discussion to prevent confusion and expand the discussion to incorporate some of these critical ideas (line 321-330).

      Major comments (all related to the above point)

      (1) Lines 146-168: The authors find in their model that "individual M/T cells changed their responses to the same odor across days due to adult-neurogenesis, with some cells decreasing the firing rate responses (Fig.2A1 top) while other cells increased the magnitude of their responses (Fig. 2A2 bottom, Fig. S2)" they also report a significant decrease in the "full ensemble correlation" in their model over time. They claim that these changes in individual cell tuning are "similar to what has been observed by others using calcium imaging of M/T cell activity (Kato et al., 2012 and Yamada et al., 2017)" and that the decrease in full ensemble correlation is "consistent with experimental observations (Yamada et al., 2017)." However, the conditions of the Kato and Yamada experiments that demonstrate response change are not comparable here, as odors were presented daily to the animals in these experiments. Therefore, the changes in odor tuning found in the Kato and Yamada papers (Kato Figure 4D; Yamada Figure 3E) are a function of accumulated experience with odor. This distinction is crucial because experience-induced changes reflect an underlying learning process, whereas changes that simply accumulate over time are more consistent with drift. The conditions of their model are more similar to those employed in other experiments described in Kato et al. 2012 (Figure 6C) as well as Shani-Narkiss et al. (2023), in which bulb tuning is measured not as a function of intervening experience, but rather as a function of time (Kato's "recovery" experiment). What is found in Kato is that even across two months, the tuning of individual mitral cells is stable. What alters tuning is experience with odor, the core finding of both the Kato et al., 2012 paper and also Yamada et al., 2017. It is crucial that this is clarified in the text.

      We thank the reviewer. As the issue raised here is related to the previous comment, we have clarified this in the revised text to avoid any misleading comparison and specify what aspects of our computational model map onto experimental studies and what aspects we cannot recapitulate and as a result, the places where our comparisons are limited.

      (2) The authors show that in a reduced-space correlation metric, the correlation of lowdimensional trajectories "remained high across all days"..."consistent with a recent experimental study" (Shani-Narkiss et al., 2023). It is true that in the Shani-Narkiss paper, a consistent low-dimensional response is found across days (t-SNE analysis in Shani-Narkiss Figure 7B). However, the key difference between the Shani-Narkiss data and the results reported here is that Shani-Narkiss also observed relative stability in the native space (Shani-Narkiss Figure 8). They conclude that they "find a relatively stable response of single neurons to odors in either awake or anesthetized states and a relatively stable representation of odors by the MC population as a whole (Figures 6-8; Bhalla and Bower, 1997)." This should be better clarified in the text.

      We agree with the reviewer that some of the cells in Shani-Narkiss Figure 8B showed relatively stable responses (while others did not). However, there is a clear monotonic increase in the “Average differences” over time, from “Same day” to “1 month” to “6 month”, as quantified in their Figure 8B. Although the author concluded that they "find a relatively stable response of single neurons”, we would argue that their data also provided evidence for what we would term “relatively unstable responses” as found in our model. But per reviewer’s suggestion, we better clarify it in the text now (line 194197).

      (3) In the discussion, the authors state that "In the MOB, individual M/T cells exhibited variable odor responses akin to gain control, altering their firing rate magnitudes over time. This is consistent with earlier experimental studies using calcium-imaging." (L3146). Again, I disagree that these data are consistent with what has been published thus far. Changes in gain would have resulted in increased variability across days in the Bhalla data. Moreover, changes in gain would be captured by Kato's change index ("To quantify the changes in mitral cell responses, we calculated the change index (CI) for each responsive mitral cell-odor pair on each trial (trial X) of a given day as (response on trial X - the initial response on day 1)/(response on trial X + the initial response on day 1). Thus, CI ranges from −1 to 1, where a value of −1 represents a complete loss of response, 1 represents the emergence of a new response, and 0 represents no change." Kato et al.). This index will capture changes in gain. However, as shown in Figure 4D (red traces), Figure 6C (Recovery and Odor set B during odor set A experience and vice versa), the change index is either zero or near zero. If the authors wish to claim that their model is consistent with these data, they should also compute Kato's change index for M/T odor-cell pairs in their model and show that it also remains at 0 over time, absent experience.

      We appreciate the reviewer’s suggestion and edited the text to make it more accurate (line 319-320).

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Line 28 "a graduate alteration in sensory perception". We do not know if drift results in changes in perception. If anything, behavioral evidence suggests that perception remains stable in spite of drift. For example, in Driscoll et al. (2017) mice are able to successfully navigate a virtual T maze despite drift, and in Schoonover et al. (2021), mice maintain aversive responses following fear conditioning, despite drift in the piriform. Finally, spatial navigation appears unimpaired despite pronounced drift in the hippocampus (e.g., Climer et al., 2025). It would be more appropriate to say "stimulusevoked activity patterns" than "sensory perception" or other words that refer to neuronal activity rather than cognition or behavior.

      We edited the text to make it more accurate per the reviewer’s suggestion (line 27).

      (2) In the introduction, the authors state: "This representational drift has led to the hypothesis that PCx, rather than being a primary sensory area, may be more like an association cortical region." (L76-78). However, the hypothesis that PCx operates as an association cortex comes originally from Haberly's work and thinking (e.g., Haberly and Bower, 1984, elaborated in extensive detail in Haberly, 2001). I think it would be appropriate to acknowledge that here.

      We added the references to make acknowledge that per the reviewer’s suggestion (line 77).

      (3) In the methods, the authors elegantly describe how they induce neurogenesis in their model using weight reshuffling (L805-814). I think it could really help the reader understand the model if this idea were also included in the results section. As the results section currently reads, it seems as if their model implemented neurogenesis in a different fashion: "To do this, following elimination of 10% of the GCs in the network, we added new cells and randomly assigned synaptic weights between these abGCs and M/Ts". I appreciate that in their model, shuffling all the weights of a given GC randomly is akin to "elimination", but I feel like at first blush the results section risks giving an impression a bit different than that actually used in the model.

      We edited the text to make it more accurate per the reviewer’s suggestion (line 110-112).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work develops a simple, rapid, low-cost methodology for assembling combinatorially complete microbial consortia using basic laboratory equipment. The motivation behind this work is to make the study of microbial community interactions more accessible to laboratories that lack specialized equipment such as robotic liquid handlers or microfluidic devices. The method was tested on a library of Pseudomonas aeruginosa strains to demonstrate its practicality and effectiveness. It provided a means to explore the complex functional interactions within microbial communities and identify optimal consortia for specific functions, such as biomass production.

      The primary strength of this manuscript lies in its accessibility and practicality. The method proposed by the authors allows any laboratory with standard equipment, such as multichannel pipettes and 96-well plates, to readily construct all possible combinations of microbial consortia from a given set of species. This greatly enhances access to full factorial designs, which were previously limited to labs with advanced technology.

      Another strength of the manuscript is the measurement and analysis of the biomass of all possible combinations of 8 strains of P. aeruginosa. This analysis provides a concrete example of how the authors' new methodology can be used to identify the best-performing communities and map pairwise and higher-order functional interactions.

      Notably, the authors do exceptionally well in providing a thorough description of the methodology, including detailed protocols and an R script for customizing the method to different experimental needs. This enhances the reproducibility and adaptability of the methodology, making it a valuable resource for researchers wishing to adopt this methodology.

      We thank the reviewer for their thoughtful comments and positive assessment of our work. Below we detail the changes we have introduced in the manuscript to clarify issues raised by the reviewer.

      While the methodology is robust and well-presented, there are some limitations that should be acknowledged more thoroughly. First, the method's scalability is an important factor. The authors indicate that it should be effective for up to 10-12 species, but there is no discussion of what sets this scale: time, amount of labor, consumables, the likelihood of error, sample volume, etc.

      The 10-12 species estimation is based on our own experience implementing the protocol, and set primarily by time, labor, and consumables (as rightly pointed out by the reviewer) rather than conceptual limitations of the approach. We have added clarifications in the Discussion (lines 401-405) regarding these scalability-limiting factors.

      Second, this methodology is tailored to construct communities where the abundance of each strain is identical in each combination. Therefore, combinations with a different number of strains also differ in the total initial amount of microbial cells. Second, variations in the initial proportions of the same set of strains cannot be readily explored.

      Note that the “density homogenization” step is optional and it could be skipped entirely, which would result in a same species being present at variable densities across consortia: specifically, skipping this step would make the density of a species in a consortium inversely proportional to the number of species in that consortium. Further variations in initial abundance could be explored by treating a same strain at two (or more) starting abundances as distinct inputs of the protocol – though this would naturally increase the number of combinations to test.

      We have included a paragraph in the Discussion (lines 416-423) describing how we can, in principle, extend our protocol to explore abundance effects.

      Third, the manuscript only discusses how to construct the combinations, and not how to assay them afterward (e.g. for community function, interspecific interactions, etc.). While details on how to achieve these goals are clearly outside the scope of this work, the use of biomass as an example function may obfuscate this caveat, which should be stated more explicitly.

      We agree that the manuscript focuses exclusively on the construction of microbial communities and does not address how these communities should be assayed afterward. This is an intentional scope decision. The proposed protocol is fully compatible with a wide range of functional, interaction-based, or omics-based assays. Absorbance is mentioned as an illustrative example of a possible readout, rather than as a recommended or exclusive parameter. We have revised the text to explicitly state that the assessment of community function or interspecific interactions lies outside the scope of this work and must be tailored to the specific biological question being addressed.

      Reviewer #1 (Recommendations for the authors):

      A few specific technical notes and notes about clarity:

      (1) It may be worth being more explicit about how to produce replicates. For example, producing technical replicates by inoculating multiple times from the same set of combinations, while biological replicates require making the combinations multiple times.

      We have updated the main text to clarify this point (line 780-781).

      (2) Figure 2C: May be worth adding some context to these performance numbers. What are typical accuracies? What would they be in a liquid handler?

      Assessing typical accuracies is nuanced since the error depends not only on the assembly steps, but also on potential intrinsic variation of the specific community function being tested and the method used to quantify it. One of the main reasons for including the experiment using colorant combinations was precisely to minimize these other sources of variation. In this experiment, we find that the error we quantify is consistent with cumulative pipetting variation (as a reference, a typical lab micropipette has an error of 0.5-1%). This is now explicitly mentioned in the manuscript.

      (3) Figure 5A: I realize it is unlikely that strains go extinct in these experiments. But it is still worth clarifying that the number of strains is the number inoculated, rather than the one present at the time of measurement.

      We updated the caption of Figure 5A as recommended by the reviewer.

      (4) Figure 5B: I realize this is just for illustration purposes, but you should provide more information about the magnitude of the difference in performance of these combinations and the confidence in their ranking (or variability in performance across replicates).

      Following this suggestion, we have added a paragraph where we report the variation across replicates for the highest-performing consortia (lines 318-323). Indeed, while variation across replicates is small, it is enough to produce an overlap between the confidence intervals of the function of some of the highest-performing consortia. This is now explicitly acknowledged in the manuscript.

      (5) Figure 5C: I believe the bold black lines indicate the combinations shown in panel D, but that is not explicitly stated.

      We have updated the caption of Figure 5C.

      Reviewer #2 (Public review):

      A simple and effective method for combinatorial assembly of microbes in synthetic communities of <12 species.

      Overall, this manuscript is a useful contribution. The efficiency of the method and clarity of the presentation is a strength. It is well-written and easy to follow. The figures are great, the pedagogical narrative is crisp. I can imagine the method being used in lots of other contexts too.

      The authors could better clarify what HOIs mean. They could address challenges with assaying community function. However, neither of these “weaknesses” affects the primary goal of the paper which is methodological.

      We thank the reviewer for the positive assessment. With respect to HOIs, we recognize that defining and quantifying them is a non-trivial subject within the broader field of microbial ecology (see e.g. ref. 24 within the manuscript). Since our aim with this manuscript is methodological, as the reviewer notes, here we have done our best to avoid introducing new or ambiguous definitions. For this reason, we simply adopt a definition given in previous works (including refs. 10, 19, 24, 29, 37, and 38 in the manuscript), where the context-dependence of pairwise interaction terms is taken as a signature of HOIs. With respect to the challenges in assaying community function, please see our responses below.

      Reviewer #2 (Recommendations for the authors):

      Overall, this manuscript is a useful contribution, I appreciate the authors taking the time to write it up! I have a few relatively minor comments.

      (1) It would be nice in the introduction to address why we might want the full factorial construction of communities in the first place. This is an especially relevant question in light of the authors' 2023 Nat E&E paper where they showed that the function of communities can often be learned even when only a fraction of all possible communities is measured. This is addressed in part in the paragraph on line 34, but I think it might be worth expanding a bit given the focus on the paper.

      We sincerely appreciate the reviewer’s feedback. In fact, one of the reasons that make full factorial construction desirable is precisely to test theoretical and computational models of community function, including (but not only) the statistical models developed in our 2023 Nature E&E paper. In that work, we showed that low-order models can explain a substantial fraction of the variation in community function in previously-published datasets, but we also predict that the same models could fail under complex structures of microbial interactions (e.g., strong high-order interactions). The protocol we present here enables the empirical quantification of such interactions, making this prediction (and others) directly testable. We have included that clarification in the revised manuscript (lines 56-58).

      (2) Around line 74, I think it is worth mentioning that even this elegant design will face insurmountable practical challenges (time, liquid handling operations, number of plates will explode) for full factorial design with 20, 30, 40 species or more. This is relevant for some very complex synthetic consortia that some microbiome groups are constructing (e.g. hCom2 from Huang/Fishbach groups) https://www.sciencedirect.com/science/article/pii/S0092867422009904.

      We agree with the reviewer that full factorial designs become impractical for very large species pools. These limits are now more clearly mentioned in the revised manuscript. We refer the reviewer to our response to comment #1 by Reviewer 1 for further details.

      (3) The binary construction is a really nice clean way to explain the protocol. Appreciate the pedagogy!

      We thank the reviewer for the appreciation.

      (4) In the experiment with pseudomonas strains the consortia are grown in LB. This medium will support growth to relatively high OD (>1). At these densities, the change in OD with density is almost certainly not linear with cell density, and this nonlinearity likely depends on strain identity. In this case, the assumption of additivity may not hold. As a result, some of the observed "interactions" may simply be non-linearity in the assay and not the abundance of bacteria in the communities. Of course, this does not affect the assembly protocol in any way, but it does complicate the interpretation of interactions via this assay. I think this is worth pointing out since other researchers may have to think carefully about the assay they use when constructing these synthetic consortia. I think in this methods paper it is important to emphasize this so other researchers do not mistakenly identify interactions due to issues with the assay.

      We thank the reviewer for pointing out this important aspect. In our experiment, we use Abs<sub>600</sub> simply as an example of a measurable community-level function. The reviewer is absolutely correct in that mapping absorbance to biomass is nuanced at large OD values, where this relationship becomes non-linear. While this is not an issue from the perspective of the protocol itself, it is indeed an important consideration for users who may want to obtain reliable quantifications of biomass. We have updated the manuscript to explicitly mention this potential issue (lines 307-313). We have also emphasized the fact that our focus on Abs<sub>600</sub> is strictly for illustrative purposes, and we have removed all instances where a direct mapping from Abs<sub>600</sub> to biomass was implied in the text.

      (5) Subtle point regarding HOIs. HOI (or pairwise) statistical interactions need not quantitatively be the same as interactions in a lotka volterra sense. I realize the authors do not explicitly use the term "interaction" in an gLV model formalism but this is how the majority of readers will interpret this term. I believe it is a research question as to how pairwise gLV interactions manifest themselves in terms of functional interactions. For example, a purely pairwise LV model could easily have HOI "functional interactions" if the function is total abundance since abundances depend nonlinearly on LV interactions. I think this part of the manuscript could be confusing to readers for this reason. I think the term "functional interaction" really helps with this issue, but just asking the authors to make sure this is clear.

      I say this because ref: 37 is focused on HOIs in an LV sense. Here, as the authors are aware, they are computing statistical "interactions" in the sense of epistasis. Given that they are computing this epistasis averaged across all community compositions a more appropriate citation might be [https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004771] where the same quantity is computed in a protein context.

      We thank the reviewer for pointing out this important issue. Indeed, we use the term “interaction” in a statistical sense (as the deviation of the observed community function from a null, additive expectation) rather than in a Lotka-Volterra sense. We agree that the reference suggested by the reviewer is more appropriate in this context. We have updated the reference list accordingly.

      (6) Figure 5G - a little hard to see. Any way to show this data more clearly? It looks like all interactions have a mean of 0 because of the way the data are presented.

      The reviewer is indeed correct in that, as defined, the interactions that we quantify are back ground dependent, and their average across backgrounds lies near zero for all species. More than an issue with the representation, we think that this is an important empirical observation: it indicates that a same species pair may interact positively or negatively depending on its ecological context. We believe that the current representation is most appropriate for making this clear, but we would be open to discussing alternatives if the reviewer had a specific suggestion in mind.

      Reviewer #3 (Public review):

      The authors developed a useful methodology for generating all combinations of multiple reagents using standard lab equipment. This methodology has clear uses for studying microbial ecology as they demonstrated. The methodology will likely be useful for other types of experiments that require exhaustive testing of all possible combinations of a given set of reagents (e.g., drug-drug antagonism and synergy).

      The authors provided a useful R script that generates a detailed experimental protocol for building the desired combination from any number of reagents. The produced document is useful and has clear instructions. The output of the computer script will be strengthened if graphical output is also provided (similar to the one provided in Figure 1C).

      The authors show that the error rate of the method doesn't go up with the number of combinations using dyes (Figure 2).

      The authors demonstrate the value of their methodology for studying interactions within microbial consortia by assembling all possible combinations of eight strains of Pseudomonas aeruginosa. The value of their methodology for this application is well-founded. However, it is also unclear why specific experimental choices were made for this application. It is unclear why authors continue to show the absorbance measurements of strain assemblies over the entire wavelength spectrum and not just for ABS 600 nm (Figures 3 and 4). It is also unclear why the authors provided information on the "sum of the three spectra" as this reference line is meaningless and not a reasonable null model for estimating how well specific strain combinations will grow together.

      Figure 5 illustrates the various analysis types that can be performed on the data collected from growing combinations of eight Pseudomonas aeruginosa strains. It is a very informative figure since it provides a "roadmap" on the various ways in which the dataset produced can be explored. The information in Figures 5 and S6 will likely be very useful for a wide audience.

      Reviewer #3 (Recommendations for the authors):

      (1) Congratulations. I think the manuscript lays out a simple and very elegant methodology that will be useful for many. While I think the method is overall well explained and rationalized, the paper can greatly benefit from further expansion of Figure 5 at the expense of Figures 3 and 4.

      We thank the reviewer for their thoughtful assessment of our work. We have considered the recommendations and discuss the following points in response.

      (2) Unless I am missing something, there is no reason to present data collected across the entire wavelength spectrum for microbial assemblies (Figures 3 and 4). Moreover, using the same color palette for bacterial strains (Figure 3A) and colorants (Figure 2) is highly confusing. I suggest considering using only the 600 nm wavelength for any data collected from microbial assemblies and using a very different color palette for bacteria and colorants to avoid misinterpretation of the data.

      We thank the reviewer for this suggestion. Our goal with Figures 3-4 was to illustrate the convenience of the protocol and the ease with which many measurements can be performed in parallel once the combinatorial assembly has been completed. While we focus on Abs<sub>600</sub> for all subsequent analyses, we chose to display the full spectra in Figs. 3-4 in hopes that future studies can make use of our rich dataset to interrogate questions on microbial interactions, with the option to focus on other wavelengths (which can effectively be treated as different community-level functions in their own right; for instance, we have previously used Abs<sub>405</sub> as a proxy for siderophore concentration). We think there is value in Figs. 3-4 in their current form to make this clear to readers.

      (3) Unlike dye absorbance, bacterial carrying capacity has an upper limit, so summing individual population absorbance as a reference line seems unjustified. If the summation of absorbance is meant to provide a "null model" for expected growth, a more suitable model should be considered (e.g., max spectra or a weighted sum of the spectra from individual members).

      We agree with the reviewer that our null model is not biologically constrained, and we did not intend to imply that the additive expectation was derived from biological principles. Instead, this additive expectation should be interpreted as a simple statistical baseline with minimal assumptions. The use of an additive baseline for quantifying microbial interactions has been addressed in the literature (see, e.g., references 10, 19, 24, 29, 37, and 38), and so here we chose to conform to this convention to avoid introducing new, non-standard quantifications of pairwise and higher-order interactions. We have revised the text to make this more explicit.

      (4) The R script is a valuable tool. I think that a valuable improvement will be to also generate visual representations as part of the script’s output such as the colored plates in Figure 1C that are specific to the generated protocol.

      We have updated the script so that it now also outputs a table specifying the location of each consortium within the plates. We chose to make this a text, rather than a graphics output, to ensure cross-device compatibility.

      (5) The discussion rightly acknowledges the potential to extend the protocol to larger libraries using liquid handlers. To facilitate this implementation, it might be beneficial to modify the script output so that the ‘volume’, ‘plate’, and ‘column’ values are tab- or comma-delimited.

      We thank the reviewer for the suggestion. We have modified the output so that it is now tab-delimited.

      (6) Figures 3 and 4 do not provide a lot of insight. I would suggest combining them into a single figure and using only absorbance values at 600 nm. It would also be interesting to add a histogram of these absorbance values and possibly show histograms for subgroups (e.g. all assemblies with more than 3 strains vs all assemblies with 3 or fewer strains).

      With respect to Figs. 3 and 4, we refer the reviewer to our response to comment #2. With respect to the histogram/subgroups plot, we understand that this would be a slightly modified version of the current Fig. 5A, where we show means and standard deviations across all subgroups of 1 to 8 species, and so we find it unclear what this figure would add.

      (7) With the recommendations of removing or reworking Figures 3 and 4, and the fact that Figure 5 is data-rich (and extremely useful), it would be beneficial to split Figure 5 and include the data shown in Figure S6 in the main figure. The analysis in Figure 6S is valuable and it might be beneficial to elevate this analysis to a primary figure and provide a detailed explanation of its rationale and methods in the main text.

      We appreciate this suggestion. In our view, we find that both the text and the figures benefit from a heavy focus on the assembly protocol, as this is the main contribution of this work. While we do think it is valuable to highlight the type and amount of data that can be collected with a full factorial assembly, as well as the types of analyses that can be performed with this data, we are afraid that allocating more space to these analyses may distract readers from the methodology itself. We have therefore chosen to keep the original structure for Figs. 5 and S6.

    1. Author response:

      Reviewer 1:

      We thank the reviewer for bringing a critical theoretical distinction to our attention. We agree that the Temporal Generalization (TG) results specifically rule out the reinstatement of post-onset neural codes, the idea that the brain pre-activates the same neural representation evoked by the stimulus. In fact, we mention in the discussion: "This temporal variability underscores the need for a more nuanced view of what constitutes predictive pre-activation, as no stable representational state appears to persist after word presentation that could serve as its target.".

      To our understanding, prediction is rarely explicitly defined in the literature, and the distinction between predictive pre-activation and other forms of prediction is seldom made. Moreover, the idea of compressed or abstract forms of pre-activated representations has not, to our knowledge, been explicitly articulated in the literature. Our TG findings therefore, put meaningful constraints on theories of prediction. In the revisions we will expand on this more and include a broader description of potential forms of pre-activation. We will emphasize that the TG results specifically rule out that the brain pre-activates the same neural code used for sensory-evoked processing.

      Moreover, although TG analysis does not rule out alternative notions of predictive pre-activation, we believe our second analysis (the inclusion of future word embeddings) provides independent evidence that argues against more abstract forms of prediction. Unlike the TG analysis, this encoding approach is not constrained to a specific neural code; if the brain represented upcoming words in any linearizable format (abstract, probabilistic, or latent) incorporating those embeddings should have improved the brain score at the current word's onset. We found no such improvement until the word was actually heard. In the revised manuscript, we will reformulate the narrative to clarify that while TG alone rejects a specific form of pre-activation, the combined evidence from both analyses suggests there is a broader lack of predictive pre-activation.

      Reviewer 2:

      We thank the reviewer for their constructive feedback and for bringing to our attention the missing information in our Methods section. We realized that the final two sections were inadvertently omitted during formatting changes before submission. These will be restored in the revised version.

      We appreciate the reviewer's careful reading of this analysis and agree that the concern whether the decorrelation in figure 4 forces the model to unlearn the associations between pre- and post-onset activity is a valid one. To clarify, this is not what we intended to claim. Rather, our argument follows a different logic: if we assume that pre-onset encoding is purely a signature of predictive pre-activation, then decorrelating the pre- and post-onset brain responses should effectively remove that signature. The fact that pre-onset encoding remains largely intact after this procedure suggests that our initial premise was false; the observed pre-onset encoding is likely not a signature of pre-activation. We would also like to note that in this analysis, we use both residualized neural data and we use decorrelated embeddings. Therefore, the majority of stimulus dependencies are removed. Nevertheless, as the reviewer notes, some dependencies such as bi-grams and other word-co-occurrences, inevitably remain. These dependencies might explain the remaining pre-onset encoding we observed. This aligns with our main message of the paper. In the revisions, will provide a detailed description of the decorrelation process and we will make this interpretive logic more explicit in the main text.

      Reviewer 3:

      We are grateful for the reviewer’s detailed comments and for raising several points that will significantly improve the clarity and comparability of our study. Specifically, the reviewer’s feedback helped us realize that our evidence for postdiction required further clarification. While the encoding of the immediate preceding word ($d-1$) may involve recognition lags, we observe that word $d-2$ further improves the brain score even after the current word's onset, beyond what is explained by word $d-1$ alone. This may extend beyond simple recognition delays. To address this we will visualize this effect further in the upcoming version and expand the manuscript to include alternative explanations for this observation, such as extended lexical processing or integration delays.

      To ensure our results are not biased toward high-frequency or function words, we will re-run our analyses including multi-token words. Given that these words constitute a small part of the datasets, we expect our core findings to remain stable.

      In line with our response to reviewer 2, we will more clearly emphasize that despite our extensive controls, we cannot be sure that we accounted for all regularities inherent to natural speech.

      Additionally, we will increase the context windows of the LLM to match the larger windows used in previous literature and add significance tests, error bars, and noise floor indications to our figures to ensure the reliability and variability of our findings are clearly communicated.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. describes the development of an optimized soluble ACE2-Fc fusion protein, B5-D3, for intranasal prophylaxis against SARS-CoV-2. As shown, B5-D3 conferred protection not only by acting as a neutralizing decoy, but also by redirecting virus-decoy complexes to phagocytic cells for lysosomal degradation. The authors showed complete in vivo protection in K18-hACE2 mice and investigated the underlying mechanism by a combination of Fc-mutant controls, transcriptomics, biodistribution studies, and in vitro assays.

      Strengths:

      The major strength of this work is the identification of a novel antiviral approach with broad-spectrum and beyond simple neutralization. Mutant ACE2 enables broad and potent binding activity with the S proteins of SARS-CoV-2 variants, while the fused Fc part mediates phagocytosis to clear the viral particles. The conceptual advance of this ACE2-Fc combination is convincingly validated by in vivo protection data and by the completely abrogated protection of Fc LALA mutant.

      We thank the reviewer for his recognition and positive comments on our study.

      Weaknesses:

      Some aspects could be further modified.

      (1) A previously reported ACE2 decamer (DOI: 10.1080/22221751.2023.2275598) needs to be mentioned and compared in the Discussion part.

      We thank the reviewer for pointing out this weakness.

      Indeed, previous studies reported that the ACE2-IgM decamer, taking advantage of the decameric structure of IgM, exhibited higher avidity to spikes and greater potency for viral neutralization [1-3]. In particular, the study by Guo et al. has demonstrated a broad-spectrum neutralization ability of the ACE2-IgM decamer against multiple SARS-CoV-2 variants and reported the efficacy of intranasal prophylaxis in preventing lethal SARS-CoV-2 challenge in K18-hACE2 mice.

      We agree with the reviewer that it is promising that our B5-D3 design would benefit from switching to the IgM isotype. However, the distinct biological features imposed by IgM Fc, including short serum half-life and restricted tissue penetration [4], may complicate the study design and diverge our focus.

      In our current study, we would focus on the IgG1 Fc-based decoy design, while inactivating the enzyme activity of ACE2 to avoid disturbing the renin angiotensin system. This design allowed us to compare diverse administration routes and regimens and to gain useful insights into the potential of sACE2-Fc decoy in combating SARS-CoV-2 in vivo.

      We appreciated the reviewer‘s insightful suggestion. In the revised manuscript, we have included additional discussion regarding ACE2-IgM decamer, addressing the relevant concern on page 17 lines 409–414.

      (2) Limitations of this study, such as off-target binding and potential immunogenicity, should also be discussed.

      We thank the reviewer for his insightful comments and agree that off-target activity is a major concern for designing the ACE2 decoy.

      (1) In our study, the representative sACE2-Fc decoy candidate B5-D3 contains H374N mutation (D3) that is designed to inactivate ACE2 enzyme activity by causing dyscoordination of Zn2+. Our in vitro enzymatic activity assay has demonstrated that the H374N mutation (D3), as well as other three single mutations D1, D4 and D5, in either WT sACE2-Fc or B5 mutant, could effectively abolish the hACE2 enzyme activity (Supplementary Fig. 2e, h).

      (2) To further address the concern on off-target activity, we performed AAV-based overexpression experiments in K18-hACE2 mice and examined serum levels of RAS hormones, using ELISA methods that specifically detect serum renin, Angiotensin II (Ang II), and Ang (1-7). While our data from WT sACE2-Fc overexpression revealed significantly elevated serum renin and Ang II, indicating a disruption of the RAS (Supplementary Fig. 4d, e); the results from examined double mutants, including B5-D3, showed negligible change in any of these metabolite levels, demonstrating no off-target effect and minimal disturbance to the RAS activity in K18-hACE2 mice (Supplementary Fig. 4d–f).

      (3) Moreover, in this experiment, after the prolonged overexpression of all these molecules in K18hACE2 mice, histological examination of multiple organs showed no evidence of immune cell infiltration and tissue damage and no difference was observed between the mice receiving WT sACE2-Fc or B5-D3(Supplementary Fig. 4g).

      In the revised manuscript, we have included the results from the AAV-delivered in vivo overexpression of WT sACE2-Fc and three most promising double mutants (B5-D3, B5-D4 and B5-D5) on page 5 lines 118–122 and on page 6 lines 123–135 in the main text. The relevant data were presented in the new Supplementary Fig. 4.

      Reviewer #2 (Public review):

      Summary:

      Wang et al. engineered an optimized ACE2 mutant by introducing two mutations (T92Q and H374N) and fused this ACE2 mutant to human IgG1-Fc (B5-D3). Experimental results suggest that B5-D3 exhibits broad-spectrum neutralization capacity and confers effective protection upon intranasal administration in SARS-CoV-2-infected K18-hACE2 mice. Transcriptomic analysis suggests that B5D3 induces early immune activation in lung tissues of infected mice. Fluorescence-based biodistribution assay further indicates rapid accumulation of B5-D3 in the respiratory tract, particularly in airway macrophages. Further investigation shows that B5-D3 promotes viral phagocytic clearance by macrophages via an Fc-mediated effector function, namely antibody-dependent cellular phagocytosis (ADCP), while simultaneously blocking ACE2-mediated viral infection in epithelial cells. These results provide insights into improving decoy treatments against SARS-CoV-2 and other potential respiratory viruses.

      Strengths:

      The protective effect of this ACE2-Fc fusion protein against SARS-CoV-2 infection has been evaluated in a quite comprehensive way.

      We thank the reviewer for his recognition and positive comments on our study.

      Weaknesses:

      (1) The paper lacks an explanation regarding the reason for the combination of mutations listed in Supplementary Figure 2b. For example, for the mutations that enhance spike protein binding, B2-B6 does not fully align with the mutations listed in Table S1 of Reference 4, yet no specific criteria are provided.

      We thank the reviewer for pointing out this negligence.

      We constructed the B2-B6 mutants based on the study by Chan et al. [5] (Reference 4 in the previous version), mainly referencing to their Fig. 1A rather than to their Table S1. In Chan’s study, each of the proposed mutations were discovered as single mutations in monomeric sACE2 molecules based on the enrichment in target cell-binding. T92 was a notable hot spot for enriched mutations in their Fig. 1A.

      Since monomeric and dimeric forms of sACE2 showed dramatically different kinetics for ACE2-RBD interaction, we selected five proposed mutations and further examined their affinity and activity in dimeric sACE2-Fc in our study. We chose not only the combinations of mutations, such as B3, B4, and B6 proposed in their Table S1, but also explored less-complicated mutation(s) like B2 (T27Y/L79T) and B5 (T92Q) in their Fig. 1A, which were in silico predicted to enhance ACE2-RBD binding but not tested in sACE2-Fc in Chan’s study.

      Interestingly, although our results confirmed enhanced viral neutralization by all these mutations, the activity increase compared to WT ACE2-Fc was rather limited. Hence, we chose not to explore other mutations but to focus on B2–B6 to construct an enhanced ACE2-Fc decoy as a representative, to investigate the potential of ACE2-Fc decoys in combating SARS-CoV-2 infections.

      In the revised manuscript, we have further amended the writing on page 4 lines 84–87 to enhance the readability. Whereas for conciseness of the manuscript, we did not describe in too much detail how we selected the mutations to be tested.

      Second, for the mutations that abolished enzymatic activity, while D1 and D2, D3, D4, and D5 are cited from References 12, 11, and 33, respectively, the reason for combining D3 and D4 into A2, and D1 and D2 into A3 remains unexplained. It is also unclear whether some of these other possible combinations have been tested. Furthermore, for the B5-derived mutations, only double-mutant combinations with D1-D5 are tested, with no attempt made to evaluate triple mutations involving A2 or A3.

      We thank the reviewer for pointing out this negligence.

      A2 and A3 mutations were originally proposed as double mutations [6,7]. A2 (H374N/H378N) was first reported by Guy et al. [6] (Reference 11 in the previous version), while A3 (R273G/T445G) was originally proposed in Payandeh et al.’s study [7] (Reference 33 in the previous version).

      In this study, we further split the two mutations in A2 and A3, to generate the single enzymedeactivating mutations, D1 and D2 from A3, and D3 and D4 from A2. Among these single mutations, D2 failed to inactivate ACE2 enzymatic activity (Supplementary Fig. 2e), and it was excluded in subsequent analyses.

      D5 (H345L) was a single mutation directly adopted from the report by Glasgow et al. [8] (Reference 12 in the previous version).

      After combining the B5 with the enzyme-deactivating mutations (A2, A3, D1, D3, D4, D5), our neuralization assay results showed that, the simpler compound mutants with only two mutations, like B5-D1, B5-D3, B5-D4 and B5-D5, exhibited stronger neutralization capacity than B5-A2 and B5-A3 with triple mutations. Moreover, since fewer mutations were more favorable to reduce risks in causing protein structure alteration and evoking host immunity, we then focused on the sACE2-Fc double mutants B5-D3, B5-D4 and B5-D5 in the subsequent neutralization and overexpression assays (Supplementary Fig. 3 and 4), and examined B5-D3 as a representative candidate in the in vivo infection tests and follow-up analysis (Figure 2–6, and Supplementary Figures 5–18).

      We agree that the lack of explanation for splitting A2 and A3 into D1 to D4 single mutations made the rationale unclear. In the revised manuscript, we have included our previous test results on B5-A2 and B5-A3, cited Lei et al.’s study using A2 in ACE2 decoy [9], and explained the rationale for splitting A2 and A3 into D1 to D4 mutations. Relevant revision was made on page 4 lines 94–97 in the main text, while the design and data for B5-A2 and B5-A3 were included in the revised Figure 1b and Supplementary Figure 2b, f–h.

      (2) Figures 1b, 1d, and 1e lack statistical analyses, making it difficult to determine whether B5 and D3 exhibit significant advantages. For Wuhan-Hu-1 strain, B2 and B5 are similar, and for D614G strain, B2, B3, B4, B5, and B6 display comparable results. However, only the glycosylation-related single mutant B5 is chosen for further combinatorial constructs. Moreover, for VOC/VOI strains, B5 is superior to B5-D3; for the Alpha strain, B5-D4 and B5-D5 are superior to B5-D3; and for the Delta and Lambda strains, B5-D5 is superior to B5-D3. These observations further highlight the need for a clearer explanation of the selection strategy.

      We agree with the reviewer’s insightful observations.

      Indeed, although our results confirmed enhanced viral neutralization by these reported mutations, the activity increases compared to WT ACE2-Fc were generally limited. Importantly, these observations were largely consistent with other reports (including the study by Chan et al. [5]), suggesting limited potential of mutagenesis in enhancing the ACE2-RBD/Spike interaction. Therefore, we chose to selectively examine B2-B6 to construct an enhanced ACE2-Fc decoy with reasonable performance, as a representative candidate to study the application potential of ACE2-Fc decoy.

      The IC<sub>50</sub> values in Figures 1b, 1d, and 1e were calculated from neutralization curves, measuring infection reduction at multiple concentrations in duplicates, which therefore were presented with statistical support. Based on the multiple neutralization assays, B5-D3 consistently showed a high performance among other top-performers (Figure 1, Supplementary Fig. 2f,g, and Supplementary Fig. 3).

      We agree that B2 and B5 performed comparably well in neutralization assays, but B2 contains two mutations (T27Y/T92Q) while B5 carries a single mutation (T92Q). Hence, we decided to focus on B5 due to its lowest mutational burden and least potential risk.

      We agree that for VOC/VOI strains, B5 was superior to B5-D3 in pseudovirus-neutralization assays. However, B3-D3 was enzymatically inactive, which is essential for generating safe ACE2 decoy and, therefore, justifies our usage of B5-D3 over B5.

      We agree with the reviewer that, altogether, the B5-D3 did not show significant advantages than other top performers like B5-D4 and B5-D5. Here, B5-D3 was selected as a representative, which performed equally well rather than being the most outstanding candidate, for subsequent examination of efficacy, safety, and mechanistic insights.

      We thank the reviewer for his valuable feedback. In the revised manuscript, we have further amended our description of B5-D3, as a “representative” candidate, to improve the readability. Relevant changes can be found on page 4 line 84, page 5 line 109, page 14 line 333 and page 15 line 360.

      (3) Figure 1e does not specify the construct form of the control hIgG1, namely whether it is an hIgG1 Fc fragment or a full-length hIgG1 protein. If the full-length form is used, the design of its Fab region should be clarified to ensure the accuracy and comparability of the experimental control.

      We thank the reviewer for pointing out this negligence.

      In this study, we used the in vivo grade recombinant human IgG1 isotype control antibody in its full length (Syd labs, #PA007125) as the negative control. It is the 4F17 clone, which is widely used and showed low or no specific binding to any human samples [10] (Human IgG1 Isotype Control Antibody | Recombinant, in vivo Grade - Syd Labs). We have added the relevant information in the MATERIALS AND METHODS on page 23 lines 548–549.

      (4) In Figure 2a, all three PBS control mice died, whereas in Figure 2f, three out of five PBS control mice died, with the remaining showing gradual weight recovery. This discrepancy may reflect individual immune variations within the control groups, and it is necessary to clarify whether potential autoimmune factors could have affected the comparability of the results. Also, the mouse experiments suffer from insufficient sample sizes, which affects the statistical power and reliability of the results. In Figure 2a, each group contains only 4 replicates, one of which was used for lung tissue sampling. As a result, body weight monitoring data is derived from only 3 mice per group (the figure legend indicating n=4 should be corrected to n=3). Such a small sample size limits the robustness of the conclusions. Similarly, in Figure 2f, although each group has 5 replicates, body weight data are presented for only 4 mice, with no explanation provided for the exclusion of the fifth mouse. Furthermore, the lung tissue experiments in Figure 3a include only 3 replicates, which is also inadequate.

      We thank the reviewer for his valuable feedback.

      Figure 2a was the first in vivo infection experiment of this study, and we performed the test in aged female K18-hACE2 mice at 10–12 months old. Whereas for the subsequent experiments in Figure 2f and Figure 3, we changed to young female K18-hACE2 mice at 2–3 months old, because the limited supply of old mice. While in Figure 2a, four aged mice (not three) in the PBS control group all died within 7 dpi, results of Figure 2f and Figure 3 consistently showed heterogeneous responses among young mice in the PBS control groups. Since increased susceptibility to SARS-CoV-2 infection has been broadly observed among aged human populations and it was also supported by mouse study [11], here we would attribute the observed discrepancy to the age difference between the two cohorts in Figure 2a and 2f. In the revised manuscript, we have further elucidated this observation in results (on page 7 lines 163–167) and included a new reference for better clarification (page 7 line 167).

      Furthermore, because the PBS control mice in both Figure 2a and 2f died within 7 dpi, which was too soon for autoimmune factors to take place. Moreover, we have performed AAV-based prolonged overexpression experiments in K18-hACE2 mice (new Supplementary Fig. 4), which showed no tissue damage in either WT sACE2-Fc or B5-D3 treated mice, suggesting low immunogenicity. Collectively, the autoimmune factors are unlikely the reason leading to the different survival between PBS controls in Figure 2a and 2f.

      We thank the reviewer for pointing out the weakness regarding small sample sizes in our study.

      (1) In Figure 2a–c, the experiment was performed in an aged cohort at 10–12 months old, starting with 5 mice in each virus-inoculated group and 4 mice in the mock control group. At 4 dpi, we sacrificed one mouse from each group for tissue analysis. Therefore, in the survival analysis, there were 4 mice in each virus-inoculated group and 3 mice in the mock control group, whose survival and body weight changes were presented in Figure 2b, c.

      Despite the relatively small sample sizes in Figure 2b, c, all 4 PBS control mice died, while all 4 mice in 6-hour B5-D3 IN prophylaxis group survived, demonstrating 100% survival and no sign of body weight loss. The survival and body weight data were highly consistent, strongly supporting that B5-D3 intranasal prophylaxis could protect the mice from lethal SARS-CoV-2 infection.

      To enhance clarity, in the revised manuscript, we have added the sample size information in chart legends in Figure 2a–c.

      (2) In Figure 2f–h, the experiment was performed in a young cohort at 2–3 months old and the body weight and survival data were presented for 5 mice in each group (not for 4 mice). Notably, although 2 out of 5 young mice in the PBS control group eventually survived from the viral infection, they had suffered significant weight loss during 4–7 dpi, similarly to the died. Whereas all 5 mice in the – 6hr B5-D3 IN prophylaxis group showed no sign of weight loss. Hence, these data were highly consistent with Figure 2b, c, supporting the efficiency of B5-D3 IN prophylaxis in protection against SARS-CoV-2 infection.

      We noticed that some data points in Figure 2g, h were very close to each other, making it difficult to distinguish the data line for individual mice. To enhance clarity, in the revised manuscript, we have added sample-size information in chart legends in Figure 2g and 2h.

      (3) In Figure 3a, we aimed to examine the lung tissues at early time points. For each treatment, we have 3 mice sacrificed at a single selected time point. Hence, total 9 mice were examined in the PBS control group and B5-D3 IN group, yielding results at 1 dpi, 2 dpi and 4 dpi that consistently supported each other. Moreover, the viral titers, S, and N protein expression analysis all showed significant difference among different groups. Therefore, our experiments have enough discrepancy between different treatment groups to draw the conclusion.

      (5) Compared to 6 hours, intranasal administration of B5-D3 at 24 hours before viral infection results in reduced protective efficacy. However, only survival and body weight data are provided, with no supporting evidence from virological assays such as viral titer measurement. Therefore, the long-term effectiveness lacks sufficient experimental validation.

      In Figure 2f–h, we aimed to compare the efficacies of IN administration of B5-D3 at different timepoints, mainly focusing on the body weight change and survival data along the infection and recovery time. As indicated by early data in Figure 2d, viruses were largely cleared by 4 dpi in mice treated with B5-D3 prophylaxis. Therefore, in this test, we did not examine virus titers in the recovered animals by the end of observation at 14 dpi. Instead, we examined plasma levels of virus-neutralizing antibodies in the survivors at the endpoint, which indeed supported that the 6-hours and 24-hours IN B5-D3 prophylaxis provided effective protection against the SARS-CoV-2 infection and resulted in minimal levels of neutralizing antibodies in plasma, as shown in Figure 2i.

      Collectively, the body weight, survival, and antibody data all supported that 6-hour IN B5-D3 prophylaxis achieved the best efficacy. Hence, we performed comprehensive viral titer and profiling analysis at early time points like 1 dpi, 2 dpi, and 4 dpi, focusing only on the 6-hour IN B5-D3 prophylaxis. This works also included B5-D3-LALA control to examine viral titers, host immune responses, and underlying mechanisms (Figure 3,4).

      We agree with the reviewer that it would be more comprehensive if our experiments could include indepth analysis of the 24-hours IN B5-D3 prophylaxis group. However, due to limited capacity of animal service, we chose to focus on the best-performing group as a representative treatment to study the underlying mechanisms.

      (6) In Figures 3b and 3c, viral spike (S) and nucleocapsid (N) RNA relative expression levels are quantified by qPCR. The results show significant individual variation within the B5-D3-LALA treatment group: one mouse exhibits high S and N expression, while the other two show low expression. Viral load levels are also inconsistent: two mice have high viral loads, and one has a low viral load. Due to this variability, the available data are insufficient to robustly support the conclusion.

      We understand the reviewer’s concern on the variability within the B5-D3-LALA group. However, we have some reservations about the importance of further increasing the sample sizes in this test.

      First, since viral gene transcription and viral particle levels represented different phases in viral life, they may follow different kinetics during infection progression and lead to variability. Second, we used different parts of the lung tissues from each mouse for extracting RNA and tissue homogenates, which were then used for detection of S/N expression and viral load levels, respectively. The uneven viral infection in the lung might also contribute to the variability. Furthermore, in this test, both our qPCR and viral load analysis data consistently demonstrated that the B5-D3-LALA was less effective than B5-D3, indicating that Fc function played an important role in supporting full protection by B5-D3 against lethal SAS-CoV-2 infections. This observation is also supported by other studies [12].

      We appreciate the valuable feedback from the reviewer. In the revised manuscript, we have further clarified these observations on page 8, lines 192–194, and included alveolar thickening data on page 9, lines 202–204.

      (7) Figure 3e: "H&E staining indicated alveolar thickening in all groups," including the Mock group. Since the Mock group did not receive virus or active drug treatment, this observed change may result from local tissue reaction induced by the intranasal inoculation procedure itself, rather than specific immune activation. A control group (no manipulation) should be set to rule out potential confounding effects of the experimental procedure on tissue morphology, thereby allowing a more accurate assessment of the drug's effects.

      We thank the reviewer for his insightful comments and suggestions.

      We have further examined our H&E staining and quantified alveolar thickening in different treatment groups. Indeed, the data suggested a transient alveolar thickening in the mock group at 1 dpi, which was improved at 2 dpi. This observation supports that the intranasal procedure itself indeed caused a transient alveolar thickening, that was evident at 1 dpi but disappeared at 2 dpi.

      Notably, moderate alveolar thickening was found to be persistent in the B5-D3-treated mice till the end point at 4 dpi. Whereas the PBS groups with intensive SARS-CoV-2 infection progressively developed severe structural damage and showed much stronger alveolar thickening than B5-D3 or mock groups at 4 dpi. Consistent with the partial protection by B5-D3-LALA, histological analysis of lung samples in this group revealed severer yet heterogenous alveolar thickening. These observations suggested that -6h IN B5-D3 treatment prevented tissue damage brought by infection with minimal yet efficient immune activation.

      In the revised manuscript, we have included the quantitation results of alveolar thickening on page 9, lines 200–204 and presented the data in new Supplementary Fig. 7.

      (8) In Supplementary Figure 11b, a considerable number of alveolar macrophages (AMs) are observed in both the PBS and B5-D3 groups. This makes it difficult to determine whether the observed accumulation is specifically induced by B5-D3.

      We thank the reviewer for pointing out this issue.

      In this experiment, the cell populations examined in previous Supplementary Fig. 11b and Fig. 5h are different, though graphs appear similar.

      Supplementary Fig. 11b (new Supplementary Fig. 12b) showed the analysis among CD45+ immune cells, regardless of B5-D3-AF750 signal. The dominance of AMs among immune cell populations is a normal physiological feature of BALF cells. To make this clear, we have added new data of BALF cells from untreated mice in the revised manuscript and new Supplementary Fig. 12b.

      Fig. 5h displayed for cell type analysis among the CD45+ B5-D3-AF750+ cells —only CD45+ immune cells that took up the AF750-labeled B5-D3.

      To enhance clarity, in the revised manuscript, we have amended the labels as CD45+ B5-D3-AF750+ in Figure 5h (and similarly in revised Supplementary Fig. 13), to differentiate the data from that in CD45+ cells shown in the revised Supplementary Fig. 12b.

      (9) In the flow cytometry experiment shown in Figure 5, the PBS control group is not labeled with AF750, which necessarily results in a value of zero for "B5-D3+ cells" on the y-axis. An appropriate control (e.g., hIgG1-Fc labeled with AF750) should be included.

      We thank the reviewer for his valuable question.

      In this experiment, we intended to analyze all immune cells with positive AF750 signals, to identify the major immune cell types that took up AF750-B5-D3 as the candidate cells responsible for the observed activation of innate immunity. Hence, here we deliberately set PBS vehicle treatment without AF750 signal as the control group for gating.

      This analysis aimed to provide an overall picture of immune cell types that actively take up ACE2 decoy, likely via Fc receptor-mediated binding. Control IgG1 labeled with AF750, with an Fc region, may show similar profile and biodistribution among BALF immune cells, which, therefore, was not examined as control for gating.

      Instead, in the revised manuscript, we have added new analysis results comparing the efficiencies of B5-D3 and IgG1 in mediating pseudovirus uptake in THP-1-derived macrophages. IgG1 isotype control was examined to address ACE2-specific effect. Indeed, we observed no pseudovirus uptake based on p24 signal, in the IgG1 treated samples, indicating that the presence of B5-D3 is crucial for efficient pseudovirus uptake in macrophages due to the sACE2-spike affinity. These results have been added on page 13 lines 310–316 in the main text, and the relevant data was presented in new Supplementary Fig. 17.

      (10) The Methods section: a more detailed description of the experimental procedures involving HIV p24 and SARS-CoV-2 should be included.

      We thank the reviewer for pointing out this weakness.

      In the revised manuscript, we have provided further details of the relevant experimental procedures in the Materials and Methods part, on page 21, lines 507–517.

      Reviewer #3 (Public review):

      Strengths:

      The core strength of this study lies in its innovative demonstration that an engineered sACE2-Fc fusion redirects virus-decoy complexes to Fc-mediated phagocytosis and lysosomal clearance in macrophages, revealing a distinct antiviral mechanism beyond traditional neutralization. Its complete prophylactic protection in animal models and precise targeting of airway phagocytes establish a novel therapeutic paradigm against SARS-CoV-2 variants and future respiratory viruses.

      We thank the reviewer for his recognition and positive comments on our study.

      Weaknesses:

      The study attributes complete antiviral protection to Fc-mediated phagocytic clearance, a central claim that requires more rigorous experimental validation. The observation that abrogating Fc functions compromises protection could be confounded by potential alterations in the protein's stability, half-life, or overall structure. To firmly establish this mechanism, it is crucial to include a control molecule with a mutated Fc region that lacks FcγR binding while preserving the Fc structure itself. Without this critical control, the conclusion that phagocytic clearance is the primary mechanism remains inadequately supported.

      We thank the reviewer for his insightful comments and suggestions.

      The L234A/L235A mutations in human IgG1 Fc region are most widely used to abolish its FcγR binding and Fc effector functions [13]. In this study, we have used B5-D3-LALA in the in vivo infection experiments in K18-hACE2 mice, as the control molecule that lacks FcγR binding while preserving the Fc structure (Figure 3, 4).

      To address the reviewer’s concern, we further performed new analysis comparing the efficiencies of different versions of B5-D3 in mediating pseudovirus uptake in THP-1-derived macrophages. In this test, B5-D3-LALA and B5-D3 were examined side-by-side to address the role of Fc effector functions in the phagocytosis process. Meanwhile, IgG1 isotype control was examined to address ACE2-specific effect. Indeed, we detected significant reduction of pseudovirus uptake based on p24 signal, in the B5D3-LALA treated samples compared to those receiving B5-D3. This decreased pseudoviral uptake correlated with the loss of Fc-mediated effector functions in B5-D3-LALA, indicating the involvement of Fc functions in efficient macrophage uptake of B5-D3-virus complex.

      In the revised manuscript, we have included these results on page 13 lines 310–316 in the main text and presented relevant data in Supplementary Fig. 17.

      The strategy of deliberately targeting virus-decoy complexes to phagocytes via Fc receptors inherently raises the question of Antibody-Dependent Enhancement (ADE) of disease. While the authors demonstrate a lack of productive infection in macrophages, this only addresses one facet of ADE. The risk of Fc-mediated exacerbation of inflammation (ADE) remains a critical concern. The manuscript would be significantly strengthened by a direct discussion of this risk and by including data, such as cytokine profiling from treated macrophages, to more comprehensively address the safety profile of this approach.

      (1) We thank the reviewer for his insightful comments and suggestions regarding the ADE issue.

      Indeed, Antibody-Dependent Enhancement (ADE) of viral infection is a critical concern when developing the ACE2 decoy strategy. In this study, we have carefully examined the relevant risk based on our data from various in vitro and in vivo assays.

      In our in vivo infection experiments, all B5-D3 prophylaxis and treatment groups, regardless of the administration times and routes, showed improved outcomes like less body-weight loss and better survival, compared to the PBS control groups (Figure 2). None of these treatment groups demonstrated worsened infections, indicating that ADE phenomenon was not occurring or did not play a major role during the B5-D3 treatments. Instead, moderate immune activation was observed in the lung of B5-D3 treated mice, which occurred much earlier but was milder compared to that in the PBS groups, and may reflect responses that lead to the efficient early clearance of viruses without observable symptoms (Figure 3 and 4).

      In our in vitro assays shown in Figure 6, B5-D3 treatments in epithelial or non-immune cell models (hACE2-Galu-3 and hACE2-293T) significantly blocked the entry of pseudovirus into cells and yielded much reduced luciferase signals (Figure 6d–g). Whereas in the THP-1-derived macrophages, although the presence of B5-D3 largely enhanced the entry of SARS-CoV-2 pseudovirus into cells (Figure 6a,b), it did not result in active infection and produced no luciferase signal (Figure 6g). These results were robustly reproducible, indicating that pseudoviruses did not successfully release its genome RNA and viral proteins (like RTase and integrases) after entering macrophages. Instead, colocalization analysis of p24 (pseudoviruses), sACE2-Fc (B5-D3), and LAMP1 (lysosome) signals suggested probability of pseudovirus degradation in endosomes/lysosomes after cell entry (Figure 6a,c). Consistently, examination of the macrophages that had taken up pseudovirus showed that the Spike (S) proteins from the pseudovirus particles were not cleaved to release S2’ fragment at a distinct smaller size (Figure 6h). As the cleavage of S protein in host cells is critical for effective membrane fusion, it is essential and regarded as hallmark for successful viral entry and escape from endosome. Collectively, these data consistently indicated that the SARS-CoV-2 pseudoviruses were degraded directly in lysosomes after entering macrophages, showing no sign of ADE.

      (2) We thank the reviewer for his valuable suggestion and have performed RNA-seq analysis to profile immune responses in the treated macrophages.

      We performed RNA-Seq analysis to investigate major transcriptional changes in THP-1-derived macrophages after the pseudovirus infection, with or without B5-D3 treatments. Although no individual genes fulfilled the cutoff threshold of significant up-/down-regulation, we observed antiviral responses in the pseodovirus-B5-D3 treated samples by GSEA (new Supplementary Fig. 18). This observation indicated that the B5-D3 treatment and subsequent cell-entry of pseudovirusB5-D3 complexes into macrophages induced immune activation at moderate levels, but not evoking strong immune responses that can be harmful to the host.

      In the revised manuscript, we have included the new RNA-seq analysis results on macrophage infection tests on page 13 lines 317–322 and page 14 lines 323–325 in the main text and presented the relevant data in the new Supplementary Fig. 18. Furthermore, we agree that ADE is a critical issue and have further enriched our discussion on page 17 lines 415–417, to emphasize that the risk for ADE should be thoroughly evaluated to further develop the decoy strategy for human use.

      The exclusive use of the K18-hACE2 mouse model, which exhibits severe disease, limits the generalizability of the findings. The "complete protection" observed may not translate to models with more robust and naturalistic immune responses or to human physiology.

      We thank the reviewer for pointing out the limitation of the mouse model used.

      (1) Given that wild type mice are not susceptible to SARS and SARS-CoV-2 infection, transgenic mice have been generated to express hACE2, through various designs and strategies, serving as models for viral infection and drug development. However, many of these hACE2 transgenic mouse models exhibit mild infections due to moderate hACE2 levels, failing to develop the severity observed in SARS and COVID patients [14].

      (2) The K18-hACE2 transgenic mouse line (B6. Cg-Tg(K18-ACE2)2Prlmn/J, Jackson Laboratory) used in our study carries multiple copies of K18-hACE2 transgene cassette [15]. Compared to other hACE2 transgenic mouse models, this K18-hACE2 line shows higher expression of hACE2 in airway and other epithelia and supports severer infections by both SARS and SARS-CoV2 viruses, successfully causing lethality [16]. Hence, K18-hACE2 mice is a widely used model to study SARS and SARS-CoV2 virus infections and drug developments.

      (3) We agree that K18-hACE2 mice is a relatively weak transgenic line with poor productivity. However, it demonstrates best susceptibility to SARS-CoV-2 infection among established mouse models. In this study, we observed robust responses to SARS-CoV-2 infection in both aged and young cohorts, with all infected mice consistently demonstrating significant body weight loss during 4 dpi to 7 dpi (the PBS groups in Figure 2b, g)

      We agree with the reviewer that it would be more convincing to assess the efficacy of B5-D3 using additional animal models. However, we have some reservations about the importance of these additional tests. First, the generality of ACE2-Fc decoy concept and its efficacy have been reported in other studies using various models [17,18]. Moreover, different transgenic mice or animal models exhibit distinct kinetics in the pathogenesis process and immune responses to SAS-CoV-2 infections, which differ from that in human patients at varied aspects. Hence, given the limited capacity of animal facility, we chose to focus on the K18-hACE2 mice that have demonstrated most robust and convincing infection data, to investigate the potential of B5-D3 administered through various strategies as well as the underlying mechanisms for the full protection observed in IN prophylaxis.

      In the revised manuscript, we have further enriched our discussion regarding this limitation, on page 17 lines 417–422.

      Furthermore, the lack of data on circulating SARS-CoV-2 variants is a concern

      We thank the reviewer for his valuable comment.

      In this study, we have demonstrated the viral neutralization capacity of B5-D3, as a representative of the enhanced sACE2 decoy, using multiple pseudoviruses and authentic SARS-CoV-2, which collectively covered eleven variants (up to Omicron strains). Our results from both in vitro neutralization and PRNT experiments confirmed the robust resilience of B5-D3 against viral evolution (Figure 1c–g). This observation aligns well with other studies and is broadly supported by various investigations, as was pointed out below by the reviewer.

      Furthermore, studies on viral evolution have observed a robust trend that later-emerging SARS-CoV-2 variants exhibit a higher affinity for the ACE2 receptor, enhancing their infectivity and transmissibility [19]. Therefore, it is unlikely for a newly emerged SARS-CoV-2 variant to escape from B5-D3mediated neutralization.

      Collectively, all evidence consistently supports the principle of decoy design, B5-D3 (or other effective ACE2 decoys) possess the intrinsic ability to neutralize new circulating SARS-CoV-2 variants, as long as the virus variants rely on ACE2 receptor for cell entry. Hence, although further tests on circulating viral variants would add strengths to our study, the significance of this additional data may be limited.

      In the revised manuscript, we have further addressed this concern in the discussion, on page 16 lines 394–397.

      The concept of sACE2-Fc fusion proteins as decoy receptors is not novel, and numerous similar constructs have been previously reported. The manuscript would benefit from a clearer demonstration of how the optimized B5-D3 mutant represents a significant advance over existing sACE2-Fc designs.

      We thank the reviewer for his valuable comments.

      Indeed, previous research has reported multiple ACE2 mutations to enhance its binding to spike proteins and neutralization against SARS-CoV-2. However, combining ACE2 mutations based on in silico predictions to both enhance spike binding and eliminate the ACE2 enzymatic activity resulted in accumulated burdens. For instance, ACE2 decoy candidates with up to five mutations like K31F/N33D/H34S/E35Q/H345L [8] and L79F/M82Y/Q325Y/H374A/H378A [12] have demonstrated excellent potency to neutralize SARS-CoV-2 in both in vitro and in vivo assays. However, the extensive mutations could be associated with structural instability and reduced production efficiency [8,12]. Furthermore, the high mutation loads increase risks for immunogenicity, which is a critical issue in future clinical applications. Corroboratively, Urano et al. detected in vitro T cell stimulation elicited by the L79F mutation, whereas the T92Q mutation (included in our decoy design) showed much lower immunogenicity and enhanced spike binding affinity [20].

      In our ACE2 decoy design, we incorporated only two mutations (like T92Q and H374N in B5-D3) to enhance neutralization potency while eliminating enzymatic activity, resulting in simplest ACE2 mutants desired for engineering enhanced decoy. B5-D3, as one representative, not only exhibited minimal mutation-related risks (Supplementary Fig. 2i) but also top-level neutralization potencies among all candidate mutants tested (Figure 1, Supplementary Fig. 2f,g and Supplementary Fig. 3). To further address the safety of B5-D3 for in vivo use, we have performed prolonged in vivo overexpression of B5-D3 ACE2 decoy through AAV delivery in immune-competent K18-hACE2 mice, which indeed showed no sign of RAS disturbance or immune infiltration causing tissue damage. (In the revise manuscript, we have included these new results on page 5 lines 118–122 and page 6 lines 123–135 in the main text and presented the data in new Supplementary Fig. 4).

      Therefore, instead of demonstrating advantage over existing sACE2-Fc designs, our study used the optimized B5-D3 as a representative ACE2 decoy of top performers, to systematically examined various administration strategies as well as the underlying mechanisms for the full protection observed in IN prophylaxis. Aligned with this effort, our study identified 6-hours IN prophylaxis as the most effective regimen to confer complete protection against SARS-CoV-2 infection in K18-hACE2 mice. Further investigation through transcriptomics, bio-distribution, and phagocytosis analysis revealed that IN-delivered B5-D3 not only neutralizes viruses but also engaged airway phagocytes to promote early viral clearance and host immune activation, uncovering a distinct antiviral mechanism for the universal “decoy strategy” to combat unknown air-borne respiratory virus in the future.

      In the revised manuscript, we have further clarified our focus on using B5-D3 as a “representative” of ACE2 decoy on page 4 line 84, page 5 line 109, page 14 line 333, and page 15 line 360.

      A direct comparative analysis with previously published benchmarks, particularly in terms of neutralizing potency, Fc effector function strength, and in vivo efficacy, is necessary to establish the incremental value and novelty of this specific agent.

      We thank the reviewer for his valuable comments.

      Indeed, our study has aimed to address this concern and made partial progress through in vitro neutralization assays (Figure 1b and Supplementary Fig. 2c,d,f,g). Our results from the limited yet meaningful comparisons with the sACE2 lacking Fc domain and selected sACE2-Fc mutants published/proposed previously clearly demonstrated “substantial enhancement through Fc-fusion” (Supplementary Fig. 1d) and modest improvement from protein mutagenesis at ACE2-Spike interaction interface” (Figure 1b and Supplementary Fig. 2c,d,f,g).

      Based on the results from our various neutralization assays, we chose B5-D3 as a representative of enhanced decoy for in vivo infection, which identified 6-hours IN prophylaxis to confer complete protection against infection, demonstrating significant impact of administration strategies on in vivo efficacy of B5-D3 (Figure 2). Subsequent analysis further uncovered intriguing phenomena regarding the cellular distribution of IN-administered B5-D3 and the early immune activation triggered in the lung, which underlies the full protection by IN prophylaxis and represents an important novelty of this study.

      We agree with the reviewer that further analysis with additional benchmark versions would enhance the value of this study, but we have reservation regarding the importance. To enhance clarity, in the revised manuscript, we have further emphasized our study focus on using B5-D3 as a representative ACE2 decoy throughout the text and enriched the discussion on page 15 line 348–365.

      References

      (1) Ku Z, Xie X, Hinton PR, Liu X, Ye X, Muruato AE, Ng DC, Biswas S, Zou J, Liu Y, Pandya D, Menachery VD, Rahman S, Cao Y-A, Deng H, Xiong W, Carlin KB, Liu J, Su H, Haanes EJ, Keyt BA, Zhang N, Carroll SF, Shi P-Y & An Z. Nasal delivery of an IgM offers broad protection from SARS-CoV-2 variants. Nature 595, 718-723 (2021).

      (2) Liu J, Mao F, Chen J, Lu S, Qi Y, Sun Y, Fang L, Yeung ML, Liu C, Yu G, Li G, Liu X, Yao Y, Huang P, Hao D, Liu Z, Ding Y, Liu H, Yang F, Chen P, Sa R, Sheng Y, Tian X, Peng R, Li X, Luo J, Cheng Y, Zheng Y, Lin Y, Song R, Jin R, Huang B, Choe H, Farzan M, Yuen KY, Tan W, Peng X, Sui J & Li W. An IgM-like inhalable ACE2 fusion protein broadly neutralizes SARSCoV-2 variants. Nat Commun 14, 5191 (2023).

      (3) Guo H, Cho B, Hinton PR, He S, Yu Y, Ramesh AK, Sivaccumar JP, Ku Z, Campo K, Holland S, Sachdeva S, Mensch C, Dawod M, Whitaker A, Eisenhauer P, Falcone A, Honce R, Botten JW, Carroll SF, Keyt BA, Womack AW, Strohl WR, Xu K, Zhang N, An Z, Ha S, Shiver JW & Fu T-M. An ACE2 decamer viral trap as a durable intervention solution for current and future SARS-CoV. Emerging Microbes & Infections 12, 2275598 (2023).

      (4) Keyt BA, Baliga R, Sinclair AM, Carroll SF & Peterson MS. Structure, Function, and Therapeutic Use of IgM Antibodies. Antibodies 9, 53 (2020).

      (5) Chan KK, Dorosky D, Sharma P, Abbasi SA, Dye JM, Kranz DM, Herbert AS & Procko E. Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2. Science 369, 1261-1265 (2020).

      (6) Guy JL, Jackson RM, Jensen HA, Hooper NM & Turner AJ. Identification of critical active-site residues in angiotensin-converting enzyme-2 (ACE2) by site-directed mutagenesis. The FEBS Journal 272, 3512-3520 (2005).

      (7) Payandeh Z, Rahbar MR, Jahangiri A, Hashemi ZS, Zakeri A, Jafarisani M, Rasaee MJ & Khalili S. Design of an engineered ACE2 as a novel therapeutics against COVID-19. Journal of Theoretical Biology 505, 110425 (2020).

      (8) Glasgow A, Glasgow J, Limonta D, Solomon P, Lui I, Zhang Y, Nix MA, Rettko NJ, Zha S, Yamin R, Kao K, Rosenberg OS, Ravetch JV, Wiita AP, Leung KK, Lim SA, Zhou XX, Hobman TC, Kortemme T & Wells JA. Engineered ACE2 receptor traps potently neutralize SARS-CoV2. Proceedings of the National Academy of Sciences 117, 28046-28055 (2020).

      (9) Lei C, Qian K, Li T, Zhang S, Fu W, Ding M & Hu S. Neutralization of SARS-CoV-2 spike pseudotyped virus by recombinant ACE2-Ig. Nature Communications 11, 2070 (2020).

      (10) Maciuba S, Bowden GD, Stratton HJ, Wisniewski K, Schteingart CD, Almagro JC, Valadon P, Lowitz J, Glaser SM, Lee G, Dolatyari M, Navratilova E, Porreca F & Riviere PJM. Discovery and characterization of prolactin neutralizing monoclonal antibodies for the treatment of female-prevalent pain disorders. MAbs 15, 2254676 (2023).

      (11) Dwivedi V, Shivanna V, Gautam S, Delgado J, Hicks A, Argonza M, Meredith R, Turner J, Martinez-Sobrido L, Torrelles JB & Kulkarni V. Age associated susceptibility to SARS-CoV-2 infection in the K18-hACE2 transgenic mouse model. Geroscience 46, 2901-2913 (2024).

      (12) Chen Y, Sun L, Ullah I, Beaudoin-Bussières G, Anand SP, Hederman AP, Tolbert WD, Sherburn R, Nguyen DN, Marchitto L, Ding S, Wu D, Luo Y, Gottumukkala S, Moran S, Kumar P, Piszczek G, Mothes W, Ackerman ME, Finzi A, Uchil PD, Gonzalez FJ & Pazgier M. Engineered ACE2-Fc counters murine lethal SARS-CoV-2 infection through direct neutralization and Fc-effector activities. Science Advances 8, eabn4188 (2022).

      (13) Lund J, Winter G, Jones PT, Pound JD, Tanaka T, Walker MR, Artymiuk PJ, Arata Y, Burton DR, Jefferis R & Woof JM. Human Fc gamma RI and Fc gamma RII interact with distinct but overlapping sites on human IgG. The Journal of Immunology 147, 2657-2662 (1991).

      (14) Lutz C, Maher L, Lee C & Kang W. COVID-19 preclinical models: human angiotensinconverting enzyme 2 transgenic mice. Hum Genomics 14, 20 (2020).

      (15) McCray PB, Pewe L, Wohlford-Lenane C, Hickey M, Manzel L, Shi L, Netland J, Jia HP, Halabi C, Sigmund CD, Meyerholz DK, Kirby P, Look DC & Perlman S. Lethal Infection of K18hACE2 Mice Infected with Severe Acute Respiratory Syndrome Coronavirus. Journal of Virology 81, 813-821 (2007).

      (16) Oladunni FS, Park JG, Pino PA, Gonzalez O, Akhter A, Allue-Guardia A, Olmo-Fontanez A, Gautam S, Garcia-Vilanova A, Ye C, Chiem K, Headley C, Dwivedi V, Parodi LM, Alfson KJ, Staples HM, Schami A, Garcia JI, Whigham A, Platt RN, 2nd, Gazi M, Martinez J, Chuba C, Earley S, Rodriguez OH, Mdaki SD, Kavelish KN, Escalona R, Hallam CRA, Christie C, Patterson JL, Anderson TJC, Carrion R, Jr., Dick EJ, Jr., Hall-Ursone S, Schlesinger LS, Alvarez X, Kaushal D, Giavedoni LD, Turner J, Martinez-Sobrido L & Torrelles JB. Lethality of SARS-CoV-2 infection in K18 human angiotensin-converting enzyme 2 transgenic mice. Nat Commun 11, 6122 (2020).

      (17) Urano E, Itoh Y, Suzuki T, Sasaki T, Kishikawa JI, Akamatsu K, Higuchi Y, Sakai Y, Okamura T, Mitoma S, Sugihara F, Takada A, Kimura M, Nakao S, Hirose M, Sasaki T, Koketsu R, Tsuji S, Yanagida S, Shioda T, Hara E, Matoba S, Matsuura Y, Kanda Y, Arase H, Okada M, Takagi J, Kato T, Hoshino A, Yasutomi Y, Saito A & Okamoto T. An inhaled ACE2 decoy confers protection against SARS-CoV-2 infection in preclinical models. Sci Transl Med 15, eadi2623 (2023).

      (18) Higuchi Y, Suzuki T, Arimori T, Ikemura N, Mihara E, Kirita Y, Ohgitani E, Mazda O, Motooka D, Nakamura S, Sakai Y, Itoh Y, Sugihara F, Matsuura Y, Matoba S, Okamoto T, Takagi J & Hoshino A. Engineered ACE2 receptor therapy overcomes mutational escape of SARS-CoV-2. Nature Communications 12, 3802 (2021).

      (19) Cho MJ, Been NR & Son H. From Alpha to Omicron: Structural Insights into SARS-CoV-2 RBD Evolution and ACE2 Binding. European Journal of Public Health 35(2025).

      (20) Urano E, Itoh Y, Suzuki T, Sasaki T, Kishikawa J-i, Akamatsu K, Higuchi Y, Sakai Y, Okamura T, Mitoma S, Sugihara F, Takada A, Kimura M, Nakao S, Hirose M, Sasaki T, Koketsu R, Tsuji S, Yanagida S, Shioda T, Hara E, Matoba S, Matsuura Y, Kanda Y, Arase H, Okada M, Takagi J, Kato T, Hoshino A, Yasutomi Y, Saito A & Okamoto T. An inhaled ACE2 decoy confers protection against SARS-CoV-2 infection in preclinical models. Science Translational Medicine 15, eadi2623 (2023).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Completeness and clarity of Methods (Weakness #1).

      We will substantially expand the Methods section to include:

      (a) Detailed information on C. difficile strain ribotype 1382 (correcting the typographical error "1482"), including its virulence characteristics, toxin production dynamics, and rationale for its selection.

      (b) Step-by-step protocols for on-chip bacterial quantification by flow cytometry, including sample collection volume, processing, and the specific normalization procedure (with clarification that normalized values are intended for within-experiment comparisons only).

      (c) Full description of mouse experiments: antibiotic pre-treatment regimen, inoculation details (spores vs. vegetative cells, justification of the 1×10^9 CFU dose), animal numbers, housing conditions, and cage-effect considerations. The IACUC approval statement will be moved from Acknowledgments to Methods.

      (2) Mucin layer characterization under anoxia (Weakness #2a).

      We will clarify in the Methods that mucin staining was performed after the initial oxic culture phase to confirm differentiation prior to anaerobic challenge. We will cite relevant literature discussing the stability of pre-formed mucin layers under short-term anoxic conditions and incorporate this discussion to contextualize our experimental design in the revised Methods.

      (3) Discrepancy in C. difficile counts and mechanism of LXA4 action (Weakness #2b, #3).

      We will provide a detailed explanation of our flow cytometry normalization algorithm, emphasizing that values are only comparable within a given experimental batch. We plan to perform additional in vitro experiments to directly assess the effect of LXA4 on bacterial growth and toxin secretion. These data will help distinguish between direct antibacterial effects and host-mediated protection, and the revised Discussion will incorporate this analysis.

      (4) Missing controls and experimental timelines (Weakness #2c–d).

      We will clarify that Figure 4 presents gut-on-chip experiments, not animal studies. The corresponding methods will be fully described. Additionally, we will include cross-experiment alignment analyses (using the CDI group as a common reference) to integrate negative control data from separate experimental batches. We also plan to generate additional data examining the effect of LXA4 alone (without infection) on epithelial barrier integrity and inflammatory status, which will be included as supplementary controls.

      (5) C. difficile strain characterization (Weakness #1g).

      A comprehensive section on ribotype 1382 will be added to the Methods, detailing its in vitro growth kinetics, toxin production profiles, and disease dynamics in the murine model, with appropriate literature citations.

      (6) Dysbiosis definition and phrasing adjustments (Other comments #b–d).

      We will revise the text to provide a clear definition of dysbiosis in the context of CDI. We will also temper the phrasing in line 82 to more accurately describe the advantages of our GOC system relative to other in vitro models, and correct the description of C. difficile as an obligate anaerobe.

      Reviewer #2 (Public review):

      (1) Synergy between LXA4 and vancomycin in vivo.

      We agree that the synergistic effect observed in the GOC model requires validation in an animal model. We are currently conducting mouse experiments to test the combination of prophylactic LXA4 with vancomycin treatment. The results will be included as a new Figure 5 in the revised manuscript.

      We are confident that these planned revisions will fully address the reviewers' concerns and significantly enhance the rigor and impact of our study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This paper presents a reanalysis of a large existing dataset to examine whether serial dependence effects-systematic influences of recent stimulus history on current perceptual judgments-are associated with generalization in perceptual learning. The central hypothesis is that extended, longer-range history effects (beyond the most recent trials) are beneficial for transfer across locations. The authors re analyze data from a texture discrimination task in which observers discriminated peripheral target orientation against a line background, with performance quantified by stimulus-onset asynchrony thresholds. Three training conditions were compared: a fixed single location condition, a two-location alternating condition, and a dummy-trial condition with frequent target-absent trials. Transfer was assessed after training at new locations. Serial dependence was quantified using history-sequence analyses and linear mixed effects models estimating bias weights across stimulus lags, with summary measures distinguishing recent (1-3 trials back) and more distant (4-6 trials back) dependencies.

      The authors report extended serial dependence effects, persisting up to 6-10 trials back, with substantial cumulative bias that remains stable across multiple days of training and is not correlated with overall performance thresholds. Recent history effects are stronger for faster responses, suggesting a contribution from decision- or responserelated processes, whereas more distant effects decline within sessions, potentially reflecting adaptation dynamics. Critically, longer-range serial dependence is significantly stronger in training conditions that promote generalization than in the single-location condition. Individual differences in the strength and decay profile of distant history effects predict the magnitude of transfer across locations, whereas recent history effects do not. History effects are also correlated across trained locations, suggesting stable individual differences.

      The authors interpret longer-range serial dependence as reflecting integrative processes that extract task-relevant structure over time, thereby supporting generalization, while shorter-range effects are attributed to more transient mechanisms such as priming or decision-level bias. The discussion connects these findings to Bayesian accounts of perceptual stability and to concepts of overfitting in machine learning.

      The study offers a novel and thoughtful link between short-term serial dependence and long-term generalization in perceptual learning, helping bridge two literatures that are often treated separately. The large dataset enables robust estimation of individual differences, and the use of mixed-effects modeling appropriately accounts for variability across observers. The empirical distinction between recent and more distant history effects is well-supported and adds important nuance to interpretations of serial dependence. Converging evidence from both group-level comparisons and individuallevel correlations strengthens the central conclusions.

      Several limitations should be addressed. First, the study relies entirely on previously collected data, without experimental manipulations designed to selectively isolate serial dependence mechanisms. Filtering choices, while theoretically motivated, may amplify history effects in ways that are difficult to quantify. Second, sequential dependencies can arise from multiple sources, including gradual updating of internal weight structures, adaptation processes, and history-dependent biases in decisionmaking. The current analyses do not clearly separate these contributions, limiting mechanistic attribution of long-range effects. Third, the conclusions are based on a single perceptual task, leaving open questions about generality across paradigms. Finally, while the discussion references computational ideas, no explicit modeling is provided to test whether plausible learning rules can jointly account for the observed history profiles and transfer effects.

      We now address these issues in the manuscript (see below for detailed responses) and provide a toy model (supplementary material) where the observed effects are explained by simple learning mechanisms.

      The findings align with theoretical frameworks that conceptualize perceptual learning as gradual reweighting of stable sensory representations at the decision stage (e.g., Petrov et al., 2005). Trial-by-trial updates in these models naturally give rise to sequential dependencies and sensitivity to training statistics. The observation that longer-range history effects predict generalization is consistent with broader temporal integration supporting more flexible learning, while narrower integration may lead to specificity. The results also indicate that multiple mechanisms - including decisionlevel biases and adaptation - may coexist with reweighting processes, highlighting the value of hybrid accounts.

      In summary, this is a careful and data-rich reanalysis that highlights a potentially important role for serial dependence in enabling generalization during perceptual learning. While the underlying mechanisms remain underspecified, the evidence supporting the reported associations is strong, and the work provides a valuable empirical foundation for further experimental and modeling efforts.

      Reviewer #2 (Public review):

      This manuscript investigates how people's perceptual reports are influenced by events and trials in the past, and how this long-range dependence relates to broader learning across locations in a visual learning task. The authors present clear and internally consistent analyses showing that extended temporal integration is associated with greater generalization of learning. The study is thought-provoking and may contribute meaningfully to understanding how short-term influences and long-term improvement interact, although several interpretational points would benefit from clarification.

      Strengths:

      (1) The manuscript identifies unusually long-range perceptual biases extending up to ten trials back, which is a striking and potentially important finding.

      (2) The association between strong long-range dependence and greater learning generalization is clearly documented and supported by consistent analyses.

      (3) The dataset is large and rich, and the authors apply repeated and well-controlled analyses that give confidence in the stability of the effects.

      (4) The writing is generally clear, and the manuscript raises interesting conceptual links between temporal integration and generalization of learning.

      Weaknesses / Points Requiring Clarification:

      (1) The manuscript repeatedly equates generalization with increased efficiency, but this relationship is not universally true. In some populations or tasks, excessive generalization can reduce task-specific efficiency. The authors should discuss this context-dependence to clarify when generalization is beneficial versus detrimental.

      We agree with the reviewer that generalization does not strictly imply increased efficiency; in some contexts, over-generalization can indeed be detrimental. We now explicitly note in the Introduction that serial dependence can impair performance when stimuli vary randomly across trials. We have reviewed the manuscript to ensure we do not explicitly equate generalization with efficiency. Our argument is specifically that long-range SDEs support the transfer of learning (generalization).

      (2) Serial dependence is also present, though smaller, in the central fixation task. It remains unclear whether this bias could contribute to the serial dependence observed in the main task. The authors should clarify whether the two biases are independent or whether the central-task bias might partially influence orientation judgments in the main task.

      These two tasks are independent, one requires T/L discrimination the other V/H discrimination. See our detailed response below.

      (3) Several figure captions and labels contain minor inconsistencies in formatting and terminology. Careful proofreading would improve clarity.

      We thank the reviewer for pointing this out and have proofread the captions to improve formatting and terminology consistency throughout.

      Reviewer #3 (Public review):

      This reanalysis of a classic study of visual perceptual learning in a texture discrimination task convincingly demonstrates the presence of sequential dependence effects, commonly seen in response time analyses in 2-alternative tasks, on response accuracy in the texture task in the visual periphery and in a simultaneous central letter report at fixation. Overall, this paper provides a new and interesting analysis of the effects of sequential dependencies from trial to trial on performance, learning, and generalizability in perceptual learning.

      Strengths:

      This new analysis of sequential dependency effects (SDEs) extends commonly observed sequential effects in two-choice reaction times to accuracy and relates them to response accuracy during visual learning in a frequently used perceptual learning task. The paper makes a convincing case that different conditions known to impact generalization of learning to a second visual location also express quantitatively distinct n-back SDEs.

      Weaknesses:

      Most of the new analyses emphasize the effects of SDEs, including trials designed to enhance the size of the effects, specifically when the current trial is low visibility, and the prior trial is of high visibility. Unless there is an argument that learning and subsequent generalization primarily occur in low-visibility trials, the presentation should also include displays and an emphasized discussion of analysis for all trials, unfiltered.

      We analyze effects on close to threshold (small-medium SOA) current targets preceded by above threshold (high SOA) reference targets. This is motivated by both technical issues and theoretical assumptions. In psychophysics, when using percent correct as a measure of performance, bias cannot be reliably estimated at or near ceiling performance, as correct responses leave little room for bias to manifest. Regarding the ‘easy’ targets used as a reference, having them at low SOA introduces uncertainty as for the reference orientation against which bias is measured, with their perceptual effect being ambiguous. Theoretically, we note that in perceptual learning with threshold targets, the introduction of clear targets in the absence of feedback enables learning (see Discussion, where we added: 'Most interestingly, in our experiments without feedback on the texture task, the experimental conditions yielding the strongest bias were also found to enhance learning in the absence of feedback (Liu et al., 2012)')

      We have addressed this concern also by conducting additional robustness analyses with unfiltered prior-trial history. We analyzed data without the prior-visibility filter; results are presented in a new Supplementary Figure S3 and confirm our main findings (see addition to Methods: "Finally, to verify that our findings are not artifacts of these filtering choices, we also conducted control analyses including all prior-trial history regardless of visibility; these results are presented in Supplementary Figure S3 and confirm the robustness of our main findings.").

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) How manipulations of stimulus statistics, uncertainty, or feedback could selectively engage different forms of serial dependence

      We expect serial dependence to be modulated by all these parameters. In classical SDT, stimulus statistics are known to affect response bias, as are temporal correlations in stimulation sequences. We note in the manuscript that we employed random sequences (50% chance for V and 50% for H targets), eliminating expectation-based biases toward either orientation. Stimulus uncertainty is known to increase serial dependence, as we also found here. Feedback is also expected to have an effect, the literature is somewhat ambiguous about this, but this may also depend on experimental design. We note that the main task studied here (TDT) had no feedback while the central T/L task did have feedback, both showing serial dependencies. In the manuscript we point to reviews of SDE where much of this is discussed.

      (2) How explicit computational models could help distinguish decision bias from structural learning

      We use the drift diffusion model (DDM) to distinguish decision bias (starting point in DDM) from structural learning (changes in drift rate). DDM predicts that decision bias is short lived, mainly affects fast reaction times (RT) while biases due to drift rate asymmetry persists to long RTs. We present these results in Figure 3.

      (3) Whether similar relationships are observed in other perceptual domains

      We are not aware of any other study linking serial dependence and perceptual learning or reporting such a link. We expect the link between long-range serial dependence and learning generalization to extend beyond the TDT (see new paragraph in Discussion). We hope this framework will motivate similar analysis in other labs where comparable datasets exist.

      (4) How sensitive are the results to the filtering choices used in the analysis?

      We analyze effects on close to threshold (small-medium SOA) current targets preceded by above threshold (high SOA) reference targets. This is motivated by both technical issues and theoretical assumptions. In psychophysics, when using percent correct as a measure of performance, bias cannot be reliably estimated at or near ceiling performance, as correct responses leave little room for bias to manifest. Regarding the ‘easy’ targets used as a reference, having them at low SOA introduces uncertainty as for the reference orientation against which bias is measured, with their perceptual effect being ambiguous. Theoretically, we note that in perceptual learning with threshold targets, the introduction of clear targets in the absence of feedback enables learning (see Discussion, where we added: 'Most interestingly, in our experiments without feedback on the texture task, the experimental conditions yielding the strongest bias were also found to enhance learning in the absence of feedback (Liu et al., 2012)')

      We have addressed this concern also by conducting additional robustness analyses with unfiltered prior-trial history. We analyzed data without the prior-visibility filter; results are presented in a new Supplementary Figure S3 and confirm our main findings (see addition to Methods: "Finally, to verify that our findings are not artifacts of these filtering choices, we also conducted control analyses including all prior-trial history regardless of visibility; these results are presented in Supplementary Figure S3 and confirm the robustness of our main findings.").

      Reviewer #2 (Recommendations for the authors):

      (1) Clarify mechanisms underlying long-range serial dependence. Please better distinguish possible sources of serial dependence (e.g., decision bias, adaptation, reweighting) and clarify which interpretations are supported or remain ambiguous given the current analyses

      Our manuscript discusses the mechanisms underlying the dissociation between recent and distant SDEs in the Discussion section. Specifically, we report that:

      Recent SDEs are RT-dependent (stronger with faster responses) consistent with decision-level criterion shifts (Dekel & Sagi, 2020)

      Distant SDEs are RT-independent consistent with neural reweighting/template updating

      We also discuss the role of sensory adaptation in truncating long-range integration, supported by within-session decline of SDEs, reduced distant SDEs in the 1loc condition, and the original findings by Harris et al. (2012).

      We have added an explicit acknowledgment that our correlational approach cannot definitively establish causality (see addition to Discussion: "While these converging findings support distinct mechanisms for recent and distant SDEs, our correlational approach cannot definitively establish causality, and targeted experimental manipulations would further strengthen these interpretations.").

      (2) Test robustness to analytic choices

      We have conducted robustness analyses by removing the prior-trial visibility filter. The results are presented in a new Supplementary Figure S3 and confirm that our key findings remain qualitatively unchanged (see addition to Methods referencing Supplementary Figure S3).

      (3) Strengthen the computational link

      We have expanded the Discussion to reference relevant computational models and specify predictions for future modeling work. We now cite Petrov et al. (2005). We provide a toy model implementing trial-by-trial template update that show SDE that is correlated with learning transfer. Importantly, in this model, long range SDE is a consequence of learning dynamics (see new paragraph in Discussion, and model simulation in supplementary material).

      (4) Discuss generality and experimental tests. Briefly address whether similar effects are expected across other tasks or sensory domains, and outline experimental manipulations that could causally test the role of serial dependence in generalization.

      We have added discussion of generality across perceptual domains and outlined the prediction that future work could test the SDE-generalization link in other tasks where both phenomena have been documented (see new paragraph in Discussion).

      Reviewer #2 (Public Review - Point 2): Central task SDE independence

      The SDEs observed in the central letter task and peripheral TDT are likely independent, as they involve different stimulus features (letter identity vs. orientation), different response mappings, and show distinct performance patterns across conditions. The absence of condition differences in central-task SDEs (described in the Results section under "SDE differences between conditions" end of paragraph), despite robust differences in TDT SDEs, further suggests that the peripheral orientation biases are not contaminated by central-task response tendencies. Note that the central task was fixed across conditions, stayed at fixation when location was changed, and when dummy trials were presented.

      Reviewer #3 (Recommendations for the authors):

      (1) Reference to Falmagne, Cohen, & Dwivedi (1975)

      We have added this reference to the Introduction, acknowledging the historical foundation of sequential effects in perceptual decisions

      (2) The SDE data of Figure 1 are (per the figure legend) from the 1 loc data of Harris et al., "pooled over all testing days", and filtered for trials with low-visibility current targets (SOA < SOA-threshold+20ms). Specify whether this threshold criterion is on a per-subject basis. State in the legend that "all testing days" includes Days 1-8 (4 days with the first location and another 4 days testing generalization to a second location).

      We have revised the Figure 1 legend to clarify:

      "Days 1–8; 4 days at the first location and 4 days at the second location to assess generalization"

      "calculated on a per-subject basis"

      (3) The leadup emphasizes that the analysis in the figure emphasizes trials where the effect is expected to be as large as possible (cited as 40 +/- 3%), while visible current targets (at n) biases were 5+/-1%.

      See below, after (4).

      (4) Unless a theoretical position associates learning just with low visibility (if so, explain), consider including two other panels showing the sequential dependencies for all trials, and the linear model weights over the last 10 trials for all trials.

      We acknowledge that the main analyses emphasize conditions that maximize SDE expression. To verify robustness, we conducted control analyses including all prior-trial history regardless of visibility; these results are presented in Supplementary Figure S3 and confirm our main findings.

      There are both theoretical and technical justifications for the filtering applied:

      It is well known that learning, in particular without feedback (as in our TDT), is facilitated by a mixture of threshold level stimuli and suprathreshold easy trials (e.g., Liu et al., 2012).

      Technically, it is impossible to measure bias with highly discriminable stimuli where performance is perfect or close to it, thus such trials are expected to dilute the measured effect. On the other hand, when considering serial effects from low sensitivity trials, we face an uncertainty involved in defining the actual orientation relative to which the bias needs to be computed.

      (5) Figure S1 seems to indicate that average thresholds over all days (location 1 and location 2) are unrelated to the sequential dependence across subjects and that the amount of learning in location 1 is unrelated to the sequential dependencies across subjects in all the varied conditions. Since Figure S1 includes all 50 subjects, it includes some conditions with dummy trials interspersed. Clarify in the description whether the dummy trials are ignored for the purposes of the SDE analyses.

      We have clarified in the Methods how trials are handled in the analysis: "To preserve the precise temporal structure of the data, all trials were included in the sequential n-back count across all experimental conditions, thus dummy trials were counted as time bins but their contribution was ignored. In the Linear Mixed Effects (LME) analysis, we modeled these trial types using distinct regressors: each n-back lag included separate predictors for visible and invisible targets, further differentiated by trial type (dummy vs. target) and relative location (ipsilateral vs. contralateral) where applicable. The SDE values reported here reflect only the influence of relevant target-present history trials; the effects of other history types (e.g., dummy trials), while estimated to ensure the temporal integrity of the model, are not presented."

      (6) The conclusion from this analysis seems to be that the overall average threshold and the amount of initial learning are both uncorrelated with the strength of sequential dependencies across subjects. This conclusion should be added to the description in the main paper.

      This finding is now discussed in the Discussion section, referring to the main Results section [ No significant correlation was found between biases and SOA thresholds across observers (r = -0.13, p = 0.37, average across days 1-8), nor between biases and improvements in performance at the first location (r = -0.09, p = 0.54, average across days 1-4), suggesting that the magnitude of serial dependence does not predict the overall amount of perceptual learning (Supplementary Figure S1)].

      (7) Decay of SDE section clarifications

      We have made the following clarifications:

      RT definition: Added to Methods: "The reaction time (RT) used in the analysis was defined as RT(TDT) – RT (fixation task), where RT for each task was measured from stimulus onset."

      N-back counting: Clarified in Methods (see response to point 5 above): all trials were included in the chronological sequence; the LME analysis assigned separate predictors at each lag for visible/invisible targets and for trial categories (dummy vs. target) and locations (ipsilateral vs. contralateral). The results reported do not include effects of dummy trial, except where response dependent SDE was reported (Fig 2a, SDE for response key).

      2loc n-back effect: The longer-range effects in the 2loc condition likely reflect reduced adaptation allowing longer temporal integration, combined with the location-selective nature of SDEs.

      RT and mechanism interpretation: The manuscript discusses that the critical observation is the qualitative difference in RT sensitivity between recent and distant SDEs, consistent with the drift-diffusion framework where criterion shifts are RTdependent while drift bias is RT-independent (Dekel & Sagi, 2020). We have added an acknowledgment of the correlational limitations of this interpretation.

      Moving figures to supplement: We prefer to keep Figures 4 and 5 in the main text as they document important dynamics supporting our mechanistic interpretation.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, Pinto and colleagues set out to investigate whether the cow udder is a potential mixing site for the influenza virus. The authors have demonstrated that bovine mammary epithelial cells can be infected with both avian and human influenza A viruses, supporting the idea that the cow udder may be a potential site for reassortment. Furthermore, they demonstrate that the bovine-adapted IAV replicates to similar titers in avian epithelial cells when compared to an AIV precursor virus. Thus, suggesting there is no fitness trade-off, and confirms the potential for spill-back of the cattle B3.13 into poultry, which has already been observed. Overall, I believe the authors achieved their aims. However, there are instances in which the results do not entirely support the conclusions (noted in weaknesses). Given the ongoing questions surrounding highly pathogenic avian influenza A virus in dairy cows, this work provides valuable evidence for the potential of the cow udder as a site of reassortment. These findings highlight the need for surveillance of influenza A virus incursions into livestock species, particularly cows. Some specific strengths and questions regarding weaknesses have been outlined below.

      Strengths:

      (1) The authors use a diverse range of cell types and influenza A virus strains, as well as a wide range of techniques to address the questions at hand.

      (2) The use of cells from multiple bovine breeds for the MAC-T, bMEC and explants suggests the phenomenon is not unique to a single breed.

      (3) The results suggesting there is no fitness trade-off for Cattle Texas in an avian host are interesting, and confirm the potential for spill-back of the cattle B3.13 into poultry, which has been observed.

      Weaknesses:

      I have listed my complete questions/concerns below. However, there are two main weaknesses of the article in its current state. Firstly, there is no apples-to-apples comparison in terms of determining a preference for IAV to infect the cow udder over other organs (Q4). The mammary gland and respiratory tract are represented by epithelial cells, but for other organs, fibroblasts were chosen. I think the fairer comparison would be to compare epithelial cells from different organs to demonstrate a preference for the mammary gland. Secondly, the main premise of the article relies on bMEC and MAC-T (primary and immortalised mammary epithelial cells), facilitating higher viral growth than the cells from other organs. Yet throughout the article, a 10x higher dose of IAV is used in the bMEC cells compared to everything else (Q6). This raises the question of how much of the results are due to a preference for the mammary epithelial cells, and how much is simply due to the increased dose.

      When we set out to test if cow mammary gland cells were particularly susceptible to IAV infection compared to other bovine cell types, we used what was available in the Roslin Institute in the first instance – a mix of primary and continuous cells from various anatomical sites: three epithelial cell types (two mammary, one respiratory tract) two immune cell types and four sets of fibroblasts from various organs. Given the representation of different anatomical sites, cell types and differentiation statuses, we considered this a suitably diverse panel with which to characterise infection dynamics of a broad range of IAVs, before more focussed investigations using the mammary bMEC and explant tissues. Both mammary epithelial cell types grew our library of influenza challenge strains significantly better than the BAT-II respiratory epithelial cells, as well as the two immune cell types and all four fibroblast populations. Of the fibroblast cells, those derived from the brain grew IAV significantly better than the skin and turbinate fibroblasts, while blood-derived macrophages grew virus significantly better than the lymphocytes and non-brain fibroblasts. Therefore, there are “apple to apple” comparisons as well as apple to pear comparisons that give significant differences. We therefore think that our conclusions (in the abstract) that mammary cells are particularly replication competent for IAV, (at the end of the introduction) that “a wide range of cow-derived cells are susceptible” and that (in the results section) that “mammary cells showed the highest susceptibility” are entirely justifiable. We do not claim that mammary cells are the only permissive bovine cells, but our evidence suggests they are highly susceptible.

      We used a higher MOI for bMECs because test experiments with WT PR8 and the Cattle Texas 6:2 reassortant showed that MOI 0.01 infections gave more variable results than ones run at MOI 0.1, perhaps because of the intrinsic variability of mixed primary cell populations. We therefore chose to go with the higher MOI. However, the end-point titres between the two conditions were not significantly different, so we do not think this choice is a confounding issue. We will add the comparison of the two MOIs as a supplementary figure in the formal revision.

      Reviewer #2 (Public review):

      The authors use a library of influenza A viruses from different strains, classified in lab-adapted, human, avian, and swine according to the animal from which they were isolated. They propose that the cow mammary gland serves as a mixing vessel for influenza A viruses. As a first approach, the authors assess susceptibility to infection across different cell types, including continuous and primary cell lines, bovine mammary cells, and mammary explants. All these cells support polymerase activity. Then, they analyzed changes in the bovine virus's viral fitness relative to an avian precursor. The authors use single-gene replacement to study whether and which RNP segments improve viral transcription. As part of this section, they also test IFN-specific antagonism by NS1 to assess the input of segment 8. Quantitative glycomic analysis was performed on the continuous bovine mammary cell line to demonstrate the presence of both a2,3 and a2,6, which is consistent with their observation that these cells can be co-infected with human and avian IAVs simultaneously. The main question, however, is: what is the glycome in the explants, or directly from tissues?

      We report quantitative glycomics for the primary bovine mammary epithelial cells as well as the continuous line the referee highlights. However, we agree with R2 that a detailed glycomic analysis of primary bovine mammary tissue would allow a better understanding of the actual glycosylation status in vivo. This has now been undertaken by the authors and is available as a bioRxiv preprint:

      Bovine H5N1 influenza viruses have adapted to more efficiently use receptors abundant in cattle

      Jack A. Hassard, Jiayun Yang, Bernadeta Dadonaite, Jonathan E.Pekar, Jin Yu, Samuel A. S. Richardson, Rute M. Pinto, Kristel Ramirez Valdez, Philippe Lemey, Jessica L. Quantrill, JinghanXue, Tereza Masonou, Katie-Marie Case, Jila Ajeian, Maximillian N. J. Woodall, Rebecca A. Ross, Nicolas Hudson, Kan Zhong, Hongzhi Cao, Samuel Jones, Hannah J. Klim, Brian R. Wasik, Desi N. Dermawan, Jean-Remy Sadeyen, Dirk Werling, DylanYaffy, Joe James, Alessandro Nunez, Paul Digard, Ian H. Brown, Daniel H. Goldhill, Pablo R. Murcia, Claire M. Smith, Yan Liu, Jesse D. Bloom, Munir Iqbal, Wendy S. Barclay, Stuart M.Haslam, Thomas P. Peacock: bioRxiv 2026.04.02.715584; doi:https://doi.org/10.64898/2026.04.02.715584

      Overall, the manuscript is clearly written and provides new insights into the behaviour of the cattle isolate, now compared with a representative group of model or precursor HAs of different origins.

      It would be great if a consistent nomenclature for the IAV strains could be used in the study. There is a mix of origin (Texas), animal from which the virus was isolated (mallard), or abbreviations that do not follow guidelines (IAV07). Are the USSR and Udorn not lab-adapted?

      We chose the abbreviated names for a variety of reasons. Partly from common usage (e.g. PR8, Udorn), partly for consistency with other already published papers from the FluTrailMap consortia (e.g. Cattle Texas; Dholakia et al 2026), partly to make diversity obvious in certain figures (e.g. H3N1, H5N2 etc) and partly to avoid confusion between viruses that originate from the same geographic area (e.g. AIV07, AIV09, H5N8-20 etc which are all Ck/England/isolate numbers). Overall, we found it more confusing to use the expanded nomenclature. Re AIV07 which the referee criticises for not following naming guidelines – if this is a reference to the EURL nomenclature, AIV07 is the abbreviation for the specific virus A/Chicken/England/053052/2021, our representative virus for EURL genotype EA-2020-C, as we say in the text. We should however have included this nomenclature in Table 1, which otherwise provides a cross-reference for all the names. This will be added in the formal revision to help with clarity.

      As to whether USSR and Udorn are lab adapted – that depends on definitions. There is a continuum of adaptive changes and/or sequence drift starting from the very first growth of an isolate in the laboratory. The viruses we define here as lab adapted are ones that have been deliberately adapted to other hosts or which have very long passage histories in multiple host species resulting in known functionally significant changes. For example, PR8, with 100s of passages in mice, ferrets and embryonated hens eggs (doi: 10.3390/v12060590), makes it unarguably lab-adapted. We admit that A/USSR/77 and A/Udorn/307/1972 are probably further along this adaptive pathway than more recent isolates such as A/Norway/3433/2018, but are unaware of any specific reason that would put them into our lab adapted category.

      The experimental setup includes bovine mammary primary and continuous cells, as well as mammary explants. Some of the most significant differences, for example, in viral fitness studies and co-infection experiments, are observed in these explants. Perhaps there could be some additional focus on this observation. The implications in comparison to the results obtained in cultured cells could be described. How will the human and other HA subtype viruses fare in the explants?

      We agree that this is an important and interesting question, and have tested the strains we used for co-infections, human seasonal H1N1 “Norway” and low pathogenic avian influenza “H3N1”, in the mammary explants. Both replicate, the avian virus to 20-fold higher titres. We will add this new information to the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This excellent manuscript by Pinto, Sharp, and colleagues examines bovine tissue tropism for influenza viruses. They find that bovine flu, as well as other strains, has strong replication in mammary tissue. They also map the genetic changes to influenza that improve replication in bovine cells. Overall, the study is well designed and executed, and the results are very timely.

      Strengths:

      (1) The experiments are well-controlled.

      (2) The figures are well-constructed and easy to follow.

      (3) The Methods and legends are detailed, with sufficient information.

      Weaknesses:

      (1) A comparison to human cells would strengthen the overall impact of the results. Are human mammary cells also uniquely susceptible to influenza? Are bovine mammary cells special in some way?

      This is an interesting question but we have not tested mammary gland cells from humans (or any other species of mammal), but we have reported elsewhere (Dholakia et al., Nat Commun. 2026 Jan 16;17(1):1603. doi: 10.1038/s41467-026-68306-6.) that Cattle Texas grows well in a variety of human respiratory cells. Here we are considering the bovine mammary organ as a potential reassortment site for IAVs; human mammary organs are unlikely to create this opportunity.

      (2) For the virus infection studies with segment 8 swaps, it should at least be noted that some of the phenotypes could be driven by NEP.

      We agree, and will change the text to acknowledge this in a revised version.

      (3) The data demonstrating that bMEC can support co-infection are compelling and important, but would be strengthened with a comparison from a different cell type or species. Do mammary cells uniquely support higher co-infection?

      We have data showing that co-infection also occurs in the continuous MAC-T udder cell line and will include these data in a revision. We have not tested bovine cells from other organs for co-infection potential as they do not seem to be significant sites of infection in vivo.

    1. Author response:

      We sincerely thank the Reviewing Editor, Senior Editor, and both reviewers for their careful and constructive assessment of our manuscript. We are encouraged that the reviewers recognize the value of our dataset and its potential contribution. We greatly appreciate the thoughtful comments and have carefully considered the reviews. We plan to revise the manuscript accordingly. 

      First, we will revise and refine the cross-species comparative analysis, with particular attention to clarifying the basis of the comparisons between ascidian and mouse endodermal lineages. In particular, we will adopt a more cautious and precise comparative framework, clarify the scope and limitations of the mouse comparison, and broaden the context by incorporating additional vertebrate and invertebrate deuterostome systems where relevant.

      Second, we will strengthen the gene-level interpretation of the identified endodermal populations and clarify the molecular basis for the similarities and differences. In particular, we will more clearly identify the key marker genes defining each population, better explain their relationship to previously described developmental sources. 

      Third, we will improve the clarity of the Results presentation, including the description of the two major endodermal progenitor populations and their subcategories, as well as the organization of the text, figures, and figure legends. 

      Fourth, we will substantially rewrite the Discussion, especially the sections dealing with evolutionary implications, to ensure that our interpretations are presented in a more cautious manner.

      These revisions are intended to address the reviewers’ concerns regarding both the evolutionary framing and the presentation of the data. We believe that these revisions, which will include both rewriting and additional analyses, will improve the clarity and rigor of the manuscript. We look forward to submitting a revised version.

      We thank the editors and reviewers again for their time and expertise.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, Andriani et al. show intracellular zinc is exported from sperm during capacitation and suppresses the alkalinization-induced hyperpolarization in sperm. Intracellular zinc inhibits Slo3 current, which is enhanced by the co-expression of gamma subunit Lrrc52. Computational studies reveal that the Zn binding site on mSlo3 is located near E169 and E205, which are involved in the sustained zinc inhibition of mSlo3 current. The authors propose that intracellular zinc plays a key role in sperm capacitation by inhibiting the Slo3 channel.

      Strengths:

      Overall, the work appears well-designed (e.g., oocyte patch-clamp experiments), and clearly presented. Three-dimensional structural modeling and flooding simulations are executed.

      Weaknesses:

      The simple mutagenesis analysis of E169 and E205 showed partial abolishment, but the molecular mechanism by which zinc inhibits Slo3 current is not yet fully shown. The authors should consider performing more extensive experiments, such as creating double mutants or combination mutants involving other residues. Additionally, could other mechanisms explain the role of zinc in regulating the Slo3 current?

      We thank the reviewer’s thoughtful comments regarding the mutagenesis analysis and the possible mechanisms underlying zinc regulation of Slo3. Regarding the suggestion to perform double or combination mutants, we agree that such experiments would provide valuable mechanistic insight. However, due to limited resources, we were not able to perform these additional experiments within the scope of this study. Our current results show that mutations at E169 and E205 partially abolish zinc inhibition, which suggests that the inhibitory mechanism is not mediated through a single residue and is likely more complex.

      Alternative mechanisms that may contribute to zinc modulation of Slo3 include indirect effects through modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites within Slo3 channel other than the sites discovered through this study. At present, these mechanisms remain speculative and further studies will be required to clarify their contributions. This study provides the foundational basis for understanding how zinc inhibits the Slo3 channel and serves as an important starting point for defining the molecular mechanism in more detail.

      We already acknowledged in the Discussion section that the precise molecular basis of zinc inhibition remains unknown and that future work involving more extensive mutational and structural analyses will be essential to fully resolve this issue.

      We also added the discussion section as follows:

      “It is worth noting that the incomplete loss of zinc sensitivity in these mutants suggests that additional mechanisms may participate in zinc modulation of Slo3. These may include modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites. Comparisons with Slo2.2 (J. Zhang et al., 2023), KCNQ4 (Gao et al., 2017), and voltage-gated calcium channels (Sun et al., 2007) further support the possibility of diverse molecular determinants for zinc inhibition. Our VCF, mutagenesis, and simulation data together indicate that zinc influences voltage sensor movement in mSlo3, which may suggest a distinct inhibitory mechanism that warrants further investigation.”

      While elucidating the mechanism of Slo3 is interesting, there is substantial literature indicating how zinc regulates channel functions at a molecular level. Given this, the manuscript should provide a deeper understanding by clearly elucidating the molecular mechanism of the regulation of Slo3 current by zinc.

      Thank you for highlighting a very important point that requires deeper discussion and explanation regarding how zinc regulates Slo3 current at the molecular level. As reported, Slo3 is gated by membrane depolarization and, at the same time, this channel is also gated by intracellular pH, particularly alkalinization (Leonetti et al., 2012; Schreiber et al., 1998; X. Zhang et al., 2006). This makes the gating mechanism of this channel complex. The molecular mechanism underlying pH regulation of the Slo3 channel remains unknown (M. D. Lyon et al., 2023). We tested different pH conditions and membrane voltage to elucidate the effect of zinc on the Slo3 channel. Our data suggests that zinc inhibition in mSlo3 channels is dependent on pH (Fig. 2A-E), voltage (Fig. 2G-H; Fig.2—figure supplement 1A, B) and exhibits a long-lasting inhibitory effect (Fig. 2I, K).

      However, as much as we are aware that these data alone cannot explain the molecular mechanisms of zinc’s effect on Slo3 current, our mutagenesis experiments also did not provide a straightforward answer. The single amino acid mutations examined in this study, which contain clustered negative residues, did not significantly alter zinc-mediated current reduction compared to the wild type. As the reviewer pointed out, mutating one single amino acid may not be sufficient to fully identify other contributing residues within the predicted mSlo3 zinc-binding site. Therefore, more extensive mutagenesis studies will be required to fully elucidate the molecular mechanism of zinc inhibition in mSlo3, which could not be fully understood in this study.

      On the other hand, when we analyzed the percentage of current recovery of all the mutants, E169A and E205A showed significant current recovery upon the wash-out by pH 8.0 alone. Consistent with MD simulations, our electrophysiological recordings demonstrated that the long-lasting inhibitory effect of zinc was partly abolished by these mutations. Thus, our findings highlight the contribution of E169A, located at the lower end of S3 domain and E205A, located at the lower region of S4 domain, to zinc-mediated inhibition of mSlo3 current.

      Additionally, since the molecular mechanism of pH regulation on Slo3 channel remains unknown, the molecular basis of its dual gating has yet to be elucidated, making it difficult to draw a single definitive conclusion from our current research data on how zinc inhibits mSlo3 current. Nevertheless, this study provides the foundation for understanding possible mechanisms of zinc inhibition. Our VCF data suggest that zinc influences the movement of VSD of mSlo3, and together with our mutagenesis and MD simulations results, these findings represent an important first step toward elucidating the molecular mechanism of zinc inhibition of the mSlo3 current.

      Intracellular zinc exerts inhibitory effect on mSlo3, similar to what has been reported for Slo2.2 channels (J. Zhang et al., 2023), high- and low-voltage activated calcium channel families (Sun et al., 2007) and KCNQ4 channels (Gao et al., 2017). These studies identified different regions, amino acids, and possible mechanisms of zinc inhibition among these ion channels. For instance, in Slo2.2 channels, which belong to the same Slo family as Slo3, the zinc-binding site was identified in the RCK2 domain, where cysteine and histidine residues form a canonical zinc binding motif (J. Zhang et al., 2023). In KCNQ4 channels, zinc inhibits the channel activity in a non-canonical manner that depends on its physiological activator, the membrane lipid PI(4,5)P<sub>2</sub> (Gao et al., 2017). Although zinc exerts the inhibitory effects on those various voltage-gated potassium and calcium channels, the mechanisms differ. Our data suggests another distinct mechanism of zinc inhibition in the mSlo3 channel with the identified sites located in the VSD, where zinc influences the voltage-sensor motion, and consequently affects the complex gating of Slo3.

      We revised the discussion section as follows, which is also related to the previous comment:

      “It is worth noting that the incomplete loss of zinc sensitivity in these mutants suggests that additional mechanisms may participate in zinc modulation of Slo3. These may include modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites. Comparisons with Slo2.2 (J. Zhang et al., 2023), KCNQ4 (Gao et al., 2017), and voltage-gated calcium channels (Sun et al., 2007) further support the possibility of diverse molecular determinants for zinc inhibition. Our VCF, mutagenesis, and simulation data together indicate that zinc influences voltage sensor movement in mSlo3, which may suggest a distinct inhibitory mechanism that warrants further investigation.”

      The manuscript includes no experimental data on the mechanism of intracellular zinc export during sperm capacitation, despite being crucial for the regulation of sperm function.

      We thank the reviewers for the valuable comment in this regard. We agree that mechanism of intracellular zinc export during capacitation is crucial for the regulation of sperm function, and it would be an important finding if we could provide the experimental data on this. However, there are significant technical difficulties in performing such experiments. Two protein families facilitate the transport of zinc across cellular and intracellular membranes in opposite directions: ZnT and ZIP. ZIP12 has been reported to be highly expressed in mouse testis (Zhu et al., 2022), as well as ZnT-1 (Elgazar et al., 2005). To date, there are no known inhibitors for zinc transporters, and there is also no suitable antibodies available for these transporters, which makes it difficult to design experiments to examine the intracellular zinc transport during sperm capacitation. Apart from the two reported zinc transporters, the functional significance of other ZnTs and ZIPs, particularly those related to capacitation, remains largely unclear, leaving the mechanisms of zinc transport in sperm during capacitation poorly understood. Moreover. homozygous Znt-1 knockout mice exhibit a lethal phenotype (Andrews et al., 2004).

      Reviewer #2 (Public review):

      Summary:

      In this paper, Andriani and colleagues are examining the potential role of Zn flux in sperm and its effect on Slo3 channels. This is an interesting question that is likely critical to how sperm function properly and Slo3 channels are a possible candidate for a downstream molecule that is impacted by Zn. In this paper, the authors use Zn imaging, sperm motility assays, and electrophysiology to show that Zn flux impacts sperm function. They then go on to look at the impact Zn has on Slo3 current and propose a binding site based on MD simulations. While the ideas are interesting, the experiments are not well described in many places making understanding the results very difficult. In addition, critical controls are missing throughout the paper.

      Strengths:

      The question of how Zn flux impacts membrane potential and sperm motility is an important one. Moreover, Slo3 presents an interesting candidate or the target of Zn regulation. The combination of methods used here also has the potential to uncover mechanisms of Zn regulation of Slo3.

      Weaknesses:

      Much of the paper lacks experimental description which makes interpretation quite difficult, or a detailed discussion is missing. Examples include:

      (1) Figure 1, particularly the Zn imaging, is not sufficiently described. How is the fluorescence intensity measured? A representative ROI? The whole tail and head? Are the sperm immobile? If not, there is evidence that motion artifacts can significantly distort these sorts of measures from Calcium measurements in Cilia. Were there controls done? Is the small amount of Zn seen in the tail above the background?

      We sincerely thank the reviewer for pointing out important details that we should provide in this study in order to make it well understood. We would like to answer and respond to the points raised by reviewer as follows:

      Fluorescence intensity is measured by the signal taken from the whole head and the proximal part of tail in sperm. We have included this in the materials and methods.

      Materials and Methods

      “Fluorescence intensity is measured by the signal taken from the whole head and the proximal part of tail in sperm.”

      Yes sperm is immobile during zinc imaging.

      We added the control data of zinc imaging without capacitation medium and incorporated the data into the graph in Figure 1B. For the control in non-capacitation medium, we use HS medium as newly explained in the methods, results, related figure (Figure 1B), and figure legends.

      Yes the small amount of Zn seen in the tail above the background. As shown in Fig. 1A we confirmed that the signal intensity at the proximal region of the tail was higher than the background. Therefore, the data for this region were calculated after background subtraction.

      (2) The second half of Figure 1 is also not well described. What is the extracellular solution in the recordings? When you apply the Zn ionophore, do you expect influx or efflux? I assume efflux is based on the conclusions but this should be discussed explicitly.

      The extracellular solution in the recordings for Figure 1 is HS solution (HEPES-buffered saline solution), a standard non-capacitation medium. We will include this information in the materials methods.

      Materials and methods

      “HS-based solution was used as the extracellular solution.”

      We assume that intracellular zinc levels increase upon application of zinc ionophore. Previous work has reported that sperm contain approximately 35.7 ng/10<sup>6</sup> cells in the head and flagellum (Henkel et al., 1999). When zinc pyrithione is applied, it facilitates the influx of Zn<sup>2+</sup> from the surrounding medium into the cell, thereby increasing intracellular zinc concentration. Zinc pyrithione functions both as a zinc source and as a transport facilitator, allowing Zn<sup>2</sup> to cross the otherwise impermeable lipid membrane without compromising membrane integrity.

      (3) Figure 2H labels the Y axis, "normalized current". Normalized to what? Why do neither of the curves end at 1? A better description of what this figure represents is needed.

      Normalization for figure 2H was performed by dividing the absolute current of mSlo3 at pH 8.0 of each voltage by the absolute current at the pre-determined highest voltage that still produced a stable mSlo3 current (i.e., good patch, good clamp). In this analysis, +140 mV was chosen as the highest voltage for normalization, since in few cells the patch was lost at +160mV and +180mV. Similar to the control condition, the absolute current of mSlo3 in the presence of 100 µM zinc was normalized to the absolute current of the control at +140 mV. This information has been included in the figure legends and the Materials Methods section of the revised manuscript.

      Materials Methods section:

      Figure legends for figure 2H has been updated.

      (4) The alpha fold simulations are not well described. How many Zn binding sites were found? Are all of the histidine mutations in Figure 4 Supplement 1 the ones that were found?

      We thank the reviewer for the question. In our AlphaFold3 input, we only input the transmembrane region of the protein. From there, we found four sites located as follows:

      Given that we are only interested in the intracellular side of the membrane, we are only interested in the site with the highest pLDDT value (confidence values). On the IC side, there are only two sites, where the other sites are located near the pore domain. The site is near E310 and K319.

      Author response image 1.

      AlphaFold3 prediction of the Zn binding site on IC side of Slo3

      The histidines in Fig. 4—figure supplement 1 are all histidines that are not in the transmembrane region. These residues were not included in the initial inputs for AlphaFold3. However, we conducted MD simulations including these residues and we were able to show that a few of these residues are in contact with Zn. We have now plotted the minimum distance between each of these residues and Zn in the flooding simulations.

      Author response image 2.

      MD simulations of histidines residues located in IC of Slo3

      Minimum distances between histidines in Fig. 4—figure supplement 1 and Zn<sup>2+</sup> from the flooding simulations. Different colors indicate different repeats.

      (5) There is no discussion of physiological intracellular Zn concentration. How much Zn is inside the sperm? How much if likely Free vs buffered? Is 100uM a reasonable physiological concentration?

      We estimated the intracellular zinc concentration in sperm based on human sperm data, which report a zinc concentration of approximately 35.7 ng/10<sup>6</sup> cells in the head and flagellum (Henkel et al., 1999). Considering the volume of a typical human sperm is about 15 µm<sup>3</sup> (Laufer et al., 1977), this translates to an estimated intracellular zinc concentration of approximately 400 mM, although the concentration of free zinc must be much lower than this level. Although exact intracellular zinc concentrations in mouse sperm are not well-documented, this estimate supports the observation of elevated zinc in non-capacitated sperm.

      There are a number of areas where the interpretation is not well supported by the data including:

      (6) You say in the Figure 4 supplement, that "we did not observe any significant decrease in the percentage of current inhibition." But that is a pretty misleading statement. There are large changes (increases) in the amount of zinc inhibition. These might be allosteric changes but I don't think you can safely eliminate these as relevant Zn binding sites. Also, some of these mutations appear to allow at least some unbinding of Zn.

      In our MD simulations, H720 is not at the zinc binding site and therefore, mutation to arginine would indeed eliminate its binding. We are showing this in the minimum distance analysis between Zn and H720 and show that they are further than 4 Å from each others (n=3), as shown in author response image 2.

      Chimera of Slo3/Slo1 RCK2 also showed large increases in the amount of zinc inhibition, and this might serve as a potential binding site. We agree that the statement: “we did not observe any significant decrease in the percentage of current inhibition.” is misleading, therefore we revised our interpretation and statement into:

      We revised the result section as follows:

      “However, the percentage of current inhibition varied across the mutated constructs, showing either increases or no appreciable change (Fig. 4—figure supplement 1B, C).”

      (7) Following up on the above point, it seems unfair to conclude that the D162S, E169A, and E205 mutants are part of the inhibitory binding site for Zn when the mutation has no effect on inhibition and only an effect on the washout. The mutations on the intracellular side also had an impact on the washout so it seems equally likely that they are the critical residues based on your data.

      We thank the reviewer for this important point. We agree that the absence of a strong reduction in the initial zinc inhibition makes it challenging to assign any single residue as a definitive zinc binding site. However, our interpretation is based not only on the electrophysiological data but also on the MD simulations, which consistently identified E169 and E205 as residues that frequently interact with zinc and stabilize zinc occupancy within the VSD region. Although the mutations did not markedly reduce the peak level of zinc inhibition, both E169A and E205A significantly altered the long-lasting inhibitory component during washout, which is consistent with the MD-predicted interactions. In contrast, the intracellular mutations affected washout but were not supported by MD simulations as potential zinc interaction sites. Taken together, these combined datasets support the idea that E169 and E205 contribute to zinc modulation of Slo3 in the VSD, even though additional residues or mechanisms are likely involved.

      (8) Nowhere in the paper do you make the specific link between Zn flux and membrane hyperpolarization via Slo3. You show that Zn flux changes the ability of the sperm to hyperpolarize and you show that Slo3 is inhibited by Zn but the connection between the two is not demonstrated. There appears to be a specific Slo3 blocker. If you use this in sperm, do you no longer see the Zn effect?

      Thank you for pointing out the need for clarifying this point. It is already known that sperm capacitation is well associated with the increase of intracellular pH (Vredenburgh‐Wilberg & Parrish, 1995; Y. Zeng et al., 1996), the hyperpolarization of the membrane (Arnoult et al., 1999; Y. Zeng et al., 1995) and the elevation of intracellular Ca<sup>2+</sup> concentration level (Breitbart, 2002; Publicover et al., 2007) through diverse ion channel activities. To explore whether these pathways are influenced by intracellular zinc, we used patch-clamp techniques to measure the membrane potential (Vm) as shown in Fig. 1D-K. It has been reported that under the whole-cell current clamp of mouse epididymal spermatozoa, resting membrane potential is hyperpolarized after intracellular alkalinization (Navarro et al., 2007). We mentioned this in line 100-108 in the manuscript.

      Next, our findings from the experiments using mouse spermatozoa suggest that intracellular zinc inhibits a key process in sperm capacitation, specifically the alkalinization-induced hyperpolarization. Previous studies have identified the pH-and voltage-dependent potassium channel Slo3 is responsible for the principal K<sup>+</sup> current (I<sub>KSper</sub>) in mouse spermatozoa (Navarro et al., 2007; Santi et al., 2010; Schreiber et al., 1998; X. H. Zeng et al., 2011). During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010). Given this context, we next investigated whether intracellular zinc acts directly on the Slo3 channel and found that zinc inhibits mSlo3 current. We explained this rationale of the experiment in line 143-150.

      We add following sentence to add more clarity to the text:

      “During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010).”

      Therefore, the text was modified into:

      “Our findings suggest that intracellular zinc inhibits a key process in sperm capacitation, specifically the alkalinization-induced hyperpolarization. Previous studies have identified the pH-and voltage-dependent potassium channel Slo3 is responsible for the principal K<sup>+</sup> current (I<sub>KSper</sub>) in mouse spermatozoa (Navarro et al., 2007; Santi et al., 2010; Schreiber et al., 1998; X. H. Zeng et al., 2011). During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010). Given this context, we next investigated whether intracellular zinc acts directly on the Slo3 channel.”

      Regarding the specific inhibitor, as has been pointed out by the reviewer that a new Slo3 inhibitor, VU0546110, exhibited more than 40-fold selective for human Slo3 over Slo1 (M. Lyon et al., 2023). However, the effect of VU0546110 in mSlo3 has not been tested yet. Both mouse and human Slo3 exhibit similar responses to certain inhibitors, but mouse and human Slo3 is also differ in their responses to several other inhibitors (M. D. Lyon et al., 2023), making it uncertain if this VU0546110 will work on mSlo3.

      (9) In the second half of Figure 1, the authors suggest that there is "no hyperpolization in 100uM Zn. That is not really true. It is reduced but not absent.

      We modified the wording of “no hyperpolarization in 100 µM Zn” to “alkalinization-induced hyperpolarization was reduced in the 100 µM ZnCl<sub>2</sub> group.”

      “In contrast, alkalinization-induced hyperpolarization was reduced in the 100 µM ZnCl<sub>2</sub> group”

      (10) The claim that Lrcc52 with Slo3 shows a higher current inhibition at pH 7.5 than pH 8 is not well supported because there are only 3 replicates in the 7.5 case. In addition, the claim is made in the test that 100uM ZnCl2 "already inhibited mSlo3+Lrcc52 at pH7.5", contrasted with mSlo3 alone, is not tested statistically.

      Thank you for the valuable comment. Although Fig. 3F shows a statistical difference, we agree that having only three replicates at pH 7.5 may somewhat weaken the conclusion. Following this suggestion, we have revised the sentence as follows:

      “Alkalinization appeared to increase the percentage of current inhibition by 100 µM ZnCl<sub>2</sub>.”

      We provided statistical analysis to compare pH 7.5 between mSlo3 alone and mSlo3+Lrrc52 in the Figure 3—figure supplement 1D:

      The statistical analysis showed that 100 µM zinc significantly inhibited the mSlo3 + Lrrc52 current at pH 7.5 compared to the mSlo3 current alone. We have incorporated the necessary changes into the revised manuscript and updated the figure legends accordingly.

      In a number of places, better controls are needed.

      (11) How specific is this effect for Zn? Mg2+, for instance, is also a divalent cation that is in the hundreds of uM range inside the cell. Does it exert the same effect? Each ion certainly has unique preferred coordination geometries, does your predicted binding with MD show what you might expect for tetrahedral coordination with Zn? Did you test other divalent cations functionally or in silicon?

      To answer this question, we have tested this by building another AlphaFold3 model, with Mg<sup>2+</sup> instead of Zn<sup>2+</sup>. We did not opt for the all-atoms MD simulations due to the cost of the simulation. Here, the model shows that Mg are all clustered at the pore domain and does not reside anywhere near the Zn<sup>2+</sup> site from both MD simulations and the AF3 model.

      Author response image 3.

      AlphaFold3 model of Slo3 channel with Mg<sup>2+</sup>

      The Slo3 AlphaFold model from residue M1 to L330. The colour gradient reflects the pLDDT score range from 1.73 to 95.69. Purple sticks highlighted E169, N171 and E205. In this study, we did not examine other divalent cations in our electrophysiological recordings. Exploring their effects will be an important direction for future research.

      (12) For the VCF experiments, a significantly higher concentration of Zn was used (10mM). What is the reason for this? There is no discussion of how much a "puff" is. Assuming you are using the RNA injector it is probably on the order of 50nL or less. Assuming the volume of an oocyte is 1uL that would argue that the final concentration is 500uM or higher. But this is also complicated by potential local effects of high Zn at the injection site, artifacts of injecting that much metal, and the fact that a great deal of the Zn will likely be bound to other things inside the cell. Better controls are needed for this experiment.

      As pointed out by the reviewer, the volume of the oocytes is estimated to be approximately 1 µL. We performed manual injections using glass needle typically used for RNA injection. However, because the injections were done manually during real-time VCF recording (as illustrated in the experimental scheme), the exact volume of the solution injected into each oocyte could not be precisely controlled. We estimated that each drop to be approximately 50 nL, resulting in a final concentration around 500 µM, as described by the reviewer.

      The rationale for using relatively high concentration was to ensure that the zinc concentration inside the oocyte reached an effective level, since manual injection may sometimes deliver less than 50 nL of solution. In some cases, injections failed entirely due to the technical difficulty of the method. Because VCF recordings are already technically difficult, we aimed to ensure that zinc injection was successful in oocytes that exhibited robust fluorescence signal by injecting an excess amount of zinc that would not disrupt normal oocyte conditions. For example, 10 mM zinc was prepared in an acidic solution (pH 2.5). We verified that this acidic condition did not affect mSlo3 current by performing control injections with the acidic solution alone, since the mSlo3 current is not activated under acidic pH conditions

      Author response image 4.

      VCF control experimentes: vehicle injection.

      Reviewer #3 (Public review):

      Summary:

      The study titled "Zinc is a Key Regulator of the Sperm-Specific K+ Channel (Slo3) Function" aims to investigate the role of intracellular zinc in sperm capacitation and its regulation of the sperm-specific Slo3 potassium channel. Capacitation is a crucial physiological process that enables sperm to fertilize an egg, and membrane hyperpolarization through Slo3 activation is a well-established event in this process. The authors propose that intracellular zinc dynamically decreases during capacitation and inhibits Slo3-mediated K⁺ currents, thereby playing a regulatory role in sperm function.

      Strengths:

      (1) Novel Contribution to Sperm Physiology.

      The study provides new insights into how zinc dynamics contribute to sperm capacitation, specifically through its direct inhibition of Slo3 activity.<br /> Previous research has focused primarily on extracellular zinc's effect on sperm function; this work expands the discussion to intracellular zinc regulation, an area with limited prior investigation.

      (2) Strong Electrophysiological Evidence.

      The study employs inside-out patch-clamp recordings in Xenopus oocytes to demonstrate zinc's direct inhibition of Slo3 currents. The observed slow dissociation of zinc from Slo3 suggests a long-lasting regulatory effect, adding to the understanding of ion channel modulation in sperm cells.

      (3) Molecular Mechanistic Insights

      Using Molecular Dynamics (MD) simulations and mutagenesis, the authors identify potential zinc-binding sites within Slo3's voltage-sensing domain (VSD), particularly E169 and E205. These computational predictions are supported by electrophysiological recordings, strengthening the argument that zinc directly binds and inhibits Slo3.

      (4) Physiological Relevance and Functional Implications

      The study suggests that zinc inhibition of Slo3 could contribute to sperm motility regulation during capacitation.

      The authors provide sperm motility assays as supporting evidence, showing that zinc chelation affects motility only after capacitation has begun, suggesting a dynamic role of intracellular zinc in the capacitation process.

      Weaknesses:

      While the study presents compelling electrophysiological data and molecular insights, there are several critical gaps that must be addressed before fully supporting the physiological relevance of the findings.

      (1) The authors should measure the effects in sperm cells using the patch-clamp technique to directly record Slo3 currents. By normalizing Slo3 currents to cell capacitance at different intracellular zinc concentrations, the authors can quantitatively assess the extent of Slo3 inhibition by zinc and strengthen the physiological relevance of their findings.

      We thank the reviewer for the valuable comments to strengthen the physiological relevance of our findings. We provided additional data of Slo3 currents measured using perforated patch-clamp recording in sperm cells in experiments with zinc pyrithione (ZnPy) before and after the addition of 10 mM NH<sub>4</sub>Cl. Control experiments were conducted in the absence of ZnPy, in which Slo3 current were recorded before and after the application of 10 mM NH<sub>4</sub>Cl. These data have been integrated into Figure 1L-N and Figure 1—figure supplement 1A, B.

      It is worth noting that Slo3 current in this recording might contain other endogenous current, as no specific blocker was used. Nonetheless, the data showed that the Slo3 current in sperm tends to be inhibited by zinc, as shown by the plot of absolute Slo3 current after the addition of 10 mM NH<sub>4</sub>Cl in the absence of ZnPy (control) and in the presence of 100 µM ZnPy. There was a decrease in the fold change calculated from the absolute current before and after the addition of 10 mM NH<sub>4</sub>Cl of ZnPy treated group compared to the control group.

      We also provided data with the cell capacitance as suggested; however, cell capacitance obtained from the sperm recordings showed the capacitance throughout the head and midpiece of spermatozoa. On the other hand, Slo3 channels are not expressed in the entire spermatozoa, therefore the cell capacitance acquired from these recordings does not accurately reflect the area where the Slo3 channels are localized. Although we included normalization of Slo3 currents to cell capacitance before and after ZnPy application, this normalization should be interpreted with caution for the reasons mentioned above. The corresponding figure has been included in the supplementary data Figure 1—figure supplement 1A, B.

      We added sentences to the result section as follows:

      “We also measured Slo3 current using perforated patch-clamp recordings in spermatozoa treated with ZnPy, before and after the addition of NH<sub>4</sub> Cl. Control experiments were conducted in the absence of ZnPy, in which Slo3 current were recorded before and after the application of 10 mM NH<sub>4</sub>Cl (Fig. 1L-N; Fig. 1—figure supplement 2A, B). Slo3 current in sperm tended to be inhibited by zinc, as shown by the plot of absolute Slo3 current after the addition of 10 mM NH<sub>4</sub>Cl in the absence of ZnPy (control) and in the presence of 100 µM ZnPy (Fig. 1L, M). There was a decrease in the fold change calculated from the absolute current before and after the addition of 10 mM NH<sub>4</sub>Cl of ZnPy treated group compared to the control group (Fig. 1N). Taken together, these results confirmed that intracellular zinc indeed inhibits alkalinization-induced hyperpolarization in mouse sperm.”

      (2) Lack of Controls in Non-Capacitated Sperm

      The claim that zinc is exported from sperm during capacitation needs stronger experimental validation.

      The authors did not include a control group of non-capacitated sperm in key fluorescence imaging experiments, making it difficult to confirm that the observed zinc decrease is capacitation-specific rather than a general zinc redistribution process.

      To strengthen this conclusion, experiments should be performed in non-capacitating conditions to determine whether intracellular zinc levels remain unchanged.

      We added the control group of non-capacitated sperm in key fluorescence imaging experiments, as integrated in Figure 1B.

      The following changes in the Results and Figure Legend sections are revised and added:

      “We observed that there was a gradual and significant decrease in fluorescence intensity in both regions (Fig. 1B), particularly prominent in the flagellum (Fig. 1C). This decline suggests the active release of intracellular zinc from sperm flagellum occurs during capacitation. In contrast, the fluorescence intensity of the control group of non-capacitated sperm remained unchanged (Fig. 1B).”

      Figure Legend 1B was modified accordingly.

      (3) Unclear Role of Zinc in Physiological Capacitation

      The study clearly demonstrates zinc inhibition of Slo3 but does not sufficiently establish how this affects capacitation at a functional level.

      Additional motility and capacitation markers should be analyzed to confirm that zinc influences sperm behavior beyond Slo3 inhibition.

      We thank the reviewer for this valuable comment. We fully agree that zinc can influence sperm physiology through multiple mechanisms and that its overall effects on capacitation are complex. However, the main goal of our study is to investigate the mechanism and to determine whether intracellular Zn<sup>2+</sup> directly inhibits Slo3. Our results from both the heterologous expression system and the sperm membrane potential recordings consistently support this conclusion.

      For these reasons, we believe that adding such assays would not clarify the role of Slo3 in capacitation but rather risk confounding interpretation. Instead, we have expanded the Discussion to explicitly acknowledge these limitations and to emphasize that future studies combining genetic or pharmacological modulation of Slo3 with comprehensive capacitation analyses will be required to fully define its physiological impact.

      We added sentences to the discussion section in the revised manuscript as follows:

      “Although these results support a mechanistic link between zinc and Slo3 activity, future studies that combine genetic or pharmacological modulation of Slo3 with comprehensive capacitation analyses will be required to define its physiological impact in more detail. Within this context, this study highlights the potential importance of intracellular zinc in the regulation of sperm capacitation.”

      (4) Insufficient Data on Zinc-Slo3 Specificity

      The authors should consider using quinidine, a known washable Slo3 inhibitor, to confirm that zinc acts specifically on Slo3 channels rather than other endogenous ion channels.

      The study would benefit from including washout controls in the inside-out patch-clamp recordings, as seen in Figure 3-Supplement 1, to confirm that zinc inhibition is reversible or long-lasting.

      We thank the reviewer for raising the point regarding the need to confirm that the current observed in our recordings indeed represents Slo3 current by using a specific blocker such as quinidine, as there is a possibility that endogenous currents might also be present and that zinc could act on those endogenous currents. Performing experiments with quinidine would indeed be crucial to demonstrate the specificity of Slo3 current in our patch-clamp recordings.

      However, in our current experimental protocol, we apply ramp pulses multiple times and require a long series of recordings within a single session in one patch as described in the materials and methods as well as Figure 2I, Figure 4—figure supplement 1C, Figure 5B (pH 8.0 → 100 µM zinc → pH 8.0, to observe the washout effect). Incorporating quinidine into this sequence would make the protocol even longer (pH 8.0 → quinidine → washout → pH 8.0 → 100 µM zinc), which increases the likelihood of patch loss before completing the full set.

      Furthermore, we have ensured that the recorded current corresponds to Slo3 by using appropriate experimental conditions, specifically the suitable voltage range for activation, a high intracellular pH (pH 8.0), and high-potassium solutions in our recordings.

      (5) Missing Discussion of Zinc's Role in CatSper Regulation

      The study focuses solely on Slo3 but does not mention CatSper, the principal Ca<sup>2+</sup> channel essential for sperm capacitation.

      Zinc has been reported to inhibit CatSper activity, which could significantly impact sperm function.

      The discussion should address whether zinc's effect on Slo3 represents a broader regulatory mechanism influencing multiple ion channels during capacitation.

      Thank you for the comment. To the best of our knowledge, there have been no reports showing that CatSper activity is directly regulated by zinc ions.

      Furthermore, in our patch-clamp recordings with NH<sub>4</sub>Cl and ZnPy, we observed that the normal CatSper current increased even in the presence of ZnPy, which makes it challenging to conclude whether zinc directly affects CatSper channel activity.

      We added sentences to the discussion section in the revised manuscript as follows:

      “In addition to that, to date, there are only few reports on the effect of zinc on other sperm ion channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper, a sperm-specific Ca<sup>2+</sup> channel, in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, although this study does not provide direct evidence for zinc acting directly on CatSper (Jeschke et al., 2021).”

      Final Assessment

      This work presents important findings on zinc regulation of Slo3 channels, supported by strong electrophysiological and molecular analyses. However, the physiological relevance of these findings remains unclear due to missing controls, and needs additional functional assays. Addressing these issues would significantly enhance the manuscript's scientific rigor and impact.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Most of the specific comments and suggestions are in the public review. Minor additional comments primarily focused on presentation and textual errors are here.

      (1) There is something strange happening in Figure 6D in the -100ish range. I think it's likely related to the reversal potential of K+.

      Thank you for pointing it out. Yes in figure 6D there was strange plot in the range of -100 mV. As the reviewer has pointed out we also think that it is related to the reversal potential of potassium ions.

      (2) There are a number of errors in the text that make following it difficult. For instance, multiple times the authors say "In consistent" (line 120 as an example) when I think they mean consistent with.

      We changed the “in consistent” with “consistent with” throughout the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      The authors provide well-described experiments, particularly those examining the effects of intracellular zinc on Slo3 channels using inside-out patch-clamp recordings. However, some experimental designs intended to assess the physiological relevance of these findings during capacitation require additional controls and data before the authors' claims can be fully supported.

      Comments

      Major Concerns & Suggested Improvements

      Line 65: "In the present study, we find that intracellular zinc is exported during capacitation, indicating that zinc dynamics in spermatozoa play an important role in fertilization."

      This claim requires additional experimental data to be fully supported.

      Thank you for pointing it out. We have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Line 79: "Intracellular zinc is exported from sperm during capacitation."

      The authors should include controls in non-capacitated conditions to determine whether zinc export is specific to capacitation or a general process in sperm cells.

      Again, we have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Figures - General Comment:

      In all figures, please replace SEM (Standard Error of the Mean) with Standard Deviation (SD) for consistency and a more accurate representation of variability.

      SEM (Standard Error of the Mean) has been replaced with SD (Standard Deviation) in all figures (main figures and supplements) as well as in numerical description accordingly.

      Figure 1

      Panel B:

      Include a non-capacitating media control to confirm that the observed decrease in zinc-sensitive dye fluorescence is not due to artifact/photobleaching.

      We have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Perform an experiment with capacitating media supplemented with a higher concentration of zinc. If intracellular zinc export is a real effect, added extracellular zinc should prevent or reduce this phenomenon.

      We appreciate the reviewer’s suggestion; however, we believe that supplementing the medium with high concentrations of zinc is unsuitable for validating the export phenomenon due to confounding physiological factors. Our preliminary tests demonstrated that increasing extracellular zinc triggers a drastic increase in intracellular zinc as well (Author response image 5). Furthermore, the high concentration of BSA in the capacitation medium acts as a potent zinc buffer, precluding precise control over free Zn<sup>2+</sup> levels. Therefore, the inherent difficulty in maintaining defined extracellular and intracellular Zn<sup>2+</sup> gradients makes the interpretation of such data highly problematic. Future studies will focus on identifying the specific zinc transporters involved and characterizing their molecular mechanisms.

      Author response image 5.

      Zinc addition

      Clarify whether the "n" value represents different cells or multiple recordings from the same cell.

      n value represents different cells.

      Supplemental Figure 1:

      Incorporate Δ (delta) comparison between 10 min and 2 hours under control conditions and in the presence of TPEN.

      Here we provide data:

      Author response image 6.

      Δ comparition between control and TPEN

      Provide statistical analysis for these comparisons to make the effects of capacitation clearer.

      We did the calculation and statistical analysis, however there was no statistical difference, as shown in the author response figure 6 due to high variability of individual data.

      Figure 2

      Panel C:

      Incorporate inhibition at pH 7.4 and 6.0 for direct comparison.

      Recording inhibition effect of zinc at pH 6.0 is not possible because there would be no current to begin with, as mSlo3 is gated by both voltage and alkaline pH.

      Panel D:

      Include a washout control, similar to what is shown in Panel A.

      We included a washout control trace to Figure 2D.

      Panel E:

      Provide a longer reference trace in the absence of zinc to clearly visualize the control condition. The current reference segment is too short to properly assess baseline activity.

      Although we do not have a longer reference trace in the absence of zinc for Figure 2E, we instead show the trace recorded under the application of 0.1 µM zinc in Figure 2—figure supplement 1A to illustrate the current behavior.

      Panels G-H:

      Include inside-out patch-clamp traces and quantification of zinc washout effects.

      Inside out patch traces are shown in Figure 2G as we applied step-pulses protocol. The zinc washout effect could not be quantified because the patch was usually lost after the second step-pulse application.

      Panels I-K:

      Provide additional traces. In Panel I, the inhibition by zinc is clear, but in Panel J, the reduction appears less distinct and could be due to rundown or an artifact. Additional controls should clarify this.

      Figure 2K presents the most representative trace among five recorded cells. The apparent reduction is less distinct, likely due to an artifact caused by a bubble in the rapid perfusion system during solution exchange. However, at the end of zinc application (t = 50 s), the current amplitude was clearly reduced compared with that at t = 0–10 s.

      Figure 3

      Panel D:

      Include additional data showing the transition to pH 6 and washout with pH 7.5, similar to the experimental design in Panels A and B.

      We included additional data showing raw trace of the application of pH 6.0 in Figure 3D, also included the transition to pH 6 and washout with pH 7.5 in Figure 3E.

      Figure 3-Supplement 1:

      Include zinc washout experiments. This approach is one of the best ways to evaluate the reversibility of zinc inhibition on the channel.

      As mentioned above, in this recording we recorded step pulses up to +180 mV. The zinc washout effect could not be quantified because the patch was usually lost after the second step-pulse application.

      Figure 6

      Zinc Inhibition Specificity:

      The authors should use quinidine, a known washable Slo3 inhibitor, to assess Slo3 activity before and after zinc injection.

      This experiment would confirm that zinc specifically inhibits Slo3, rather than affecting other endogenous channels.

      We sincerely thank the reviewer for this valuable suggestion. However, given the technical difficulty of these experiments, which involve lengthy VCF recordings and manual zinc injections that significantly compromise oocyte health, it is not feasible to apply quinidine at this stage.

      Moreover, we observed voltage-dependent fluorescence changes around the VSD, and this change was influenced by the application of zinc, confirming that zinc specifically inhibits Slo3 rather than affecting other endogenous channels.

      Discussion - Key Revisions Needed

      Line 308: "Our results demonstrated that intracellular zinc is exported from spermatozoa during capacitation."

      This claim needs to be supported by experiments using non-capacitated conditions.

      Additionally, measuring maximum and minimum zinc concentrations under different conditions would improve the interpretation of fluorescence intensity changes.

      We now include negative control in non-capacitated sperm. The data is incorporated into Figure 1B.

      Line 309: "We further discovered that intracellular zinc regulates alkalinization-induced hyperpolarization in mice spermatozoa, mediated by Slo3 channel."

      Additional controls are needed to substantiate this claim.

      At this stage of the study, we do not have access to Slo3 knockout (KO) mice; therefore, performing additional experiments is not feasible.

      Line 316: "Using FluoZin3-AM for zinc imaging, we confirmed the presence of intracellular zinc in sperm (Fig. 1A), which is consistent with previous findings (Henkel et al., 1999). Our observations revealed that treatment with capacitation medium induced a decrease in zinc fluorescence intensity (Fig. 1B, C), suggesting that zinc levels are dynamic during capacitation."

      This statement must be supported by negative controls, including non-capacitated sperm conditions.

      We now include negative control in non-capacitated sperm. The data is incorporated into Figure 1B.

      Line 327: "We also observed that zinc chelator significantly affected the sperm motility only after, but not before, capacitation (Fig. 1-figure supplement 1)."

      Data presentation should be revised to highlight the effects of capacitation itself.

      The discussion should specify which motility parameters were affected and why others were not.

      In the text we mentioned that:

      “We incubated the isolated spermatozoa with cell permeable Zn<sup>2+</sup> chelator N,N,N',N'-Tetrakis(2-pyridylmethyl)ethylenediamine (TPEN) and measured the motility parameters before and after capacitation. We found that VAP (average path velocity), VCL (curvilinear velocity), and VSL (straight-line velocity) were influenced by the TPEN treatment only after the capacitation, as shown in Fig. 1—figure supplement 1. These results demonstrate that the dynamics of zinc levels during capacitation potentially contributes to sperm motility, highlighting the importance of zinc action in sperm physiology.”

      Indeed, we observed that zinc chelator significantly affected the sperm motility specifically in VAP (average path velocity), VCL (curvilinear velocity), and VSL (straight-line velocity) only after, but not before, capacitation (Fig. 1—figure supplement 1). Of note, it has been recently reported that all these motility parameters (VAP, VCL, and VSL) are reduced by Slo3-specific inhibitors in human sperm (M. Lyon et al., 2023). These findings are consistent with the idea that endogenous zinc dynamics control sperm motility through Slo3 during the capacitation process.

      Figure legend is revised accordingly.

      Line 369: "Structural determinants of zinc inhibition in the mSlo3 channel."

      The authors should include an analysis of the evolutionary conservation of the mutated sites across Slo1, Slo2, and Slo3.

      If Slo3 has a unique regulatory mechanism, these sites should show high sequence variability compared to other Slo channels.

      If these sites are highly conserved, the authors should explain how Slo3 differs functionally from Slo1 and Slo2 despite this conservation.

      We thank the reviewer for the valuable suggestions regarding the inclusion of additional discussion points on the structural determinants of zinc inhibition in the mSlo3 channel. We performed sequence alignment by using ClustalO between mSlo3, mSlo1, and mSlo2.2. It is worth noting that only human and frog variants of Slo2.1 sequence are available in the database, so we included only Slo2.2 subtype, as our focus was on Slo3 in mouse sperm.

      Based on the alignment, E169 (mSlo3 numbering) is conserved among the Slo family channels in mice, while in contrast E205 (mSlo3 numbering) is not. To date, there have been no report examining the corresponding residues to E169 (E191 in mslo1 or E176 in mslo2.2) for their zinc sensitivity. This might be because in both channels the zinc-binding sites are well defined where they are located in RCK1 domain for Slo1 (Hou et al., 2010) and RCK2 domain for Slo2.2 (J. Zhang et al., 2023). The identified binding site in Slo2.2 is conserved in Slo2.1 but not present in Slo1 and Slo3 (J. Zhang et al., 2023), further suggesting that zinc regulation differs among Slo family members. However, this does not rule out the possibility that regions surrounding E191 or E176 could provide to additional insights into zinc regulation in these channels, which could be of interest for future studies.

      Interestingly, in contrast to E169, E205 is not conserved across the Slo family, making this residue unique to the mouse Slo3 channel and potentially a determinant of zinc sensitivity in mSlo3. Given that E205 is located in the S4 domain and supported by our VCF results showing that zinc inhibition influences the motion of voltage-sensing domain of mSlo3, E205 represents an important residue to be explored in future studies. Furthermore, as this residue is unique only to Slo3, it highlights the distinct functional properties of Slo3 such as its gating mechanism as it is regulated by both membrane voltage and alkalinization, which has a different voltage range of activation compared to mSlo1 (Li et al., 2024) and involves distinct ligands and gating mechanisms compared to Slo2 (J. Zhang et al., 2023).

      We add the sequence alignment results into Figure 5—figure supplement 1F.

      We revised the results section as follows:

      “Additionally, we performed sequence alignment by using ClustalO between mSlo3, mSlo1, and mSlo2.2. It is worth noting that only human and frog variants of Slo2.1 sequence are available in the database, so we included only Slo2.2 subtype, as our focus was on Slo3 in mouse sperm. Based on the alignment, E169 (mSlo3 numbering) is conserved among the Slo family channels in mice, while in contrast E205 (mSlo3 numbering) is not. (Figure 5—figure supplement 1F).”

      We revised the discussion section as follows:

      “Based on sequence alignment, E169 (mSlo3 numbering) is conserved among Slo family channels in mice, whereas E205 (mSlo3 numbering) is not (Fig. 5—figure supplement 1F). To date, no studies have examined the corresponding residues to E169 (E191 in mSlo1 or E176 in mSlo2.2) for their potential zinc sensitivity, likely because the established zinc binding sites in these channels are located in the RCK1 domain for Slo1 (Hou et al., 2010) and the RCK2 domain for Slo2.2 (J. Zhang et al., 2023). The identified zinc binding site in Slo2.2 is conserved in Slo2.1 but is absent in both Slo1 and Slo3 (J. Zhang et al., 2023), further suggesting that zinc regulation differs among Slo family members. Although regions surrounding E191 or E176 may still provide additional insights into zinc regulation and could be of interest for future investigation, E205 stands out because, unlike E169, it is not conserved across the Slo family, making it unique to mSlo3 and potentially a specific determinant of zinc sensitivity in this channel.”

      Figure legend is revised accordingly.

      Line 392: "Physiological relevance of zinc inhibition of the mSlo3 channel in mouse sperm."

      The authors should mention the effects of zinc on CatSper channels, as CatSper is also crucial for capacitation.

      Slo3 inhibition may represent only one component of zinc's broader regulatory role during capacitation.

      We thank the reviewer for raising this important point regarding the physiological relevance of zinc inhibition of the mSlo3 channel in mouse sperm. We agree that we should have also discussed the effect of zinc on CatSper channels, as this channel is crucial for capacitation. To date, there are only few reports on the effect of zinc on CatSper channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, which facilitating sperm to escape into female genital tract (Jeschke et al., 2021). Taking this into consideration, as the reviewer pointed out, zinc inhibition on Slo3 may represent only one component of zinc’s broader regulatory role during capacitation.

      We added a sentence to the discussion section in the revised manuscript as follows:

      “In addition to that, to date, there are only few reports on the effect of zinc on other sperm ion channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper, a sperm-specific Ca<sup>2+</sup> channel, in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, although this study does not provide direct evidence for zinc acting directly on CatSper (Jeschke et al., 2021).”

      The study presents valuable insights into the role of intracellular zinc in sperm capacitation and Slo3 channel function. However, the physiological impact of these findings remains unclear due to insufficient controls and missing key experimental data. The suggested revisions would strengthen the validity of the claims made by the authors and improve the overall scientific rigor of the manuscript.

      Key Areas for Improvement:

      Control experiments in non-capacitated conditions.

      Increased statistical rigor in figure analyses.

      More detailed experiments to confirm specificity of zinc action on Slo3.

      Expanded discussion of zinc's role beyond Slo3, including CatSper regulation.

      The authors should measure these effects in sperm cells using the patch-clamp technique to directly record Slo3 currents. By normalizing Slo3 currents to cell capacitance at different intracellular zinc concentrations, the authors can quantitatively assess the extent of Slo3 inhibition by zinc and strengthen the physiological relevance of their findings.

      By addressing these concerns, the manuscript will provide a more robust foundation for understanding zinc's regulatory role in sperm physiology and capacitation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The study presents valuable findings of an optimized E. coli cell-free protein synthesis (eCFPS) system that has been simplified by reducing the number of core components from 35 to 7; furthermore, the findings communicate a simplified 'fast lysate' preparation that eliminates the need for traditional runoff and dialysis steps. This study is an advance towards simplifying protein expression workflows, and the evidence provided is solid, starting with nanoluc, a protein that expresses readily in many systems, to applications to more challenging proteins like the functional self-assembling vimentin and the active restriction endonuclease Bsal. Data on the underlying mechanisms and efficiency of the presented system in terms of protein yield relative to other known cell-free systems would greatly enhance the findings' significance and the strength of the evidence. The paper remains of interest to scientists in microbiology, biotechnology and protein synthesis.

      We thank the editors for the positive assessment of our optimized E. coli cellfree protein synthesis (eCFPS) system and the "fast lysate" preparation.

      As suggested, we have significantly strengthened the evidence by adding:

      (1) Mechanism data: We have integrated a detailed analysis of the endogenous metabolic pathways (amino acids and nucleotides) into the Discussion section, supported by literature (Prinz et al. 1997; Yokoyama et al. 2010; Kigawa et al. 1999).

      (2) Efficiency comparisons: We have added quantitative comparisons of absolute protein yields between our simplified 7-component system and the conventional 35-component system (now in Figure S3 E-F), demonstrating that our system matches or exceeds traditional titers.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors only provided the data for optimization, leaving the underlying mechanism that explains the phenomena unexplained.

      We appreciate this feedback. To address the mechanism of how protein synthesis persists without exogenous additives, we have expanded the Discussion to explain how the "fast lysate" retains active endogenous enzymes. By omitting runoff and dialysis, our system preserves the metabolic capacity to synthesize amino acids (e.g., Cys and Trp from Ser) and nucleotides from residual precursors, as supported by the literature (Prinz et al. 1997; Yokoyama et al. 2010; Kigawa et al. 1999).

      Reviewer #2 (Public review):

      The production of the lysate requires special instrumentation, limiting accessibility. While the strengths of the study are well-emphasized, the limitations are not mentioned.

      We thank the reviewer for this point. While a high-pressure homogenizer is common in many molecular biology labs, we acknowledge it may be a barrier for some. We have now included a dedicated Limitations paragraph in the Discussion addressing accessibility and the inherent challenges of prokaryotic systems in producing complex human proteins requiring post-translational modifications.

      Reviewer #3 (Public review):

      (1) Clarification on "highly efficient" and the lack of comparison with typical high-yield systems.

      We have clarified "highly efficient" as a holistic balance of high yield, robustness, and simplified preparation. Crucially, we added absolute yield data (sfGFP standard curve) to Figure S3E-F demonstrating that our 7-component system performs comparably to or better than traditional high-yield protocols.

      (2) How did the authors ensure chemical composition only affected translation and not transcription?

      This is a key distinction. We performed new experiments using pretranscribed mRNA templates (Figure S3G) to isolate translational effects. While translation efficiency slightly decreased in the simplified buffer, the overall protein yield increased significantly due to a dramatic boost in transcription efficiency, confirming the system's net performance gain.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are specific concerns that need to be addressed:

      (1) On page 4, lines 103-109, the authors speculate that protein synthesis persists even in the absence of amino acids like arginine, cysteine, and tryptophan. They suggest that this is likely due to residual amounts of these amino acids present in the cell lysate. Yokoyama et al. demonstrated that these amino acids are generated from other amino acids by endogenous amino acid metabolic enzymes in the cell lysate (J. Biomol. NMR 48, 193, (2010), doi: 10.1007/s10858-010-9455-3.). Cysteine and tryptophan can be derived from serine. In this context, asparagine and glutamine can be disregarded because they are synthesized from aspartate and glutamate, respectively. A more indepth analysis is required to interpret the results accurately.

      We thank the reviewer for this insightful comment and for pointing us toward the relevant literature. We agree that the persistence of protein synthesis in the absence of exogenous amino acids like Arg, Cys, and Trp is driven by the robust metabolic capacity of our "fast lysate."

      Unlike conventional protocols, our "fast lysate" procedure deliberately omits runoff and dialysis steps, ensuring the maximal retention of active endogenous metabolic enzymes and residual small-molecule pools. As demonstrated by Yokoyama et al. (2010), E. coli cell extracts retain functional enzymes capable of synthesizing acid-sensitive amino acids from precursors or more stable amino acids. We have integrated a detailed mechanistic analysis of these endogenous metabolic pathways into the Discussion section and have cited Yokoyama et al. (2010) to support this interpretation.

      (2) On page 4, lines 111-115, the authors demonstrated that protein synthesis could occur even in the absence of CTP or UTP, provided ATP and GTP are present. This phenomenon can also be attributed to the analogous complementary actions of metabolic pathways.

      We agree with the reviewer's assessment. The ability of the optimized eCFPS to function without exogenous CTP/UTP relies on the same principle of endogenous metabolic conversion mentioned above. The omission of dialysis ensures that the lysate retains not only residual nucleotide pools but also the full suite of nucleotide metabolic enzymes. Powered by our optimized energy regeneration system, these enzymes maintain sufficient levels of CTP and UTP to support transcription and translation. This explanation has been added to the Discussion section to clarify the robustness of our system.

      (3) On Figure 3A, protein synthesis kinetics are presented in a stair plot instead of the commonly used scatterplot. Is there a specific reason for choosing the stair plot?

      We chose the stair plot representation to more clearly visualize the cumulative process of protein synthesis and its stabilization over discrete time intervals. Given that sampling occurred every 10 minutes, a stair plot effectively highlights the "plateau" phases and the incremental nature of accumulation, which can sometimes be obscured by dense scatter plots.

      (4) On Figure 3C. It is unclear which system is referred to as the "initial" system in Figure 3C. Which data point on Figures 3A and 3B corresponds to this "initial" system?

      We apologize for the lack of clarity. In Figure 3C, "initial" refers to the traditional 35-component system prior to our streamlining process. Figures 3A and 3B characterize the performance of the final optimized system alone. To resolve this ambiguity, we have updated the legend for Figure 3 to explicitly define the "initial" system as the pre-optimization control.

      (5) In Figure 5D, previously reported eCFPS and the system using "fast lysate" were compared. The only difference between the two systems seems to be the type of lysate used, according to the Supplementary table. Optimal concentrations for the components are the same for both lysates, or is there still room for optimization for "fast lysate"?

      The "fast lysate" primarily differs from conventional lysates in its preparation speed and the retention of endogenous cofactors/enzymes. While the optimal salt and energy concentrations remained consistent across both lysates in our tests, the "fast lysate" provides a higher baseline signal due to the endogenous T7 RNA polymerase and metabolic factors. We believe this demonstrates the robustness of the optimized reaction buffer across varying lysate preparation qualities.

      (6) The study suggests that the removal of DTT didn't negatively affect protein expression. However, based on my experience, certain proteins, especially those with cysteine residues on their surface, tend to aggregate without DTT. Did the authors attempt to express such proteins, or did they draw this conclusion based on the limited number of proteins tested?

      This is a valid concern. We based our conclusion on the functional expression of Bsal and vimentin—two proteins that are inherently prone to aggregation and misfolding. Their successful synthesis suggests that the intrinsic reducing capacity of the lysate (e.g., glutathione and thioredoxin systems) is sufficient for many targets (Prinz et al. 1997). However, we acknowledge that specialized cysteine-rich proteins may still require exogenous DTT. We have addressed this in the Discussion.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 77-78 "we iteratively evaluated the contribution of individual constituents through luciferase reporter assays" - where is all the data? Please use an appropriate figure citation. Figure 1 cherry picks some components, but I think all data should be included.

      We have structured the data presentation to show dispensable components in Figure 1 (where removal does not inhibit reaction) and essential components in Figure 2 (where 0-concentration results in zero activity). This ensures a logical flow of the "streamlining" narrative. All raw data for these screenings have been included in the Source Data files.

      (2) Line 127 typo "concentrations".

      We thank the reviewer for pointing out this error. The typo "concentrations" has been corrected.

      (3) Figure 2: "protein expression levels" measured how?/what is the unit of the vertical bar on the right? I'm assuming that this experiment was conducted for discrete concentrations and thus generated discrete data points. However, the graph makes it seem as if this is continuous data. Kindly change the type of graphing to indicate that this is discrete data, showing each data point.

      We appreciate the reviewer's suggestion. Protein expression levels were measured using the Nanoluciferase (NLuc) reporter gene assay. We utilized heatmaps/contour plots because our data are bivariate, representing the simultaneous optimization of two concentrations (e.g., Mg<sup>2+</sup> and K<sup>+</sup> in Figure 2A). For such matrix-based screenings, heatmaps are significantly more effective than scatter plots at conveying synergistic trends and identifying optimal reaction landscapes. Notably, this visualization approach for discrete biochemical optimization data was successfully employed by Ban lab in their recent study on translation system optimization (Bothe and Ban 2024). The vertical color bar on the right represents the relative expression ratio, normalized to the maximum yield. Although we have provided a scatter plot of this discrete data for reference (see Author response image 1), we believe it appears visually cluttered due to the high density of data points, making it difficult to discern overarching trends. Heatmaps, by contrast, offer a much clearer representation of the optimal reaction landscape. To maintain transparency, the discrete concentration points tested are clearly reflected by the axis ticks, and all raw discrete data are available in the Source Data files.

      Author response image 1.

      (4) Also, for all figures: the way the units are presented (DTT/mM) is confusing to me; it could just be something like [DTT] (mM).

      We have revised all figures and tables to follow the standard format (e.g., [Component] (unit)) as suggested.

      (5) Do the sucrose gradient sedimentation data have replicates? If so, please indicate statistics.

      The sucrose gradient data provided (Figure 5C) is intended as qualitative evidence that the "fast lysate" method preserves intact 70S ribosomes across different preparation batches. This experiment has been performed independently multiple times with consistent results, demonstrating the high reproducibility of our preparation method. While we did not perform a quantitative comparative analysis of ribosome concentration, the consistency of the peaks confirms the integrity of the translational machinery.

      (6) Line 457: fix the red line.

      We thank the reviewer for pointing this out. The formatting issue has been resolved in the revised manuscript.

      (7) Please mention the limitations of this study in the discussion.

      We thank the reviewer for this suggestion. We have added a paragraph to the Discussion addressing the limitations of prokaryotic systems regarding complex eukaryotic post-translational modifications and chaperone requirements.

      (8) Please include all uncropped gels in the source data, alongside the raw data, as you have already done.

      As requested, we have provided all original, uncropped gel images in the Source Data files, alongside the raw data, to ensure full transparency and compliance with the journal's data sharing policies.

      Reviewer #3 (Recommendations for the authors):

      (1) The study lacks a comparison of protein levels with a typical cell-free protein synthesis system.

      We have performed new quantitative experiments (now included in Figure S3 E-F) to measure absolute protein yields. Our optimized system achieves yields comparable to, or exceeding, several widely recognized highyield protocols while utilizing significantly fewer components. We have also clarified in the text that "highly efficient" refers to the synergistic balance of high yield, low cost, and simplified preparation time.

      (2) What do the authors mean by "highly efficient", often used in the manuscript?

      We thank the reviewer for the opportunity to clarify our terminology. We have performed new quantitative experiments (now included in Figure S3) to measure absolute protein yields, demonstrating that our optimized system achieves yields comparable to, or exceeding, several widely recognized highyield protocols while utilizing significantly fewer components.

      In the context of this manuscript, we use the term "highly efficient" as a holistic descriptor that encapsulates three key dimensions of the system:

      (1) Performance Superiority: Achieving higher expression levels and faster kinetics compared to conventional 35-component systems.

      (2) Functional Robustness: The ability to efficiently synthesize challenging targets, such as cytotoxic proteins (BsaI) and aggregation-prone proteins (vimentin), which often fail in simplified systems.

      (3) Practical Utility: A drastic reduction in preparation time and cost through the "fast lysate" protocol and the removal of 28 auxiliary components, thereby lowering the barrier to adoption.

      This definition aligns with the study's core objective: developing a system where efficiency is measured not only by final yield but by the synergy between high performance and extreme ease of use.

      (3) In this article, the term 'optimisation' is used as a synonym for 'simplification'. In biochemistry, optimisation commonly refers to an increase in yield, or the same yield achieved more easily or at a lower cost. In this case, however, we have no idea how this new system compares to a conventional expression system in terms of yield.

      We thank the reviewer for this conceptual clarification. We agree that in biochemistry, "optimization" typically implies an improvement in yield or cost-effectiveness. In our study, we use the term to describe the process of achieving a superior balance between system simplicity and protein production. To address the reviewer's concern regarding the lack of a direct yield comparison, we have added new data in Figure S3. This figure provides a sideby-side comparison of protein yields between our simplified 7-component system and the conventional 35-component system. The results demonstrate that our system not only matches the performance of the traditional setup but frequently exceeds it in terms of final protein titer, while significantly reducing the reagent cost and preparation complexity. Thus, the simplification achieved in this work represents a true biochemical optimization of the cell-free synthesis process.

      (4) The levels of transcripts of the proteins studied were not determined in any of the experiments performed. Therefore, it is unknown whether the effects of different experimental conditions on NLuc, GFP or other protein expression are due to an effect on transcription, translation, or both.

      This is an excellent point. We performed a new set of experiments using mRNA templates instead of DNA to isolate the effects on translation (Figure S3G). Our results indicate that while the system's overall boost in NLuc expression is partially attributable to enhanced transcription efficiency, the translation machinery remains highly robust. We have updated the Results and Discussion to reflect this distinction.

      References

      Bothe, Adrian, and Nenad Ban. 2024. “A Highly Optimized Human in Vitro Translation System.” Cell Reports Methods 4 (4): 100755.

      Kigawa, T., T. Yabuki, Y. Yoshida, M. Tsutsui, Y. Ito, T. Shibata, and S. Yokoyama. 1999. “Cell-Free Production and Stable-Isotope Labeling of Milligram Quantities of Proteins.” FEBS Letters 442 (1): 15–19.

      Prinz, W. A., F. Aslund, A. Holmgren, and J. Beckwith. 1997. “The Role of the Thioredoxin and Glutaredoxin Pathways in Reducing Protein Disulfide Bonds in the Escherichia Coli Cytoplasm.” The Journal of Biological Chemistry 272 (25): 15661–67.

      Yokoyama, Jun, Takayoshi Matsuda, Seizo Koshiba, and Takanori Kigawa. 2010. “An Economical Method for Producing Stable-Isotope Labeled Proteins by the E. Coli Cell-Free System.” Journal of Biomolecular NMR 48 (4): 193–201.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Bobola et al reports single-nucleus expression analysis with some supporting spatial expression data of human embryonic and fetal cardiac outflow tracts compared to adult aortic valves. The transcription factor GATA6 is identified as a top regulator of one of the mesenchymal subpopulations, and potential interacting factors and downstream target genes are identified bioinformatically. Additional bioinformatic tools are used to describe cell lineage relationships and trajectories for developmental and adult cardiac cell types.

      Strengths:

      The studies of human tissue and extensive gene expression data will be valuable to the field.

      Weaknesses:

      (1) The expression data are largely confirmatory of previous studies in humans and mice. Thus, it is not clear what novel biological insights are being reported. While there is some novelty and impact in using human tissue, there are extensive existing publications and data sets in this area.

      (2) Major conclusions regarding spatial localization, differential gene expression, or cell lineage relationships based on bioinformatic data are not validated in the context of intact tissues.

      (3) The conclusions regarding lineage relationships are based on common gene expression in the current study and may not reflect cellular origins or lineage relationships that have previously been reported in genetic mouse models.

      (4) An additional limitation is the exclusive examination of adult aortic valve leaflets that represent only a subset of outflow tract derivatives in the mature heart. The conclusion, as stated in the title regarding adult derivatives of the outflow tract, is not accurate based on the limited adult tissue evaluated, exclusive bioinformatic approach, and lack of experimental lineage analysis of cell origins.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Leshem et al. presents a transcriptomic analysis of the developing human outflow tract (OFT) at embryonic and fetal stages using snRNAseq and spatial transcriptomics. Additionally, the authors analyze transcriptomic data from the adult aortic valve to compare embryonic and adult cell populations, aiming to identify persistent embryonic transcriptional signatures in adult cells. A total of 15 clusters were identified from the embryonic and fetal OFT samples, including three mesenchymal and four endothelial clusters. Using SCENIC analysis on the embryonic snRNAseq data, the authors identified GATA6 as a key regulator of valve precursor cells. Spatial transcriptomic analysis of four fetal OFT sections further revealed the spatial distribution of mesenchymal nuclei, smooth muscle cells, and valvular interstitial cells. Trajectory analysis identified two distinct developmental origins of fetal mesenchymal cells: the neural crest and the second heart field. Finally, the authors used snRNAseq data from the adult aortic valve to propose that embryonic transcriptional signatures persist in a subset of adult cells.

      Strengths:

      (1) The study offers a rich and detailed dataset, combining snRNA-seq and spatial transcriptomics in human embryonic and fetal OFT, which are challenging to obtain.

      (2) The use of SCENIC and trajectory analysis adds mechanistic insight into cell lineage and regulatory programs during valve development.

      (3) This study confirms GATA6 as a key regulator of valve precursor cells.

      (4) Comparison between embryonic/fetal and adult datasets represents a novel attempt to trace persistence of developmental transcriptional programs.

      Weaknesses:

      (1) A major limitation is the lack of experimental validation to support key conclusions, particularly the claim of persistent embryonic transcriptional signatures in adult cells.

      (2) The manuscript would benefit from a clearer discussion of how these results advance beyond previous studies in human heart and valve development.

      (3) The comparison between embryonic and adult data is interesting, but would be more convincing with additional evidence supporting the proposed persistence of embryonic transcriptional signatures in adult cells.

      Reviewer #3 (Public review):

      Leshem et al have generated a transcriptional cell atlas of the human outflow tract at two developmental timepoints and its adult valvular derivatives. This carefully performed study provides a useful resource for the study of known genes implicated in outflow tract defects and potentially also for discovering new disease genes. The authors reveal neural crest and mesodermal contributions to different outflow tract components and show that GATA6, known to play a role in arterial valve development, controls a set of genes expressed in endocardium-derived cells during valve development. Interestingly, the results suggest lineage persistence of expression of certain genes through to the adult timepoint, a main new finding of this study.

      The following points should be addressed to reinforce the conclusions and emphasize the novel features of this study.

      (1) It would be helpful to clarify how these new findings confirm or diverge from what is known from analysis of neural crest and mesodermal lineage contributions to different cell populations in the mouse heart. Did the authors identify any human-specific populations of cells, such as the LGR5 population reported by Sahara et al?

      (2) The authors should clarify in the introduction and results that they consider the endocardium to be on the SHF trajectory as indicated in Figure S4C. Please add a reference for this point.

      (3) The GATA6 results are interesting and support this experimental approach. The paper would be reinforced if the authors could provide any functional validation (in addition to their GATA6 genomic occupancy data) that the designated target genes are regulated by GATA6. This might involve looking at mutant mouse embryos or cultured cells. Do the authors consider that GATA6 may regulate the endocardial to mesenchymal transition during the early stages of valve development? Or the valve interstitial cell versus fibroblast fate choice?

      (4) Do the new findings reveal whether human valves have a direct SHF to VIC trajectory (ie, without transiting through endocardium) as has been recently shown in the murine non-coronary valve leaflet? Relevant to this point, Figure 5E appears to show contributions to a single adult aortic valve leaflet - this should be explained, or corrected.

      We sincerely thank the Editor and the Reviewers for their constructive and insightful comments. We have carefully addressed the majority of the points raised and believe the revisions have substantially strengthened the manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Overall, the reviewers felt that integrating these datasets with prior snRNAseq datasets on human OFT (de Bono et al, 2025) would enhance analyses and provide broader context.

      Several human fetal heart single-cell datasets have been published, including De Bono et al, 2025. We carefully considered whether integrative analyses with these datasets would further strengthen our study. However, there are substantial differences in anatomical scope: most published datasets encompass broad cardiac regions, whereas our study specifically targets the OFT, enabling higher-resolution characterization of OFT-specific cell states. Integration across datasets with markedly different regional compositions would likely be driven by largescale anatomical differences rather than yield additional OFT-specific insight. In addition, cross-study integration requires batch correction. When datasets differ in anatomical scope, as well as developmental timing, and experimental protocols, stronger correction may be needed, increasing the risk of overcorrection and potential loss of biologically meaningful OFTspecific signals.

      Importantly, our dataset has been deposited in the Human Cell Atlas and is fully available for future comparative analyses. We therefore believe that broader cross-dataset integration is best undertaken within such harmonized frameworks as more closely matched datasets become available.

      Overall, cluster annotations should be more rigorous, which may be facilitated by comparisons with earlier studies.

      We have clarified all the points raised by the reviewer regarding cluster annotation. Specifically: (1) the “cardiac” cluster has been renamed “cardiac muscle” to more accurately reflect its transcriptional identity; and (2) we now explicitly state that mesenchymal populations not resolved in the initial global analysis (across all samples) were subsequently defined through dedicated sub clustering analyses performed separately for the adult and developmental datasets. These clarifications have been incorporated into the revised manuscript.

      Citation of other spatial transcriptomics studies on human OFT would be useful.

      We apologise for missing these contributions. They have now been added to the text.

      Can the authors identify a human-specific population of cells, such as the LGR5 population reported by Sahara et al?

      While our dataset does not reveal a novel single-gene marker comparable to the human specific LGR5 marker described for the LGR5-positive population by Sahara et al., it does identify a distinct GATA6-enriched embryonic mesenchymal population that functions as a human valve progenitor lineage. Using regulatory network analysis, RNA velocity, lineage tracing and spatial transcriptomics, we show that this GATA6-driven program is specifically associated with semilunar valve morphogenesis and that its transcriptional signature persists in fetal and adult VIC populations. Thus, the novelty of our study lies in defining this human GATA6-regulated valve progenitor population and its lineage trajectory, rather than in the identification of previously unreported single marker genes.

      “….Although we have not defined a novel single-gene marker (analogous to LRG5 [Sahara et al]), our identification of a GATA6 network highlights…..”

      Further investigation of the specific role of GATA6 would strengthen findings.

      FISH studies would indicate whether GATA6 is involved in EMT or fibroblast versus valve interstitial cell fate choice.

      We have added a panel to Fig. S2 (D), showing that GATA6 expression is not restricted to specific outflow tract populations. In CS16-17 embryos, GATA6-expressing nuclei are detected across all embryonic clusters. Given this broad expression pattern, FISH analysis would not distinguish whether GATA6 functions in EMT or in fibroblast versus valve interstitial cell fate specification. While we cannot exclude the possibility that GATA6 contributes to EMT, we observe that its expression levels are highest in cluster 4 (post-EMT) cells. This suggests that GATA6 activation is more likely a consequence of the transition rather than its initiating cause (shown in Fig. S2D).

      Functional validation of some proposed GATA6 targets would strengthen findings.

      To our knowledge, there are currently no publicly available datasets defining the GATA6 regulatory network in human OFT cells or valvular fibroblast progenitors. Existing datasets focus primarily on cardiomyocytes, which arise from a distinct developmental lineage. Given the well-established cell-type and context dependence of transcription factor activity, these datasets are unlikely to provide meaningful insight into regulatory relationships within the valvular lineage examined here.

      As noted in the original submission, we previously leveraged published mouse GATA6 ChIPseq data from E11.5 OFT (DOI: https://doi.org/10.7554/eLife.31362) as independent support for the GATA6 regulon identified in our human dataset. In this revised version, we have now extended this analysis by formally quantifying the overlap between the cluster 4 GATA6 regulon and genes bound by GATA6 in the mouse OFT dataset. Using a hypergeometric enrichment test, we found that the observed overlap is approximately two-fold greater than expected by chance and highly significant (p = 1.2 × 10<sup>-33</sup>). This statistical analysis strengthens our original interpretation and provides quantitative support that the identified regulon is strongly enriched for bona fide GATA6-bound targets in a closely related developmental context.

      In addition, we examined the spatial expression pattern of the GATA6 regulon gene set and found that it specifically localizes to the semilunar valves (OFT derivatives), consistent with GATA6 activity in this developmental context. This new analysis has been incorporated into Figure 2F of the revised manuscript.

      Collectively, the cross-species binding enrichment and valve-specific expression pattern provide orthogonal support for the biological relevance of the identified GATA6 regulon and strengthen the mechanistic interpretation of GATA6 function in OFT and valve development.

      As GATA6 has been previously identified in mouse studies, can the authors identify novel transcription factors potentially involved in OFT development?

      To identify additional transcription factors potentially involved in OFT development and to define regulators that may confer specificity to GATA6 activity, we compared the GATA6 regulon with the regulons of other cluster 4 transcription factors identified by SCENIC (SOX4, GLI3, RARG, ETV1, GLIS3, BACH2, ZNF423, FOXO3, ZBTB20).

      While all cluster 4 regulators share some downstream targets, GLI3 regulon showed approximately twice the degree of overlap with the GATA6 regulon compared to the other factors. This suggests a potential functional interaction between GATA6 and GLI3 in OFT associated mesenchyme. Consistent with this, cooperation between GATA6 and GLI3 has been reported in mouse limb development. These findings have now been incorporated into the Results section, and co-expression of GATA6 and GLI3 in CS16-17 populations is shown in Figure S2DE.

      Although GATA6 has previously been implicated in OFT development, SCENIC analysis provides mechanistic insight by defining the downstream gene programs active in specific human embryonic lineages. Thus, the novelty of our findings lies not in re-identifying GATA6, but in characterizing its regulon in human OFT- and valve-associated mesenchyme and identifying potential cooperating regulators such as GLI3.

      Embryonic signatures in adult valve cells are an interesting finding, that should be further explored by pseudotime trajectories, which may also indicate whether SHF cells have a direct trajectory to VIC (without transiting endocardium), as recently shown in mice.

      We included all embryonic populations, including cardiac progenitor cells (SHF), in the pseudotime trajectory analysis. However, we did not observe evidence of a direct trajectory from SHF cells toward VIC. In contrast, the same analysis consistently identified a trajectory linking endocardial cells to VIC, supporting an endocardial origin in our dataset.

      Reviewer #1 (Recommendations for the authors):

      (1) Major conclusions regarding cell lineages and derivatives are based on common gene expression patterns and bioinformatic tools. Thus, these conclusions are not based on empirical data, and assumptions regarding lineages based on gene expression may not be accurate. The language related to lineage analysis, derivative, and longitudinal gene expression is not supported by data. For example, studies in mice have shown that aortic valve interstitial cells from endocardial cushions and neural crest-derived lineages have overlapping patterns of ECM gene expression and cannot be easily distinguished in adults. Thus, it is not possible to determine derivation and cell origins based on gene expression alone.

      While we fully acknowledge that gene expression-based analyses provide correlative rather than direct lineage-tracing evidence, the Reviewer’s statement that “it is not possible to determine derivation and cell origins based on gene expression alone,” and the example cited in support, appear to equate global transcriptional similarity with the distinct embryonic transcriptional signatures that underpin our analysis.

      As the Reviewer notes, a given differentiated cell type can derive from different embryonic progenitors. Due to functional convergence, differentiated cells often exhibit highly similar expression profiles that reflect their shared function rather than developmental origin. Consequently, discriminating embryonic origins based on global expression profiles, or even for highly distinctive genes of differentiated cells, is very challenging. The example cited by the Reviewer - overlapping ECM gene expression in aortic valve interstitial cells derived from endocardial cushions and neural crest - illustrates precisely this point.

      However, our analysis does not rely on global transcriptional similarity or on markers of mature differentiated cells. Instead, we specifically identified gene sets that are highly distinctive of embryonic clusters prior to the onset of differentiation. These signatures are enriched for transcription factors and signaling molecules that define developmental identity, rather than functional effector genes associated with mature cell states. We have shown that these embryonic signatures persist in fetal cells (which already express differentiated markers but are developmentally closer to the embryonic stage relative to adult cells) and remain detectable, albeit attenuated, in adult cells. It is these distinctive embryonic transcriptional signatures, rather than global or shared functional gene expression, that we have used to infer potential lineage relationships.

      We fully acknowledge that this constitutes correlative evidence rather than direct lineage tracing, which is not feasible in human studies. However, the persistence of embryonic regulatory signatures into fetal and adult stages provides a biologically plausible link to developmental origin. This persistence most plausibly reflects partial retention of ancestral embryonic transcriptional programs in descendant cells, rather than de novo activation later in life of embryonic genes that were never previously expressed in that cell’s lineage.

      (2) Most of the findings related to cell composition, gene expression, and cell lineages seem to be largely confirmatory of previous reports. Novel findings should be emphasized and validated in the tissues.

      We agree that several aspects of our dataset reproduce and extend findings from previous human and animal studies, which we regard as an important validation of the atlas. However, our study also provides multiple novel insights that are directly supported by our spatial data. Specifically, we (i) identify a GATA6-enriched embryonic mesenchymal valve progenitor population, (ii) delineate its GATA6 transcriptional regulon and direct targets implicated in OFT and valve disease, and (iii) trace its embryonic transcriptional signature into fetal and adult valve interstitial cell populations. These findings are strengthened by our spatial transcriptomic data, which maps the GATA6 regulon and key targets to the semilunar valves and adjacent arterial root, providing in situ validation of both cell identity and gene expression patterns (see Fig. 3 and the newly added Fig. 2F). We have revised the Discussion to more explicitly highlight these novel aspects and their spatial validation in the final

      “In summary, our work goes beyond confirming previously reported cell types by (i) defining a GATA6-regulated human valve progenitor lineage and its descendants, (ii) establishing distinct embryonic origins for smooth muscle and valvular fibroblasts, and (iii) demonstrating persistence of embryonic signatures in adult valve cell populations. These findings are directly supported in tissue by our spatial transcriptomics data, which map these lineages and regulatory programs to defined anatomical domains within the human OFT and semilunar valves.”

      (3) The developing outflow tract of the heart contributes to more than just the aortic valve leaflets in adults. Additional conotruncal structures need to be evaluated in order to define adult derivatives of the developing outflow tract as described in the title.

      The title has been changed to reflect that only adult aortic valves were examined.

      (4) Major conclusions regarding the GATA6 regulatory network and downstream target genes are not validated in the context of the developing outflow tract or adult valves. Is GATA6 expression restricted to specific outflow tract populations? Is GATA6 binding or responsive gene expression detected for the indicated target genes?

      We performed additional analyses that further reinforce the relationship between GATA6 and its target genes and support the biological relevance of GATA6 downstream targets in arterial valve development. Below, we address the specific questions raised by the reviewer.

      (1) Is GATA6 expression restricted to specific outflow tract populations?

      GATA6 expression is not restricted to specific outflow tract populations. In CS16-17 embryos, GATA6-expressing cells are detected across all embryonic clusters; however, expression levels are highest in cluster 4 (valve precursor cells).

      Despite this broad expression pattern, SCENIC identifies GATA6 activity (i.e., a GATA6 regulon) specifically in cluster 4. This apparent restriction of GATA6 regulatory activity to cluster 4 may be explained, at least in part, by its elevated expression levels within this cluster. Alternatively, given that transcription factors often act in a combinatorial manner, GATA6 may co-regulate its target genes in cluster 4 together with additional cluster-specific regulators. To explore this possibility, we compared the GATA6 regulon with the regulons of other cluster 4 transcription factors identified by SCENIC (namely SOX4, GLI3, RARG, ETV1, GLIS3, BACH2, ZNF423, FOXO3, ZBTB20) in order to identify potential co-regulatory modules. As expected, since these regulons are sampled from the subset of genes enriched in cluster 4, all regulators share a substantial proportion of downstream targets with GATA6. However, GLI3 stands out, showing approximately twice the degree of overlap compared to the other factors. This suggests a functional interaction between GATA6 and GLI3, consistent with previously reported cooperation in mouse limb development. These results have been incorporated into the Results section, and the expression of GATA6 and GLI3 in CS16-17 cell populations is shown in Fig. S2DE.

      (2) Is GATA6 binding or responsive gene expression detected for the indicated target genes?

      We were unable to find public data describing the GATA6 regulatory network or its downstream targets in the specific human cell types examined here (OFT cells; valvular fibroblast progenitors). Available datasets focus primarily on cardiomyocytes, which arise from a distinct lineage, and because transcription factor function is highly cell-type and context dependent, these datasets are unlikely to be helpful in inferring regulatory relationships in the valvular lineage.

      The strongest validation for the GATA6 regulon identified in this study comes from the mouse GATA6 occupancy data (this was included in the original manuscript). Although derived from a different species, GATA6 binding has been profiled in a highly related developmental context, the OFT. To assess the relevance of these data to our human findings, we performed a hypergeometric test comparing the GATA6 regulon identified in cluster 4 (this study) with genes bound by GATA6 in E11.5 mouse OFT ChIP-seq data (DOI: https://doi.org/10.7554/eLife.31362). The observed overlap is substantially greater than expected by chance: it is approximately twice the expected value, and the enrichment is highly significant (p = 1.2 × 10<sup>-33</sup>). Biologically, this strongly supports the interpretation that many genes within GATA6 regulon are likely to be direct GATA6 targets, or at minimum are strongly associated with GATA6 binding, rather than representing a random gene set. This analysis has been added to the revised manuscript.

      In this revised version of the manuscript, we also overlapped the expression of GATA6 regulon genes to our fetal spatial transcriptomics data. GATA6 regulon was identified in embryonic cluster 4, whose expected trajectory is fetal valvular fibroblasts (cluster 12). Remarkably, GATA6 regulon genes are expressed in both the aortic and pulmonary valves, and their expression pattern aligns closely with HAPLN1-positive valvular fibroblasts (cluster 12), further supporting the biological relevance of this gene set. This new data has been added to Fig 2(F).

      Together, the strong enrichment of GATA6 regulon genes among GATA6-bound targets in the OFT, and the specific expression of this gene set within the arterial valves (cluster 4 descendant cells), support the biological relevance of GATA6 downstream targets in arterial valve development and disease. In addition, we identify GLI3 as a potential GATA6 co-binding partner.

      (5) What are "cardiac" cell types in the embryonic single cell clustering? Are these cardiomyocytes? Cardiac is an ambiguous term if the cells being analyzed are all in the heart.

      Thank you for highlighting this ambiguity. The “cardiac” population refers specifically to cardiac muscle cells. We have updated the labels in Fig. 1E, 1F, and Fig. S3A to make this explicit.

      (6) The methods and analytical tools seem fairly standard for single nuclear gene expression and spatial genomics studies. What are the new tools and resources being reported? The "novel lineage tracing algorithm" mentioned in the methods is not well described. A Cellxgene VIP app is mentioned, but is not described in detail. Also, it seems to be housed on a local server, which is not optimal.

      The description of the lineage tracing algorithm has been expanded in the method’s section of the paper.

      The data has been submitted to the Human Cell Atlas, a coordinated global effort to systematically map human cell types using standardized, interoperable formats. Public access via cell x gene enables interactive visualization, gene-level queries, and cross-dataset comparisons without requiring advanced computational expertise. This broad accessibility enhances reproducibility, facilitates integration with complementary single-cell and spatial datasets, and maximizes the visibility, transparency, and long-term impact of our work.

      (7) Only adult aortic valves from females were included in the study.

      The rationale for using female tissues has been explained in the result section:

      We collected female samples to mitigate individual variability and maximise the possibility to analyse healthy aortic valves, justified by the lower incidence and severity of aortic disease in females versus males.

      (8) In many of the figures, the font size of the text is too small to read.

      We have increased the font size in all figures where this was compatible with the layout. For the larger plots, additional enlargement would necessitate scaling the panels beyond the allowable page dimensions, and therefore could not be implemented.

      (9) "CAT" is not a commonly used abbreviation for congenital heart anomalies related to persistent truncus arteriosus.

      CAT is now the preferred term for PTA as latinised terms are no longer used.

      Reviewer #2 (Recommendations for the authors):

      Overall, this study is thoughtfully conducted and offers valuable observations that contribute to our understanding of valve morphogenesis. However, my main concern is the lack of experimental validation to support the findings, particularly the conclusion regarding the persistence of transcriptional signatures in adult cells, which is not sufficiently substantiated or clearly argued. It is unclear how this study advances beyond previous research in humans.

      Major points:

      (1) Several recent studies have applied spatial transcriptomics to human embryonic and fetal hearts, including OFT (Asp et al., 2019; Queen et al., 2023; Farah et al., 2024; De Bono et al., 2025). It is disappointing that the authors did not acknowledge these important contributions.

      We apologise for missing these contributions. They have now been added to the text.

      (2) The present study used snRNAseq to explore the transcriptional signature of the fetal OFT. A similar approach was used by De Bono et al. (2025) to analyze fetal hearts. Integrating these complementary snRNAseq datasets could enhance the current analysis and provide broader context for the findings.

      The reviewers suggested that integrating our datasets with prior snRNA-seq datasets on human OFT (de Bono et al., 2025) could enhance the analyses and provide broader context. While several fetal heart datasets have been published (e.g., Sahara et al.), our study focuses specifically on the OFT. These other studies do not perform cross-dataset comparisons. We therefore do not see a strong rationale for integrating ours, especially given that those datasets cover much larger regions of the heart.

      (3) Figure 1 presents 18 distinct clusters identified through unsupervised clustering. The authors classify three of these clusters broadly as mesenchymal cells. However, the term "mesenchymal cells" lacks precision. The authors should clarify why these clusters were not more specifically defined as fibroblasts or myofibroblasts based on marker expression.

      Clustering of the full dataset does not provide sufficient resolution to distinguish all mesenchymal cell types. The clusters broadly annotated as mesenchymal comprise heterogeneous populations, including both undifferentiated embryonic mesenchymal cells and more differentiated fetal mesenchymal cells. These mesenchymal clusters were therefore further subclustered, and the resulting cell identities are described in detail in the Results sections corresponding to Fig. 2 and Fig. 3.

      (4) The authors used SCENIC on their snRNAseq datasets to infer key cell fate regulators and identified GATA6 as a top regulator of embryonic mesenchymal cluster 4. However, the rationale for focusing on GATA6, which is already known to be associated with CHD in humans, is not fully convincing. Why not investigate a transcription factor whose role in valve development remains unexplored?

      There are two key outcomes from a SCENIC analysis: (1) the identification of major transcriptional regulators driving the differentiation of a given cluster, and (2) the identification of their regulons (the downstream gene programs they control). While GATA6 is indeed already known to be associated with CHD in humans, including valve malformations and major OFT defects, its downstream targets in the relevant human developmental lineages have not been defined. Understanding these targets is essential for clarifying the molecular basis of GATA6-mediated CHD. Thus, the significance of our result does not lie in the rediscovery of GATA6 as a CHD-related factor, but in identifying the genes it regulates in embryonic OFT- and valve-associated mesenchyme. These GATA6-controlled genes in the OFT and valves represent biologically plausible candidate genes for human OFT defects, as disruption of GATA6 targets could similarly contribute to CHD.

      In this revised version we have performed a hypergeometric test showing that GATA6 regulon genes are significantly enriched among genes bound by GATA6 in the OFT. Biologically, this strongly supports the interpretation that many genes within the GATA6 regulon are likely to be direct GATA6 targets, or at minimum are strongly associated with GATA6 binding in the OFT, rather than representing a random gene set.

      We have also mapped the expression of GATA6 regulon to the semilunar valves. Collectively, these analyses demonstrate that the GATA6 regulon captures a biologically coherent and developmentally relevant program, offering new mechanistic insight into how GATA6 influences OFT and valve formation and how its disruption may contribute to CHD.

      (5) Several studies have already suggested a role for GATA6 in EMT. Do the authors propose that GATA6 regulates this process during embryonic valve development? Once again, validation using FISH would be important to support these findings.

      We do not propose that GATA6 directly regulates EMT during embryonic valve development. We rather make two independent observations: (1) cluster 4 derives from cluster 7 (likely through EMT); (2) GATA6 regulates cluster4-specific genes.

      The first observation is supported by RNA velocity, which links cluster 7 to cluster 4. Supporting this interpretation, endothelial cluster 7 is enriched for genes associated with arterial valve development, and mesenchymal cluster 4 cells are identified as progenitors of fetal valve fibroblasts. Because cluster 7 is endothelial and cluster 4 is mesenchymal, this trajectory suggests an endothelial-to-mesenchymal transition.

      Second, SCENIC analysis identifies GATA6 as a regulator of cluster 4 genes. Additionally, the GATA6 regulon shows distinct localization to the formed valves in fetal cells (new data added to Fig 2F). Together these findings support the notion that GATA6 regulates a gene program specific to the cell populations that will give rise to the valves and that these genes remain selectively expressed in valve cells once the arterial valves have formed.

      While we cannot exclude the possibility that GATA6 contributes to EMT, we observe that GATA6 expression levels are highest in cluster 4 (post-EMT) cells, suggesting that its activation may be a consequence of the transition rather than its initiating cause (now shown in Fig S2D).

      For validation using FISH, please see response to point 6 below

      (6) I found it curious that the ST section was used to validate MECOM expression (Figure 2I), while ST had not yet been introduced at this point in the manuscript. Validation using FISH would have been a more appropriate approach.

      Thank you for drawing attention to this discrepancy. Spatial transcriptomics is now introduced before MECOM analysis, in the Results section pertaining to Figure 2F

      “…spatial transcriptomic analysis of a later stage (12pcw) OFT shows that GATA6 regulon is mainly restricted to the aortic and pulmonary valves (Fig 2F)”.

      With regard to this and the above comment concerning FISH, while RNA FISH/RNAscope would provide an additional orthogonal approach, the Visium-based spatial transcriptomics platform directly measures MECOM transcripts in tissue sections and, in our view, represents an appropriate and sufficiently sensitive method for validating its spatial distribution in the human OFT. We have therefore relied on the spatial transcriptomics dataset to confirm and validate gene expression patterns, rather than performing additional FISH experiments. We now explicitly state that this approach serves as an independent in situ validation of gene expression, including MECOM.

      (7) "Spatial resolution of mesenchymal nuclei in the OFT" section: It is unclear which cluster the authors are referring to in this section.

      As mentioned in the text, we “mapped the five fetal mesenchymal clusters to distinct structures in the OFT” and used distinctive markers to confirm spatial assignments.

      (8) The authors should justify their choice to use Cell2location instead of a deconvolution method.

      We selected cell2location because it provides a probabilistic, hierarchical Bayesian framework that explicitly models technical variability across both single-cell reference data and spatial transcriptomics platforms. Rather than relying on predefined marker genes or simple linear regression, cell2location leverages the full transcriptomic profile of reference single-cell data and incorporates a factor analysis-based framework to model shared transcriptional signatures and latent structure across cell types. This approach improves discrimination between closely related cell states and reduces sensitivity to gene selection bias. Additionally, the probabilistic formulation yields uncertainty estimates for inferred cell abundances, enhancing interpretability and statistical rigor. Together, these features make cell2location particularly well suited for resolving complex cellular composition in our fetal human tissue spatial transcriptomics data.

      (9) Figure 3: Cluster 9 is identified as endothelial, yet it includes markers such as MYH11 among its top genes, a gene more commonly associated with cells at the base of the aorta. This raises questions about the accuracy of the cluster annotation.

      We could not find the definition of cluster 9 as endothelial to which the reviewer refers to. In Fig 3, both in the result text and in the figure legend, cluster 9 is identified as smooth muscle, which is consistent with MYH11 expression. The endothelial cluster is shown in Fig S3C.

      (10) The approach used to trace embryonic signatures in adult cells, based on overlap with the top 100 genes in embryonic clusters, relies largely on gene expression similarity, without incorporating lineage inference tools such as RNA velocity or pseudotime analysis. This limits the ability to distinguish true developmental relationships from shared functional programs. I believe that the use of aggregated adult samples may mask individual variability. Validation in separate samples (AV1 and AV3) lacks statistical rigor. The observed lower expression of embryonic genes in adult cells further complicates interpretation, raising the possibility that these signatures reflect residual expression rather than persistent lineage markers.

      We thank the reviewer for the opportunity to clarify our approach.

      We fully agree that tools such as RNA velocity and pseudotime are powerful for capturing short-term dynamic transcriptional changes and inferring lineage trajectories within continuous developmental processes. Indeed, we applied RNA velocity and identified a transition between clusters 7 and 4 in embryonic cells (Fig 2). However, as noted in the Results section, “trajectory inference methods failed to establish lineage relationships between embryonic and fetal populations”. These methods assume temporal continuity and comparable transcriptional kinetics between cells. When comparing samples separated by large developmental intervals (e.g., embryonic versus adult tissues), these assumptions do not hold: RNA velocity vectors become unreliable and may even yield biologically meaningless directions. Therefore, rather than forcing a continuous trajectory across temporally distant datasets, we employed an anchoring approach designed to identify conserved transcriptional programs and potential lineage correspondences between embryonic and adult cell types.

      To address the concern about individual variability, we performed analyses both on aggregated adult samples and on individual replicates (AV1 and AV3). The results were highly consistent across both levels of analysis, and statistical significance was supported by very low p-values, indicating that the observed patterns are robust and reproducible. We therefore believe our analysis in independent samples is statistically sound.

      Finally, we agree that adult cells display lower expression of embryonic genes, and we acknowledge that these signatures may represent residual rather than persistent expression. This observation aligns with our intended interpretation: our goal was not to demonstrate enduring embryonic marker expression, but to highlight that adult cells retain transcriptional traces that connect them to their developmental origins.

      Reviewer #3 (Recommendations for the authors):

      (1) Please clarify if MEIS1, JAG1, ROR1, PRDM6 have been previously implicated in neural crest cell development. Are these then new potential regulators of neural crest cells? The same applies to SOX6 for the mesodermal population.

      The main reason for selecting these genes (MEIS1, JAG1, ROR1, and PRDM6 in cluster 20, and SOX6 in cluster 4) is that they serve as distinctive markers of specific embryonic clusters. Because their expression remains restricted at later developmental stages, they allow reliable tracing of bona fide descendant cells originating from cluster 20 and cluster 4 into fetal and adult tissues. Importantly, MEIS1, JAG1, ROR1, and PRDM6 were not chosen as new potential regulators of neural crest (NC) cells, but rather because their expression is enriched in cluster 20 and remains restricted at later developmental stages, allowing reliable tracing of bona fide descendant cells originating from cluster 20. Since cluster 20 is, based on transcriptional profiles, the embryonic mesenchymal cluster most closely related to the NC lineage, these markers enable lineage tracing of NC-descendent cells. Nonetheless, these genes have all been linked to neural crest biology, either through known functional roles or through specific expression patterns associated with NC development.

      Similarly, SOX6 was selected for its restricted expression in cluster 4, a pattern that is preserved in its descendant populations, making it a suitable marker for tracking the mesoderm-derived lineage.

      (2) Please comment in the text whether any regional transcriptional differences (rather than cell type differences) were detected between the aortic and pulmonary regions.

      We have added the following text to the result section related to Fig 3: “No molecular differences or distinguishing markers were identified between the aortic and pulmonary valves.”

      (3) There appear to be no myocardial cells in the adult valve tissue - the authors could discuss what the fate of myocardium is in the embryonic OFT. Are they only looking at a subset of derivatives of the embryonic OFT?

      Our adult dataset represents the aortic valve complex and adjacent arterial root tissue (a subset of outflow tract derivatives) rather than the entire outflow tract (this has now been specified in the title). Spatial transcriptomic analysis identified myocardial gene expression within the ventricular and outflow tract walls at CS16-19, but not within the valve leaflet cluster (Queen et al., 2023). This is consistent with previous observations that myocardium contributes to the arterial root and supports early cushion formation, but does not persist in mature valve tissue, which becomes predominantly fibrous and populated by valve interstitial cells. This explanation has been added to the analysis of cell populations in the valves.

      (4) Please equate Carnegie stages 13-23 to embryonic days or weeks of gestation in the first paragraph to help the general reader.

      We have added the suggested clarification and noted that this period spans four weeks of human development, rather than the three weeks previously indicated. The text has been updated accordingly.

      (5) I suggest rewriting the first sentence of the introduction using the plural, as there are many different types of CHD.

      The sentence has been changed accordingly.

      (6) It would be helpful to add the persistence of embryonic signatures into adult valve cell types in Figure 4E.

      We thank the reviewer for this helpful suggestion. To address this point, we have now added an analysis of the persistence of embryonic signatures in adult valve cell types to Figure 4E. Specifically, we selected 10 representative genes from the 100-gene embryonic signature lists of cluster 4 and cluster 20 and projected their expression onto the t-SNE shown in Figure 4E. The combined (module) expression of these 10 genes is now shown in Figure S6E, and the expression of the individual genes is presented in the newly added Figure S7.

      We would like to clarify that our statistical framework identifies potential descendant populations based on significant enrichment of an embryonic gene signature. Therefore, individual embryonic genes are not necessarily expected to be expressed exclusively or uniformly within a single adult population.

      (7) Please explain how the 2-dimensional plot in 2J relates to the other plots.

      The plot originally shown in Fig 2J (now Fig 2K) was generated by applying RNA velocity exclusively to CS16-17 nuclei. Developmental nuclei (excluding adult samples) were subclustered as shown in Fig S2AB, resulting in the 5 clusters of embryonic nuclei analysed in Fig 2J: cardiac muscle (2, 17), endothelial (7), and mesenchymal (4, 20).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This thoughtful and thorough mechanistic and functional study reports ARHGAP36 as a direct transcriptional target of FOXC1…… Although this study largely represents a robust and near-comprehensive set of focused investigations on a novel target of FOXC1 activity, several significant omissions undercut the generalizability of the findings reported.

      (1) It is notable that the volcano plot in Figure 1a does now show evidence of canonical Hedgehog gene regulation, even though the subsequent studies in this paper clearly demonstrate that ARHGAP36 regulates Hedgehog signal transduction. Is this because canonical Hedgehog target genes (GLI1, PTCH1, SUFU) simply weren't labeled? Or is there a technical limitation that needs to be clarified? A note about Hedgehog target genes is made in conjunction with Table S1, but the justification or basis of defining these genes as Hedgehog targets is unclear. More broadly, it would be useful to see ontology analyses from these gene expression data to understand FOXC1 target genes more broadly. Ontology analyses are included in a supplementary table, but network visualizations would be much preferred.

      Space constraints precluded labelling the Volcano plot with all 285 significantly differentially expressed genes. So rather than just Hedgehog pathway members, the most dysregulated were labelled (those with a 4-fold change: -2 <log\<sub>2\</sub>> +2) and the full list of DEGs provided in the supplemental excel file. We have added the suggested network analysis, and for additional rigor also included protein interaction partners of Gli1 and Arhgap36 (Fig. S12).

      (2) Likewise, the ChIP-seq data in Figure 2 are under-analyzed, focusing only on the ARHGAP36 locus and not more broadly on the FOXC1 gene expression program. This is a missed opportunity that should be remedied with unbiased analyses intersecting differentially expressed FOXC1 peaks with differentially expressed genes from RNA-sequencing data displayed in Figure 1.

      We agree that genome-wide analysis of ChIP-seq data from Foxc1 over-expression is worthwhile, not least for diverse malignancies where FOXC1 is over-expressed. We chose to restrict the focus of this paper in order to define, as comprehensively as we could, the FOXC1 - ARHGAP36 relationship. Our ChIP and RNA-seq datasets are freely available to other researchers via GEO (GSE297865/GSE297719). Our future manuscript is integrating ChIP-seq and RNA-seq with ATAC-seq: replicate ATAC-seq experiments permit rigorous characterization of genes transcriptionally regulated by Foxc1 as well as Foxc1’s pioneering abilities. However, these additional assays, and particularly validation of findings, take significant time and so lie beyond the scope of the current manuscript.

      (3) RNA-seq and ChIP-seq data strongly suggest that FOXC1 regulates ARHGAP36 expression, and the authors convincingly identify genomic segments at the ARHGAP36 locus where FOXC1 binds, but they do not test if FOXC1 specifically activates this locus through the creation of a luciferase or similar promoter reporter. Such a reagent and associated experiments would not only strengthen the primary argument of this investigation but could serve as a valuable resource for the community of scientists investigating FOXC1, ARHGAP36, the Hedgehog pathway, and related biological processes. CRISPRi targeting of the identified regions of the ARHGAP locus is a useful step in the right direction, but these experiments are not done in a way to demonstrate FOXC1 dependency.

      We agree and undertook the suggested luciferase reporter assays. The results demonstrate that transcriptional activity is dependent on Foxc1 and abrogated by mutation of the predicted Foxc1binding motifs (Fig. S8).

      (4) It would be useful to see individual fluorescence channels in association with images in Figure 3b.

      The figure has been revised to provide individual fluorescence channel data, as suggested.

      (5) Perhaps the most significant limitation of this study is the omission of in vivo data, a shortcoming the authors partly mitigate through the incorporation of clinical outcome data from pediatric neuroblastoma patients in the context of ARHGAP36 expression. The authors also mention that high levels of ARHGAP36 expression were also detected in "specific CNS, breast, lung, and neuroendocrine tumors," but do not provide clinical outcome data for these cohorts. Such analyses would be useful to understand the generalizability of their findings across different cancer types. More broadly, how were high, medium, and low levels of ARHGAP36 expression identified? "Terciles" are mentioned, but such an approach is not experimentally rigorous, and RPA or related approaches (nested rank statistics, etc) are recommended to find optimal cutpoints for ARHGAP36 expression in the context of neuroblastoma, "specific CNS, breast, lung, and neuroendocrine" tumor outcomes.

      The issue of analyzing in vivo data for neuroblastoma is addressed in more detail below, as it is also raised by the other reviewers. The neuroblastoma data represent the initial findings after the Foxc1Arhgap36 link was defined. There is vastly more that could and should be undertaken to determine mechanism(s) for ARHGAP36’s beneficial association with this tumor’ survival. This is the ongoing focus for the lab.

      The original text omitted details of the cancer expression datasets surveyed that revealed high levels of ARHGAP36 expression were also detected in "specific CNS, breast, lung, and neuroendocrine tumors". This oversight has been corrected – when submitting, we omitted to upload a supplemental file (Table S4) that provided these data, which were derived from the following four sites (TCGA, TARGET, PCAWG and CCLE). However, these excellent online resources infrequently provide clinical outcome data.

      The three independent neuroblastoma cohorts were analyzed identically. Each was stratified into an ordered dataset for ARHGAP36 expression, and then divided into three equal-sized groups [terciles]. Stratification into smaller subgroups [quartiles/quintiles] would have been equally feasible. The same methodology is used by the UCSC Xena browser for Kaplan-Meier survival analysis, and offers the advantage of avoiding a priori assumptions; it is thus agnostic regarding the data. We agree that there is scope for additional approaches, including recursive partitioning analyses, but suggest it may be better to reserve these for the future, not least in analyses that test the reported ARHGAP36-survival association in additional neuroblastoma datasets.

      Reviewer #2 (Public review):

      FOXC1 is a transcription factor essential for the development of neural crest-derived tissues and has been identified as a key biomarker in various cancers. … Together, these findings uncover a novel FOXC1-ARHGAP36 regulatory axis that modulates Hh and PKA signaling, offering new insights into both normal development and cancer progression.

      The main strengths of the study are:

      (1) Identification of a novel signaling pathway involving FOXC1 and ARHGAP36, which may play a critical role in both normal development and cancer biology.

      (2) Mechanistic investigation using RNA-seq, ChIP-seq, and functional assays to elucidate how FOXC1 regulates ARHGAP36 and how this axis modulates Hh signaling.

      (3) Clinical relevance demonstrated through analysis of neuroblastoma patient datasets, linking ARHGAP36 expression to improved 5-year overall survival.

      The main weaknesses of the study are:

      (1) Lack of validation in neuroblastoma models - the study does not directly test its findings in neuroblastoma cell models, limiting translational relevance.

      We agree that the mechanisms by which increased ARHGAP36 levels are protective, are important to define. Despite experiments over many months manipulating ARHGAP36 expression, that induce quite rapid death of neuroblastoma cells in vitro, the precise mechanism(s) remain unresolved. Currently, we are endogenously labelling multiple neuroblastoma lines with Histone 2B-mCherry to facilitate live cell imaging and differentiate effects on proliferation and apoptosis. In the interim, we believe publication of the current dataset allows other researchers to independently test our findings for this pediatric malignancy. We are also establishing collaborations to access patient tissue samples, that will facilitate investigation of non cell autonomous mechanisms mediated via the tumor microenvironment.

      (2) Incomplete mechanistic insight into PKA regulation - the study does not fully elucidate how FOXC1-ARHGAP36 regulates PKAC activity at the molecular level.

      Other laboratories elegantly demonstrated that ARHGAP36’s effect on Hedgehog output is mediated by one motif blocking PKAC activity and the targeting of PKAC for degradation [PMIDs 25024229, 27713425, 30598432]. With these effects well-established, we limited experiments to confirming that Foxc1induced Arhgap36 reduced PKAC, and pT197 PKAC levels, to those of ectopic Arhgap36 expression.

      (3) Insufficient discussion of clinical outcome data - while ARHGAP36 expression correlates with improved survival in neuroblastoma, the manuscript lacks a clear interpretation of this unexpected finding, especially given the known oncogenic roles of FOXC1, ARHGAP36, and Hh signaling.

      ARHGAP36 expression may influence neuroblastoma survival via multiple mechanisms. Considering just canonical Hedgehog, possibilities include: cell cycle modulation, symmetric vs asymmetric cell division, maintenance of cancer stem cells, EMT, metastasis… Others include Hedgehog’s anti-apoptotic roles and the diverse mechanisms by which PKA influences cell function and survival. Faced with such diversity, we focused the discussion on what the presented data demonstrate.

      Reviewer #3 (Public review):

      Summary:

      The focus of the research is to understand how transcription factors with high expression in neural crest cell-derived cancers (e.g., neuroblastoma) and roles in neural crest cell development function to promote malignancy. The focus is on the transcription factor FOXC1 and using murine cell culture, gain- and loss-of-function approaches, and ChIP profiling, among other techniques, to place PKC inhibitor ARHGAP36 mechanistically between FOXC1 and another pathway associated with malignancy, Sonic Hedgehog (SHH).

      Strengths:

      Major strengths are the mechanistic approaches to identify FOXC1 direct targets, definitively showing that FOXC1 transcriptional regulation of ARHGAP36 leads to dysregulation of SHH signaling downstream of ARHGAP36 inhibition of PKC. Starting from a screen of Foxc1 OE to get to ARHGAP36 and then using genetic and pharmacological manipulation to work through the mechanism is very well done. There is data that will be of use to others studying FOXC1 in mesenchymal cell types, in particular, the FOXC1 ChIP-seq.

      Weaknesses:

      Work is almost all performed in NIH3T3 or similar cells (mouse cells, not patient or mouse-derived cancer cells), so the link to neuroblastoma that forms the major motivation of the work is not clear. The authors look at ARHGAP36 levels in association with the neuroblastoma patient survival; however, the finding, though interesting and quite compelling, is misaligned with what the literature shows about FOXC1 and SHH, their high expression is associated with increased malignancy (also maybe worse outcomes?). Therefore, ARHGAP36 expression may be more complicated in a tumor cell or may be unrelated to FOXC1 or SHH, leaving one to wonder what the work in NIH3T3 cells, though well done, is telling us about the mechanisms of FOXC1 as an oncogene in neuroblastoma cells or in any type of cancer cell. Does it really function as an SHH activator to drive tumor growth? The 'oncogenic relevance' and 'contribution to malignancy' claimed in the last paragraph of the introduction are currently weakly supported by the data as presented. This could be improved by studying some of these mechanisms in patient-derived neuroblastoma cells with high FOXC1 expression. Does inhibiting FOXC1 change SHH and ARHGAP36 and have any effect on cell proliferation or migration? Alternatively, does OE of FOXC1 in NIH3T3 cells increase their migration or stimulate proliferation in some way, and is this dependent on ARHGAP36 or SHH? Application of their mechanistic approaches in cancer cells or looking for hallmarks of cancer phenotypes with FOXC1 OE (and dependent on SHH or ARHGAP36) could help to make a link with cellular phenotypes of malignant cells.

      The manuscript stems from the lab’s findings that Foxc1 influences cilia-mediated signaling (Hedgehog and PDGFRalpha), offering an explanation for FOXC1’s pleiotropic phenotypes. Due to FOXC1’s largely unexplained roles in malignancy, the effects on Hedgehog prompted investigation of differential gene expression in NIH3T3 cells when Foxc1 was over-expressed. This identified Arhgap36 as a prime candidate for the Hedgehog pathway alterations, and most of the paper reports the characterization of this relationship. The final, small component of the paper, tests the relevance in neural crest derived cells, where Foxc1 has key roles. Neuroblastoma’s frequent lethality has created a network of highly supportive researchers with shared datasets, and these survival data were assayed. This in turn revealed that high levels of ARHGAP36 expression were associated with a favorable survival outcome.

      Defining the underlying molecular mechanisms for this novel association is clearly important. As outlined above, one challenge reflects the diversity of potential mechanisms, coupled with the requirement to validate those identified from 2-D culture in patient-derived tumor explants as well as immuno-deficient model organisms. Such experiments take significant time, and our present focus is on manipulating ARHGAP36 expression directly, rather than by altering FOXC1 expression, which inevitably has even more diverse effects.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The study would be strengthened by validating key findings, such as the resistance to Hh inhibition, in neuroblastoma cell lines to enhance disease relevance.

      Planned future experiments include in vitro evaluation of PKA antagonists and agonists on neuroblastoma survival.

      The authors show that FOXC1/ARHGAP36 reduces PKAC protein levels; however, it is unclear whether this regulation occurs at the transcriptional level. Assessing PKAC mRNA expression would help explain the mechanism. Additionally, if PKAC is transcriptionally downregulated, overexpression of PKAC can be used to test whether it reverses the FOXC1/ARHGAP36induced activation of Hh signaling.

      The RNA-sequencing data exclude this possibility at the transcriptional level, since PKA is not significantly differentially expressed (Table S1). Instead, Figures 1&3 support Foxc1 inducing Arhgap36 expression, with elevated Arhgap36 protein levels reducing those of PKAC and catalytically active pT197 PKAC, in both the cytoplasm and adjacent to the basal body.

      The Discussion should address the potential effects of ARHGAP36 overexpression on other signaling pathways-particularly Hh and PKA signaling and PKA in neuroblastoma. These effects may help interpret the observed association between ARHGAP36 expression and clinical outcomes in patients. Of note, it has been reported that Hh may correlate with better survival in neuroblastoma (Cancers, 2021 Apr 15;13(8):1908; J Pediatr Surg. 2010 Dec;45(12):2299).

      Both Hedgehog signaling and protein kinase A have broad effects on normal cell biology, that are likely more extensive in malignant cells. Consequently, although tempting to propose why ARHAGP36 overexpression is associated with enhanced survival, it may be better to wait until the causative mechanisms have been defined.

      If treatment information for the patient cohorts is available, it should be included as it may enhance the interpretability of the survival analyses.

      This is an excellent suggestion, although at present this information is not available to us. As the manuscript moves forward to publication, we will be liaising with the corresponding authors of the three datasets [GSE49711, E-MTAB-178191 and TARGET] to explore such additional clinical possibilities.

      The 'A' label in Figures S9 and S10 should be removed, as neither figure contains sub-panels.

      This has been corrected, as suggested.

      Reviewer #3 (Recommendations for the authors):

      Other comments:

      (1) Figure 5A, B: Unclear how meaningful the inhibitor experiments are in the absence of SHH (presumable none in the media or made by NIH3T3 cells?), other than as a control for the FOXC1 OE treated with Smo antagonists. A potentially better experiment could be to take malignant cells with high FOXC1 and high SHH signaling and put on Smo inhibitors.

      Figure 5A demonstrates Foxc1’s induction of GLI1 expression is not dependent on Hedgehog ligand. While certainly feasible to repeat in malignant cells strongly expressing FOXC1, doing this comprehensively would require testing lines from many or all of the ~15 malignancies where FOXC1 has a defined contribution.

      (2) Figure 6: the Gli2-mGFP seem to have higher levels of ciliary Sufu, they also have higher levels of Gli1 (see Figure 1C), does the Gli2-mGFP expression change SHH signaling? What controls have the authors done to test if this is a serious confound in their studies? They use it for most experiments, this is important to address.

      Although Gli2-mGFP expression affects Hedgehog signaling, in the absence of Gli2 (e.g. untransformed NIH3T3) Foxc1 induces Arhgap36 expression. The scope for interaction between Foxc1 and Gli2 represents an additional motivation for the ATAC-seq experiments described above to better determine if these two transcription factors have synergistic effects.

      (3) Figure 3B: (1) Please use color-blind friendly LUTs for the signals (same comment for other figures), (2) The Gli2-mGFP line with the current color scheme is confusing; it looks like only 647 and 555 secondaries were used, did they not image with the mGFP? Why not? (3) What is the evidence that these are basal bodies? (4) Why did the authors use cycloheximide in these IF experiments? Was this also done in other methods? The reasoning behind this is missing.

      For now, we have included separate channels for Figure 3. In future manuscripts we will adopt the suggestion of moving to either magenta and green, or cyan and magenta combinations for depicting immunofluorescence.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors' goal was to advance the understanding of metabolic flux in the bradyzoite cyst form of the parasite T. gondii, since this is a major form of transmission of this ubiquitous parasite, but very little is understood about cyst metabolism and growth.

      Nonetheless, this is an important advance in understanding and targeting bradyzoite growth.

      Strengths:

      The study used a newly developed technique for growing T. gondii cystic parasites in a human muscle-cell myotube format, which enables culturing and analysis of cysts. This enabled screening of a set of anti-parasitic compounds to identify those that inhibit growth in both vegetative (tachyzoite) forms and bradyzoites (cysts). Three of these compounds were used for comparative Metabolomic profiling to demonstrate differences in metabolism between the two cellular forms.

      One of the compounds yielded a pattern consistent with targeting the mitochondrial bc1 complex, and suggest a role for this complex in metabolism in the bradyzoite form, an important advance in understanding this life stage.

      Weaknesses:

      Studies such as these provide important insights into the overall metabolic differences between different life stages, and they also underscore the challenge with interpreting individual patterns caused by metabolic inhibitors due to the systemic level of some of some targets, so that some observed effects are indirect consequences of the inhibitor action. While the authors make a compelling argument for focusing on the role of the bc1 complex, there are some inconsistencies in the some patterns that underscore the complexity of metabolic systems.

      Thank you for reviewing the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      A particular challenge in treating infections caused by the parasite Toxoplasma gondii is to target (and ultimately clear) the tissue cysts that persist for the lifetime of an infected individual. The study by Maus and colleagues leverages the development of a powerful in vitro culture system for the cyst-forming bradyzoite stage of Toxoplasma parasites to screen a compound library for candidate inhibitors of parasite proliferation and survival. They identify numerous inhibitors capable of inhibiting both the disease-causing tachyzoite and the cyst-forming bradyzoite stages of the parasite. To characterize the potential targets of some of these inhibitors, they undertake metabolomic analyses. The metabolic signatures from these analyses lead them to identify one compound (MMV1028806) that interferes with aspects of parasite mitochondrial metabolism. In the revised version of the manuscript, the authors present convincing evidence that MMV1028806 targets the mitochondrial electron transport (ETC) chain of the parasite (although they don't identify the actual target in the ETC). The revised manuscript also nicely addresses my other criticisms of the original version. Overall, the study presents an exciting approach for identifying and characterizing much-needed inhibitors for targeting tissue cysts in these parasites.

      Strengths:

      The study presents convincing proof-of-principle evidence that the myotube-based in vitro culture system for T. gondii bradyzoites can be used to screen compound libraries, enabling the identification of compounds that target the proliferation and/or survival of this stage of the parasite. The study also utilizes metabolomic approaches to characterize metabolic 'signatures' that provide clues to the potential targets of candidate inhibitors. In addition to insights into candidate bradyzoite inhibitors, the study also provides new insights into the physiological role of the mitochondrial electron transport chain of bradyzoites, and raises a host of interesting questions around the functional roles of mitochondria in this stage of the parasite.

      Weaknesses:

      In the revised manuscript, the authors have included additional oxygen consumption rate data that indicate that MMV1028806 targets the mitochondrial electron transport chain (ETC). These data are convincing. On line 481, the authors state that "treatments with ATQ, BPQ, MMV1028806, and antimycin A resulted in substantially reduced oxygen consumption levels relative to the DMSO control and suggest indeed a blockage of the mETC consistent with the inhibition of the bc1-complex." The OCR assay the authors use is still only an indirect measure of bc1 activity. Given that most OCR-inhibiting compounds in T. gondii are bc1 inhibitors, it is possible (and perhaps likely) that MMV1028806 is targeting this complex. However, the data cannot rule out that it is targeting another component of the ETC (or potentially even a TCA cycle enzyme). Without a direct test that MMV1028806 inhibits bc1 complex activity, the authors should be more cautious in their interpretation (e.g. by acknowledging the limitations of their conclusion, or acknowledging other possible targets). Similarly, the conclusion on line Line 622 that "... we confirmed the bc1-complex as a target" is overstating the findings. The phrasing on lines 683-695 is more appropriate: "... suggesting that it also targets complex III or a functionally linked site within the mitochondrial electron transport chain."

      We are grateful for he thorough review of the updated manuscript and the identification the minor issues. We addressed all of them as detailed below. We also tempered our conclusions regarding the identification of the bc1-complex as a target in line 616:

      “In addition to abundance data, Additionally, we confirmed the bc1-complex as a target by monitoring the incorporation of <sup>13</sup>C and <sup>15</sup>N stable isotopes from glucose and glutamine, respectively, into TCA cycle and pyrimidine biosynthesis intermediates suggest the bc1-complex as a target”

      Reviewer #3 (Public review):

      Summary:

      The authors described an exciting 400-drug screening using a MMV pathogen box to select compounds that effectively affect the medically important Toxoplasma parasite bradyzoite stage. This work utilises a bradyzoites culture technique that was published recently by the same group. They focused on compounds that affected directly the mitochondria electron transport chain (mETC) bc1-complex and compared with other bc1 inhibitors described in the literature such as atovaquone and HDQs. They further provide metabolomics analysis of inhibited parasites which serves to provide support for the target and to characterise the outcome of the different inhibitors.

      Strengths:

      This work is important as, until now, there are no effective drugs that clear cysts during T. gondii infection. So, the discovery of new inhibitors that are effective against this parasite-stage in culture and thus have the potential to battle chronic infection is needed. The further metabolic characterization provides indirect target validation and highlight different metabolic outcome for different inhibitors. The latter forms the basis for new studies in the field to understand the mode of inhibition and mechanism of bc1-complex function in detail.

      The authors focused in the function of one compound, MMV1028806, that is demonstrated to have a similar metabolic outcome to burvaquone. Furthermore, the authors evaluated the importance of ATP production in tachyzoite and bradyzoites stages and under atovaquone/HDQs drugs.

      Thank you for reviewing the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thanks for making appropriate updates. I believe it makes the report stronger. Just please double-check proof-reading in newly added text: for example "integration" is misspelled in Figure 4 legend (C, E).

      Typos have been corrected throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      I congratulate the authors on an excellent study. I have several minor comments for the authors to consider before publication.

      Line 99. Schistosoma –

      Corrected

      Line 123. What was the pH of the bicarb-free RPMI medium?

      Added “at pH 7.2”

      Line 218 (and again on line 687). "RHku80" - are these just standard RH strain parasites? Or do the authors mean to imply that the ku80 gene has been knocked out in this line? If the latter, RH∆ku80 may be a better way to describe this line.

      We harmonized all mentions of this strain to RH∆ku80.

      Line 225. "Parasites were incubated in medium with one of the following treatments ..." How long were the parasites incubated in the different treatments before the plate was read? Was there any preincubation? I think not, but it would help to state this so the reader can appreciate that the effects of the compounds on OCR is likely an immediate (rather than a secondary) effect.

      This is indeed a good suggestion. There was no pre-incubation and we added changed the text to: “Parasites were incubated in medium with one of the following treatments immediately before measurement: … “

      Figure S2A. Check the spelling of Toxoplasmosis.

      Done, we corrected this sentence.

      Figure S2B. do you mean 'tachyzoidal' or 'tachyzocidal'? 'bradyzoidal' or 'bradyzocidal'?

      We clarified the formulation of the legends for Fig S2.

      Figure S2D. The "Tachyzoite lowest cytotoxicity" and "Bradyzoite lowest cytotoxicity" columns are, I think, depicting compound toxicity in host cells. Would it be clearer to rename these columns relative to the host cells being tested? e.g. "HFF/KD3 myotube lowest cytotoxicity"

      Good suggestion and we changed the designation accordingly.

      Line 369. "We found that tachyzocidal, bradyzocidal and dually active compounds possess a statistically significantly higher lipophilicity and this trend appeared more accentuated for bradyzocidal and dually active compounds." Significantly higher than what? Need to be clearer about the comparison being made: i.e. to non-active compounds.

      You are correct and we corrected this sentence accordingly.

      Line 500. "we attribute these changes to inhibition of host mitochondria (Fig. 5A)." The reason for referencing Figure 5A here isn't clear. Do the authors mean to point out that host mitochondrial membrane potential is affected by compound treatment? This could be stated more clearly.

      We deleted the reference to Fig 5A. We did not systematically measure the effect of the inhibitors on the membrane potential of the host mitochondria. We also changed the sentence to emphasize the speculative nature of this assertion: “we attribute these changes to potential inhibitory effects on host mitochondria”.

      Line 840. 'hurdling mechanisms'. The authors don't explain what they mean by this expression.

      We truncated the figure title to: “Untargeted metabolomic analysis of bradyzoites treated with bc1-complex inhibitors shows an energy imbalance.”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      By mapping H3K4me2 in mouse oocytes and pre-implantation embryos, the authors aim to elucidate how this histone modification is erased and re-established during the parental-to-zygotic transition, as well as how the reprogramming of H3K4me2 regulates gene expression and facilitates zygotic genome activation.

      Employing an improved CUT&RUN approach, the authors successfully generated H3K4me2 profiling data from a limited number of embryos. While the profiling experiments are very well executed, several weaknesses, particularly in data analysis, are apparent:

      (1) The study emphasizes H3K4me2, which often serves as a precursor to H3K4me3, a well-studied modification during early development. Analyzing the new H3K4me2 dataset alongside published H3K4me3 data is crucial for comprehensively understanding epigenetic reprogramming post-fertilization and the interplay between histone modifications. However, the current analysis is preliminary and lacks depth.

      Thank you very much for your valuable suggestions. The data of histone H3K4me3 in humans and mice has been published,and our previous data revealed the unique pattern of H3K4me3 during early human embryos and oocytes (Science. 2019 Jul 26;365(6451):353-360.) . So, this study mainly focuses on the localization of H3K4me2 in mouse oocytes and preimplantation embryos, how it is erased and re-established during mammalian parental-to-zygote transition, and its function. The combined analysis of H3K4me2 and H3K4me3 is not our main work, but it is not ruled out that there may be new discoveries between these two histones. Previously, our data tended to show that the H3K4me2 not only acts as a precursor of H3K4me3, but also plays its role independently.

      (2) Tranylcypromine (TCP) is known as an irreversible inhibitor of monoamine oxidase and LSD1. While the authors suggest TCP inhibits the expression of LSD2, this assertion is questionable. Given TCP's potential non-specific effects in cells, conclusions related to the experiments using TCP should be made with caution.

      Thank you for pointing this out, and we thank the reviewer again for the important suggestion. We found that the previous study (.Binda C, Valente S, Romanenghi M, Pilotto S, Cirilli R, Karytinos A, Ciossani G, Botrugno OA, Forneris F, Tardugno M, Edmondson DE, Minucci S, Mattevi A, Mai A. Biochemical, structural, and biological evaluation of tranylcypromine derivatives as inhibitors of histone demethylases LSD1 and LSD2. J Am Chem Soc. 2010 May 19;132(19):6827-33.) indicated that TCP was a non-reversible inhibitor of LSD1 and LSD2 (Human LSD2/KDM1b/AOF1 Regulates Gene Transcription by Modulating Intragenic H3K4me2 Methylation, Mol Cell. 2010 Jul 30; 39(2): 222–233.), but according to our data, the content of LSD1 was very low in the early stages of mouse embryos, which mainly inhibited the function of LSD2.

      (3) Some batches of H3K4me2 antibody are known to cross-react with H3K4me3. Has the H3K4me2 antibody used in CUT&RUN been tested for such cross-reactivity? Heatmaps in the figures indeed show similar distribution for H3K4me2 and H3K4me3, further raising concerns about antibody specificity.

      We thank the reviewer for the insightful comments. The H3K4me2 antibody was purchased from Millipore (cat. 07030). Figure 2A shows the specific enrichment area of H3K4me2 in promoter and distal region. Some batches of H3K4me2 antibody are known to cross-react with H3K4me3, but the H3K4me2 antibody we used in our CUT&RUN seems to have Low cross-reactivity.

      (4) Certain statements lack supporting references or figures (examples on page 9 can be found on line 245, line 254, and line 258).

      Thank you for pointing this out, and we will add references to support the statement in the paper as suggested.

      (5) Extensive language editing is recommended to clarify ambiguous sentences. Additionally, caution should be taken to avoid overstatement - most analyses in this study only suggest correlation rather than causality.

      Thank you for your kind comments. We will revise the expression in the manuscript later.

      Reviewer #2 (Public Review):

      Chong Wang et al. investigated the role of H3K4me2 during the reprogramming processes in mouse preimplantation embryos. The authors show that H3K4me2 is erased from GV to MII oocytes and re-established in the late 2-cell stage by performing Cut & Run H3K4me2 and immunofluorescence staining. Erasure and re-establishment of H3K4me2 have not been studied well, and profiling of H3K4me2 in germ cells and preimplantation embryos is valuable to understanding the reprogramming process and epigenetic inheritance.

      (1) The authors claim that the Cut & Run worked for MII oocytes, zygotes, and the 2-cell embryos. However, it is unclear if H3K4me2 is erased during the stage or if the Cut & Run did not work for these samples. To support the hypothesis of the erasure of H3K4me2, the authors conducted immunofluorescence staining, and H3k4me2 was undetected in the MII oocyte, PN5, and 2-cell stage. However, the published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage ((Ancelin et al., 2016; Shao et al., 2014)). The authors need to cite these papers and discuss the contradictory findings.

      The authors used 165 MII oocytes and 190 GV oocytes for the Cut & Run. The amount of DNA in MII oocytes is halved because of the emission of the first polar body. Would it be a reason that H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes?

      First of all, thank you for your valuable advice. The published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage (Ancelin et al., 2016), which is interesting. I think we may have used different parameters in the confocal laser shooting process. We used the same parameter to continuously shoot the blastocyst stage from the GV stage. If we only shot the fertilized egg and the 2-cell stage, I think we may also see weak fluorescence at the 2-cell stage under different parameters. We will refer to this reference and discuss it in the resubmitted version.

      Moreover, you mentioned the H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes, because the MII expelled the polar body. There is no problem with this logic. However, the first polar body expelled from the MII stage is still in the zona pellucida, and we also collected the polar body in the CUT&RUN experiment; Therefore, compared to GV, the DNA content of MII samples is not halved. After further discussion, we believe that the reduction of H3K4me2 peaks in MII stage compared with GV stage may be closely related to oocyte maturation. It is the specific modification of histones in different forms at different times that affects the chromatin structure change appropriately with the different stages of meiosis. At present, it has been confirmed that H3K4me3 gradually decreases from GV to MII stage during the maturation of human oocytes. H3K27me3 did not change from GV to MII stage.

      In Figure 3C, 98% (13,183/13,428) of H3K4me2 marked genes in GV oocytes overlap with those in the 4-cell stage. Furthermore, 92% (14,049/15,112) of H3K4me2 marked genes in sperm overlap with those in the 4-cell stage. Therefore, most regions maintain germ line-derived H3K4me2 in the 4-cell stage. The authors need to clarify which regions of germ line-derived H3K4me2 are maintained or erased in preimplantation embryos. Additionally, it would be interesting to investigate which regions show the parental allele-specific H3K4me2 in preimplantation embryos since the authors used hybrid preimplantation embryos (B6 x DBA).

      Thank you very much for your suggestion. Further analysis of which regions show the parental allele-specific H3K4me2 in preimplantation embryos will make the study more interesting. We will discuss this in depth in resubmitted vision.

      (2) The authors claim that Kdm1a is rarely expressed during mouse embryonic development (Figure 4A). However, the published paper showed that KDM1a is present in the zygote and 2-cell stage using immunostaining and western blotting (Ancelin et al., 2016). Additionally, this paper showed that depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage, and therefore, KDM1a is functionally important in early development.

      The authors should have cited the paper and described the role of KDM1a in early embryos.

      In the analysis of this experiment, we believe that in the early embryonic development of mice, the expression of KDM1A is lower than that of KDM1B, which is relative. Similarly, the transcriptome data we cite also show that KDM1A is expressed at elevated levels during oocyte maturation and fertilization compared to immature oocytes. In addition, the effects of loss of maternal KDM1a on embryonic development were not discussed. We believe that the absence of maternal KDM1b blocks embryonic development, and we will cite and discus the references later.

      (3) The authors used the published RNA data set and interpreted that KDM1B (LSD2) was highly expressed at the MII stage (Figure S3A). However, the heat map shows that KDM1B expression is high in growing oocytes but not at 8w_oocytes and MII oocytes. The authors need to interpret the data accurately.

      After re-checking the data, we found that there was a problem with the normalization method of our heat map, and we will re-make the heatmap and submit it in the modified version. With reference to Figure 4A, the content of Kdm1b is indeed higher than that of Kdm1a.

      (4) All embryos in the TCP group were arrested at the four-cell stage. Embryos generated from KDM1b KO females can survive until E10.5 (Ciccone et al., 2009); therefore, TCP-treated embryos show a more severe phenotype than oocyte-derived KDM1b deleted embryos. Depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage ((Ancelin et al., 2016)). The authors need to examine whether TCP treatment affects KDM1a expression. Western blotting would be recommended to quantify the expression of KDM1A and KDM1B in the TCP-treated embryos.

      We will further dig the transcriptome data to confirm the specificity of TCP to KDM1b. In addition, the intervention of TCP on the whole fertilized egg in this study increased the H3K4me2 content, and the embryo development retarding effect was more significant than that obtained by crossing with normal paternal lines after knocking down KDM1B from the mother.

      (5) H3K4me2 is increased dramatically in the TCP-treated embryos in Figure 4 (the intensity is 1,000 times more than the control). However, the Cut & Run H3K4me2 shows that the H3K4me2 signal is increased in 251 genes and decreased in 194 genes in the TCP-treated embryos (Fold changes > 2, P < 0.01). The authors need to explain why the gain of H3K4me2 is less evident in the Cut & Run data set than in the immunofluorescence result.

      Thanks a lot for your question. In the experimental group, the fluorescence value of H3K4me2 in IF was increased by 1000 times (Figure 4E), and the expression of H3K4Me2-related genes in CR was up-regulated and down-regulated for a total of 445 changes (Figure 6A). In our opinion, as a semi-quantitative analysis, immunofluorescence cannot be compared with the quantitative analysis method of CR because of the different analysis models and threshold Settings.

      References

      Ancelin, K., ne Syx, L., Borensztein, M., mie Ranisavljevic, N., Vassilev, I., Briseñ o-Roa, L., Liu, T., Metzger, E., Servant, N., Barillot, E., Chen, C.-J., Schü le, R., & Heard, E. (2016). Maternal LSD1/KDM1A is an essential regulator of chromatin and transcription landscapes during zygotic genome activation. https://doi.org/10.7554/eLife.08851.001

      Ciccone, D. N., Su, H., Hevi, S., Gay, F., Lei, H., Bajko, J., Xu, G., Li, E., & Chen, T. (2009). KDM1B is a histone H3K4 demethylase required to establish maternal genomic imprints. Nature, 461(7262), 415-418. https://doi.org/10.1038/nature08315

      Shao, G. B., Chen, J. C., Zhang, L. P., Huang, P., Lu, H. Y., Jin, J., Gong, A. H., & Sang, J. R. (2014). Dynamic patterns of histone H3 lysine 4 methyltransferases and demethylases during mouse preimplantation development. In Vitro Cellular and Developmental Biology - Animal, 50(7), 603-613. https://doi.org/10.1007/s11626-014-9741-6

      Reviewer #3 (Public Review):

      Summary:

      This study explores the dynamic reprogramming of histone modification H3K4me2 during the early stages of mammalian embryogenesis. Utilizing the advanced CUT&RUN technique coupled with high-throughput sequencing, the authors investigate the erasure and re-establishment of H3K4me2 in mouse germinal vesicle (GV) oocytes, metaphase II (MII) oocytes, and early embryos.

      Strengths:

      The findings provide valuable insights into the temporal and spatial dynamics of H3K4me2 and its potential role in zygotic genome activation (ZGA).

      Weaknesses:

      The study primarily remains descriptive at this point. It would be advantageous to conduct further comprehensive functional validation and mechanistic exploration.

      Key areas for improvement include enhancing the innovation and novelty of the study, providing robust functional validation, establishing a clear model for H3K4me2's role, and addressing technical and presentation issues. The text would benefit from the introduction of a novel conceptual framework or model that provides a clear explanation of the functional consequences and molecular mechanisms underlying H3K4me2 reprogramming in the transition from parental to early embryonic development.

      While the findings are significant, the current manuscript falls short in several critical areas. Addressing major and minor issues will significantly strengthen the study's contribution to the field of epigenetic reprogramming and embryonic development.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In the manuscript by Li et al., the authors perform a comprehensive study on the template and cofactor determinants of the SARS-CoV-2 nsp13 protein. They find that, alongside the classical processive unwinding ability of helicases driven by ATP consumption, other chaperone-like and ATP-independent functions exist for this enzyme. By testing DNA and RNA oligos in several conformations, the authors show that these functions are highly dependent on template identity, but also on the ratio of ATP to divalent cations. Ultimately, it is suggested that these distinct mechanisms of action are employed by nsp13 to orchestrate viral replication.

      Overall, this study provides some novel insights into the functionality of a central and conserved enzyme of a relevant human pathogenic virus. While the approach is important and adds to the field, particularly by characterizing the chaperoning activities and adding G-quadruplexes as templates, previous studies have already identified several determinants of nsp13 template binding and processing in vitro (Sommers et al., 2023, JBC; Park et al., 2025, JBC). In addition, some issues regarding experimental design need to be addressed to increase the cogency and biological relevance of the study.

      We thank the reviewer for recognizing the novelty of our work, particularly the ATP-independent chaperone-like activities and G-quadruplex remodeling. We also appreciate the opportunity to clarify the conceptual distinction between our study and the prior work by Sommers et al. (2023) and Park et al. (2025). We fully agree that those studies systematically defined the canonical ATP-driven motor mechanism of Nsp13. Our results on 5′→3′ polarity, DNA preference, and tail/ATP/Mg<sup>2+</sup> dependence align with these benchmarks, confirming the reliability of our platform.

      However, the core novelty of our work lies in revealing that Nsp13 functions as a multifaceted nucleic acid remodeler, integrating motor and non-motor activities within a single protein-a functional regime absent from the JBC papers. Specifically, we uncover three novel layers: 1. Mg<sup>2+</sup>-activated, ATP-independent remodeling of short duplexes and G-quadruplexes. 2. Bidirectional remodeling on duplexes in the Mg<sup>2+</sup>-primed state. 3. Intrinsic chaperone functions including strand annealing and stem-loop restructuring.

      Thus, our work fundamentally expands the biochemical model of Nsp13 from a simple ATP-driven motor to a multifunctional, mode-switchable remodeler. We will highlight these distinctions in the revised Discussion. Below, we respond point-by-point to the specific experimental design issues.

      (1) Generally, low concentrations of monovalent cations (20 mM), as used throughout this study, may influence helicase activity and artificially enhance protein binding/oligomerization, which could favor the observed chaperoning activity (Venus et al., 2022, Methods). In contrast, some helicases, such as HCV NS3, are inhibited by higher K+ concentrations (Gwack et al., 2004, FEBS). Thus, the influence of higher concentrations of monovalent cations should be tested in relevant assays, as intracellular K+ levels are usually >100 mM. Additionally, this could significantly affect template stability. For instance, in some G4 assays, the addition of the trap already leads to observable duplex formation (Figure 5), which may be due to low K+ conditions.

      We thank the reviewer for this critical comment regarding the ionic environment. We agree that monovalent cation concentrations are pivotal for both helicase activity and the structural stability of templates like G4s.

      First, we wish to clarify that the final NaCl concentration in our reaction is not 20 mM, as this refers only to the unwinding buffer. Our protein dilution buffer contains 200 mM NaCl, and each 10 μL reaction includes 2 μL of protein, contributing ~40 mM NaCl. With 20 mM from the reaction buffer, the final concentration reaches~60 mM. We will clarify this in the Methods.

      Second, our choice of ionic strength is guided by established literature. A survey of 27 published nsp13 studies (Author response table 1) shows that the majority use 20–50 mM monovalent cations, with 20 mM being most common. Mickolajczyk et al. (2021) showed that nsp13 activity is highest at low salt and declines at higher concentrations. Thus, low salt conditions are routinely used to capture nsp13’s intrinsic catalytic activity. The intracellular environment is far more complex, with crowding and interacting proteins that likely modulate helicase behavior. The low-salt conditions are therefore a deliberate simplification to isolate and define enzyme function.

      Planned experiments: We fully agree that higher salt concentrations should be tested. In the revision, we will perform key assays such as ATP-independent duplex unwinding and G4 unfolding at ≥100 mM NaCl or KCl to verify that the observed activities persist under more physiological ionic conditions

      (2) As in most publications that focus strictly on helicase (or other enzymatic) functions, the activity of the isolated protein is examined. However, particularly in the case of nsp13, core functions rely on other factors, such as nsp7/8 and other components of the replication-transcription complex (RTC). The overall structure and oligomerization state of nsp13 are altered within the complex (Chen et al., 2022, NSMB). The inclusion of such factors in key experiments would greatly improve the biological relevance of the findings.

      We agree that examining Nsp13 within the context of the RTC is essential for establishing the biological relevance of our findings. The structural reorganization of Nsp13 upon binding to Nsp12 and Nsp7/8 (Chen et al., 2022) suggests that its enzymatic "mode" may be regulated by its protein partners.

      Planned experiments: To address this, we will include the following biochemical characterizations:

      (1) Nsp13/12 and Nsp13/7/8 sub-complexes will be examined to dissect the individual contributions of the polymerase and the primase-like factors to Nsp13’s multifaceted activities.

      (2) The core RTC (Nsp13/12/7/8) will be used to evaluate how the full assembly modulates the functions of Nsp13 particularly on complex templates like G4 and pseudoknots.

      (3) In Figure 4, the authors claim that Mg2+ concentration inhibits RNA unwinding. While this is likely considering previous findings, it must be validated that duplex stabilization is not the primary cause for the observed lower dissociation rates. As the template is only 12 bp long with extensive overhangs, higher ion concentrations may significantly stabilize base pairing by reducing fraying effects. Similarly, in Figure 6, template-dependent effects of Mg2+/ATP should be ruled out.

      We thank the reviewer for this insightful suggestion. We agree that it is critical to distinguish whether the observed inhibition of RNA unwinding at higher Mg<sup>2+</sup> concentrations is due to the physical stabilization of the RNA duplex.

      Planned experiments: To address this, we will perform the following characterizations:

      (1) We will measure the Tm of the RNA duplex used in Figure 4 across a range of Mg<sup>2+</sup> concentrations (0, 0.5, and 1.0 mM). This will allow us to quantify the extent to which divalent cations stabilize the duplex RNA. These data will provide a more rigorous interpretation of the Mg<sup>2+</sup>-dependent unwinding in Figure 4.

      (2) Similarly, we will perform thermal melting analyses for the various DNA and RNA templates used in Figure 6 under different Mg<sup>2+</sup>/ATP conditions to rule out the template-dependent effects of Mg<sup>2+</sup>/ATP.

      (4) It is not entirely clear to me by which principle the templates were chosen. In my opinion, it would improve the overall comparability of the experimental results if, for instance, the blunt-ended duplex had the same sequence as the oligos with overhangs, since factors such as length, G/C content, Tm, etc., may play a significant role in binding and unwinding. Similarly, the oligos for binding and unwinding should be kept somewhat comparable, e.g., the G4 for the binding assay has 3 stacks, whereas RG1 has only 2. This discrepancy could make a significant difference. Thus, key experiments should be repeated using comparable sequence pairs.

      We fully agree with the reviewer that maintaining sequence consistency across different assays is essential for a rigorous comparison of nsp13 activities. We apologize for the ambiguity in the initial presentation of our sequences in Table S1.

      Planned revisions and experiments:

      (1) We wish to clarify that several key substrates were sequence-matched. For unwinding assays, the 12-bp 3′-overhang DNA and blunt-ended DNA share the identical duplex sequence, and the 16-bp 5′-overhang and 3′-overhang DNA substrates are also sequence-matched. For annealing assays, the duplex regions for all DNA substrates (3′, 5′, blunt, and fork) are identical, and the same internal consistency was maintained for all RNA annealing substrates. To make this clear, we will reorganize Table S1 to explicitly group these sequence-paired substrates.

      (2) The reviewer also notes discrepancies between binding and unwinding substrates (e.g., the difference in G4 stacks). To ensure direct comparability, we will perform additional experiments: complete binding assays for RG-1 (the 2-stack G4 used in unwinding) to match the functional data, and systematically measure binding affinities for all key unwinding substrates, including 3′-overhang, 5′-overhang, blunt-ended DNA, and the RNA fork.

      (5) Moreover, in the initial characterization of the binding abilities (Figure 1), the authors should include blunt-ended controls (duplex/hairpin) and, importantly, a pseudoknot (PK), as these structures are crucial for multiple steps in the viral life cycle (frameshifting, replication). Specifically, the PK in the 3'UTR (Sola et al., 2011, RNA Biology) may be an interesting target structure for unwinding assays, as it recruits the RTC, and, to my knowledge, no studies are available regarding nsp13 function at a PK. This would be particularly interesting in combination with nsp7/8 (Ohyama et al., 2024, JACS Au).

      We thank the reviewer for this insightful and inspiring suggestion. Incorporating pseudoknot (PK) structures into our analysis—particularly the well-characterized PK in the 3'UTR (Sola et al., 2011)—represents a significant opportunity to bridge our biochemical findings with the viral life cycle. To address this, we have designed a 3'UTR PK substrate based on recently reported scaffolds (Ohyama et al., 2024).

      Planned experiments:

      (1) We will expand our initial binding assays (Figure 1) to include blunt-ended duplexes, hairpins, and the 3'UTR PK. This will establish a baseline for how Nsp13 recognizes these structurally distinct and physiologically critical templates.

      (2) We will perform unwinding assays to determine whether Nsp13, in its isolated state, possesses the mechanical capability to resolve the complex tertiary interactions within a pseudoknot.

      (3) Following the reviewer's insight, we will examine whether the addition of nsp7/8 is required to facilitate the unfolding of the 3'UTR PK.

      Together, these experiments will allow us to assess whether Nsp13 is capable of managing one of the most challenging structural obstacles in the SARS-CoV-2 genome.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to broaden the understanding of SARS-CoV2 Nsp13 activity to show that a single viral protein can accomplish multiple functions. Additionally, they try to show that helicase function is not limited to ATP-driven, unidirectional unwinding.

      Strengths: The consistent application of statistics to triplicate experiments is a strength of the manuscript. The ToPif1 control in Figure S12 is a good control.

      We thank the reviewer for the insightful assessment and for highlighting the rigor of our experimental design, particularly our reliance on triplicate data with robust statistical validation and the inclusion of the ToPif1 control.

      We are especially grateful for the detailed comments provided by the reviewer. We fully recognize that addressing these specific points is essential for strengthening the cogency of our conclusions and improving the overall rigor of the manuscript. These suggestions have provided us with a clear roadmap for further refining our experimental evidence and clarifying our mechanistic interpretations. Below, we respond point-by-point to the specific issues.

      Weaknesses:

      (1) All the experiments except the one in Figure S2 use N-terminally His-tagged Nsp13. Because the N-terminal tag is known to have large effects on Nsp13 activity, this calls into question virtually all of the results in this manuscript.

      We thank the reviewer for raising this important concern regarding the potential influence of the N-terminal His tag on nsp13 activity. We have carefully considered this issue and provide the following lines of evidence to address it.

      (1) We have generated a tag-free nsp13 variant and our preliminary characterization (Author response image 1) shows that it retains all key activities: ATP hydrolysis (comparable to His-tagged nsp13), both ATP-independent (Mg<sup>2+</sup>-activated) and ATP-dependent unwinding, as well as chaperone activity to remodel stem-loops. These results demonstrate that while the His tag may modulate enzymatic efficiency, it does not create or abolish any specific biochemical function.

      (2) We conducted a systematic survey of 27 published studies on SARS-CoV/SARS-CoV-2 nsp13 (Author response table 1). The results show that 17 out of 27 studies (63%) used affinity-tagged nsp13 without tag removal, including His, MBP, GST, and Strep tags.

      (3) The only study that systematically compared different affinity tags (Adedeji et al., 2012) reported that GST-tagged nsp13 exhibited ~520-fold higher ATPase activity than His-tagged nsp13, demonstrating that the choice of affinity tag can affect enzymatic efficiency. However, both tagged versions retained all core enzymatic activities, including ATP hydrolysis and duplex unwinding. Importantly, no study has compared the full functional spectrum between His-tagged and tag-free nsp13. Our preliminary data suggest that the His tag may affect efficiency but does not alter the presence or absence of any specific activity.

      Planned experiments:

      We fully agree with the reviewer that a more systematic comparison would strengthen the conclusions. In the revision, we will include additional characterization of tag-free nsp13: (i) quantitative nucleic acid binding affinity, (ii) G4 unfolding efficiency, (iii) strand annealing activity. These experiments are currently underway.

      In summary, while we acknowledge that the His tag may influence enzymatic efficiency, our key conclusions are supported by experiments with tag-free nsp13. We will add a discussion of these points and include additional tag-free nsp13 data in the revised manuscript.

      (2) The ATP-independent, bidirectional duplex unwinding shown for short duplex substrates is reminiscent of the trapping of thermal fraying intermediates that have been reported for other helicases. Because they are only observed on short duplexes, do not require ATP, and are bidirectional, this does not suggest strand displacement as suggested in the manuscript. Instead, it suggests trapping of partially melted intermediates.

      We thank the reviewer for this insightful perspective. While the passive trapping of thermal fraying intermediates is a well-established model for non-catalytic protein-nucleic acid interactions, several lines of evidence suggest that nsp13 employs a more active, allosteric mechanism for ATP-independent remodeling.

      (1) If nsp13 were merely a passive trap, increasing duplex stability should decrease unwinding. However, as shown in Figure S3, raising Mg<sup>2+</sup> from 0 to 5 mM increases the DNA duplex Tm by ~10°C, yet nsp13’s remodeling activity is markedly enhanced under the same conditions (Figure 2). This positive correlation between cation-induced substrate stabilization and protein activation supports an active, protein-centered mechanism that overcomes the increased energetic barrier.

      (2) The observed bidirectionality in ATP-independent remodeling does not simply imply a lack of polarity; rather, it can reflect nsp13’s intrinsic chaperone function. In the absence of ATP, nsp13 binds the ss/ds junction (Figure 2F) and, in a Mg<sup>2+</sup>-dependent manner, may use its binding energy to actively intercalate into the duplex. This mechanism is inherently symmetric for 3′ and 5′ overhangs, explaining bidirectional remodeling, while the absence of activity on blunt-ended substrates confirms the requirement for a pre-existing junction.

      (3) The lack of activity on 24-bp substrates does not negate this remodeling mode but defines its energetic boundary. The binding energy released upon nsp13-nucleic acid interaction is sufficient to overcome the lower unwinding barrier of 12-16 bp duplexes, but insufficient to counteract the high stability and rapid re-annealing of a 24-bp duplex without the continuous mechanical power of ATP hydrolysis.

      Planned Revision:

      We thank the reviewer for prompting us to refine our mechanistic model. In the revision, we will add a dedicated discussion explicitly comparing the model of allosterically activated, binding-driven strand intrusion with the passive trapping model, incorporating the Tm data to strengthen our conclusions.

      (3) Results that may be artifacts of unusual in vitro conditions are interpreted as if similar results will occur in the cell, where ATP is likely always present. Along those same lines, SARS-CoV-2 replicates in compartments of the endoplasmic reticulum, which would limit the ability of Nsp13 to access DNA substrates.

      We thank the reviewer for raising this important concern regarding the physiological relevance. We fully agree that in vitro conditions do not entirely recapitulate the complex intracellular environment, and we have been careful not to over-interpret our findings. Below we address the two specific issues raised:

      (1) Regarding the ATP-independent activity, we acknowledge that ATP is abundant in healthy, actively replicating cells. However, during rapid viral replication, local ATP concentrations can fluctuate due to the high energy demand of the RTC as the template contains extensive secondary structures, which may lead to transient ATP depletion. Under such energy-limited conditions, Yu et al. (2025) demonstrated that ADP-bound nsp13 exhibits chaperone activity that destabilizes nucleic acid structures without ATP hydrolysis, and Dumm et al. (2025) reported that SARS-CoV-2 nsp13 resolves RNA stem-loops in an ATP-independent manner.

      Even when ATP is abundant, the ATP-independent mode may enable rapid, local structural adjustments that bypass the kinetic delay of ATP binding and hydrolysis. As shown in Figure 1D, nsp13 exhibits high binding affinity for structured nucleic acids. In this scenario, nsp13 functions not as a processive motor but through a binding-driven mechanism, using the free energy of protein-nucleic acid interaction to transiently destabilize short duplexes or resolve local secondary structures such as G4s and stem-loops in an energy-efficient manner.

      (2) Regarding DNA substrates, we fully agree that RNA is the physiological substrate for nsp13. However, DNA is a validated and widely accepted surrogate for mechanistic studies because DNA is more stable and easier to manipulate than RNA to yield the mechanistic insights. A systematic survey of 27 published nsp13 studies (Author response table 1) shows that 20 out of 27 (74%) used DNA substrates for at least some of their experiments. In our study, we used DNA primarily as a mechanistic probe and a stable control, and we validated all key conclusions on physiological RNA substrates, as shown in Figures 4, 5, 6, S7, S8, S10, S11 and S12.

      Planned revisions: To address the reviewer’s concerns more directly, we will revise the manuscript to include a discussion paragraph explicitly stating that the ATP-independent activity was observed under optimized in vitro conditions and may represent a latent remodeling capability that could be relevant under energy-limited conditions such as local ATP depletion during rapid replication. We will also clarify that DNA substrates were used as mechanistic probes and controls, and that all key findings were validated on physiological RNA substrates. We thank the reviewer for prompting us to strengthen the discussion of these important points.

      (4) There is no evidence to support the conclusion that "Duplex DNA supports bidirectional remodeling via both ATP-dependent and ATP-independent mechanisms." 3'-5' duplex melting is limited to short duplexes and is ATP-independent, suggesting it may be due to trapping of thermal fraying intermediates by the ssDNA binding Nsp13. The ATP-dependent and ATP-independent melting on the substrates with the 3'-overhang are the same, suggesting that ATP-dependent melting does not occur on this substrate, which would indicate that bidirectional ATP-dependent translocation does not occur.

      We are grateful to the reviewer for this critical evaluation of our mechanistic claims. We agree that our initial statement regarding bidirectional ATP-dependent remodeling was imprecise and not fully supported by the data. As the reviewer correctly notes, the similar unwinding efficiency on 3′-overhang substrates regardless of ATP presence indicates that ATP hydrolysis does not drive 3′→5′ translocation, which is consistent with nsp13’s known 5′→3′ motor polarity. The observed 3′→5′ activity is therefore more accurately described as an ATP-independent remodeling event, not ATP-dependent unwinding.

      We will revise the Discussion and relevant Results sections to clarify the nature of this bidirectional activity. Specifically, the sentence:

      "Duplex DNA supports bidirectional remodeling via both ATP-dependent and ATP-independent mechanisms..."will be corrected to: "Duplex DNA supports bidirectional remodeling via ATP-independent mechanisms."

      We will also explicitly state that while nsp13 requires ATP for long-range, processive 5'→3' helicase activity, its remodeling/chaperone function is inherently bidirectional and powered by the free energy of binding to the ss/ds junction, rather than by ATP-driven mechanical work.

      (5)-The description of ATP-independent unwinding as having "limited processivity," is likely not accurate. These experiments were multiturnover reactions with very high Nsp13 concentrations and no protein trap to ensure single turnover conditions. Because the reactions were multi-turnover, no information about the processivity of Nsp13 can be obtained. On the contrary, it seems likely that the product formed over the 30-minute reaction with a vast excess of Nsp13 is due to binding and dissociation of multiple Nsp13 molecules instead of processive translocation by a single enzyme.

      We thank the reviewer for this important correction. We fully agree that our use of the term "processivity" was technically imprecise. Processivity strictly defines the distance a single enzyme translocates during one binding event, which our multi-turnover assays (with high nsp13 concentrations and no protein trap) were not designed to measure. Our results specifically demonstrate that the ATP-independent remodeling mode is highly sensitive to duplex length, with efficiency declining sharply as the duplex lengthens. To reflect the experimental data more faithfully, we have replaced "processivity" with more accurate descriptors throughout the manuscript.

      Planned revisions:

      (1) Original: "The ATP-independent unwinding mode, however, has limited processivity." Revised: "The ATP-independent unwinding mode, however, exhibits a steep decline in efficiency as the duplex length increases."

      (2) Original: "...an ATP-independent, cation-activated mode with limited processivity." Revised: "...an ATP-independent, cation-activated mode specialized for localized structural remodeling"

      (3) Original: "...primes Nsp13 for basal strand remodeling but supports only limited processivity." Revised: "...primes Nsp13 for basal strand remodeling but is insufficient for the sustained unwinding of extended duplexes."

      (4) Original: "...primes Nsp13 for low-processivity strand displacement." Revised: "...primes Nsp13 for short-range strand displacement rather than long-range processive unwinding."

      We believe these changes clarify that the ATP-independent mode acts as a molecular chaperone for local obstacles (like G4 or short stems) rather than a motor for long-range translocation. We thank the reviewer for helping us improve the precision of our description.

      (6) G4s are much more stable at cellular K+ concentrations than they are at 20 mM K+. As such, Nsp13's ability to unfold a G4 in the absence of ATP may be diminished or eliminated at a physiological K+ concentration.

      We thank the reviewer for this critical point regarding physiological ion concentrations. We agree that K<sup>+</sup> significantly stabilizes G4 structures, which may raise the energy barrier for ATP-independent remodeling.

      Planned experiments:

      To address this, we will perform salt titration assays (up to 150 mM KCl) to evaluate the robustness of nsp13’s G4 unfolding activity under more physiological ionic conditions. We will also measure the melting temperature of our G4 substrates across this K<sup>+</sup> range to correlate structural stability with enzymatic efficiency.

      Author response image 1.

      Preliminary characterization of tag-free Nsp13 enzymatic activities. (A) Comparison of ATPase activity between His-tagged and tag-free Nsp13 in the presence of ssRNA or RNA G4. (B) Raw fluorescence data from stopped-flow FRET analysis of ATP-dependent unwinding (16-bp fork DNA, 2 mM Mg<sup>2+</sup>, 2 mM ATP). F/F<sub>0</sub> represents FAM fluorescence normalized to initial DNA intensity. (C) ATP-independent DNA duplex remodeling (data reproduced from Figure S2). (D) Chaperone activity of tag-free Nsp13 on DNA and RNA stem-loops.

      Author response table 1.

      Summary of affinity tags, monovalent salt concentrations, and substrate types used in 27 published SARS-CoV/SARS-CoV-2 nsp13 studies

      References:

      (1) Ivanov KA, Thiel V, Dobbe JC, van der Meer Y, Snijder EJ, Ziebuhr J. Multiple enzymatic activities associated with severe acute respiratory syndrome coronavirus helicase. J Virol. 2004 Jun;78(11):5619-32.

      (2) Lee NR, Kwon HM, Park K, Oh S, Jeong YJ, Kim DE. Cooperative translocation enhances the unwinding of duplex DNA by SARS coronavirus helicase nsP13. Nucleic Acids Res. 2010 Nov;38(21):7626-36.

      (3) Adedeji AO, Marchand B, Te Velthuis AJ, Snijder EJ, Weiss S, Eoff RL, Singh K, Sarafianos SG. Mechanism of nucleic acid unwinding by SARS-CoV helicase. PLoS One. 2012;7(5):e36521. doi: 10.1371/journal.pone.0036521.

      (4) Adedeji AO, Lazarus H. Biochemical Characterization of Middle East Respiratory Syndrome Coronavirus Helicase. mSphere. 2016 Sep 7;1(5):e00235-16.

      (5) Jia Z, Yan L, Ren Z, Wu L, Wang J, Guo J, Zheng L, Ming Z, Zhang L, Lou Z, Rao Z. Delicate structural coordination of the Severe Acute Respiratory Syndrome coronavirus Nsp13 upon ATP hydrolysis. Nucleic Acids Res. 2019 Jul 9;47(12):6538-6550.

      (4) Jang KJ, Jeong S, Kang DY, Sp N, Yang YM, Kim DE. A high ATP concentration enhances the cooperative translocation of the SARS coronavirus helicase nsP13 in the unwinding of duplex RNA. Sci Rep. 2020 Mar 11;10(1):4481.

      (5) Shu T, Huang M, Wu D, Ren Y, Zhang X, Han Y, Mu J, Wang R, Qiu Y, Zhang DY, Zhou X. SARS-Coronavirus-2 Nsp13 Possesses NTPase and RNA Helicase Activities That Can Be Inhibited by Bismuth Salts. Virol Sin. 2020 Jun;35(3):321-329.

      (6) Mickolajczyk KJ, Shelton PMM, Grasso M, Cao X, Warrington SE, Aher A, Liu S, Kapoor TM. Force-dependent stimulation of RNA unwinding by SARS-CoV-2 nsp13 helicase. Biophys J. 2021 Mar 16;120(6):1020-1030.

      (7) Chen J, Wang Q, Malone B, Llewellyn E, Pechersky Y, Maruthi K, Eng ET, Perry JK, Campbell EA, Shaw DE, Darst SA. Ensemble cryo-EM reveals conformational states of the nsp13 helicase in the SARS-CoV-2 helicase replication-transcription complex. Nat Struct Mol Biol. 2022 Mar;29(3):250-260.

      (8) Yazdi AK, Pakarian P, Perveen S, Hajian T, Santhakumar V, Bolotokova A, Li F, Vedadi M. Kinetic Characterization of SARS-CoV-2 nsp13 ATPase Activity and Discovery of Small-Molecule Inhibitors. ACS Infect Dis. 2022 Aug 12;8(8):1533-1542.

      (9) Corona A, Wycisk K, Talarico C, Manelfi C, Milia J, Cannalire R, Esposito F, Gribbon P, Zaliani A, Iaconis D, Beccari AR, Summa V, Nowotny M, Tramontano E. Natural Compounds Inhibit SARS-CoV-2 nsp13 Unwinding and ATPase Enzyme Activities. ACS Pharmacol Transl Sci. 2022 Apr 1;5(4):226-239.

      (10) Lu L, Peng Y, Yao H, Wang Y, Li J, Yang Y, Lin Z. Punicalagin as an allosteric NSP13 helicase inhibitor potently suppresses SARS-CoV-2 replication in vitro. Antiviral Res. 2022 Oct;206:105389.

      (11) Yue K, Yao B, Shi Y, Yang Y, Qian Z, Ci Y, Shi L. The stalk domain of SARS-CoV-2 NSP13 is essential for its helicase activity. Biochem Biophys Res Commun. 2022 Apr 23;601:129-136.

      (12) Grimes SL, Choi YJ, Banerjee A, Small G, Anderson-Daniels J, Gribble J, Pruijssers AJ, Agostini ML, Abu-Shmais A, Lu X, Darst SA, Campbell E, Denison MR. A mutation in the coronavirus nsp13-helicase impairs enzymatic activity and confers partial remdesivir resistance. mBio. 2023 Aug 31;14(4):e0106023.

      (13) Yu J, Im H, Lee G. Unwinding mechanism of SARS-CoV helicase (nsp13) in the presence of Ca2+, elucidated by biochemical and single-molecular studies. Biochem Biophys Res Commun. 2023 Aug 6;668:35-41.

      (14) Sommers JA, Loftus LN, Jones MP 3rd, Lee RA, Haren CE, Dumm AJ, Brosh RM Jr. Biochemical analysis of SARS-CoV-2 Nsp13 helicase implicated in COVID-19 and factors that regulate its catalytic functions. J Biol Chem. 2023 Mar;299(3):102980.

      (15) Maio N, Raza MK, Li Y, Zhang DL, Bollinger JM Jr, Krebs C, Rouault TA. An iron-sulfur cluster in the zinc-binding domain of the SARS-CoV-2 helicase modulates its RNA-binding and -unwinding activities. Proc Natl Acad Sci U S A. 2023 Aug 15;120(33):e2303860120.

      (16) Marx SK, Mickolajczyk KJ, Craig JM, Thomas CA, Pfeffer AM, Abell SJ, Carrasco JD, Franzi MC, Huang JR, Kim HC, Brinkerhoff H, Kapoor TM, Gundlach JH, Laszlo AH. Observing inhibition of the SARS-CoV-2 helicase at single-nucleotide resolution. Nucleic Acids Res. 2023 Sep 22;51(17):9266-9278.

      (17) Inniss NL, Rzhetskaya M, Ling-Hu T, Lorenzo-Redondo R, Bachta KE, Satchell KJF, Hultquist JF. Activity and inhibition of the SARS-CoV-2 Omicron nsp13 R392C variant using RNA duplex unwinding assays. SLAS Discov. 2024 Apr;29(3):100145.

      (18) Sales AH, Fu I, Durandin A, Ciervo S, Lupoli TJ, Shafirovich V, Broyde S, Geacintov NE. Variable Inhibition of DNA Unwinding Rates Catalyzed by the SARS-CoV-2 Helicase Nsp13 by Structurally Distinct Single DNA Lesions. Int J Mol Sci. 2024 Jul 19;25(14):7930.

      (19) Soper N, Yardumian I, Chen E, Yang C, Ciervo S, Oom AL, Desvignes L, Mulligan MJ, Zhang Y, Lupoli TJ. A Repurposed Drug Interferes with Nucleic Acid to Inhibit the Dual Activities of Coronavirus Nsp13. ACS Chem Biol. 2024 Jul 19;19(7):1593-1603.

      (20) Hao W, Hu X, Chen Q, Qin B, Tian Z, Li Z, Hou P, Zhao R, Balci H, Cui S, Diao J. Duplex Unwinding Mechanism of Coronavirus MERS-CoV nsp13 Helicase. Chem Biomed Imaging. 2024 Dec 19;3(2):111-122.

      (21) Park J, Jeong YJ, Chauhan K, Koh HR, Kim DE. ATPase-dependent duplex nucleic acid unwinding by SARS-CoV-2 nsP13 relies on facile binding and translocation along single-stranded nucleic acid. J Biol Chem. 2025 Jul;301(7):110373.

      (24) Yu J, Im H, Cho H, Jeon Y, Lee JB, Lee G. A novel ADP-directed chaperone function facilitates the ATP-driven motor activity of SARS-CoV helicase. Nucleic Acids Res. 2025 Jan 24;53(3):gkaf034.

      (25) Dumm AJ, Zheng AY, Butler TJ, Kulikowicz T, George JC, Bombard PT, Sommers JA, Ding J, Brosh RM Jr. SARS-CoV-2 point mutations are over-represented in terminal loops of RNA stem-loop structures that can be resolved by Nsp13 helicase in a unique manner with respect to nucleotide dependence. Nucleic Acids Res. 2025 May 22;53(10):gkaf447.

      (26) Castro JM, Slack RL, Ong YT, Zhang H, Gifford LB, Courouble VV, Aiken RM, Shankar V, O'Leary TR, Griffin PR, Lan S, Du Y, Fu H, Sarafianos SG. Stalling the Enemy: Targeting Nsp13 for Next-Generation SARS-CoV-2 Antivirals. Int J Mol Sci. 2026 Mar 11;27(6):2587.

      (27) Mingroni MA, Enney BM, Malsick LE, Geiss BJ. Motif V is an allosteric couple between the SARS-CoV-2 nsp13 nucleotide triphosphatase and helicase active sites. J Biol Chem. 2026 Mar;302(3):111198.

    1. Author response:

      eLife Assessment

      This useful study presents an improved protocol for long-term in vitro culture of Schistosoma mansoni that enables progression toward sexually dimorphic stages, representing a meaningful advance for studying parasite development and reducing reliance on animal models. The findings show that host-specific culture conditions support essential developmental and metabolic functions required for parasite maturation, although development remains delayed compared to in vivo conditions. The evidence is solid overall, but limited pairing efficiency and the absence of egg production indicate that the system does not yet fully recapitulate complete reproductive development.

      On behalf of the co-authors, we thank the three reviewers and the editors for their complimentary remarks as well as the major and minor comments/ concerns. Addressing these concerns have led to revisions that improved the manuscript. In particular, further analyses have generated an updated Figures 3 and 4, and Supplementary Tables S1, and S4-S6.

      Public Reviews:

      Reviewer #1 (Public review):

      Pichon, Rémi et al. describe an in vitro method for transforming Schistosoma cercariae into mature adult worms. The authors show that human serum (HS) supports parasite growth and differentiation more effectively than fetal bovine serum (FBS). They also observed differences in parasite growth and activity, with worms cultured in HS efficiently digesting human red blood cells (hRBC). Cultured worms were able to pair with ex vivo adult worms and produce eggs, indicating functional maturation suitable for downstream applications such as drug screening. While the experimental approach is comprehensive and supports the advantage of HS culture conditions, the pairing efficiency was low (≈7%) and required long culture periods (70-80 days), highlighting limitations that may affect reproducibility.

      We acknowledge the reviewer for the positive highlights. Regarding the low in vitro pairing efficiency, we have now edited the manuscript to clarify a misleading statement related to 7%. We decided to remove the value of 7% — which corresponds to the percentage of experiments in which couples were observed, as it does not accurately represent the actual number of observed worm pairs and it is probably misleading. We have updated the text as follows:

      Results, lines 230 ff.:

      “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”

      We also agree with the reviewer that the extended culture periods required to obtain fully sexually dimorphic parasites remain a limitation. As elaborated in Discussion (see below), key factors, probably derived from the host, are missing in the in vitro system explaining both the slow in vitro development and low rate of spontaneous pairing between in vitro developed, sexually dimorphic male and female worms. This was discussed as follows (lines 340-343): “That said, while our system was highly efficient in producing sexually dimorphic worms, spontaneous pairing between male and female parasites was extremely rare, mainly in aged in vitro cultures (from 80 to 100 days in culture) indicating that other factors, e.g., cholesterol, may be missing[35].”

      A major strength of the study, in particular, is that the authors clearly differentiate the effects of FBS versus HS on developmental progression. The conversion rate observed in HS cultures is significant and consistent with previously published data.

      While the study has several strengths, some aspects of the work are not fully explored. In particular, the role of hRBC supplementation requires further clarification. Although HS-cultured worms were shown to digest hRBC more readily, the implications of this observation remain unclear. Specifically, it would be useful to understand whether hRBC supplementation influences (1) long-term culture stability, (2) molecular pathways associated with development and differentiation, or (3) the pairing capacity of the worms. While addressing these questions may not be the main objective of the study, further discussion of these points would strengthen the manuscript.

      We agree that deciphering the role of the human Red Blood Cells (hRBCs) supplementation is critical. Regarding the influence of hRBCs on the long-term culture stability in parasite development it has been well established for more than four decades that schistosomes do need red blood cells to grow in culture [Basch, P. F. Cultivation of Schistosoma mansoni in vitro. II. production of infertile eggs by worm pairs cultured from cercariae. J Parasitol 67, 186-190 (1981); Basch, P. F. Cultivation of Schistosoma mansoni in vitro. I. Establishment of cultures from cercariae and development until pairing. J. Parasitol. 67, 179-185 (1981)]. The molecular pathways underlying development, sexual differentiation and pairing and modulated by hRBCs in culture is currently being investigated by our team. We decided not to include these data and analyses in the current manuscript, as they fall outside its scope.

      The manuscript is clearly written and represents a valuable contribution to the field. Overall, the experimental approach is sound, and the results support a useful methodological framework for the in vitro culture of Schistosoma worms and the attainment of sexual maturity, particularly for adult male worms.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Reviewer #2 (Public review):

      Summary:

      The authors perform confirmation studies of Paul Basch's seminal schistosome work from 1981, demonstrating the development of transformed schistosomules into sexually dimorphic adult parasites, albeit without successful egg production. In addition to the findings from Basch's earlier work, the authors add some new molecular data in the form of an analysis of proliferative cells in in-vitro-derived animals.

      Strengths:

      The authors successfully confirm experimental results from earlier schistosome researchers, providing a potential new tool for studying schistosome biology without the need for vertebrate hosts.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Weaknesses:

      The display of data from the authors is sometimes difficult to follow/understand where it comes from. For example:

      (1) Line 136: The authors claim that parasites in HS and FBS conditions have substantially different mortality rates (11.3 +/- 2.7 vs 5 +/- 2.3) but a quite high p-value (0.8). Analyzing the raw data myself, I obtained a mean of 8.2 +/- 1.7% vs 4.8% +/- 4.3% with a p-value of 0.15. Either the data are not clearly presented, and I did not follow them, or the data presented in the text do not match the raw data in the supplemental files.

      We thank the reviewer for pointing this out; we have now edited Supplementary Tables S1 and S6 by turning them into a long format for the sake of clarity. Accordingly, Results, Methods sections, and indicated supplementary tables were edited as follows:

      Results, lines 142 ff.:

      “No morphological differences were observed between parasites cultured either in FBS or HS within the first week in culture; in both conditions most parasites were classified as early schistosomula [category 1: 76% ± 30 (average ± SD) in FBS and 73% ± 29 (average ± SD) in HS] with few lung (category 2) and early liver schistosomula (category 3) (Figure 1B, week 1; Supplementary Figure S1). The mean mortality (category 0) at week 1 was slightly higher, but not statistically significant (P= 0.42), in worms cultured in HS [9.75% ± 2.76 (average ± SD)] compared to the mortality registered in FBS-cultured parasites [5.52% ± 5.18 (average ± SD), Supplementary Table S6], consistent with previous findings[39].”

      Methods, lines 463-465:

      “To evaluate differences in mortality between HS- and FBS-cultured parasites, data from 5 experiments were combined and analysed using a Shapiro-Wilk normality test to test normality of the data and a non-parametric Wilcoxon rank sum exact test (Supplementary Tables S1 and S6).”

      Supplementary Tables:

      Supplementary Table S1. “Raw counts of parasites within each developmental stage category. Each row corresponds to a picture of parasites in culture medium containing FBS or HS. Each column corresponds to the raw parasite counts at indicated stage development (categories 0 to 5), time in culture (Time in days - D), and experimental condition.”

      Supplementary Table S6. “Summary of all statistical tests employed in this study. 1. Statistical tests of parasite mortality and the raw data table used for this test. 2. Statistical tests for worm size comparisons (correspond to Figure 2). 3. Statistical tests for worm black gut comparisons (correspond to Figure 3). BG: Black gut. 4. Statistical tests for EdU positive cells comparisons (correspond to Figure 4). Replicate code: E, M and L correspond to day 2, 8 and 15 respectively; R and W correspond to the presence (R) or absence (W) of RBCs added 13 days after transformation.”

      For clarity, in Author response image 1 we provide the R script used to perform the statistical tests on the data shown in Supplementary Table S6 (column Raw count of parasite developmental category per image and experiment)

      Author response image 1.

      (2) Line 187/Figure 4: Though it is not clearly stated, it appears that the authors treat their EdU counts as an ordinal data set of 61 steps (from 0 to >60) rather than a continuous measure of EdU+ cells per animal. In this author's opinion, the graph strongly suggests a continuous data set, and the fact that this reviewer had to dig through poorly-labeled raw data to discover the nature of the data is problematic. The authors should either switch to a continuous data set or make it explicit that the data shown are ordinal. If counting EdU+ cells is too arduous, the authors could consider comparing the amount of EdU+ area to the amount of DAPI+ area in maximum intensity projections of their confocal images, as this would roughly approximate the amount of proliferative cells in the animals.

      As the reviewer correctly pointed out, the data were treated as ordinal because counting worms with more than 60 Edu+ cells became extremely difficult and highly inaccurate. Therefore, we decided to group in a single category, “60 EdU+ cells”, all worms showing more than 60 EdU+ cells. We have now updated Figure 4 where medians are shown instead of media values, Supplementary Table S5 to provide more comprehensive access to the raw counts, and Supplementary Table S6 to indicate the data for EdU+ cells per worm were considered ordinal. Accordingly, we have revised the corresponding sections as follows:

      Results, lines 211 ff.:

      “HS-cultured schistosomula showed higher numbers of proliferating stem cells, with a median of >48 and >60 EdU+ cells per worm at days 8 and 15, respectively (Figure 4). On the other hand, most FBS-cultured parasites displayed no more than an average of 20 EdU+ cells per worm (Figure 4).”

      Methods, lines 520 ff.:

      “EdU+ cells per parasite were counted for an average of 100 parasites across three independent experiments (Supplementary Table S5). Worms were grouped based on the number of cells per individual, but all those showing ⪰ 60 EdU+ cells were counted in the same group named ‘60 EdU+ cells'. Therefore, the data were considered ordinal data. Statistical analysis was performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 considered significant (Supplementary Table S6).”

      Figure 4 legend, lines 830 ff.:

      “A. Violin plots showing the number of Edu+ cells per worm at indicated time points (2, 8, and 15 days post cercarial transformation) in parasites cultured either in Foetal Bovine Serum (FBS, blue) or Human Serum (HS, light brown). Human Red Blood Cells (hRBCs) were added in the culture at day 13 post cercarial transformation. The small black dots indicate individual worms, and the big black point indicates the median of EdU+ cells per worm. All worms showing ⪰ 60 EdU+ cells were counted and clustered together in the group named ‘60 EdU+ cells’. Hence, the data were treated as ordinal and statistical analysis performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 (*) considered significant (Supplementary Tables S5 and S6).”

      We thank the reviewer for the very interesting suggestion to quantify cell proliferation by calculating the ratio between EdU+ area to DAPI+ area in maximum intensity projections images. Measuring the fluorescence area for each worm in maximum projection is an excellent idea; however, due to the number of EdU+ cells present in some samples, we think this technique would not provide additional information or produce more detailed data compared with our analysis when the number of Edu+ cells exceeds 60 per worm. We will certainly consider this approximation for future studies.

      There are some minor issues as well:

      (1) Line 122: It is perhaps incorrect to refer to humans as "the" definitive host of schistosomes, as S. japonicum is primarily considered a zoonotic infection with water buffalo/cows being the primary definitive host.

      We thank the reviewer for pointing this out; we have now replaced “schistosomes” with “Schistosoma mansoni” (current line 131)

      (2) Line 185/298: The authors refer to EdU pulse-chase experiments, but the experiments described here are EdU pulse experiments.

      This is a very good point, we thank the reviewer for bringing this up and have accordingly edited by replacing “EdU pulse-chase” with “EdU pulse” experiments in lines 37, 204, and 321.

      Reviewer #3 (Public review):

      Summary:

      This study is significant as it established a protocol for the long-term culture of Schistosoma mansoni newly transformed cercariae, which developed in vitro into sexually dimorphic forms. The impact of two different sera, Fetal Bovine Serum (FBS) and Human Serum (HS), added to the culture medium supplemented with human red blood cells was evaluated. The authors demonstrated that HS-cultured parasites were able to digest red blood cells, a critical step for long-term parasite development. Furthermore, while most FBS-cultured parasites did not progress beyond an early liver stage, sexual dimorphism was clearly evident in the HS-cultured worms, albeit delayed compared to in vivo development.

      Strengths:

      This study could contribute to further in vitro studies for a better understanding of the unique sexual biology of Schistosoma mansoni and for screening novel schistosomicidal compounds. By increasing parasite development in in vitro studies, this protocol could have a positive impact on the principles of the 3Rs (Replacement, Reduction and Refinement) for animal research.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Weaknesses:

      As the authors mentioned, "pairing between male and female parasites was rare. Pairing was observed in approximately ~7% of the experiments, usually after day ~ 80 in culture. Egg production was also not achieved with this protocol.

      Following the reviewer’s point and to clarify a misleading point, we have now decided to remove the value of 7% — which corresponds to the percentage of experiments in which couples were observed. However, this value does not accurately reflect the actual number of observed worm pairs, and it is probably misleading. We have updated the text as follows:

      Results, lines 230 ff.:

      “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Lin et al. presents a timely, technically strong study that builds patient-specific midbrain-like organoids (MLOs) from hiPSCs carrying clinically relevant GBA1 mutations (L444P/P415R and L444P/RecNcil). The authors comprehensively characterize nGD phenotypes (GCase deficiency, GluCer/GluSph accumulation, altered transcriptome, impaired dopaminergic differentiation), perform CRISPR correction to produce an isogenic line, and test three therapeutic modalities (SapC-DOPS-fGCase nanoparticles, AAV9GBA1, and SRT with GZ452). The model and multi-arm therapeutic evaluation are important advances with clear translational value.

      My overall recommendation is that the work undergo a major revision to address the experimental and interpretive gaps listed below.

      Strengths:

      (1) Human, patient-specific midbrain model: Use of clinically relevant compound heterozygous GBA1 alleles (L444P/P415R and L444P/RecNcil) makes the model highly relevant to human nGD and captures patient genetic context that mouse models often miss.

      (2) Robust multi-level phenotyping: Biochemical (GCase activity), lipidomic (GluCer/GluSph by UHPLC-MS/MS), molecular (bulk RNA-seq), and histological (TH/FOXA2, LAMP1, LC3) characterization are thorough and complementary.

      (3) Use of isogenic CRISPR correction: Generating an isogenic line (WT/P415R) and demonstrating partial rescue strengthens causal inference that the GBA1 mutation drives many observed phenotypes.

      (4) Parallel therapeutic testing in the same human platform: Comparing enzyme delivery (SapC-DOPS-fGCase), gene therapy (AAV9-GBA1), and substrate reduction (GZ452) within the same MLO system is an elegant demonstration of the platform's utility for preclinical evaluation.

      (5) Good methodological transparency: Detailed protocols for MLO generation, editing, lipidomics, and assays allow reproducibility

      Weaknesses:

      (1) Limited genetic and biological replication

      (a) Single primary disease line for core mechanistic claims. Most mechanistic data derive from GD2-1260 (L444P/P415R); GD2-10-257 (L444P/RecNcil) appears mainly in therapeutic experiments. Relying primarily on one patient line risks conflating patient-specific variation with general nGD mechanisms.

      We thank the reviewer for highlighting the importance of genetic and biological replication. An additional patient-derived iPSC line was included in the manuscript, therefore, our study includes two independent nGD patient-derived iPSC lines, GD2-1260 (GBA1<sup>L444P/P415R</sup>) and GD2-10-257 (GBA1<sup>L444P/RecNcil</sup>), both of which carry the severe mutations associated with nGD. These two lines represent distinct genetic backgrounds and were used to demonstrate the consistency of key disease phenotypes (reduced GCase activity, elevated substrate, impaired dopaminergic neuron differentiation etc.) across different patient’s MLOs. Major experiments (e.g., GCase activity assays, substrate, immunoblotting for DA marker TH, and therapeutic testing with SapC-DOPS-fGCase, AAV9-GBA1) were performed using both patient lines, with results showing consistent phenotypes and therapeutic responses (see Figs. 2-6, and Supplementary Figs. 4-5). To ensure clarity and transparency, a new Supplementary Table 2 summarizes the characterization of both, the GD2-1260 and GD2-10-257 lines.

      (b) Unclear biological replicate strategy. It is not always explicit how many independent differentiations and organoid batches were used (biological replicates vs. technical fields of view).

      Biological replication was ensured in our study by conducting experiments in at least 3 independent differentiations per line, and technical replicates (multiple organoids/fields per batch) were averaged accordingly. We have clarified biological replicates and differentiation in the figure legends.

      (c) A significant disadvantage of employing brain organoids is the heterogeneity during induction and potential low reproducibility. In this study, it is unclear how many independent differentiation batches were evaluated and, for each test (for example, immunofluorescent stain and bulk RNA-seq), how many organoids from each group were used. Please add a statement accordingly and show replicates to verify consistency in the supplementary data.

      In the revision, we have clarified biological replicates and differentiation in the figure legend in Fig.1E; Fig.2B,2G; Fig.3F, 3G; Fig.4B-C,E,H-J, M-N; Fig.6D; and Fig.7A-C, I.

      (d) Isogenic correction is partial. The corrected line is WT/P415R (single-allele correction); residual P415R complicates the interpretation of "full" rescue and leaves open whether the remaining pathology is due to incomplete correction or clonal/epigenetic effects.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was not feasible because GBA1 overlaps with a highly homologous pseudogene (PGBA), which makes precise editing technically challenging. Consequently, only the L444P mutation was successfully corrected, and the resulting isogenic line retains the P415R mutation in a heterozygous state. Because Gaucher disease is an autosomal recessive disorder, individuals carrying a single GBA1 mutation (heterozygous carriers) do not develop clinical symptoms. Therefore, the partially corrected isogenic line, which retains only the P415R allele, represents a clinically relevant carrier model. Consistent with this, our results show that GCase activity was restored to approximately 50% of wild-type levels (Fig.4B-C), supporting the expected heterozygous state. These findings also make it unlikely that the remaining differences observed are due to clonal variation or epigenetic effects.

      (e) The authors tested week 3, 4, 8, 15, and 28 old organoids in different settings. However, systematic markers of maturation should be analyzed, and different maturation stages should be compared, for example, comparing week 8 organoids to week 28 organoids, with immunofluorescent marker staining and bulk RNAseq.

      We agree that a systematic analysis of maturation stages is essential for validating the MLO model. Our data integrated a longitudinal comparison across multiple developmental windows (Weeks 3 to 28) to characterize the transition from progenitors to mature/functional states for nGD phenotyping and evaluation of therapeutic modalities: 1) DA differentiation (Wks 3 and 8 in Fig. 3): qPCR analysis demonstrated the progression of DA-specific programs. We observed a steady increase in the mature DA neuron marker TH and ASCL1. This was accompanied by a gradual decrease in early floor plate/progenitor markers FOXA2 and PLZF, indicating a successful differentiation path from progenitors to differentiated/mature DA neurons. 2) Glycosphingolipid substrates accumulation (Wks 15 and 28 in Fig 2): To assess late-stage nGD phenotyping, we compared GluCer and GluSph at Week 15 and Week 28. This comparison highlights the progressive accumulation of substrates in nGD MLOs, reflecting the metabolic consequences of the disease at different mature stage. 3) Organoid growth dynamics (Wks 4, 8, and 15 in new Fig. 4): The new Fig. 4 tracks physical maturation through organoid size and growth rates across three key time points, providing a macro-scale verification of consistent development between WT and nGD groups. By comparing these early (Wk 3-8) and late (Wk 15-28) stages, we confirmed that our MLOs transition from a proliferative state to a post-mitotic, specialized neuronal state, satisfied the requirement for comparing distinct maturation stages.

      (f) The manuscript frequently refers to Wnt signaling dysregulation as a major finding. However, experimental validation is limited to transcriptomic data. Functional tests, such as the use of Wnt agonist/inhibitor, are needed to support this claim (see below).

      We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work.

      (g) Suggested fixes / experiments

      Add at least one more independent disease hiPSC line (or show expanded analysis from GD2-10-257) for key mechanistic endpoints (lipid accumulation, transcriptomics, DA markers).

      Additional line iPSC GD2-10-257 derived MLO was included in the manuscript. This was addressed above [see response to Weaknesses (1)-a].

      Generate and analyze a fully corrected isogenic WT/WT clone (or a P415R-only line) if feasible; at minimum, acknowledge this limitation more explicitly and soften claims.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was unsuccessful because the GBA1 gene overlaps with a pseudogene (PGBA) located16kd downstream of GBA1, which shares 9698% sequence similarity with GBA1) (Ref#1, #2), which complicates precise editing. GBA1 is shorter (~5.7 kb) than PGBA (~7.6 kb). The primary exonic difference between GBA1 and PGBA is a 55-bp deletion in exon 9 of the pseudogene. As a result, the isogenic line we obtained carries only the P415R mutation, and L444P was corrected to normal sequence. We have included this limitation in the Methods as “This gene editing strategy is expected to also target the GBA1 pseudogene due to the identical target sequence, which limits the gene correction on certain mutations (e.g., P415R)”.

      References:

      (1) Horowitz M., Wilder S., Horowitz Z., Reiner O., Gelbart T., Beutler E. The human glucocerebrosidase gene and pseudogene: structure and evolution. Genomics (1989). 4, 87–96. doi:10.1016/0888-7543(89)90319-4

      (2) Woo EG, Tayebi N, Sidransky E. Next-Generation Sequencing Analysis of GBA1: The Challenge of Detecting Complex Recombinant Alleles. Front Genet. (2021). 12:684067. doi: 10.3389/fgene.2021.684067. PMCID: PMC8255797.

      Report and increase independent differentiations (N = biological replicates) and present per-differentiation summary statistics.

      This was addressed above [see response to Weaknesses (1)-b, (1)-c].

      (2) Mechanistic validation is insufficient

      (a) RNA-seq pathways (Wnt, mTOR, lysosome) are not functionally probed. The manuscript shows pathway enrichment and some protein markers (p-4E-BP1) but lacks perturbation/rescue experiments to link these pathways causally to the DA phenotype.

      (b) Autophagy analysis lacks flux assays. LC3-II and LAMP1 are informative, but without flux assays (e.g., bafilomycin A1 or chloroquine), one cannot distinguish increased autophagosome formation from decreased clearance.

      (c) Dopaminergic dysfunction is superficially assessed. Dopamine in the medium and TH protein are shown, but no neuronal electrophysiology, synaptic marker co-localization, or viability measures are provided to demonstrate functional recovery after therapy.

      (d) Suggested fixes / experiments - Perform targeted functional assays:

      (i) Wnt reporter assays (TOP/FOP flash) and/or treat organoids with Wnt agonists/antagonists to test whether Wnt modulation rescues DA differentiation.

      (ii) Test mTOR pathway causality using mTOR inhibitors (e.g., rapamycin) or 4E-BP1 perturbation and assay effects on DA markers and autophagy.

      Include autophagy flux assessment (LC3 turnover with bafilomycin), and measure cathepsin activity where relevant.

      Add at least one functional neuronal readout: calcium imaging, MEA recordings, or synaptic marker quantification (e.g., SYN1, PSD95) together with TH colocalization.

      We thank the reviewer for these valuable suggestions. We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work. Importantly, the primary conclusions of our manuscript, that GBA1 mutations in nGD MLOs resulted in nGD pathologies such as diminished enzymatic function, accumulation of lipid substrates, widespread transcriptomic changes, and impaired dopaminergic neuron differentiation, which can be corrected by several therapeutic strategies in this study, are supported by the evidence presented. The suggested experiments represent an important direction for future research using brain organoids.

      (3) Therapeutic evaluation needs greater depth and standardization

      (a) Short windows and limited durability data. SapC-DOPS and AAV9 experiments range from 48 hours to 3 weeks; longer follow-up is needed to assess durability and whether biochemical rescue translates into restored neuronal function.

      We agree with the reviewer. Because this is a proof-of-principle study, the treatment was designed within a short time window. Long-term studies with more comprehensive outcome assessments will be conducted in future work.

      (b) Dose-response and biodistribution are under-characterized. AAV injection sites/volumes are described, but transduction efficiency, vg copies per organoid, cell-type tropism quantification, and SapC-DOPS penetration/distribution are not rigorously quantified.

      We appreciate the reviewer’s concerns. This study was intended to demonstrate the feasibility and initial response of MLOs to AAV therapy. A comprehensive evaluation of AAV biodistribution will be considered in future studies.

      The penetration and distribution of SapC-DOPS have been extensively characterized in prior studies. In vivo biodistribution of SapC–DOPS coupled CellVue Maroon, a fluorescent cargo, was examined in mice bearing human tumor xenografts using real-time fluorescence imaging, where CellVue Maroon fluorescence in tumor remained for 48 hours (Ref. #3: Fig. 4B, mouse 1), 100 hours (Ref. #4: Fig. 5), up to 216 hours (Ref. #5: Fig. 3). Uptake kinetics were also demonstrated in cells, with flow cytometry quantification showing that fluorescent cargo coupled SapC-DOPS nanovesicles, were incorporated into human brain tumor cell membranes within minutes and remained stably incorporated into the cells for up to one hour (Ref. # 6: Fig. 1a and Fig. 1b). Building on these findings, the present study focuses on evaluating the restoration of GCase function rather than reexamining biodistribution and uptake kinetics.

      References:

      (3) X. Qi, Z. Chu, Y.Y. Mahller, K.F. Stringer, D.P. Witte, T.P. Cripe. Cancer-selective targeting and cytotoxicity by liposomal-coupled lysosomal saposin C protein. Clin. Cancer Res. (2009) 15, 5840-5851. PMID: 19737950.

      (4) Z. Chu, S. Abu-Baker, M.B. Palascak, S.A. Ahmad, R.S. Franco, and X. Qi. Targeting and cytotoxicity of SapC-DOPS nanovesicles in pancreatic cancer. PLOS ONE (2013) 8, e75507. PMID: 24124494.

      (5) Z. Chu, K. LaSance, V.M. Blanco, C-H. Kwon, B. Kaur, M. Frederick, S. Thornton, L. Lemen, and X. Qi. Multi-angle rotational optical imaging of brain tumors and arthritis using fluorescent SapC-DOPS nanovesicles. J. Vis. Exp. (2014) 87, e51187, 1-7. PMID: 24837630.

      (6) J. Wojton, Z. Chu, C-H. Kwon, L.M.L. Chow, M. Palascak, R. Franco, T. Bourdeau, S. Thornton, B. Kaur, and X. Qi. Systemic delivery of SapC-DOPS has antiangiogenic and antitumor effects against glioblastoma. Mol. Ther. (2013) 21, 1517-1525. PMID: 23732993.

      (c) Specificity controls are missing. For SapC-DOPS, inclusion of a non-functional enzyme control (or heat-inactivated fGCase) would rule out non-specific nanoparticle effects. For AAV, assessment of off-target expression and potential cytotoxicity is needed.

      Including inactive fGCase would confound the assessment of fGCase in MLOs by immunoblot and immunofluorescence; therefore, saposin C–DOPS was used as the control instead.

      We agree that assessment of off-target expression and potential cytotoxicity for AAV is important, this will be included in future studies.

      (d) Comparative efficacy lacking. It remains unclear which modality is most effective in the long term and in which cellular compartments.

      To address this comment, we have added a new table (Supplementary Table 2) comparing the four therapeutic modalities and summarizing their respective outcomes. While this study focused on short-term responses as a proof-of-principle, future work will explore long-term therapeutic effects.

      (e) Suggested fixes/experiments

      Extend follow-up (e.g., 6+ weeks) after AAV/SapC dosing and evaluate DA markers, electrophysiology, and lipid levels over time.

      We appreciate the reviewer’s suggestions. The therapeutic testing in patient-derived MLOs was designed as a proof-of-principle study to demonstrate feasibility and the primary response (rescue of GCase function) to the treatment. A comprehensive, long-term therapeutic evaluation of AAV and SapC-DOPS-fGCase is indeed important for a complete assessment; however, this represents a separate therapeutic study and is beyond the scope of the current work.

      Quantify AAV transduction by qPCR for vector genomes and by cell-type quantification of GFP+ cells (neurons vs astrocytes vs progenitors).

      For the AAV-treated experiments, we agree that measuring AAV copy number and GFP expression would provide additional information. However, the primary goal of this study was to demonstrate the key therapeutic outcome, rescue of GCase function by AAV-delivered normal GCase, which is directly relevant to the treatment objective.

      Include SapC-DOPS control nanoparticles loaded with an inert protein and/or fluorescent cargo quantitation to show distribution and uptake kinetics.

      As noted above [see response to Weakness (3)-c], using inert GCase would confound the assessment of fGCase uptake in MLOs; therefore, it was not suitable for this study. See response above for the distribution and uptake kinetics of SapC-DOPS [see response to Weaknesses (3)-b].

      Provide head-to-head comparative graphs (activity, lipid clearance, DA restoration, and durability) with statistical tests.

      We have added a new table (Supplementary Table 2) providing a head-to-head comparison of the treatment effects.

      (4) Model limitations not fully accounted for in interpretation

      (a) Absence of microglia and vasculature limits recapitulation of neuroinflammatory responses and drug penetration, both of which are important in nGD. These absences could explain incomplete phenotypic rescues and must be emphasized when drawing conclusions about therapeutic translation.

      We agree that the absence of microglia and vasculature in midbrain-like organoids represents a limitation, as we have discussed in the manuscript. In this revision, we highlighted this limitation in the Discussion section and clarified that it may contribute to incomplete phenotyping and phenotypic rescue observed in our therapeutic experiments. Additionally, we have outlined future directions to incorporate microglia and vascularization into the organoid system to better recapitulate the in vivo environment and improve translational relevance (see 7th paragraph in the Discussion).

      (b) Developmental vs degenerative phenotype conflation. Many phenotypes appear during differentiation (patterning defects). The manuscript sometimes interprets these as degenerative mechanisms; the distinction must be clarified.

      We appreciate the reviewer’s comments. In the revised manuscript, we have clarified that certain abnormalities, such as patterning defects observed during early differentiation, likely reflect developmental consequences of GBA1 mutations rather than degenerative processes. Conversely, phenotypes such as substrate accumulation, lysosomal dysfunction, and impaired dopaminergic maturation at later stages are interpreted as degenerative features. We have updated the Results and Discussion sections to avoid conflating developmental defects with neurodegenerative mechanisms.

      (c) Suggested fixes

      Tone down the language throughout (Abstract/Results/Discussion) to avoid overstatement that MLOs fully recapitulate nGD neuropathology.

      The manuscript has been revised to avoid overstatements.

      Add plans or pilot data (if available) for microglia incorporation or vascularization to indicate how future work will address these gaps.

      The manuscript now includes further plans to address the incorporation of microglia and vascularization, described in the last two paragraphs in the Discussion. Pilot study of microglia incorporation will be reported when it is completed.

      (5) Statistical and presentation issues

      (a) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      We have clarified biological replicates and differentiation in the figure legend [see response to Weaknesses (1)-b, (1)-c].

      (b) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      We have updated Statistical analysis in methods as described below:

      For comparisons between two groups, data were analyzed using unpaired two-tailed Student’s t-tests when the sample size was ≥6 per group and normality was confirmed by the Shapiro-Wilk test. When the normality assumption was not met or when sample sizes were small (n < 6), the non-parametric Mann-Whitney U test was used instead. For comparisons involving three or more groups, one-way ANOVA followed by Tukey’s multiple comparison test was applied when data were normally distributed; otherwise, the nonparametric Dunn’s multiple comparison test was used. Exclusion of outliers was made based on cutoffs of the mean ±2 standard deviations. All statistical analyses were performed using GraphPad Prism 10 software. Exact p-values are reported throughout the manuscript and figures where feasible. A p-value < 0.05 was considered statistically significant.

      (c) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      In this work, quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM. This multilevel averaging approach minimizes bias from regional heterogeneity within organoids and accounts for variability across differentiations. Representative confocal images shown in the figures were selected to accurately reflect the quantified data. We believe this standardized quantification strategy ensures robust and reproducible results while appropriately representing the 3D architecture of the organoids.

      In the revision, we have clarified the method used for image analysis of sectioned MLOs as below:

      Quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed using ImageJ (NIH) on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM.

      (d) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      RNA-seq data are from same batch. The mapping rate is >90%. GEO accession will be active upon publication. These were included in the Methods.

      (e) Suggested fixes

      Add a table summarizing biological replicates, technical replicates, and statistical tests used for each figure panel.

      We have revised the figure legends to include replicates for each figure and statistical tests [see response in weaknesses (1)-b, (1)-c].

      Recompute statistics where appropriate (non-parametric if N is small) and report effect sizes and confidence intervals.

      Statistical analysis method is provided in the revision [see response in Weaknesses (5)-b].

      (6) Minor comments and clarifications

      (a) The authors should validate midbrain identity further with additional regional markers (EN1, OTX2) and show absence/low expression of forebrain markers (FOXG1) across replicates.

      We validated the MLO identity by 1) FOXG1 and 2) EN1. FOXG1 was barely detectable in Wk8 75.1_MLO but highly present in ‘age-matched’ cerebral organoid (CO), suggesting our culturing method is midbrain region-oriented. In nGD MLO, FOXG1 expression is significantly higher than 75.1_MLO, indicating that there was aberrant anterior-posterior brain specification, consistent with the transcriptomic dysregulation observed in our RNA-seq data.

      To further confirm midbrain identity, we examined the expression of EN1, an established midbrain-specific marker. Quantitative RT-PCR analysis demonstrated that EN1 expression increased progressively during differentiation in both WT-75.1 and nGD2-1260 MLOs at weeks 3 and 8 (Author response image 1). EN1 reached 34-fold and 373-fold higher levels than in WT-75.1 iPSCs at weeks 3 and 8, respectively, in WT-75.1 MLOs. In nGD MLOs, although EN1 expression showed a modest reduction at week 8, the levels were not significantly different from those observed in age-matched WT-75.1 MLOs (p > 0.05, ns).

      Author response image 1.

      qRT-PCR quantification of midbrain progenitor marker EN1 expression in WT-75.1 and GD2-1260 MLOs at Wk3 and Wk8. Data was normalized to WT-75.1 hiPSC cells and presented as mean ± SEM (n = 3-4 MLOs per group). ns, not significant.

      (b) Extracellular dopamine ELISA should be complemented with intracellular dopamine or TH+ neuron counts normalized per organoid or per total neurons.

      We quantified TH expression at both the mRNA level (Fig. 3F) and the protein level (Fig. 3G/H) from whole-organoid lysates, which provides a more consistent and integrative measure across samples. These TH expression levels correlated well with the corresponding extracellular (medium) dopamine concentrations for each genotype. In contrast, TH<sup>+</sup> neuron counts may not reliably reflect total cellular dopamine levels because the number of cells captured on each organoid section varies substantially, making normalization difficult. Measuring intracellular dopamine is an alternative approach that will be considered in future studies.

      (c) For CRISPR editing: the authors should report off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus. (off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus).

      The off-target effect was analyzed during gene editing and the chance to target other off-targets is low due to low off-target scores ranked based on the MIT Specificity Score analysis. The related method was also updated as stated below:

      “The chance to target other off-targets is low due to low off-target scores ranked based on the MIT Specificity Score analysis (Hsu, P., Scott, D., Weinstein, J. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31, 827–832 (2013). https://doi.org/10.1038/nbt.2647).”

      (d) It should be clarified as to whether lipidomics normalization is to total protein per organoid or per cell, and include representative LC-MS chromatograms or method QC.

      The normalization was to the protein of organoid lysate. This was clarified in the Methods section in the revision as stated below:

      “The GluCer and GluSph levels in MLO were normalized to total MLO protein (mg) that were used for glycosphingolipids analyses. Protein mass was determined by BCA assay and glycosphingolipid was expressed as pmol/mg protein. Additionally, GluSph levels in the culture medium were quantified and normalized to the medium volume (pmol/mL).”

      Representative LC-MS chromatograms for both normal and GD MLOs have been included in a new figure, Supplementary Figure 2.

      (e) Figure legends should be improved in order to state the number of organoids, the number of differentiations, and the exact statistical tests used (including multiplecomparison corrections).

      This was addressed above [see response to Weaknesses (1)-b and (5)-b].

      (f) In the title, the authors state "reveal disease mechanisms", but the studies mainly exhibit functional changes. They should consider toning down the statement.

      The title was revised to: Patient-Specific Midbrain Organoids with CRISPR Correction Recapitulate Neuronopathic Gaucher Disease Phenotypes and Enable Evaluation of Novel Therapies

      (7) Recommendations

      This reviewer recommends a major revision. The manuscript presents substantial novelty and strong potential impact but requires additional experimental validation and clearer, more conservative interpretation. Key items to address are:

      (a) Strengthening genetic and biological replication (additional lines or replicate differentiations).

      This was addressed above [see response to Weaknesses (1)-a, (1)-b, (1)-c].

      (b) Adding functional mechanistic validation for major pathways (Wnt/mTOR/autophagy) and providing autophagy flux data.

      (c) Including at least one neuronal functional readout (calcium imaging/MEA/patch) to demonstrate functional rescue.

      As addressed above [see response to Weaknesses (2)], the suggested experiments in b) and c) would provide additional insights into this study and we will consider them in future work.

      (d) Deepening therapeutic characterization (dose, biodistribution, durability) and including specificity controls.

      This was addressed above [see response to Weaknesses (3)-a to e].

      (e) Improving statistical reporting and explicitly stating biological replicate structure.

      This was addressed above [see response to Weaknesses (1)-b, (5)-b].

      Reviewer #2 (Public review):

      Sun et al. have developed a midbrain-like organoid (MLO) model for neuronopathic Gaucher disease (nGD). The MLOs recapitulate several features of nGD molecular pathology, including reduced GCase activity, sphingolipid accumulation, and impaired dopaminergic neuron development. They also characterize the transcriptome in the MLO nGD model. CRISPR correction of one of the GBA1 mutant alleles rescues most of the nGD molecular phenotypes. The MLO model was further deployed in proof-of-principle studies of investigational nGD therapies, including SapC-DOPS nanovesicles, AAV9-mediated GBA1 gene delivery, and substrate-reduction therapy (GZ452). This patient-specific 3D model provides a new platform for studying nGD mechanisms and accelerating therapy development. Overall, only modest weaknesses are noted.

      We thank the reviewer for the supportive remarks.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors describe modeling of neuronopathic Gaucher disease (nGD) using midbrain-like organoids (MLOs) derived from hiPSCs carrying GBA1 L444P/P415R or L444P/RecNciI variants. These MLOs recapitulate several disease features, including GCase deficiency, reduced enzymatic activity, lipid substrate accumulation, and impaired dopaminergic neuron differentiation. Correction of the GBA1 L444P variant restored GCase activity, normalized lipid metabolism, and rescued dopaminergic neuronal defects, confirming its pathogenic role in the MLO model. The authors further leveraged this system to evaluate therapeutic strategies, including: (i) SapC-DOPS nanovesicles for GCase delivery, (ii) AAV9-mediated GBA1 gene therapy, and (iii) GZ452, a glucosylceramide synthase inhibitor. These treatments reduced lipid accumulation and ameliorated autophagic, lysosomal, and neurodevelopmental abnormalities.

      Strengths:

      This manuscript demonstrates that nGD patient-derived MLOs can serve as an additional platform for investigating nGD mechanisms and advancing therapeutic development.

      Comments:

      (1) It is interesting that GBA1 L444P/P415R MLOs show defects in midbrain patterning and dopaminergic neuron differentiation (Figure 3). One might wonder whether these abnormalities are specific to the combination of L444P and P415R variants or represent a general consequence of GBA1 loss. Do GBA1 L444P/RecNciI (GD2-10-257) MLOs also exhibit similar defects?

      We observed reduced dopaminergic neuron marker TH expression in GBA1 L444P/RecNciI (GD2-10-257) MLOs, suggesting that this line also exhibits defects in dopaminergic neuron differentiation. These data are provided in a new Supplementary Fig. 4E, and are summarized in new Supplementary Table 2 in the revision.

      (2) In Supplementary Figure 3, the authors examined GCase localization in SapC-DOPSfGCase-treated nGD MLOs. These data indicate that GCase is delivered to TH<sup>+</sup> neurons, GFAP<sup>+</sup> glia, and various other unidentified cell types. In fruit flies, the GBA1 ortholog, Gba1b, is only expressed in glia (PMID: 35857503; 35961319). Neuronally produced GluCer is transferred to glia for GBA1-mediated degradation. These findings raise an important question: in wild-type MLOs, which cell type(s) normally express GBA1? Are they dopaminergic neurons, astrocytes, or other cell types?

      All cell types in wild-type MLOs are expected to express GBA1, as it is a housekeeping gene broadly expressed across neurons, astrocytes, and other brain cell types. Its lysosomal function is essential for cellular homeostasis and is therefore not restricted to any specific lineage. (https://www.proteinatlas.org/ENSG00000177628GBA1/brain/midbrain).

      (3) The authors may consider switching Figures 2 and 3 so that the differentiation defects observed in nGD MLOs (Figure 3) are presented before the analysis of other phenotypic abnormalities, including the various transcriptional changes (Figure 2).

      We appreciate the reviewer’s suggestion; however, we respectfully prefer to retain the current order of Figures 2 and 3, as we believe this structure provides the clearest narrative flow. Figure 2 establishes the core biochemical hallmarks: reduced GCase activity, substrate accumulation, and global transcriptomic dysregulation (1,429 DEGs enriched in neural development, WNT signaling, and lysosomal pathways), which together provide essential molecular context for studying the specific cellular differentiation defects presented in Figure 3. Presenting the broader disease landscape first creates a coherent mechanistic link to the subsequent analyses of midbrain patterning and dopaminergic neuron impairment.

      To enhance readability, we have added a brief transitional sentence at the start of the Figure 3 paragraph: “Building on the molecular and transcriptomic hallmarks of GCase deficiency observed in nGD MLOs (Figure 2), we next investigated the impact on midbrain patterning and dopaminergic neuron differentiation (Figure 3).”

      Recommendations for the authors:

      Reviewing Editor Comments:

      Your paper has been reviewed by three expert reviewers in the GBA field. Although they appreciate the work and its novelty, they raise several concerns. We suggest that you to address these concerns in the next version.

      Reviewer #1 (Recommendations for the authors):

      Statistical and presentation issues

      (1) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      This was addressed above [see response to Reviewer 1 Weaknesses (1)- b].

      (2) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      We have updated methods to describe the Statistical analysis details [see response to Reviewer 1 Weaknesses (5)-b].

      (3) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      This was addressed above [see response to Reviewer 1 Weaknesses (5)- c].

      (4) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      Our RNA-seq data were generated from a single batch of MLOs, with mapping rates exceeding 90%. The GEO accession will be made publicly available upon publication.

      Reviewer #2 (Recommendations for the authors):

      Please consider the following suggestions for revisions:

      (1) Line 86: A bit more explanation/justification for the focus on midbrain-like organoids would be helpful, including introducing the nature of the midbrain pathology to better put some of the MLO findings in context. Is the nGD pathology for the midbrain significantly different / out of proportion to other affected brain regions?

      nGD Patients often display impaired vertical gaze and movement disorders. These symptoms correlate with midbrain involvement due to the sensitivity of this region to neuroinflammatory and degenerative processes (Ref #7, #8). Both human and mouse studies indicate that the midbrain exhibits prominent substrate accumulation compared to other brain regions, suggesting a predisposition for greater pathological involvement in GD midbrain (Ref #8, #9, #10, #11). This rationale was added to Line 86 in the revision.

      References:

      (7) Goker-Alpan O, Ivanova MM. Neuronopathic Gaucher disease: Rare in the West, common in the East. J Inherit Metab Dis.(2024) 47(5):917-934. PMID: 38768609.

      (8) Burrow TA, Sun Y, Prada CE, Bailey L, Zhang W, Brewer A, Wu SW, Setchell KDR, Witte D, Cohen MB, Grabowski GA. CNS, lung, and lymph node involvement in Gaucher disease type 3 after 11 years of therapy: clinical, histopathologic, and biochemical findings. Mol Genet Metab. (2015) 114(2):233-241. PMID: 25219293.

      (9) Tamar Farfel-Becker, Einat B. Vitner, Samuel L. Kelly, Jessica R. Bame, Jingjing Duan, Vera Shinder, Alfred H. Merrill, Kostantin Dobrenis, Anthony H. Futerman. Neuronal accumulation of glucosylceramide in a mouse model of neuronopathic Gaucher disease leads to neurodegeneration, Human Molecular Genetics, (2014). Volume 23, Issue 4, Pages 843–854.

      (10) E. Ellen Jones, Wujuan Zhang, Xueheng Zhao, Cristine Quiason , Stephanie Dale, Sheerin Shahidi-Latham, Gregory A. Grabowski, Kenneth D. R. Setchell, Richard R. Drake, and Ying Sun. High-Resolution MALDI Imaging Mass Spectrometry. SLAS Discovery (2017). Vol. 22(10) 1218–1228

      (11) Xu YH, Xu K, Sun Y, Liou B, Quinn B, Li RH, Xue L, Zhang W, Setchell KD, Witte D, Grabowski GA. Multiple pathogenic proteins implicated in neuronopathic Gaucher disease mice. Hum Mol Genet. (2014) 23(15):3943-57. PMID: 24599400.

      (2) Lines 359-360: Please specify the carbon-chain length of the sphingoid base of the GluCer species analyzed. Also, is there a citation for the statement that 18:0 and 16:0 are "brain-enriched species"?

      The carbon-chain length analyzed ranges from 14:0 to 24:0. The sphingoid base for all GluCer species analyzed is d18:1. For example, the species referred to as GluCer 18:0 corresponds to GluCer(d18:1/18:0). Although both, 16:0 and 18:0 are enriched in the brain, 18:0 is the most abundant species in the brain (Ref #12, #13). We revised "brain-enriched species” to “brain-predominant species (18:0)”.

      References:

      (12) Nilsson, O., and Svennerholm, L. Accumulation of Glucosylceramide and Glucosylsphingosine (Psychosine) in Cerebrum and Cerebellum in Infantile and Juvenile Gaucher Disease. Journal of Neurochemistry (1982) 39, 709–718.

      (13) Sun, Y., Zhang, W., Xu, Y.H., Quinn, B., Dasgupta, N., Liou, B., Setchell, K.D., and Grabowski, G.A. Substrate compositional variation with tissue/region and Gba1 mutations in mouse models--implications for Gaucher disease. PLoS One (2013). 8, e57560.10.1371/journal.pone.0057560.

      (3) Figure 2: It would be interesting to compare the MLO findings to prior gene expression data. Are there previously published transcriptome analyses from nGD brain tissue (or other tissues) that the transcriptome data obtained from MLOs may be compared with? What about transcriptome analyses of mouse GD models?

      We thank the reviewer for this valuable suggestion. To strengthen the biological context of our transcriptomic findings, we have added a new comparative table (new Supplementary Table 3) in the revised manuscript that summarizes key dysregulated pathways in our human nGD MLOs alongside previously published data from nGD mouse midbrain (Ref#14). The table highlights substantial overlap, including axon guidance, neuron differentiation, dopaminergic/glutamatergic/GABAergic synaptic signaling, lipid metabolism, apoptosis/cell death, and nervous system development, emphasizing the translational relevance of our model. We also note that our dataset uniquely reveals pronounced dysregulation of WNT signaling and anterior-posterior patterning (Fig. 2L and 2M), potentially reflecting human-specific early midbrain defects.

      We added the following sentence to Discussion: “Comparative analysis with prior transcriptomic data from nGD mouse midbrain showed consistent dysregulation in axon guidance, synaptic signaling, lipid metabolism, and nervous system development (new Supplementary Table 3), supporting the fidelity of our human MLO model.”

      Reference:

      (14) Dasgupta N, Xu YH, Li R, Peng Y, Pandey MK, Tinch SL, Liou B, Inskeep V, Zhang W, Setchell KD, Keddache M, Grabowski GA, Sun Y. Neuronopathic Gaucher disease: dysregulated mRNAs and miRNAs in brain pathogenesis and effects of pharmacologic chaperone treatment in a mouse model. Hum Mol Genet. (2015) 24(24):7031-48. PMID: 26420838.

      (4) Lines 402-405 & Figure 3D: Is it possible to include a merged image to better visualize the TH and FOXA2 co-staining / potential colocalization?

      The merged images of TH (red) and FOXA2 (green) are shown in Fig. 3E. Yellow arrows indicate TH and FOXA2 co-stained cells, which appear yellow in the merged images. The results demonstrate that the number of co-stained cells is reduced in GD2-1260 MLOs compared with WT-75.1 MLOs at both, week 6 and week 8.

      (5) Lines 447-448 & Figure 4F, G, J: It would be helpful to provide a direct analysis/visualization of MLO size between the WT-75.1, GD2-1260, and iso-GD2-1260 genotypes (allowing direct comparison of WT and iso). Similarly, the same 3-way analysis would be valuable for assessing dopamine levels.

      We have included WT-75.1 in Fig. 4 F/G/J in the revision. All three genotypes, WT-75.1, GD2-1260, and iso-GD2-1260, are presented for analysis compared to WT-75.1. In new Figure 4F, MLO growth is presented by representative MLO images taken under wide field microscopy at day 2, Wk4 and Wk8 of differentiation. In new Fig. 4G, MLOs size was analyzed by NIS elements and presented as the area (µm<sup>2</sup>) of MLO in image (mean ± SEM). N≥10 MLOs were analyzed for each genotype. In new Fig. 4J. Dopamine levels in MLO culture medium from WT-75.1, GD2-1260 and iso- GD2-1260 MLOs at Wk12 cultured in 3 mL BGM medium for 72 hours were analyzed. Data are presented as mean ± SEM (n = 5 per group). Statistical analysis applied was described in the legend.

      (6) Figure 4: What is the explanation/interpretation of the residual autophagy pathway dysfunction in CRISPR-corrected MLOs? nGD requires near-complete loss of GCase activity, so it is a bit curious that autophagic dysfunction would be observed with only ~50% GCase reduction? There is some discussion, but it doesn't fully capture the unexpected nature and implications of this result.

      This phenomenon may be explained by a threshold effect in lysosomal function. Gaucher disease is an autosomal recessive disorder. The carriers with heterozygous GBA1 mutation, who retain approximately 50% of normal GCase activity, do not develop disease. This suggests that even partial restoration of GCase activity can reduce glucosylceramide accumulation below a pathological threshold, thereby restoring lysosomal integrity and autophagic flux. In addition, improved GCase activity may help normalize the lipid composition of lysosomal membranes, facilitating the fusion events required for effective autophagy.

      (7) Lines 512-516 & Figure 5J: The data shown are inconclusive. Can these Western blot data be quantified, noting the number of replicates for each measurement? Without quantification and statistics, it is difficult to assess the claim that levels of LAMP1, LC3-I, LC3-II, 4E-BP1, and p-4E-BP1 in GD2-1260 treated with SapC-DOPS-fGCase are more similar to GD2-1260 treated following SapC-DOPS than to WT-75.1.

      We performed quantitative analysis by comparing WT-75.1 and included the data in new Fig. 5J. The result was revised as:

      Analysis of protein levels showed that decreased LAMP1 expression in GD2 1260 MLOs was not altered following SapC DOPS fGCase treatment (Figure 5J). The elevated LC3-II levels, an indicator of impaired autophagic flux, were reduced upon treatment, suggesting enhanced autophagic activity (Figure 5J). Moreover, phosphorylated 4E-BP1 (Thr37/46), but not total 4E-BP1, was improved in SapC-DOPS-fGCase–treated MLOs, reflecting a decrease in mTOR hyperactivation (Figure 5J). We anticipate that a longer duration of SapC-DOPS-fGCase exposure in nGD MLOs may produce a more robust therapeutic effect in rescuing nGD-associated phenotypes, which will be evaluated in future studies.

      (8) Lines 518-520: The presented data support "effective restoration of GCase activity," but clarification is needed regarding "correction of GD-related disease phenotypes." Perhaps "selected molecular and biochemical phenotypes" would be more accurate. Data are not shown for several other phenotypes, including TH, FOXA2, and dopamine levels.

      This was revised to “selected molecular and biochemical phenotypes “.

      (9) Figure 5D-J: Please clarify whether all experiments were conducted 48 hours after treatment, as indicated for Figure 5C. If so, does this suggest that SapC-DOPS treatment exhibits only short-term effects? Were any data collected to evaluate the persistence of the treatment effect?

      The treatment duration is specified in the Fig. 5 legend. Fig. 5D–J represent experiments conducted after two weeks of treatment, whereas Fig. 5C reflects a 48-hour treatment. In both Gaucher disease lines, two-week treatment restored GCase activity to wild-type levels and reduced GluSph substrate accumulation. These findings were intended as proof-of-principle to demonstrate therapeutic feasibility; evaluation of treatment persistence beyond two weeks was beyond the scope of this study.

      Minor suggestions

      (1) Line 80: "A brain organoid derived from hiPSCs of a healthy individual with GBA1 knockout and α-synuclein overexpression exhibited some PD features23." I would suggest enumerating what "PD features" are to distinguish from "clinical features", which I don't think is the intended meaning.

      This was revised as “exhibited characteristic PD markers”.

      (2) Figure 2I: The reported number of downregulated DEGs is incorrect. It should be 765, not 1429.

      This was corrected in Figure 2I.

      (3) Line 359: change "enrich" to "enriched".

      This word was corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This foundational study builds on prior work from this group to reveal the complexities underlying ligand-dependent RXRγ-Nur77 heterodimer formation, offering a compelling re-evaluation of their earlier conclusions. The authors examine how a library of RXR ligands influences the biophysical, structural, and functional properties of Nur77. They find that although the Nur77-RXRγ heterodimer shares notable functional similarities with the Nurr1-RXRα complex, it also exhibits unique features, notably, both dimer dissociation and classical agonist-driven activities. This work advances our understanding of the nuanced behaviors of nuclear receptor heterodimers, which have important implications for health and disease.

      Strengths:

      (1) Builds on previous work by providing a comprehensive analysis that examines whether Nur77-RXRγ heterodimer formation parallels that of the Nurr1-RXRα complex.

      (2) Systematic evaluation of a library of RXR ligands provides a broad survey of functional outputs.

      (3) Careful reanalysis of previous work sheds new light on how NR4A heterodimers function.

      We thank the reviewer for recognizing our work as foundational. In the nuclear receptor field, current understanding of ligand-regulated nuclear receptor activity is based largely on ligand-dependent coregulator recruitment preferences; for example, agonists enhance coactivator recruitment to activate transcription. Building on our recent study of Nurr1-RXRα, the present work suggests that activation of the evolutionarily related NR4A-RXR heterodimer Nur77-RXRγ by RXR ligands is also consistent with a non-classical activation mechanism involving heterodimer dissociation.

      Weaknesses:

      (1) Some conclusions appear overstated or are not well substantiated by the work presented. It's unclear how the data support a non-classical mode of agonism, for example, based on the data shown.

      We thank the reviewer for this important point. We did not intend to claim that Nur77-RXRγ activation is explained exclusively by a non-classical mode of agonism. Rather, our interpretation was that the data are consistent with two possible, non-mutually exclusive mechanisms: (1) a classical pharmacological mechanism involving ligand-dependent coregulator recruitment; and (2) a non-classical mechanism involving ligand-binding domain (LBD) heterodimer dissociation, as we previously described for Nurr1-RXRα. This differs from our prior eLife study of Nurr1-RXRα, in which the data supported the LBD heterodimer dissociation model but not the classical pharmacological model.

      In our revised manuscript, we clarify two points that are important for interpreting the Nur77-RXRγ data. First, several experimental limitations of the Nur77-RXRγ studies reduced the extent to which the mechanism could be resolved as rigorously as in our earlier Nurr1-RXRα study. Second, and more importantly, the currently available ligand set lacks Nur77-RXRγ-selective agonists. This limits our ability to determine whether LBD heterodimer dissociation is the sole or principal mechanism of activation, or instead one of several contributing mechanisms.

      Taken together, these results support LBD heterodimer dissociation as a plausible and experimentally observable component of Nur77-RXRγ activation and, therefore, as a candidate shared activation mechanism for NR4A-RXR heterodimers. At the same time, because the quantitative evidence is less definitive than in the Nurr1-RXRα system, we agree that conclusions regarding Nur77-RXRγ should be stated more cautiously. This caution is reflected in both the title of our manuscript (“Towards a unified mechanism…”) and the language used throughout the text.

      (2) Some assays have relatively few replicates, with only two in some cases.

      We thank the reviewer for their attention to experimental rigor. For some assays, the findings were reproduced in two independent experiments, which we considered sufficient to confirm the presence and reproducibility of the effects observed in those particular assay formats. In the original manuscript, we used a general statement in the figure legends (“representative of two or more independent experiments”) across all assay data. In the revised manuscript, we now specify the number of independent experimental replicates for each assay in the corresponding figure legends to improve transparency.

      Reviewer #2 (Public review):

      Summary:

      This study explores the mechanisms by which binding of the nuclear receptor RXRg regulates its heterodimeric partner Nur77. Previously, this group made the interesting discovery that ligand-dependent activation of RXRg bound to a related partner, Nurr1, does not occur through a classical pharmacological mechanism but through agonist-dependent dissociation of the complex through disruption of their ligand binding domain (LBD) interactions. Here, they revisit this paradigm with Nur77. In contrast to Nurr1, the authors do not have the reagents to clearly support a role for LBD dissociation. Following the model of partial ligand-dependent dissociation of the LBD heterodimer, the experimental data (NMR, ITC, SEC) are interesting and quite complex.

      Strengths:

      The authors do a rigorous job of describing the data and providing possible interpretations and caveats. Revisiting the analysis of Nurr1, they identify the crucial role that selective Nurr1-RXRg agonists played in supporting the LBD dissociation model; without analogous compounds for the Nur77-RXRg complex, it is difficult to invoke this mechanism. Interestingly, treatment with the Nurr1-RXRg selective agonist HX600 suggests it can induce some LBD dissociation. Therefore, there may be some similarities between the regulation of Nurr1 and Nur77 by RXRg.

      We thank the reviewer for this thoughtful and balanced summary of our work. We appreciate the reviewer’s recognition of both our prior findings in the Nurr1-RXRα system and the interesting, but more complex, experimental behavior observed here for Nur77-RXRγ. We agree that the absence of Nur77-RXRγ-selective agonists currently limits how definitively the contribution of LBD dissociation can be resolved, and we have revised the manuscript to make this point more explicit and to further temper our conclusions accordingly.

      Weaknesses:

      Despite evidence supporting a partial role for RXRg LBD dissociation as a mechanism to activate Nur77, other data demonstrate that a fundamentally different regulatory mechanism likely exists in the Nur77-RXRg complex that involves the RXRg disordered NTD. The decision to describe further study of this as outside the scope of this work is unfortunate, as it closed off an avenue that could have provided fruitful data informing the apparently distinct regulatory mechanisms of the Nur77-RXRg complex. Given the uncertainty in the importance of the partial roles of the pharmacological mechanism, LBD dissociation, and the RXRg NTD, this study may have limited impact on the field.

      We thank the reviewer for this thoughtful point. We agree that the RXRγ NTD likely contributes to regulation of Nur77-RXRγ transcription, and that our truncation data suggest that regions outside the LBD can influence transcriptional output. At present, however, the effect of RXRγ NTD truncation is not sufficiently mechanistically resolved to distinguish among several plausible explanations.

      For example, the RXRγ NTD has been implicated in phase separation and biomolecular condensate formation in cells (PubMed ID 40392852, 40420113, 33971237, 31881311), and perturbing these properties (via RXRγ NTD truncation) could indirectly affect Nur77-RXRγ transcriptional activity. In addition, NTDs of nuclear receptors can participate in coactivator or corepressor interactions (PubMed ID 24284822), raising the possibility that removal of the RXRγ NTD alters transcription by changing recruitment of regulatory factors rather than by directly informing the LBD-centered mechanism examined here. We will clarify in the revised manuscript that these possibilities remain unresolved and represent important directions for future study.

      We also agree that defining how multiple RXRγ domains contribute to Nur77-RXRγ regulation would be valuable for the field. However, the focus of the present study is narrower: to test whether, as in our previous eLife study of Nurr1-RXRα, RXR ligands can influence heterodimer function through effects on LBD-LBD interactions. Because the available data do not yet allow a mechanistic dissection of the RXRγ NTD contribution, we believe that a definitive analysis of this question would require a separate set of experiments beyond the scope of the present work. We have revised the manuscript to better acknowledge this limitation and to frame the conclusions accordingly.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, this is a compelling body of work. Additional summary statements and clearer transitions would be helpful throughout.

      Here are some points that should be addressed or at least discussed by the authors:

      (1) It is unclear in the luciferase assays whether the truncated proteins are functional or not. Were there Western blots or other assays run to confirm protein concentrations?

      We thank the reviewer for this point. We did not perform Western blotting or other assays to confirm equivalent expression levels of the truncated RXRγ constructs, and we agree that this is a limitation of the luciferase assay data. As a result, the transcriptional effects observed with the truncation constructs should be interpreted cautiously.

      With that said, the increased transcriptional activity observed upon deletion of the RXRγ NTD/AF-1 region suggests that this region may exert a repressive effect on Nur77-RXRγ transcription. This effect could reflect multiple, non-mutually exclusive mechanisms, including altered phase separation or condensate-related properties of RXRγ, or altered recruitment of transcriptional coregulators through the NTD. Because our truncation strategy does not distinguish among these possibilities, we do not believe these data allow a definitive mechanistic interpretation of the NTD contribution.

      We have revised the manuscript to clarify this limitation. We also note that the primary focus of the present study is the role of ligands in modulating Nur77-RXRγ function through LBD-mediated interactions, in direct comparison with our previous Nurr1-RXRα study. A more complete mechanistic dissection of how RXRγ domain architecture influences Nur77-RXRγ transcription will require future work.

      (2) Why does the Nur77 construct lacking the NTD show increased luciferase activity?

      Please see our response above to Reviewer 2’s Public Review, which also addresses this point.

      (3) A case is made for the Nur77 LBD driving the activity, but it also could be inferred that the DBD is driving based on the data shown in Figure 1.

      We thank the reviewer for this point. We agree that the Nur77 DBD is required for binding to NBRE response elements, and we did not intend to suggest otherwise. The experimental approach in Figure 1 was not designed to dissect the relative contributions of Nur77 domains, since Nur77 was tested only in its full-length form. Instead, the purpose of this experiment was to examine how truncation of RXRγ domains affects Nur77-RXRγ transcriptional activity, in direct comparison with our prior eLife study of Nurr1-RXRα, where RXRα domain truncations helped define the importance of RXR-LBD-mediated regulation. We will revise the text to clarify that Figure 1 does not distinguish whether Nur77 DBD-dependent DNA binding is necessary, but instead addresses whether the pattern of RXRγ domain dependence is consistent with an LBD-centered mechanism of ligand-regulated heterodimer function.

      (4) It is stated that the HX600 coactivator recruitment requires further study. Why wasn't it studied here?

      We thank the reviewer for this point. The primary focus of this study was to determine how RXR ligands influence Nur77-RXRγ heterodimer activity, particularly in relation to ligand-dependent effects on heterodimer function. A more detailed analysis of HX600-dependent coactivator recruitment would require a broader mechanistic investigation of RXRα and RXRγ homodimer pharmacology and RXR-specific coregulator interactions, which extends beyond the central scope of the present manuscript. We agree that this is an important question and view it as a valuable direction for future work.

      (5) Figure 3B, the shifts in monomer populations, error bars aren't shown, the biggest shift is from 0.2 to 0.6, is that statistically meaningful?

      We thank the reviewer for this point. The reviewer is correct that error bars were not shown for Figure 3B. These NMR measurements were performed once (n=1), and therefore the shifts in monomer populations shown in Figure 3B cannot be assessed statistically. Because these studies required substantial NMR instrument time and isotopically labeled protein at high concentration, we were not able to perform experimental replicates for this dataset. We have revised the figure legend to explicitly state that these data were collected from a single experiment and have tempered the corresponding language in the manuscript accordingly.

      (6) Some ligands are shown in the figures but don't appear to be discussed in the text (at least that I can find), such as SR11237.

      We thank the reviewer for pointing this out. We used a panel of 14 commercially available RXR ligands with different pharmacological properties to probe Nur77-RXRγ function, as in our previous Nurr1-RXRα study. In the text, we emphasized ligands that were most informative for the mechanistic conclusions, rather than discussing every compound individually. SR11237, for example, behaved similarly to the broader group of RXR agonists and was therefore shown as part of the full ligand panel but not specifically highlighted in the text. We will clarify this in the revised manuscript.

      (7) There is a sentence in the discussion that says "these observations implicate that although RXRg LBD provides the protein-protein interaction interface to bind Nur77...." the authors did not show enough data to support this claim. It should be bolstered.

      We thank the reviewer for this point. We agree that this statement was stronger than was warranted by the data presented. Our intent was not to claim that the present study definitively establishes the RXRγ LBD as the sole or fully defined protein-protein interaction interface for Nur77 binding. Rather, based on the domain truncation data together with our prior Nurr1-RXRα study, we intended this statement as a working interpretation consistent with an LBD-centered mechanism. In our revised manuscript, we have softened this language to avoid overstating the conclusion and clarified that the current data support, but do not definitively prove, a role for the RXRγ LBD in mediating functionally relevant interaction with Nur77.

      Reviewer #2 (Recommendations for the authors):

      Even though this study is not able to make definitive claims about the mechanism(s) of activation of Nur77 in the Nur77-RXRg complex, the work presented here is rigorous and solidly interpreted. Identifying differences between Nurr1 and Nur77 regulation is important, and the work here shows that selective agonists are essential for supporting the non-canonical mechanism they identified before. Although they address potential implications of NTD regulation in the discussion, it feels like a lot of insight into Nur77 regulation is being missed. However, it is clear that addressing this experimentally would require substantially more work. I don't have any specific recommendations. Given current limitations on funding, I think it's fine to focus on the work completed with the acceptance that it likely limits the impact of the work on the field.

      We thank the reviewer for this thoughtful and balanced assessment of our work. The goal of this manuscript was to test whether the LBD heterodimer dissociation mechanism that we previously reported for Nurr1-RXRα may represent a conserved feature of NR4A-RXR heterodimers by extending these studies to Nur77-RXRγ. We agree that understanding the role of the RXRγ NTD in Nur77-RXRγ regulation is important and potentially highly informative. At the same time, resolving that question experimentally would require a distinct and more extensive set of studies beyond the scope of the present work. We have therefore chosen to focus this manuscript on the completed LBD-centered studies, while acknowledging that this narrower scope may limit the broader impact of the work.

      Minor points:

      (1) Without page and line numbers, it is not easy to point out specific text. On the bottom of page 6 of the document, there are two references to Figure 3a, and the arrows that help illustrate RXRg LBD-dependent CSPs; the second figure callout should describe the blue arrow, I believe.

      Thank you, we made this change.

      (2) Bottom of page 8, "...revealed two compounds [that] standout..."

      Thank you, we made this change.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      We truly appreciate all the effort that the reviewer put into reading and understanding our work. With a total of 37 excellent questions, this is one of the most thorough reviews that we have received in a long time.

      R1.0: Summary:

      In this study, the authors propose a "unifying method to evaluate inter-areal interactions in different types of neuronal recordings, timescales, and species". The method consists of computing the variance explained by a linear decoder that attempts to predict individual neural responses (firing rates) in one area based on neural responses in another area.

      The authors apply the method to previously published calcium imaging data from layer 4 and layers 2/3 of 4 mice over 7 days, and simultaneously recorded Utah array spiking data from areas V1 and V4 of 1 monkey over 5 days of recording. They report distributions over "variance explained" numbers for several combinations: from mouse V1 L4 to mouse V1 L2/3, from L2/3 to L4, from monkey V1 to monkey V4, and from V4 to V1. For their monkey data, they also report the corresponding results for different temporal shifts. Overall, they find the expected results: responses in each of the two neural populations are predictive of responses in the other, more so when the stimulus is not controlled than when it is, and with sometimes different results for different stimulus classes (e.g., gratings vs. natural images).

      Strengths:

      (1) Use of existing data.

      (2) Addresses an interesting question.

      R1.1: Unfortunately, the method falls short of the state of the art: both generalized linear models (GLMs), which have been used in similar contexts for at least 20 years (see the many papers, both theoretical and applied to neural population data, by e.g. Simoncelli, Paninsky, Pillow, Schwartz, and many colleagues dating back to 2004), and the extension of Granger causality to point processes (e.g. Kim et al. PLoS CB 2011). Both approaches are substantially superior to what is proposed in the manuscript, since they enforce non-negativity for spike rates (the importance of which can be seen in Figure 2AB), and do not require unnecessary coarse-graining of the data by binning spikes (the 200 ms time bins are very long compared to the time scale on which communication between closely connected neuronal populations within an area, or between related areas, takes place).

      First, a few points of clarification.

      (i) We worked with two-photon calcium imaging data (mice), and with the envelope of multi-unit activity (monkeys). While both of these types of signals are strongly correlated with spikes, neither of them can be truly considered to be a point process.

      (ii)The reviewer points to Figure 2AB. The signals that we worked with can be negative. The black traces are the actual signals and show clear negative bouts, especially noticeable in the middle panel in Figure 2B. Of course, this does not mean that there are negative spike rates. This has to do with the way the data are normalized and not with the specific prediction method. However, the reviewer is correct in stating that the method that we used could also yield negative values even for non-negative spike rates.

      (iii) We did not bin the macaque data into 200-ms time bins, but rather 25-ms time bins (line 548, Figure 1B legend). Additionally, we have now performed additional analyses with different window sizes, showing that the conclusions still hold (see Supplemental Figure 4 and lines 139-143).

      To further address the reviewer’s question, we implemented a Poisson GLM enforcing non-negativity on macaque MUAe data (without spontaneous activity subtraction, ensuring strictly positive values; lines 135-139, Supplemental Figure 1M). The model did not improve predictions over ridge regression, confirming our methodological choice. This method is not directly applicable to mouse calcium data, since the activity after baseline subtraction can be negative.

      We did not use Granger or any other causality methods. The question of causality is certainly important, and there are multiple methods developed to assess causality in neural signals. We do not make any claims about causality in our study. A rigorous evaluation of causality is an interesting line of research for future work.

      R1.2: In terms of analysis results, the work in the manuscript presents some expected and some less expected results. However, because the monkey data are based on only one monkey (misleadingly, the manuscript consistently uses the plural ‘monkeys’), none of the results specific to that monkey, nor the comparison of that one monkey to mice, are supported by robust data.

      We have now added data from 2 additional monkeys, including:

      (i) A second monkey (monkey “A”) from the same dataset (Chen et al., 2020), which includes all activity types except the lights off condition (lines 90-96, 120-132, 159, 161, 171, 183-185, 188-194, 200-203, 228-237, 254-258, 292-296, 334-342, 351-353, 358-364, 374-378, 387-393, 400-408, 414, 417-421, 539-540, 544-545, 680-681, 696-698; Supplemental figures 1-6, 8, 11, 12, and 13; Table 2).

      (i) We collected new neural activity from one additional monkey (monkey “D”) in collaboration with the Ponce lab (lines 90-96, 120-130, 132-134, 163-164, 228-235, 237-243, 292-296, 351-353, 374-378, 387-389, 539-540, 553-560, 696-698; Supplemental figures 1-2, 4, 6, 9, 11, and 12; Table 2). The new data include responses to the same checkerboard and gray screen images as the original dataset, along with responses during lights-off conditions.

      R1.3: One of the main results for mice (bimodality of explained variance values, mentioned in the abstract) does not appear to be quantified or supported by a statistical test.

      We have now formally quantified the bimodality of the relationship between one-vs-rest correlation and inter-laminar explained variance (EV) in mice using Hartigan’s dip test, applied to neurons with EV>0.4. The test confirmed significant bimodality in two of the three mice (MP031 and MP032: p<0.001; MP033: p=0.687). These results are now included in the Results section (lines 307-311) and shown in Supplemental Figure 7A,D. In datasets that did not show bimodality by visual inspection (macaque recordings), the same test yielded non-significant results (e.g., p=0.994), confirming that the statistical analysis distinguishes between bimodal and unimodal cases.

      R1.4: Moreover, the two data sets differ in too many aspects to allow for any conclusions about whether the comparisons reflect differences in species (mouse vs. monkey), anatomy (L2/3-L4 vs. V1-V4), or recording technique (calcium imaging vs. extracellular spiking).

      We also agree with this comment. Our goal is not to provide any direct quantitative comparison between the two species. We emphasize (lines 494-497) that the experiments in the two species differ along multiple dimensions, including: (i) differences in recording modalities (calcium vs. electrophysiology), (ii) associated differences in temporal resolution, neuronal types, and SNR, (iii) cortical targets (layers vs. areas), (iii) sample size, (iv) stimuli, (v) task conditions. In the revised manuscript, we also emphasized that the aim of this work is to investigate inter-areal interactions within each species rather than to draw quantitative comparisons between species (lines 497-499).

      Reviewer #1 (Recommendations for the authors):

      R1.5 In the analysis of directionality, you stated that subsampling was done randomly. Presumably, there could be multiple subsamples that fulfill the control of split-trial r. Are you only showing results from one subsample or multiple subsamples?

      We show the median from 10 subsample permutations. This is now clarified in line 621.

      R1.6 About the measurement 1-vs-rest r2. Understanding the definition is important for interpreting the results, but the definition was not clearly written. In lines 195-196, could you be more clear about whether the correlation is between the predicted neuron and other neurons in the predicted population or between the predicted neuron and the mean activity of the predictor population? Also, in line 212, why do you call this self-consistency? Isn't this a correlation between a neuron and the others?

      The 1-vs-rest r<sup>2</sup> value, or self-consistency, is the correlation calculated for each neuron i and does not involve other neurons. Let indicate the response 𝑟 of neuron i during trial t (t=1,..., T where T is the total number of trials). For a given trial t, we compute the average activity of the neuron excluding this trial:

      Throughout, the superscript (rest)means “all repetitions excluding repeat 𝑡”. The one-vs-rest correlation for the held-out repetition 𝑡 is:

      We then average these correlations across all held-out repetitions:

      We now clarify this in the text (lines 304-306 and lines 642-647).

      R1.7 In Figure 6 G and I. The "all" condition contains more neurons than either of the other two. In this case, is this comparison fair or meaningful?

      The reviewer is also correct here. The comparisons between the <10% and >80% groups contain the same number of predictor neurons, and those are fair comparisons. The “all” condition contains more predictor neurons, and, therefore, those comparisons are not fair. We clarified this point in lines 360-364.

      We included the “all” condition here because we think that it is an instructive sanity check in terms of reporting how EV changes with more neurons, and also in terms of understanding why the EV values in the other two conditions are lower. Expanding on this point with a little bit of philosophy, ultimately, when considering a neuron in area B (e.g., V4) and the contributions from neurons in another area A (e.g., V1), one would like to have access to all the inputs (e.g., all the neurons in V1 that are monosynaptically connected to the target neuron in area V4). We do not have access to this type of information, and we do not make any claims about monosynaptic connectivity, let alone exhaustive sampling of inputs to a given neuron. The “all” condition merely provides a quantitative illustration of the fact that EV increases with the number of predictor neurons. This observation may be considered to be somewhat trivial, but it should be pointed out that the conclusion relies on the input neurons sharing information with the target neurons (e.g., perhaps one may not be able to predict V4 activity very well from the responses of millions of neurons in the cerebellum).

      R1.8 I believe the results section can be improved by adding some interpretation after each finding.

      We thank the reviewer for the suggestion. We generally like to separate results from interpretation. However, to honor the suggestion, we added brief interpretations throughout the results section (lines 142-143, 171-173, 272-273, 279-281, 331-333, and 361-364) and expanded on the interpretations in the Discussion section.

      R1.9 Line 52 - 74: It would be better to be more specific about what kind of neuronal interactions, e.g., noise correlation, synchrony, etc.

      We added a clarification on the types of interactions we study in lines 68-73.

      R1.10 Line 81. Something seems to be missing after "5500". 5500 trials? Neurons?

      We thank the reviewer for pointing this out. The number refers to neurons (fixed in line 87).

      R1.11 Line 94. The readers would appreciate more explanation of the method.

      We have expanded on the explanation, as suggested (lines 106-107).

      R1.12 Line 104. The fraction of visually responsive neurons seems to be small. Is this typically for mouse V1? Would this fraction be higher if you also used the peak, as you did for macaque data in your SNR calculation (line 412)? And what is this number for the recorded L4?

      The reviewer correctly points out the small number of visually responsive neurons.

      We note that we now refer to the subset of neurons used for prediction analyses as visually reliable (VR) neurons (lines 115-116, 125-126, 178-179, 183-184, 211-212, 214-216, 217-226, 283-286), defined conservatively as neurons with SNR > 2 computed from the mean across all stimuli (not the peak to any one stimulus) and split-half reliability >0.8 (Methods, lines 569–590). This choice emphasizes neurons that are consistently informative over the full stimulus set.

      Regarding the question of how typical the number of responsive neurons in mice is, the fraction of “responsive” neurons in mouse V1 varies widely depending on the definition and stimulus set but the fractions are substantially lower than those reported in monkeys (with different methods). For those of us more used to the macaque neurophysiology literature, this has been one of the biggest surprises coming from work in rodents. Many studies report a sizable group of non-responsive neurons in mouse V1 (e.g., as little as 37% percent of V1 neurons being responsive in at least 25% of the trials according to de Vries et al., Nat Neur, 2020). Our fraction of visually responsive neurons is small because it couples a conservative SNR metric with a high trial-reliability threshold.

      As the reviewer notes, a peak-based metric based on any stimulus would be a less conservative criterion that would increase the fraction of neurons labeled responsive.

      R1.13 Line 113. Why not also give an exact percentage number?

      We have given the exact percentage number (lines 125-126).

      R1.14 Line 128. Is this just because L2/3 has more neurons? If so, then isn't this trivial?

      Our intention was to illustrate the best prediction performance we could get in either direction, which means including all L2/3 neurons. We have reworded our text to clarify (lines 149-151).

      R1.15 Line 134. Isn't this expected? Since V1 have more units than V4?

      The reviewer is correct. As discussed in R1.7 in mice, we sought to report the best prediction performances in either direction. We have edited our text for clarity (lines 149-151).

      R1.16 Line 165-168. What's the logical connection between these two sentences? If the former is true, we should expect to see differences. Also, why the same population? Shouldn't you include non-visual neurons?

      The two sentences in question are: “The difference in predictability in the absence of a stimulus could in principle change according to the directionality in inter-laminar interactions.” and, “There was no statistically significant difference in the EV fraction between laminar directions (L4→L2/3 vs. L2/3→L4) using the same control population as in Figure 3B (Figure 5A-C and Figure Supplement 2H).”. The key point here was to control for similar reliability values in order to make fair comparisons. We have added an additional comparison between directionalities focusing on nonvisual neurons (SNR<2 & r<0.8), and have also found no statistically significant difference between direction of predictability (Supplemental Figure 3A, right, lines 221-224).

      R1.17 Table 2. The information of which session corresponds to which experiment can be put in the table, which would be easier to read.

      We have added which sessions correspond to which experiments in Table 2.

      R1.18 Figure 1, Captions for panel c and d. I don't see any colored arrows in the figure.

      We removed the color descriptions (Figure 1C-D).

      R1.19 Figures 3, 4, and others. The annotations of "n.s." are very hard to see.

      We changed the color so that it is easier to see now (Figures 3, 4, 6, and Supplementary Figures 1-4, 6, and 8-10).

      R1.20 Figure 5, panel A. The legend is too small.

      We increased the legend size (Figure 5A).

      R1.21 Figure S5, panel D. Why are some of the data points connected?

      The paired connections are illustrated specifically in the highly predictable neurons to highlight the two separate distributions of neurons. One group, the highly predictable and highly reliable group, maintains its inter-laminar predictability after projecting out the “non-visual” activity (lines 327-330), whereas the highly predictable yet unreliable group shows a sharp decrease in inter-areal predictability, which corroborates the idea of non-visual components influencing neurons in mouse V1, as shown by Stringer et al. 2019b and consistent with our results.

      R1.22 l.91 "Ope" -> open?

      We fixed the typo (line 100).

      R1.23 Fig. 3C+D: Why is only one session used for this?

      One session was used to illustrate the distribution of split-half reliability values per area. Figure 3D contains information about all 5 stimulus sessions (see legend to Figure 3D).

      R1.24 "Even without controlling for the number of predictors or their respective split-half correlation values (627-688 sites in V1, 86-115 sites in V4), we found better predictability in the V1 to V4 direction than the reverse ( 𝑝 < 0.001, Figure Supplement 2I)." -> What does "even" mean here? Isn't this simply the null result if there is no true difference and the real reason the authors controlled for size?

      The reviewer’s understanding is correct. We have edited our text for clarity (lines 157-160)

      R1.25 "We could predict V1 and V4 activity across all stimulus types ( 𝑝 < 0.001, paired permutation test of prediction vs. shuffled frames prediction)." -> better than chance? For all neurons on average? What does this mean? Isn't it trivial and 100% expected that neural activity in the visual cortex is above chance related to the visual input?

      We stated that sites in V1 and V4 could predict each other across all stimulus types before describing the differences between them. We agree that this observation is to be expected and indicated so now in the text (lines 185-186).

      R1.26 "The predictability was the highest in both directions for neuronal activity in response to a full field checkerboard images (Figure 4D). In the V1 → V4 direction, the EV fraction was higher when predicting a slow-moving small thin bar compared to a fast-moving large thick bar (Figure 4D, left), whereas the opposite was true for the V4 → V1 direction (Figure 4D, right)." -> What does this mean? Is this expected or not? Under what theories of cortical processing?

      The differences between EV prediction directions (V1→V4: slow thin bars > fast thick bars; V4→V1: fast thick bars > slow thin bars) could be because V4 responses are more reliable for the slow thin bars whereas V1 responses are more reliable for the fast thick bars (Supplemental Figure 5H–I). To account for this possibility, we controlled for differences in target-related properties by regressing out covariates like SNR, split-half correlation, and variance. In monkey L, regressing out reliability/drive within direction using these covariates, the V4→V1 bar difference between slow thin bars and fast thick bars was not significant and the difference in the V1→V4 difference direction was reduced (Supplemental Figure 5K, lines 198-203). This suggests that the asymmetry primarily reflects stimulus‑dependent reliability of the target population rather than a strong directional selectivity.

      To the best of our knowledge, there are no clear predictions that match these observations from existing theories of visual cortical processing, especially given the paucity of computational models that include stimulus velocity when describing the responses in area V4. There has been extensive work on theories of surround suppression, but it seems unlikely that the thick bars would elicit surround suppression given the size of the V4 receptive fields. Many current computational models that aim to fit the responses of neurons in the visual cortex use neural networks that take an image as visual input and yield activations. Most of these models do not incorporate stimulus movement, and even those that do incorporate stimulus dynamics, only indirectly map onto interlaminar stimulus transformations or even between-area stimulus transformations. We hope that the results in this manuscript will help inspire and constrain better models of visual cortical processing.

      R1.27 Shouldn't all the predictability analysis be done conditioned on the stimulus in order to tell us more than the trivial "both V1 and V3, or L2/3 and L4, are driven by visual inputs"? (The spontaneous activity analyses are essentially that, for a small subset of the stimuli.)

      The key goal of this study is to quantify inter-areal interactions both under visual input and without visual input. This type of analysis is important because inter-areal interactions may depend both on visual inputs but also on neuronal inputs that are not triggered by visual signals. For example, extensive work in mice has now shown that neuronal responses in V1 depend on an animal’s running speed, independently of any visual input. Even within the visual input conditions, we present analyses where we shuffle trial order (e.g., Figure 7, Supplementary Figure 11) to estimate the contribution of trial-by-trial variations that are independent of visual inputs and other analyses where we project out non-visual activity (e.g., Supplementary Figure 7).

      R1.28 "In visually responsive neurons, there was a significant reduction in EV during gray screen compared to visual stimulus presentation" -> perfectly expected. But the report-worthy result here is how much is left, not whether EV is decreased!

      We have changed the wording on the results to highlight the sustained predictability (lines 211-212). It is important to note that, although the reduction in EV during gray screen may be expected, this observation does not hold for all neurons. In fact, there are some neurons for which the EV during visual presentation is comparable to that during gray screen (Figure 5B,C,E: neurons that lie on the diagonal line).

      R1.29 "Similar to the conclusions drawn from the mouse data, the predictability of neuronal activity was higher in response to stimulus presentation than to gray screen presentations" -> Really? Conditioned on stimulus, or explainable by the well-known fact that both V1 and V4 are visually driven?

      As discussed in R1.28, in mice, there are many neurons where the EV during gray screen is comparable to that during stimulus presentation. In monkeys, most sites were visually driven. As the reviewer points out, we expected that EV during stimulus presentation would be higher than during gray screen; this observation is a reasonable sanity check. The difference between unshuffled trials and shuffled trials (Figure 7, Supplementary Figure 11) provides an estimate of the interactions that are not purely explained by visual inputs alone in monkeys.

      R1.30 "Unlike the mouse, macaque correlation of visual predictability between stimulus presentation and spontaneous activity was high across all types of spontaneous conditions" -> Why? Is this simply explainable by a lower mean response in the spontaneous condition in the mouse? Are these mouse and monkey experiments truly comparable? Isn't it surprising that spontaneous activity in the monkey visual cortex compared to evoked activity is higher than in the mouse?

      With respect to the question of whether spontaneous activity (or stimulus-evoked activity) in monkeys is higher than in the mouse, it is difficult to make these comparisons. We emphasize in the text the multiple differences between the experiments in both species. Our goal is not to perform any quantitative comparison across species (see R1.4). We changed the wording to remove any inference of comparison between species (lines 248-250).

      R1.31 Occasionally imprecise presentation. Ex "To further examine the non-stimulus driven component, we reasoned that if the shared information between areas were strictly driven by the visual stimulus, then using the activity of a stimulus presentation repeat to one specific image could be used to predict the responses to any other stimulus repeat of the same image. On the other hand, if the shared activity does not have any stimulus-response information, then the prediction model would not work when considering responses across repeated presentations of identical stimuli in different trials. To test these two opposing ideas, we compared the inter-areal prediction EV fractions using unshuffled versus shuffled trials." -> Sets up two extreme strawmen (100% driven by stimulus vs 0% driven by stimulus). What does "model would not work" mean? EV=0? Hypotheses not ideas.

      Our intent was to set up two extreme hypotheses, not to claim that neurons must fall exclusively into one or the other. The two extremes help better interpret the results.

      The reviewer indicates that these are straw-man hypotheses. This may well be the case. But note the responses to R1.12, R1.27, R1.28, and R1.29. The reviewer seems to assume that all or most neurons in the visual cortex should be mostly or exclusively driven by visual stimuli.

      We also replaced “ideas” with “hypotheses”, as suggested. We have expanded the discussion of these points in the manuscript (lines 480-493). Many neurons occupy intermediate positions between these two extreme hypotheses. We clarified that “model would not work” refers to prediction accuracy approaching chance (EV ≈ 0).

      R1.32 "In both species and in both directions, inter-areal prediction EV fraction persisted (𝑝 < 0.001," Doesn't persist mean EV is unchanged? But the test is EV>0 or not in both cases.

      We meant that EV values remained significantly above chance, not that they were unchanged. The statistical test was indeed whether EV > 0 as the reviewer indicated. We have revised the text accordingly (lines 375-380).

      R1.33 "In mice, neurons showed a bimodal distribution in terms of their response predictability in shuffled and unshuffled trials" -> I don't see any bimodality in the figure, nor is there a statistical test provided for bimodality.

      In Figure 7C, a group of neurons lay essentially along the horizontal axis, whereas the other group is dispersed closer to the diagonal line. Specifically, the neurons that lay on the horizontal axis are also the ones whose responses are best predicted during gray screen activity. We have changed the text to clarify this point (lines 380-382).

      R1.34 "In the macaque V4 → V1 direction, there was a large proportion of neurons with peak EV when considering 25 ms to 50 ms offsets in the positive direction (i.e., V4 after V1, Figure 7I, right)." -> So what does this mean? Is this compatible with anything we know? This is the anti-causal direction so some kind of explanation would be warranted.

      In the V4→V1 panel, a positive offset means we use V4 at t+Δt to predict V1 at t (and conversely in the V1→V4 panel). Therefore, the fact that the peak EV occurs at +10–20 ms indicates that V1 leads V4 by ~10–20 ms: in other words, V1’s earlier response best predicts V4’s slightly later response. This observation is not anti-causal, but rather it is consistent with the canonical largely feed-forward V1→V4 latency (e.g., Schmolesky et al., 1998 among many others). We clarified this in text (lines 400-404).

      R1.35 L. 307: "In monkeys," plural!?

      While this was not correct in the original version, we have now added data from two more monkeys.

      R1.36 L. 313: "we observed an approximately bimodal distribution of neuronal responses, with a large subset of neurons that do not show reliable responses to visual stimuli both in L4 and L2/3" -> where?

      The bimodal distribution can be appreciated in Figure 6B (1-vs-rest r2, third panel, note neurons along the y-axis, see also R1.33) and Supplementary Figure 7B (lines 307-312). Additionally, as stated in R1.3, we have now formally quantified the bimodality of the relationship between one-vs-rest correlation and inter-laminar explained variance (EV) in mice using Hartigan’s dip test (lines 310-313); see also Supplementary Figure 7A,D. In datasets that did not show bimodality by visual inspection (macaque recordings) the same test yielded non-significant results, confirming that the statistical analysis distinguishes between bimodal and unimodal cases.

      R1.37 Random subsampling to control for population size done with how many subsamples? How are they combined? Variability across subsamples interpreted how?

      We performed 10 permutations and used the median distributions across permutations (line 621).

      Reviewer #2 (Public Review):

      R2.0: “Summary:

      In this work, the authors investigated the extent of shared variability in cortical population activity in the visual cortex in mice and macaques under conditions of spontaneous activity and visual stimulation. They argue that by studying the average response to repeated presentations of sensory stimuli, investigators are discounting the contribution of variable population responses that can have a significant impact at the single trial level. They hypothesized that, because these fluctuations are to some degree shared across cortical populations depending on the sources of these fluctuations and the relative connectivity between cortical populations within a network, one should be able to predict the response in one cortical population given the response of another cortical population on a single trial, and the degree of predictability should vary with factors such as retinotopic overlap, visual stimulation, and the directionality of canonical cortical circuits.”

      R2.1: To test this, the authors analyzed previously collected and publicly available datasets. These include calcium imaging of the primary visual cortex in mice and electrophysiology recordings in V1 and V4 of macaques under different conditions of visual stimulation. The strength of this data is that it includes simultaneous recordings of hundreds of neurons across cortical layers or areas. However, the weaknesses of calcium dynamics (which has lower temporal resolution and misses some non-linear dynamics in cortical activity) and multi-unit envelope activity (which reflects fluctuations in population activity rather than the variance in individual unit spike trains), underestimate the variability of individual neurons. The authors deploy a regression model that is appropriate for addressing their hypothesis, and their analytic approach appears rigorous and well-controlled.

      We agree with these points, and we discuss these specific limitations in capturing the variability of individual neurons in the Discussion section (lines 500-504). We have now also added analyses based on local field potentials (LFP). LFPs do not directly reflect the activity of individual neurons either.

      R2.2: From their analysis, they found that there was significant predictability of activity between layer II/III and layer IV responses in mice and V1 and V4 activity in macaques, although the specific degree of predictability varied somewhat with the condition of the comparison with some minor differences between the datasets. The authors deployed a variety of analytic controls and explored a variety of comparisons that are both appropriate and convincing that there is a significant degree of predictability in population responses at the single trial level consistent with their hypothesis. This demonstrates that a significant fraction of cortical responses to stimuli is not due solely to the feedforward response to sensory input, and if we are to understand the computations that take place in the cortex, we must also understand how sensory responses interact with other sources of activity in cortical networks. However, the source of these predictive signals and their impact on function is only explored in a limited fashion, largely due to limitations in the datasets. Overall, this work highlights that, beyond the traditionally studied average evoked responses considered in systems neuroscience, there is a significant contribution of shared variability in cortical populations that may contextualize sensory representations depending on a host of factors that may be independent of the sensory signals being studied.

      We agree that these datasets do not lend themselves well to directly separating and quantifying all the different sources of the predictive signals. We expand on this point in the Discussion section (lines 509-511).

      R2.3: The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.

      We also agree with this comment. We emphasize that our goal is not to attempt a direct quantitative comparison across species (lines 497-499).

      R2.4: Strengths:

      This work considers a variety of conditions that may influence the relative predictability between cortical populations, including receptive field overlap, latency that may reflect feed-forward or feedback delays, and stimulus type and sensory condition. Their analytic approach is well-designed and statistically rigorous. They acknowledge the limitations of the data and do not over-interpret their findings.

      Weaknesses:

      The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.The mechanistic contribution of known sources or correlates of shared variability (eye movements, pupil fluctuations, locomotion, whisking behaviors) were not considered, and these could be driving or a reflection of much of the predictability observed and explain differences in spontaneous and visual activity predictions.

      We have expanded on the Discussion section to explicitly state the points raised by the reviewer (lines 494-509).

      In mice, we have now also analyzed a separate dataset in which behavioral measurements were available, including running speed and facial motion (FaceMap SVDs). We used these to build behavioral-only and combined models to predict neural activity. We found that behavioral variables explained a modest but consistent portion of the variance across both spontaneous and stimulus conditions (Supplementary Figure 10A,C, lines 268-273).

      For the macaque data, we analyzed pupil size as the only available behavioral measure in the macaque dataset. We focused specifically on the “resting state, eyes open” condition, where both neural activity and pupil measurements were available. Using ridge regression, we assessed the extent to which pupil size predicted neural activity in V1 and V4. Pupil size alone explained only a small fraction of the variance (Supplementary Figure 10E, lines 274-276).

      R2.5: Previous work has explored correlations in activity between areas on various timescales, but this work only considered a narrow scope of timescales.

      Without going into specifics about the numbers, it is hard to fully address this question. As the reviewer noted in R2.1, the mouse data analyzed here do not lend themselves to evaluating predictability on scales of tens of milliseconds. In the macaque data, we have now conducted additional analyses where we binned the activity across a range of bin sizes (10 ms to 200 ms). The new analyses are shown in Supplementary Figure 4, and described in lines 140-143, 160-163.

      R2.6: The observation that there is some degree of predictability is not surprising, and it is unclear whether changes in observed predictability with analysis conditions are informative of a particular mechanism or just due to differences in the variance of activity under those conditions. Some of these issues could be addressed with further analysis, but some may be due to limitations in the experimental scope of the datasets and would require new experiments to resolve.

      First, we note that several of the analyses and comparisons are within conditions and not across conditions, where by “condition” we mean the presence or absence of a stimulus or different stimuli (e.g., Figures 3, 5, 6, 7, Supplementary Figures 3-4, 7–13).

      Second, we note that our mouse preprocessing standardized responses by spontaneous mean and SD per neuron, controlling baseline scale across conditions (lines 535-538). Because of this standardization, spontaneous traces have unit scale (mean = 0, SD = 1).

      To test whether differences in variance underlie our findings, we calculated the variance for both species. For mice, we computed variance across repeats (visual) and across timepoints (lines 286-291). For the macaque moving-bar sessions, we computed variance across the concatenated held-out samples pooling timepoints, repeats, and bar identities (lines 291-292).

      The V4 population showed a higher overall variance distribution compared to the V1 population (Supplementary Figure 2I-J), and L2/3 variance was also overall higher than L4 (Supplementary Figure 2D-E). We also see a modest monotonic relationship between EV fraction and this variance (mouse visual: Spearman ρ = 0.43–0.52, p < 0.001; macaque stimulus responses: ρ = 0.50–0.56, p < 0.001; macaque gray-screen responses: ρ = 0.38, p < 0.001, Figure 6A,D), indicating variance contributes to (but is not the primary driver of) EV prediction fraction. We then adjusted for variance by fitting, within each stimulus condition, a linear regression of EV on variance (excluding shuffled-control rows) and conducted all comparisons on the resulting residual EV values, thereby isolating effects not attributable to variance (see Supplementary Figure 3E-G, lines 165-171).

      Reviewer #2 (Recommendations for the authors):

      R2.7 Overall I found this manuscript to be very clearly written and the results compelling, although I found myself wanting a little more. I believe these datasets also include information about eye movements, pupil diameter, and maybe locomotion and whisking in the rodent work. I think it could be informative to ask the degree to which the predictability, particularly during the spontaneous activity, is attributable to these other known sources of variance in trial-by-trial measures. My concern is that during visual stimulation, the space of cortical responses is limited to a very narrow scope (observing a visual stimulus during fixation) whereas spontaneous activity includes a broader range of possibilities (different states of arousal, eye movement).

      We analyzed the role of behavioral variables that could explain the neural activity in mouse V1 (including the variables suggested by the reviewer, running speed, facemap SVDs). The open dataset authors warned not to use pupil size since in the dark, the measurements were not accurate. In terms of the contribution to the predictability of mouse V1 activity, these behavioral variables showed a weak yet significant contribution (Supplementary Figure 10A,C, lines 260-270).

      R2.8 By controlling for eye movements or pupil diameter during spontaneous measurements, would you improve your measure of predictability?

      When predicting neural activity in the lights-off eyes open condition, combining neural data of the predictor population with information of pupil size did not result in a statistically significant increase in EV fraction when predicting the target population (Supplementary Figure 10E, lines 276-278).

      R2.9 Also, there is work that shows feed-forward correlations between V1 and higher visual areas are observed in higher frequency activity, whereas feedback is associated with lower frequency activity. If you compared your predictability measure over bandpasses with different timescales, would you find the direction of V1-V4 interactions changes consistent with this previous work?

      To address this question, we extended our analyses to the local field potential signals (LFPs) in monkeys, using band-limited LFP power (2–12, 12–30, 30–45, 55–95 Hz). We reran the lag sweep analyses (10-ms steps; 200-ms windows slid every 10 ms) in both directions. The Gamma band showed a feed-forward signature in the early evoked period: the V1→V4 predictability peaked at negative offsets (∼10–30ms; V1 leads), and the V4→V1 predictability peaked at positive offsets, consistent with previous findings. The results for low and beta frequency bands are also presented in the text (Supplemental Figure 13, lines 412-423).

      Reviewer #3 (Public review):

      R3.0: Neural activity in the visual cortex has primarily been studied in terms of responses to external visual stimuli. While the noisiness of inputs to a visual area is known to also influence visual responses, the contribution of this noisy component to overall visual responses has not been well characterized.

      In this study, the authors reanalyze two previously published datasets - a Ca++ imaging study from mouse V1 and a large-scale electrophysiological study from monkey V1-V4. Using regression models, they examine how neural activity in one layer (in mice) or one cortical area (in monkeys) predicts activity in another layer or area. Their main finding is that significant predictions are possible even in the absence of visual input, highlighting the influence of non-stimulus-related downstream activity on neural responses. These findings can inform future modeling work of neural responses in the visual cortex to account for such non-visual influences.

      R3.1: "A major weakness of the study is that the analysis includes data from only a single monkey. This makes it hard to interpret the data as the results could be due to experimental conditions specific to this monkey, such as the relative placement of electrode arrays in V1 and V4."

      We have now added the second monkey (monkey “A”) from the same dataset (Chen et al., 2020), which includes all activity types except the lights-off condition. In addition, we collected new neural activity from one additional monkey (monkey “D”) in collaboration with the Carlos Ponce lab (monkey A: seelines 90-96, 120-132, 159, 161, 171, 183-185, 188-194, 200-203, 228-237, 254-258, 292-296, 334-342, 351-353, 358-364, 374-378, 387-393, 400-408, 414, 417-421, 539-540, 544-545, 680-681, 696-698; Supplemental Figures 1-6, 8, 11, 12, and 13; monkey D: see lines 90-96, 120-130, 132-134, 163-164, 228-235, 237-243, 292-296, 351-353, 374-378, 387-389, 539-540, 553-560, 696-698; Supplemental Figures 1-2, 4, 6, 9, 11, and 12. The conclusions for the new monkeys are qualitatively similar to the ones reported previously. The main quantitative differences are due to the very large difference in the number of predictor sites (Table 2, lines 127-134).

      R3.2: The authors perform a thorough analysis comparing regression-based predictions for a wide variety of combinations of stimulus conditions and directions of influence. However, the comparison of stimulus types (Figure 4) raises a potential concern. It is not clear if the differences reported reflect an actual change in predictive influence across the two conditions or if they stem from fundamental differences in the responses of the predictor population, which could in turn affect the ability to measure predictive relationships. The authors do control for some potential confounds such as the number of neurons and self-consistency of the predictor population. However, the predictability seems to closely track the responsiveness of neurons to a particular stimulus. For instance, in the monkey data, the V1 neuronal population will likely be more responsive to checkerboards than to single bars. Moreover, neurons that don't have the bars in their RFs may remain largely silent. Could the difference in predictability be just due to this? Controlling for overall neuronal responsiveness across the two conditions would make this comparison more interpretable.

      First, we note that several of the analyses and comparisons are within conditions and not across conditions, where by “condition” we mean the presence or absence of a stimulus or different stimuli (e.g., Figures 3, 5, 6, 7, Supplementary Figures 3-4, 7-13).

      In Figure 4, differences in target-population responsiveness could influence predictability across stimulus types, as the reviewer points out. We therefore controlled for this by modeling EV as a function of the following neuron properties: split-half r, SNR, one-vs-rest r^2, and response variance. Regression was performed within each direction, where we then used residuals for inference_._ When comparing residuals, the predictability of checkerboard responses remained statistically higher than the predictability of the responses to moving bars (p<0.001, permutation test, Supplementary Figure 5K, lines 196-203), suggesting that the differences in predictability cannot be exclusively attributed to differences in the target population neuronal properties.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important study provides the first direct neuroimaging evidence for the integration segregation theory of exogenous attention underlying inhibition of return, using an optimized IOR-Stroop fMRI paradigm to dissociate integration and segregation processes and to demonstrate that attentional orienting modulates semantic- and response-level conflict processing. Although the empirical evidence is compelling, clearer justification of the experimental logic, more cautious framing of behavioral and regional interpretations, and greater transparency in reporting and presentation are needed to strengthen the conclusions. The work will be of broad interest to researchers investigating visual attention, perception, cognitive control, and conflict processing.

      We appreciate the positive reception to our manuscript. In the revised manuscript, we have further clarified the logic underlying the task design, adopted a more cautious tone in interpreting the behavioral and neuroimaging results, and enhanced the transparency of reporting and presentation.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study makes a significant and timely contribution to the field of attention research. By providing the first direct neuroimaging evidence for the integration-segregation theory of exogenous attention, it fills a critical gap in our understanding of the neural mechanisms underlying inhibition of return (IOR). The authors employ a carefully optimized cue-target paradigm combined with fMRI to elegantly dissociate the neural substrates of cue-target integration from those of segregation, thereby offering compelling support for the integration-segregation account. Beyond validating a key theoretical hypothesis, the study also uncovers an interaction between spatial orienting and cognitive conflict processing, suggesting that exogenous attention modulates conflict processing at both semantic and response levels. This finding shed new light on the neural mechanisms that connect exogenous attentional orienting with cognitive control.

      Strengths:

      The experimental design is rigorous, the analyses are thorough, and the interpretation is well grounded in the literature. The manuscript is clearly written, logically structured, and addresses a theoretically important question. Overall, this is an excellent, high-impact study that advances both theoretical and neural models of attention.

      Weaknesses:

      While this study addresses an important theoretical question and presents compelling neuroimaging findings, a few additional details would help improve clarity and interpretation. Specifically, more information could be provided regarding the experimental conditions (SI and RI), the justification for the criteria used for excluding behavioral trials, and how the null condition was incorporated into the analyses. In addition, given the non-significant interaction effect in the behavioral results, the claim that the behavioral data "clearly isolated" distinct semantic and response conflict effects should be phrased more cautiously.

      We thank the reviewer for these helpful comments. In the revised manuscript, we have provided additional clarification regarding the SI and RI conditions (page 29), expanded the justification for the behavioral trial exclusion criteria (page 32), and clarified how the null condition was modeled and incorporated into the analyses (page 29). In addition, we have revised the description of the behavioral results to adopt more cautious wording, particularly given the absence of a significant interaction effect. For detailed responses to these specific points, please refer to the "Recommendations for the Authors" section below.

      Reviewer #2 (Public review):

      Summary:

      This study provides evidence for the integration-segregation theory of an attentional effect, widely cited as inhibition of return (IOR), from a neuroimaging perspective, and explores neural interactions between IOR and cognitive conflict, showing that conflict processing is potentially modulated by attentional orienting.

      Strengths:

      The integration-segregation theory was examined in a sophisticated experimental task that also accounted for cognitive conflict processing, which is phenomenologically related to IOR but "non-spatial" by nature. This study was carefully designed and executed. The behavioral and neuroimaging data were carefully analyzed and largely well presented.

      Weaknesses:

      The rationale for the experimental design was not clearly explained in the manuscript; more specifically, why the current ER-fMRI study would disentangle integration and segregation processes was not explained. The introduction of "cognitive conflict" into the present study was not well reasoned for a non-expert reader to follow.

      We thank the reviewer for raising these important points. In the revised manuscript, we have further clarified the rationale of the experimental design and the motivation for introducing cognitive conflict.

      First, we clarified that previous neuroimaging studies relied primarily on SOA-based contrasts, which capture the temporal dynamics of attentional orienting but do not directly distinguish the functional processes of integration and segregation. We therefore established the direct comparison between cued and uncued targets in the long SOA as the critical test required by the theory, as these conditions are hypothesized to engage integration and segregation processes, respectively (pages 6-7, “The Challenge of Neural Verification”). Crucially, to successfully implement this comparison, we highlighted the specific methodological advantage of our study: the use of a Genetic Algorithm (GA) to optimize the stimulus sequence. We explained how this design maximizes statistical power specifically for contrast detection (i.e., cued vs. uncued) while maintaining high estimation efficiency, thereby directly overcoming the power constraints that had likely obscured these subtle neural signatures in prior ER-fMRI work (pages 7-8).

      Second, we clarified that the manipulation of cognitive conflict was introduced with the additional aim of examining IOR expression mechanisms, specifically investigating how spatial attention modulates ongoing cognitive processing after target onset, rather than the generation of IOR itself. We have now provided a clearer rationale for embedding a modified Stroop task within the cue-target paradigm, and explained how this design allows us to dissociate semantic and response conflicts while avoiding methodological confounds present in previous studies (page 8).

      The presentation of the results can be further improved, especially the neuroimaging results. For instance, Figure 4 is challenging to interpret. If "deactivation" (or a reduction in activation) is regarded as a neural signature of IOR, this should be clearly stated in the manuscript.

      We thank the reviewer for pointing out the interpretational challenges in Figure 4. To address this, we have revised Figure 4 and provided a clearer and more precise interpretation of these interaction effects in the manuscript.

      First, we have added explicit panel titles to Figure 4 (page 17). Panel A is now clearly labeled as the “Effect of IOR on Semantic Conflict”, while Panel B is labeled as the “Effect of IOR on Response Conflict”. We hope this visual labeling helps readers clearly identify the IOR modulation effects specific to each conflict type.

      Second, we have revised the figure caption to explicitly define the interaction contrasts used to quantify these modulations, providing specific formulas (e.g., [UncuedRI – Uncued-SI] > [Cued-RI – Cued-SI] for response conflict) to ensure transparency.

      Finally, regarding the reviewer’s comment on “deactivation”, we realized that our original figure terminology (e.g., “IOR effect under...”) might have caused confusion by mixing the interaction effect with the IOR effect itself. We have clarified that Figure 4 specifically illustrates the “Effect of IOR on the Semantic Conflict and the Response Conflict” (i.e., interaction effect between IOR and cognitive conflict). To interpret this interaction, we further examined the simple effects of conflict under each cueing condition. Specifically, we analyzed the neural signatures of semantic conflict (SI minus NE) and response conflict (RI minus SI) separately for the cued and uncued targets. Importantly, regarding the nature of the IOR effect itself (as displayed in Figure 3, page 14), it is not simply a uniform deactivation. Instead, by directly comparing the cued and uncued conditions for the neutral words, we observed neural changes in two directions: some specific regions exhibited an increased activation (Cued > Uncued), while others showed a reduced activation (Uncued > Cued). These differential patterns involved distinct brain networks and corresponded to the distinct integration and segregation mechanisms, respectively, rather than a global loss of activation (pages 20-21).

      Reviewer #3 (Public review):

      Summary:

      This study aims to provide the first direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention - a framework that has shaped behavioral research for more than two decades but has lacked clear neural validation. By combining an inhibition-of-return (IOR) paradigm with a modified Stroop task in an optimized event-related fMRI design, the authors examine how attentional integration and segregation processes are implemented at the neural level and how these processes interact with semantic and response conflicts. The central goal is to map the distinct neural substrates associated with integration and segregation and to clarify how IOR influences conflict processing in the brain.

      Strengths:

      The study is well-motivated, addressing a theoretically important gap in the attention literature by directly testing a long-standing behavioral framework with neuroimaging methods. The experimental approach is creative: integrating IOR with a Stroop manipulation expands the theoretical relevance of the paradigm, and the use of a genetic algorithm-optimized fMRI design ensures high efficiency. Methodologically, the study is sound, with rigorous preprocessing, appropriate modeling, and analyses that converge across multiple contrasts. The results are theoretically coherent, demonstrating plausible dissociations between integration-related activity in the fronto-parietal attention network (FEF, IPS, TPJ, dACC) and segregation-related activity in medial temporal regions (PHG, STG). The findings advance the field by supplying much-needed neural evidence for the integration-segregation framework and by clarifying how IOR modulates conflict processing.

      Weaknesses:

      Some interpretive aspects would benefit from clarification, particularly regarding the dual roles ascribed to dACC activation and the circumstances under which PHG and STG are treated as a single versus separate functional clusters. Reporting conventions are occasionally inconsistent (e.g., statistical formatting, abbreviation definitions), which may hinder readability. More detailed reporting of sample characteristics, exclusion criteria, and data-quality metrics-especially regarding the global-variance threshold-would improve transparency and reproducibility. Finally, some limitations of the study, including potential constraints on generalization, are not explicitly acknowledged and should be articulated to provide a more balanced interpretation.

      We thank the reviewer for the positive and constructive assessment of our study. In response to the concerns raised, we have carefully revised the manuscript and addressed all points in detail below. In brief, we have clarified key interpretation issues in the Discussion section, including the complementary roles of dACC activation and the distinction between statistical clustering and functional interpretation of PHG and STG activations (pages 20-21). We have also improved transparency and reporting throughout the manuscript by providing more detailed sample characteristics, clarifying exclusion criteria and global variance computation, adding illustrative supplementary figures, and standardizing statistical reporting and abbreviations (pages 28, 33). Finally, we have added a concise paragraph on limitations of the study to provide a more balanced interpretation of the findings (pages 26-27). Detailed, point-by-point responses to all specific comments are provided below (see the “Recommendations for the authors” Section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      (1) The figure caption contains an unclear sentence (lines 195-196): "The target was a 450-ms colored Chinese character presented 600 ms after the fixation cue onset at the two target locations with equal probabilities." This description is ambiguous and should be revised for clarity.

      Thanks for pointing this out. In the revised manuscript, we have rephrased the figure caption to improve clarity as follows (pages 9-10):

      “Each trial started with a 150-ms non-informative cue presented at one of the two peripheral boxes. After a 150-ms interstimulus interval (ISI), a 150-ms fixation cue was presented at the central fixation box. Following a further 450-ms ISI, the target, a colored Chinese character, appeared at one of the two target locations with equal probabilities and remained on the screen for 450 ms. The trial ended with a variable intertrial interval (ITI) of 850, 1050, 1250, or 1450 ms (with equal probabilities).”

      (2) Please provide a more detailed and clearer description of the SI and RI experimental conditions in the Methods section.

      Thanks for this helpful suggestion. We have revised the Methods section to provide a more detailed description of the SI and RI conditions. Specifically, we have further described the stimulus-response mapping and clarified how the SI and RI conditions are defined based on whether the ink color and the character meaning fell into the same or different response categories under this mapping. In addition, we have added a clarification in the Methods section to make it clearer that the SI trials involved semantic conflict without response conflict, whereas RI trials involve both semantic and response conflicts (page 29).

      (3) As the data were collected across two research centers, please clarify the number of participants enrolled at each site.

      Thanks for this suggestion. We have now explicitly stated in the Apparatus and Data Acquisition section that 16 participants were enrolled at each site. The revised text reads (page 31):

      “The imaging data were acquired at two research sites following comparable protocols, with equal numbers of participants scanned at each site (n = 16 per site).”

      (4) In the behavioral data analysis, please provide the rationale or justification for the criteria used to exclude trials.

      Thanks for this comment. In the revised manuscript (page 32), we have clarified that reaction times (RTs) shorter than 150 ms were excluded as anticipatory responses, and RTs longer than 1,300 ms were excluded to limit the influence of unusually slow responses. These exclusion criteria are commonly adopted in RT research and were applied consistently across all conditions (Ratcliff, 1993; Whelan, 2008).

      (5) Given that the behavioral interaction effect was not statistically significant, the conclusion on lines 236-237, "These data clearly isolated the two distinct conflict effects in the Stroop effect, namely the semantic conflict (SI-NE difference) and the response conflict (RI-SI difference)" appears overstated and should be softened accordingly.

      We thank the reviewer for this important comment. We have clarified that our original statement was intended to highlight the successful isolation of conflict types based on the significant main effects of congruency (validating the task design), rather than implying a significant interaction effect. However, we agree that the original phrasing appeared unclear in this context. We have therefore revised the sentence to adopt a more cautious tone in the revised manuscript (page 12):

      “These data demonstrated typical Stroop interference effects (Veen & Carter, 2005) in both the semantic (SI-NE difference) and response conflicts (RI-SI difference).”

      (6) The statement on lines 281-282, "Although the IOR effect showed no effect on either the semantic conflict difference (SI-NE) or the response conflict difference (RI-SI) in the behavioral performance" lacks supporting statistical evidence. Please report the relevant test statistics.

      We appreciate the reviewer’s careful reading and note that the relevant statistical evidence was missing from the original manuscript. This has now been added in the revised version. Specifically, we examined the interactions between cue validity and semantic conflict (SI vs. NE) as well as between cue validity and response conflict (RI vs. SI). Neither interaction was significant (see revised Results for full statistics on page 12), supporting our original statement that cue validity did not modulate either conflict component in behavioral performance.

      (7) The manuscript mentions that a null condition (with no Chinese character presented) was included to increase statistical power for detecting differences across conditions. However, it is unclear how this null condition was actually used in the data analyses. Please clarify the role of the null condition in both the behavioral and neuroimaging analyses.

      Thanks for this comment. We regret that this was not sufficiently clear in the original manuscript. The null condition was included for neuroimaging purposes and was not used in the behavioral analyses, as no response was required in these trials. In the fMRI analyses, null trials served as the implicit baseline and were not modeled as regressors of interest. Task-related activities for all experimental conditions were therefore estimated relative to this null baseline, facilitating estimations of task-related responses in randomized event-related designs (Burock et al., 1998; Friston et al., 1999; Liu, 2004). We have clarified this point in the revised manuscript (page 29).

      References

      Burock, M. A., Buckner, R. L., Woldorff, M. G., Rosen, B. R., & Dale, A. M. (1998). Randomized event-related experimental designs allow for extremely rapid presentation rates using functional MRI. NeuroReport, 9(16), 3735-3739. https://doi.org/10.1097/00001756-199811160-00030

      Friston, K. J., Zarahn, E., Josephs, O., Henson, R. N. A., & Dale, A. M. (1999). Stochastic designs in event-related fMRI. NeuroImage, 10(5), 607-619. https://doi.org/10.1006/nimg.1999.0498

      Liu, T. T. (2004). Efficiency, power, and entropy in event-related fMRI with multiple trial types: Part II: design of experiments. NeuroImage, 21(1), 401-413. https://doi.org/10.1016/j.neuroimage.2003.09.031

      Ratcliff, R. (1993). Methods for dealing with reaction time outliers. Psychological Bulletin, 114(3), 510-532. https://doi.org/10.1037/0033-2909.114.3.510

      Whelan, R. (2008). Effective analysis of reaction time data. The Psychological Record, 58(3), 475-482. https://doi.org/10.1007/BF03395630

      Reviewer #2 (Recommendations for the authors):

      (1) The paper is a bit too lengthy, with a lot of information that is hard for non-experts to grasp.

      We thank the reviewer for this comment. We realized that the Introduction was the most challenging section for general readers. In the revision, we refined the text in the Introduction for a better structure and more reader-friendly wording to improve readability. In addition, following the reviewer’s suggestion (Recommendation 4 below), we have added short subsection titles to the Introduction, Results, and Discussion sections to better organize the content and highlight the main ideas. We hope these revisions make the manuscript more accessible and easier for a broader audience to follow.

      (2) Please double-check the stats, as some of the results presented in the main text do not align well with the figures. Take Figure 2 as an example.

      We appreciate the reviewer’s concern and have double-checked all statistics. All the results are consistent between the figures and the main text. Take Figure 2 as an example (page 12), the perceived discrepancy probably was caused by the fact that the descriptive values reported in the main text are marginal means for the main effects (i.e., the overall average of one factor, collapsed over the other factor), whereas Figure 2 shows the mean for each Congruency × Cue Validity condition (i.e., simple effect).

      (3) The reasoning that the neuroimaging findings support the dissociation between integration and segregation needs to be improved.

      We thank the reviewer for this important comment. In the revised Discussion (pages 1921), we have strengthened the reasoning linking our neuroimaging findings to the dissociation between the integration and segregation processes. Specifically, we make it clear how the distinct activation patterns observed for the cued and uncued targets map onto the different functional demands proposed by the integration-segregation theory. The cued targets were theorized to recruit the frontoparietal attentional control networks, consistent with the re-engagement of an existing object file (integration). On the other hand, the uncued targets should engage the medial temporal and temporal association regions responsible for novelty detection and episodic encoding, consistent with the creation of a new object file (segregation). We hope the reviewer finds that the revision offers a clearer explanation of how the observed neural patterns are consistent with a dissociation between the integration and segregation processes.

      (4) Please use short section titles to organize the introduction, results, and discussion sections. For instance, the discussion section is a long chunk of text (almost 9 pages) and is pretty dense, making it hard to quickly grasp the ideas the authors want to convey.

      Thanks for this helpful suggestion. Following the reviewer’s recommendation, we have now added short subsection titles to the Introduction and Discussion sections to improve structure and readability. For the Results section, we have maintained and further refined the existing subheadings to ensure consistent organization.

      Reviewer #3 (Recommendations for the authors):

      I found this manuscript to be a timely and substantive contribution to the study of attention and cognitive neuroscience. To my knowledge, it provides the first direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention, a framework that has been influential in behavioral work for more than two decades but has lacked clear neural support. The study is conceptually well motivated, methodologically solid, and generally clearly reported. The findings differentiate neural substrates associated with integration and segregation processes and further show how inhibition of return (IOR) interacts with semantic and response conflicts at the neural level.

      The manuscript is well organized, the writing is mostly clear, and the progression from theory to hypotheses and methods is easy to follow. The combination of IOR with a modified Stroop paradigm is a clever choice that extends the theoretical scope of exogenous attention research. The use of an optimized event-related fMRI design based on a genetic algorithm is also a strength and reflects careful attention to design efficiency.

      The main results are internally consistent and theoretically meaningful. Integration related activity in the fronto-parietal attention network (including FEF, IPS, TPJ, and dACC) and segregation-related activity in medial temporal areas (PHG and STG) it well with the proposed framework, and the pattern of activations is coherent across analyses.

      Overall, I think this is a carefully executed study that offers much-needed neural evidence bearing on the integration-segregation theory of exogenous attention. I would recommend the following revisions.

      Suggestions:

      (1) In the Discussion (pp. ~17-18), dACC activation is described both in terms of general cognitive control demands and as reflecting a possible inhibitory bias toward the cued direction. It would help the reader if you could briefly indicate whether you see these as complementary (e.g., dual roles within the same region) or as more competing interpretations.

      We thank the reviewer for this helpful comment. We have clarified in the revised manuscript that dACC exerts general cognitive control demands and biasing against the cued direction are complementary rather than competing interpretations. Specifically, we described how the dACC is involved in both the cognitive control required for target integration and the inhibitory bias toward the cued location, thereby highlighting its dual roles within the same region. The revised section reads as follows (page 20):

      “Furthermore, the observed increase in the left dACC activity under the cued relative to the uncued condition likely reflected the engagement of cognitive control mechanisms (Botvinick et al., 2004; Chung et al., 2024; Mayer et al., 2012; Veen & Carter, 2005), particularly in resolving the conflict between the task-driven requirement of target integration and the reduced accessibility of the cue-initiated representation. In this context, the heightened activation of dACC may also reflect its role in fulfilling the inhibitory bias toward the cued location (Mayer et al., 2004) and discouraging inefficient integration attempts at a location marked as less relevant.”

      (2) In the Discussion, you could consider adding a short paragraph explicitly acknowledging a few limitations and how they might constrain generalization of the findings. A concise reflection of this kind would give a more balanced picture without undermining the main conclusions.

      We appreciate this helpful suggestion. In the revised manuscript, we have added a concise paragraph explicitly addressing a key limitation of the present study (pages 26-27). Specifically, we acknowledge that the absence of behavioral interactions alongside clear neural effects requires cautious interpretation. We discussed how this dissociation may reflect differences in measurement sensitivity between behavioral and neural indices, consistent with prior findings (Chen et al., 2006; Wilkinson & Halligan, 2004). We also note that the use of a GA-optimized sequence, while improving statistical efficiency, may have introduced unintended regularities in event order that could influence behavioral strategies.

      (3) Since the dataset is hosted on GitHub, adding a short note in the Data Availability section about whether the repository will also include analysis scripts or future replication data would further enhance transparency and long-term usefulness.

      Thanks for this helpful suggestion. We have revised the Data Availability section (page 35) to clarify that the GitHub repository contains the processed data used in the final analyses. Analysis scripts and additional materials for replication are available from the authors upon reasonable request.

      (4) In the Results section, the formatting of statistics is not fully consistent. For example, some reports use spaces around symbols (e.g., "η<sup>2</sup> = 0.301") whereas others do not (e.g., "p< .001"). It would be good to standardize this (e.g., "p < .001", "η<sup>2</sup> = .30") across the manuscript.

      Done as suggested.

      (5) A few abbreviations appear before they are defined-for instance, SPC (superior parietal cortex) shows up in the Results (response conflict section) before the full name is given. Ensuring that each abbreviation is defined at first mention would help readers who may be less familiar with all of the regional acronyms.

      Thanks for this comment. We have conducted a thorough check of the manuscript and ensured that all abbreviations are defined upon their first occurrence.

      (6) The text sometimes refers to "PHG/STG" as a combined cluster, while at other points, PHG and STG are described separately. It would be useful to clarify under what circumstances they are treated as a single functional cluster versus distinct regions of interest, and to keep the nomenclature as consistent as possible between the main text and the tables.

      Thanks for raising this point. In the revised manuscript, we have clarified this issue by distinguishing between statistical clustering and functional interpretation. In the whole brain analysis, activations in the left hemisphere formed a single continuous cluster spanning the PHG and STG; therefore, this cluster is labeled as “PHG/STG” in Table 1. We have explicitly noted the continuous nature of this cluster in the Results section (page 15) to ensure clarity:

      “Notably, in the left hemisphere, these activations formed a continuous cluster spanning both regions (labeled as PHG/STG in Table 1).”

      (7) It would be helpful to provide a bit more detail about the sample characteristics (e.g., age range, handedness, and inclusion/exclusion criteria) and to state explicitly how many participants, if any, were excluded from the analyses and for what reasons. This would help readers better evaluate data quality and generalizability.

      Thanks for this helpful suggestion. We have revised the Participants section (page 28) to provide the full details regarding our sample:

      “32 healthy participants with normal or corrected-to-normal vision and normal color vision were recruited. All participants were right-handed and reported no history of neurological or psychiatric disorders. Data from three participants were excluded due to excessive head movements and high global variances (see fMRI Data Analysis), leaving 29 participants for analysis (18 female, 11 male; aged 18-30 years, M = 22.69, SD = 2.58).”

      Furthermore, we have provided a clearer description of the exclusion criteria in the Data Analysis section (pages 33-34) as follows:

      “Runs with motions exceeding one voxel length in any direction were excluded (resulting in the exclusion of two runs) …Runs with global variance equal to or over 0.1% were excluded, resulting in the exclusion of eight runs (see Supplementary Information for details). Ultimately, three participants were excluded because neither run met the quality criteria. All remaining participants retained both runs, except for three individuals who each contributed only one valid run.”

      (8) Given that participants were excluded based on global variance exceeding 0.1%, it would be very informative to include, in the Supplementary Materials, an illustrative figure showing the signal time series (or global signal variance over time) for excluded participants.

      We appreciate this valuable suggestion. In the revised Supplementary Materials, we have included a new figure (Figure S2) that plots the global signal time series for the excluded runs to illustrate the signal patterns that led to their exclusion based on global variance.

      (9) Relatedly, it may help to more explicitly describe how global variance was computed (e.g., over which time window, after which preprocessing steps, and whether it was calculated on whole-brain signal or within specific masks). A concise clarification would make the exclusion criterion easier to interpret.

      Thanks for this helpful suggestion. We have now clarified in the manuscript how global variance was computed (page 33) and have also provided a more detailed description of the computation procedure in the Supplementary Materials (page 4). Specifically, after the standard preprocessing (slice timing correction, 3D motion correction, spatial smoothing, linear trend removal, and high-pass temporal filtering), the global signal was computed for each run as the mean signal across voxels with intensity values greater than 100 in each volume. Global variance was then quantified as the temporal variance of this run-wise global-signal time course across all volumes, providing a quality-control index of signal stability.

      (10) Rather than only reporting a single overall exclusion rate (e.g., 5.52% of total trials), it would be informative to break this down by source, reporting separately the proportion of trials excluded as RT outliers and the proportion excluded due to response errors. This would further improve transparency regarding the behavioral preprocessing pipeline.

      Thanks for this helpful suggestion. We have now broken down the overall exclusion rate by source in the revised manuscript. Specifically, we reported that 4.29% of trials were excluded due to incorrect responses, and 1.24% of trials were excluded as RT outliers (page 32).

      References

      Botvinick, M. M., Cohen, J. D., & Carter, C. S. (2004). Conflict monitoring and anterior cingulate cortex: an update. Trends in Cognitive Sciences, 8(12), 539-546. https://doi.org/10.1016/j.tics.2004.10.003

      Chen, Q., Wei, P., & Zhou, X. (2006). Distinct neural correlates for resolving stroop conflict at inhibited and noninhibited locations in inhibition of return. Journal Of Cognitive Neuroscience, 18(11), 1937-1946. https://doi.org/10.1162/jocn.2006.18.11.1937

      Chung, R. S., Cavaleri, J., Sundaram, S., Gilbert, Z. D., Del Campo-Vera, R. M., Leonor, A., Tang, A. M., Chen, K.-H., Sebastian, R., Shao, A., Kammen, A., Tabarsi, E., Gogia, A. S., Mason, X., Heck, C., Liu, C. Y., Kellis, S. S., & Lee, B. (2024). Understanding the human conflict processing network: A review of the literature on direct neural recordings during performance of a modified stroop task. Neuroscience Research, 206, 1-19. https://doi.org/10.1016/j.neures.2024.03.006

      Mayer, A. R., Seidenberg, M., Dorflinger, J. M., & Rao, S. M. (2004). An event-related fMRI study of exogenous orienting: supporting evidence for the cortical basis of inhibition of return? Journal Of Cognitive Neuroscience, 16(7), 1262-1271. https://doi.org/10.1162/0898929041920531

      Mayer, A. R., Teshiba, T. M., Franco, A. R., Ling, J., Shane, M. S., Stephen, J. M., & Jung, R. E. (2012). Modeling conflict and error in the medial frontal cortex. Human Brain Mapping, 33(12), 2843-2855. https://doi.org/10.1002/hbm.21405

      Veen, V. V., & Carter, C. S. (2005). Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. Neuro Image, 27(3), 497-504. https://doi.org/10.1016/j.neuroimage.2005.04.042

      Wilkinson, D., & Halligan, P. (2004). The relevance of behavioural measures for functional imaging studies of cognition. Nature Reviews Neuroscience, 5(1), 67-73. https://doi.org/10.1038/nrn1302

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public reviews:

      Reviewer #1 (Public review):

      The sample size for the ex vivo electrophysiology conducted on the calb1+ lamina I projection neurons (Figure 5) is limited to a total of six recorded neurons. Given the difficulty and complexity of the preparation, this is understandable. Notably, since approximately 87% of lamina I projection neurons heavily innervated by Trpm8+ terminals are calb1+, these six recordings of such neurons in Figure 4E could also be calb1+.

      As noted in our initial resubmission, we fully accept that the sample size is limited. We have already toned down statements related to this, to say that our findings “strongly suggest” that the cells with dense Trpm8 input are cold-selective (both in the Abstract and Results)

      Reviewer #2 (Public review):

      In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      The authors acknowledge that, technically, this is a very difficult preparation with very low yield as far as obtaining successful recordings. Moreover, the tissue needs to be maintained at room temperature which is obviously not ideal when characterizing cold thermoreceptors due to the unavoidable effects of low temperature on cold-activated receptors.

      Please see our response to Reviewer #1 (Public review):

      Reviewer #3 (Public review):

      The main limitation remains the relatively small number of neurons that could be recorded electrophysiologically. While understandable given the complexity of the preparation, this necessarily limits generalization.

      Again, please see our response to Reviewer #1 (Public review):

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Line 609. The authors used the Trpm8Flp;RCE:FRT;Ai9 mice in some electrophysiological experiments. What is the function of the Ai9 allele (a Cre-dependent reporter) in this cross? Should not be a Cre line as well?

      One of the mice used for electrophysiological experiments was Trpm8Flp;RCE:FRT;Ai9, and this animal received an injection of AAV encoding Cre into the caudal ventrolateral medulla, resulting in tdTomato expression in spinal projection neurons. This part of the Methods was inadvertently omitted from the resubmitted version (see next point). This has been corrected, and in addition, this information is shown in the cartoon in Fig 4A and is explained in the figure legend.

      (2) Line 860. Phrase is incomplete

      We apologise for this – 3 lines from the original version had been deleted inadvertently. This has now been corrected.

      (3) Line 103 "These results are therefore consistent with the transcriptomic findings described above (36,37)."

      I would revise the references used to support this claim. Reference 37 is a transcriptomic atlas of the brain. I could not find TRPM8 expression data in DRG in this reference.

      Figure S4 of reference 37 deals with the mouse peripheral nervous system and describes Trpm8 classes of primary afferent. More detail on these cells (including expression of VGLUT3, Tac1, Calca and Trpv1) can be found in the associated website: mousebrain.org/adolescent/genesearch.html. We have therefore left this reference as it is.

      (4) Line 242. "neurons with dense Trpm8 input had significantly lower sEPSC frequencies compared to those that lacked dense Trpm8 input".

      This is an interesting paradox because cold thermoreceptors (i.e. the presumed direct monosynaptic input to these projection neurons) are known to be spontaneously active at physiological skin temperatures. This is well characterized in trigeminal corneal endings (DOI: 10.1038/nm.2264). In fact, the decrease in this spontaneous activity can be used by mice to faithfully detect warm stimuli (DOI: 10.1016/j.neuron.2020.02.035). This reviewer likes to remark that this low spontaneous frequency may be due to the non-physiological temperature of this preparations, leading to partial adaptation/desensitization of the afferents. Perhaps, it also influences the amplitude (e.g. release probability) of EPSPs (I do not expect you to do anything about my remark).

      These are interesting points, but we do not feel that we can add anything here.

      (5) Figure 3A. It would be useful to include orientation references (dorso-ventral, mediolateral) in the images. Same comment applies to Figure 5C.

      Since these are horizontal sections, the axes are medio-lateral and rostro-caudal. Corresponding orientation markers have been added to both figures.

      (6) Figure 3F. If I understood correctly, the light pulse used for optogenetic activation is delivered directly through the objective used for recording the cell. Thus, the distance between pre and postsynaptic neuron should be minimal. That being the case, I do not understand how a monosynaptic input can have a delay of 5 or 7 ms. Am I missing something?

      The relatively long duration of latency is likely to reflect a slow rise time of depolarisation in the Trpm8 terminals, so that although channels will open very rapidly, there is a delay until the boutons reach action potential threshold. Hachisuka et al (2016) recorded from Nts<sup>Cre;</sup>Ai32 mice (i.e. coding for channelrhodopsin) and found typical latencies of >5 ms (Fig 5E in that paper). We believe that this delay is exacerbated by the low levels of expression of ChR2 that we were able to achieve with the neonatal i.p. injection approach. We have provided a brief explanation for this, and cited the reference in the Results section (lines 197-198).

      (7) Figures 4E/H. To be meaningful, the pie charts should include the n (total number of neurons). See, for example figure 5J.

      Numbers have been added to the pie charts.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Speed and mismatch between locomotion and visual stimulation.

      The authors do not clearly describe the definition of locomotion versus the resting state. The speed should, by itself, have an impact on neuronal responses, especially at the onset of locomotion. Several published studies show that the mismatch between a visual stimulus and the speed of the animal induces specific responses in V1, both in excitatory and subtypes of inhibitory neurons. The authors should address these points upfront in the manuscript, since it is likely a major variable explaining their results.

      We will clarify in the methods that a trial was considered as locomotion when an animal ran at a minimum of 3 cm/s for at least 80% of the 10 s stimulus presentation, and was considered rest when running under 3 cm/s during the same fraction of time. Trials with abrupt changes from locomotion to rest were rare and excluded following these criteria.

      Locomotion speed and visuomotor mismatch can influence neuronal responses in V1 but in the large majority of our trials mice either run continuously at a stable speed or remained still

      i.e locomotion onsets or offsets did not occur (see Hinojosa et al. 2026 for example running traces). Furthermore, sensitizing and depressing neurons were typically recorded simultaneously within the same field of view, experiencing identical locomotor behaviour. For these reasons, we think it is unlikely that differences in speed or mismatch alone can account for the different increase in amplitude observed between depressors and sensitizers.

      To directly address this point and further explore the role of speed on V1 neurons, we will quantify the relationship between running speed and amplitude increase in both PCs and interneurons, and include these analyses in the revised version of the manuscript.

      (2) Use of deconvolution with MLSpike.

      Some results (Figure 2) exclusively depend on the deconvolution of calcium signals into spikes (since the initial peak is not seen in calcium transients). The authors should validate this result either with electrophysiological recordings or with the use of another deconvolution method (e.g CASCADE), emphasising the limitations of this approach and the limitations of the time resolution of calcium imaging.

      A similar initial increase in amplitude followed by fast depression has been observed previously with electrophysiological recordings in V1 (Chance et al., 1998; Jin & Glickfeld, 2020; Varela et al., 1997). We will further validate our results using an alternative spike inference method like CASCADE (Rupprecht et al., 2021), as well as expanding on the limitations of our approach.

      (3) The manuscript is centred around a specific increase in visual responses in sensitizing neurons during locomotion, both in the fraction of responsive neurons and response magnitudes.

      It is hard to tell whether this difference is due to a greater scaling effect of locomotion, a difference in responses during the resting state, or both. The manuscript should further explore and discuss the differences in responses between sensitizing and depressing neurons, both during the resting state and locomotion. Adding metrics and direct comparisons of the magnitudes of fast responses, slow responses, and time integrals between sensitizing and depressing neurons in resting and locomotion states would help to clarify this. Same for fractions of responsive neurons of each type in each condition. E.g., the slow phase is harder to judge from the plots, but the DeltaF/F integral shown in Figure 1G seems to suggest the difference in response magnitude between sensitizing and depressing neurons is largest in locomotion state, rather than resting state. How do these integrals look for inferred firing rates shown in Figure 2?

      We will further explore the response dynamics of adaptive types within the locomotion and resting state, highlighting the differences between calcium signals and inferred spikes. We will then include our findings in the new version.

      (4) There is something counterintuitive about how the changes in inhibition onto sensitizing and depressing neurons during locomotion explain the reported activity changes.

      Sensitizers receive reduced SST input and increased PV input during locomotion. If SSTs depress and PVs sensitize (and this is the main reason why sensitizers, which receive dominant input from SSTs sensitize, and vice-versa), how is it possible that this switch does not alter the sensitizing or depressing nature of these neurons' responses in locomotion? Are these changes insufficient to flip the dominant SST-PV drive? Figure 6D-E seems to show there is a flip, at least for sensitizers. How do authors explain this? Do authors think this is related to the narrowing of the adaptive index distribution shown in Figure 1C?

      This result is only counterintuitive if we consider exclusively the internal connections within V1. The PV:SST ratio changes from 0.9 during rest, dominated by SST induced sensitization, to 1.2, dominated by PV depression. Although adaptation is strongly driven by the opposing inhibition of PV and SST in PCs during locomotion, its origin is more easily explained by an external input (SS) that targets VIPs, PVs and PCs. As a result, when locomotion increases the drive coming from SS input, it injects a source of sensitization that partly balances the decrease in PV:SST ratio, preventing a switch in their adaptive properties which, although reduced, remain sensitizing. We will include these calculations in the revised version.

      (5) Presentation of the experimental data and the model.

      The manuscript introduces the results of interneuron recordings during the description of the model. Similarly, the results of optogenetic manipulations are presented inside the model's description. It would be clearer to present all experimental data first and introduce the model later, fitting it to all experimental evidence previously presented.

      We understand that a clear separation between experimental and modelling results is often preferred in papers that combine these approaches but in our case modelling and experimental data are highly interdependent and we believe that an overlapping presentation make it easier for the reader to appreciate the links. One example is Fig. 2G-L that shows experimental results validating a key feature of the model - the use of average response dynamics for each population of interneuron. Similarly, the results in Fig. 3 validate the use of the VIP response dynamics as the template for the slow modulatory input to layer 2/3. Then the results of optogenetic experiments in Fig. 4 are used to narrow down fits to the model. For these reasons, we have chosen to present experimental results and the model in this more integrated manner.

      Reviewer #2 (Public review):

      In the model, they postulate that synapses within the 6-cell-type network - sensitizing, intermediate, and depressing E cells, and PV, SST, and VIP I cells - and from three sources of external input to each of the six types all change between rest and locomotion (except that connections between the E cells don't depend on their types). There are a lot of degrees of freedom, and this makes interpretation of the results difficult. I would have liked to have seen more efforts to constrain the degrees of freedom. For example, there seems to be very little difference between the three E cell types in any of the three types of external input received. Why not constrain them all to get the same external input and see if it significantly affects model fit? Or what if synapses from the three types of external input are left unchanged, and only change their strengths between rest and locomotion? How well could this do? During optimization, why not constrain the changes between rest and locomotion, for example, by putting an L1 penalty on the changes or the relative changes, trying to force them to be sparse, and see whether there are roughly equally good fits? And then, if the main changes are in a small set of synapses, can the authors isolate changes to that small set and do roughly equally well? What about looking at the principal components of the weight changes across models, to isolate patterns of change that are most important?

      To reduce the number of degrees of freedom and ease interpretation we did limit the model fitting for adaptive subtypes by fixing the PC-PC (𝑤<sub>𝑃𝐶_𝑃𝐶</sub>) and restricting the external inputs weights (𝑤<sub>𝐹𝐹_𝑃𝐶</sub>, 𝑤<sub>𝑆𝑆_𝑃𝐶</sub>, 𝑤<sub>𝐹𝐵_𝑃𝐶</sub>) to changes of ± 10 %. We will explicitly explain these constrains in the methods and discuss its limitations.

      We thank the reviewer for their suggestions of testing different conditions to find those providing the best fit for sensitizing and depressing PCs. We tried an approach similar to that described by Dipoppa et al. 2018 by using the locomotion weights as initial conditions for the rest traces and introducing penalties at later stages. However, the local optimization algorithms failed to reach distant regions of parameter space containing minimum solutions for the rest condition. We finally opted for repeating the same process of initial condition searching for locomotion and rest, making the L1 penalty approach impracticable in our case. We believe this approach is effective because it has both allowed us to describe circuit changes during internal-state transitions (the present paper) and, more recently, it has made a series of predictions about different learning states that have been confirmed by optogenetic tests (Hinojosa et al., 2026). We will nevertheless explore this and other of the reviewer suggestions to further optimize the fitting in the revised manuscript.

      In terms of comparing to previous works, when optogenetic manipulations of SST and PV are done to test various hypotheses, I would like to see some discussion of what is already known from the authors' 2022 paper and what they are adding or testing that wasn't known or tested from that paper. And Dipoppa et al (2018) also found weight changes to account for the difference between rest and locomotion. They were looking at a fixed point of responses of neurons across retinotopic space to stimuli of various sizes with only one E-cell type, whereas they are accounting for trajectories across time considering 3 E-cell subtypes but without variation in stimuli or retinotopic position of neurons, so the efforts are somewhat different, but still, it would be good to see a bit more discussion of what is in agreement or in contradiction in the conclusions.

      Thanks for this prompt. We will add further discussion of this work in light of the Heintz et al. (2022) and Dipoppa et al. (2018) papers.

      (1) The main result is that sensitizers increase their responses with locomotion ~2X (for dF/F) or about 3.5X (for spikes) more than depressors. But there are other differences between sensitizers and depressors, for example sensitizers have smaller initial stimulus responses at rest, and depressors have larger. What if cells were divided into tertiles by initial stimulus response at rest? Would the authors see the same differences in the effects of locomotion? If so, can they establish whether the difference is really attached to the adaptation properties rather than to, for example, the initial responses, for example, by comparing the regression of response increase against AI vs the regression of response increase against initial resting response? And there might be other controls to be done for other features in which sensitizers and depressors differ.

      We will explore the possibility that initial response influences the increase in amplitude. Preliminary data suggest that initial amplitude is higher in depressors than in sensitizers.

      (2) Lines 103 and following: the authors refer to a "second notable change" which is the narrower distribution of adaptive effects, but I think this is trivial. The adaptive index is AI=(R1-R2)/(R1+R2), where R1 is response 0.5-2.5s after stimulus onset and R2 over 8-10s. But if the change is additive, as suggested by the dF/F figures (and I believe the distributions of AI here are based on dF/F measurements) -- adding the same constant to R1 and R2 will shrink |AI| without changing the sign of AI. So this would seem to just be a signature of a change that is primarily additive rather than multiplicative.

      Also, if the authors do decide that they are going to focus on spikes after showing the raw dF/F, then this analysis should be repeated for spikes.

      We agree with the reviewer and will change the text accordingly to highlight the additive nature of the change in amplitude. We will also show the analysis with spikes (this shows similar results as the calcium data).

      (3) Figure 2, F is supposed to be D minus E, but it doesn't look like it. For example, the initial response under locomotion is very similar in sensitizers and depressors, so the initial difference in F should be small, but it's not; and at rest, depressors initially have larger responses than sensitizers, whereas later depressors have smaller responses than sensitizers, yet the difference at rest is positive at all times. Something seems wrong here.

      We apologize for the confusion this has caused. Figure 2F does not represent the difference between sensitizing and depressing PCs from panels D and E. Instead, it shows the time-varying difference between locomotion and rest states of sensitizers (blue, in figure 2D) and depressors (green, in figure 2E). Thus, panel F shows within-population modulation by behavioural state, rather than differences between sensitizing and depressing neurons. We will amend the figure legend and main text to explain this point and avoid misinterpretation.

      Reviewer #3 (Public review):

      (1) Key concern is the usage of dF/F signals for all analyses, especially when comparing responses.

      (1a) Figure 1G: Comparison of sensitisers and depressors. It is important to consider what the baseline rates are when making these comparisons, especially when comparing the degree of effects between different cell types. For example, if baseline rates for sensitizers were overall higher, it would mean the difference in gain of response would be lower, and could affect the results in the opposite direction of what is claimed. One option to account for this would be to z-score the overall responses, using the same normalization for locomotion and rest. We also suggest plotting differences in sensitisers, intermediates, and depressors as a function of firing rate. Matching for firing rate across each PC categorization and calculating delta AI for each matched firing rate bin.

      (1b) Figure 2A-F: The above is an even more significant issue when it comes to estimating spiking rates. The methods do not state how dF/F is calculated. If these are based on using the pre-stim as the reference, the algorithms for spike rate used might not be appropriate if this were used. Using pre-stimulus referencing could result in the estimate going into the wrong range in the calculation of the spike rate.

      (1c) In both cases above, it could be a problem if baseline firing rates are different between cell types, or states (locomotion/stationary). The latter is established to have effects on many cell types measured, and so needs to be account ted for very carefully.

      The DF/F0 trace was calculated using the mode of the whole trace as F0. While this approach is less sensitive to biases than subtracting the pre-stimulus, it does not consider noise levels like the z-score suggested by the reviewer. We will, therefore, normalize the calcium traces to z-score to further account for changes in the baseline. Spike inference using MLSpike, however, explicitly models baseline noise and subtracts its effect from that of the spikes calculated from the calcium signal (Deneux et al., 2016). This transformation preserved the difference in amplitude triggered by locomotion between depressing and sensitizing PCs while revealing their similar baseline activity (see Figs. 2D,E and F). These results indicate that the distinct changes in response amplitude between sensitizing and depressing PCs during locomotion are not driven by baseline differences. We will add this explanation to the methods section.

      We will also plot the changes in activity with locomotion across cell types as a function of firing rate and add these results to the revised manuscript.

      (1d) It would be informative to see per-neuron comparison for adaptive indices during rest and locomotion states. This could be visualized using a scatter plot with AI-rest vs. AI-locomotion for Figures 1D- 1F and 2J- 2L.

      (1e) Are neurons more strongly modulated between locomotion and rest, also more likely to experience a shift in AI indices (i.e. delta AI). Is there a correlation between the change in firing rate between behavioral states and Delta AI (Loco-Rest)? If so, is this present for all neuron subtypes (e.g. VIP, SST, and PV)?

      Sorting was carried out separately on locomotion and rest data sets to capture the adaptive properties of the network under each condition. When assessing the change in adaptive index in individual cells there was a weak but significant correlation (r = 0.10, p<0.05), probably due to trial to trial stochasticity in the network which has been shown to be present in V1 (Carandini, 2004; Lee et al., 2010). Although adaptation profiles of individual PCs are not fully conserved across rest and locomotion, the observed overlap exceeds that expected by chance, suggesting that stochastic fluctuations modulate an underlying, stable circuit organization. Despite including the stochastic component of the responses, the conclusions hold: sensitizers undergo a larger gain modulation than that of depressors. We will include this analysis and the correlation between change in firing rate and Delta AI in the revised version of the paper.

      (1f) Optogenetic inhibition of VIP neurons on average abolished the slow depressive effects of adaptation in SST (Figure 3). The strength and prevalence of this effect are unclear. Perhaps one can perform a bootstrap control and opto AI indices and calculate whether AI was significantly reduced following optogenetics inhibition, and if so, on average, how likely was this to occur for the recorded SST neurons? This is important in knowing that the average effects (Figure 3D) aren't driven by a portion of SST neurons, especially as this is later used to confirm the region of parameter space and affects the subsequent results in Figure 4.

      The strength and prevalence of the effect are reflected in the distribution of AI changes across SST neurons, which is centred at AI = -0.3 ± 0.3, indicating a consistent reduction in AI across the population instead of being driven by a small portion of SST neurons. To further clarify this, we will report the proportion of SST neurons showing a reduction in AI and include statistical analyses on the changes.

      (2) Statistics for the effects. There is a mention of Liner mixed models, but no information is given on the actual models being used and tested. This is particularly for the case of Figure 1G, where there is a composition of effect sizes between different populations. What precise significance test is being used? Are the stats on paired cells when considering locomotion and rest?

      We used Linear mixed models to test for statistical significance between different conditions composed of hundreds of cells from several mice, i.e. nested analysis (cells nested within mice; see (Judd et al., 2017)). For analyses such as Fig. 1G, we considered locomotion state, adaptive type and their interaction (loco’adap) as fixed effects and mouse number as the random effect. The p-values depicted in the legend indicates the interaction between locomotion and adaptive type, i.e. the increase in amplitude during locomotion is significantly different in sensitizers compared to depressors with p < 0.0001. We will revise the method section and figure legends to explicitly describe the model and statistical test used.

      (3) Model parameters: It is acknowledged that there is a large range of parameters that can model the responses effectively, up to 11% of initial conditions. At 9000 initial conditions, this is around 1000. The parameter estimates are then considered as the mean of each parameter. This seems like a strange choice for a few different reasons:

      (3a) A mean solution might not be one of the solutions. Let's say the parameters range over a large dimensional space. They could occupy non-overlapping / discontinuous subspaces. In that case, the mean parameters do not necessarily fall within the solution subspaces. Therefore, this reduction to means might not be valid.

      (3b) Compare distributions rather than means. There are multiple distributions of parameters between conditions. All stats should be on the comparison of distributions rather than just the means.

      To test for the presence of subsets of solutions grouped around different parameter values we plotted the distribution of each parameter across all the good solutions found. Most of the weights were a gaussian distribution centred around the mean and, most importantly, none of them had two peaks. Furthermore, after computing the mean weight values we plotted the solutions given by them in the model, and it rendered a good fit as shown in the figures. We will include those distributions in the new version and base the overall comparison on these distributions.

      (4) Visualizing weight matrices: It is very challenging to interpret the weight matrices. Furthermore, it appears that the stationary and locomotion conditions fit independently, and given the large parameter spaces, it is even harder to interpret. Can the fitting instead be done by fitting on one and using those at the initial conditions for the other state? Figure 7 shows an initiative cartoon, but it is not clear how the matrices in Figures 5 and 6 lead to the summary shown in Figure 7. It is also not clear why the connections between inhibitory neurons are not shown in Figure 7. One option is to perhaps run some kind of dimensionality deduction on the parameter space to better interpret the data. When showing deltaWeights, was the model initialised with 'Rest' weights and allowed to change? It is not obvious what the difference is between 'relative change in connection weights' and 'relative change in synaptic weights'.This needs to be clarified.

      Thanks for raising this concern. We will firstly try to make the weight matrices clearer to interpret.

      Regarding the fitting of rest and locomotion conditions, we fitted the locomotion traces first and used those solutions as initial conditions for the rest traces. However, this rendered no good solutions as minimums in the parameter space were too far from the initial starting points. We opted, therefore, for repeating the same process of initial condition searching for locomotion and rest. This approach is less biased in satisfying our aim of finding solutions that fit the data and can explain their dynamics, which are different for each condition. We believe this approach is effective, as not only has it allowed us to describe circuit changes during internal-state transitions but has also made a series of predictions under different learning states that were confirmed by optogenetic tests (Hinojosa et al., 2026).

      We simplified Fig. 7 for clarity but we will make it more accurate and explain it more in detail in the legend, including connections between interneurons.

      Interpreting high-dimensional parameter spaces can be challenging. In this study, we focused on low-dimensional summaries of the parameter space (e.g., average connection weights and their distributions across populations), which revealed consistent and interpretable differences between sensitizing and depressing neurons. Importantly, our conclusions do not rely on individual parameter values, but rather on systematic differences across populations that are robust across solutions. Additionally, we ran clustering analysis and found that there is no parameter that can be removed. We focused, therefore, on the larger and more robust differences. We will explore additional dimensionality reduction approaches and include these results if they provide further insight beyond the current analyses.

      Finally, the change in weights was calculated with equation 4, in which the weight from locomotion and rest, obtained through independent fits, were used to calculate the relative change from rest to locomotion. These were either connection weights (equation 2) which consider the strength of the connection between cell j and i, or synaptic weights (equation 3) which express the weight of individual synapses by dividing connection weights by the number of presynaptic cells and probability of connection. This distinction arises because we used average traces from all the neurons imaged to fit the model, requiring considering the number of cells to know the strength of individual synapses. We will add this explanation in the results and methods sections.

      (4a) Model parameters were reduced differently for locomotion and rest (Figure 4). We suggest evaluating the results for locomotion and rest using the same chi-square value of 3 for both behavioral states (at least in controls).

      Thank you for this prompt, this is an important point that we tried to resolve during our analysis. We used the reduced chi-square () to evaluate model fits within locomotion and rest condition independently. As defined in equation 12, reduced chi-square is inversely proportional to the standard error of the data which is higher in the rest dataset. As a consequence, setting the same threshold across conditions would not correspond to an equivalent goodness-of-fit criterion, and would impose a disproportionately strict constraint on the condition with lower variability, where deviations between model and data are more heavily penalized. For this reason, we used condition specific thresholds to ensure comparable fit quality relative to the noise level in each condition. In addition, to enable direct comparison across conditions independent of their noise levels, we used the RMSE as a complementary metric.

      References

      Carandini, M. (2004). Amplification of trial-to-trial response variability by neurons in visual cortex. PLoS Biol, 2(9), E264. https://doi.org/10.1371/journal.pbio.0020264

      Chance, F. S., Nelson, S. B., & Abbott, L. F. (1998). Synaptic Depression and the Temporal Response Characteristics of V1 Cells. The Journal of Neuroscience, 18(12), 4785–4799. https://doi.org/10.1523/JNEUROSCI.18-12-04785.1998

      Deneux, T., Kaszas, A., Szalay, G., Katona, G., Lakner, T., Grinvald, A., Rózsa, B., & Vanzetta, I. (2016). Accurate spike estimation from noisy calcium signals for ultrafast three-dimensional imaging of large neuronal populations in vivo. Nature Communications, 7(1), 12190. https://doi.org/10.1038/ncomms12190

      Dipoppa, M., Ranson, A., Krumin, M., Pachitariu, M., Carandini, M., & Harris, K. D. (2018). Vision and Locomotion Shape the Interactions between Neuron Types in Mouse Visual Cortex. Neuron, 98(3), 602–615.e608. https://doi.org/10.1016/j.neuron.2018.03.037

      Heintz, T. G., Hinojosa, A. J., Dominiak, S. E., & Lagnado, L. (2022). Opposite forms of adaptation in mouse visual cortex are controlled by distinct inhibitory microcircuits. Nature Communications, 13(1), 1031. https://doi.org/10.1038/s41467-022-28635-8

      Hinojosa, A. J., Dominiak, S. E., Kosiachkin, Y., & Lagnado, L. (2026). Distinct Disinhibitory Circuits Link Short-Term Adaptation to Familiarity and Reward Learning in Visual Cortex. bioRxiv, 2026.2003.2024.713929. https://doi.org/10.64898/2026.03.24.713929

      Jin, M., & Glickfeld, L. L. (2020). Magnitude, time course, and specificity of rapid adaptation across mouse visual areas. J Neurophysiol, 124(1), 245–258. https://doi.org/10.1152/jn.00758.2019

      Judd, C. M., Westfall, J., & Kenny, D. A. (2017). Experiments with More Than One Random Factor: Designs, Analytic Models, and Statistical Power. Annu Rev Psychol, 68, 601–625. https://doi.org/10.1146/annurev-psych-122414-033702

      Lee, J., Kim, H. R., & Lee, C. (2010). Trial-to-trial variability of spike response of V1 and saccadic response time. J Neurophysiol, 104(5), 2556–2572. https://doi.org/10.1152/jn.01040.2009

      Rupprecht, P., Carta, S., Hoffmann, A., Echizen, M., Blot, A., Kwan, A. C., Dan, Y., Hofer, S. B., Kitamura, K., Helmchen, F., & Friedrich, R. W. (2021). A database and deep learning toolbox for noise-optimized, generalized spike inference from calcium imaging. Nat Neurosci, 24(9), 1324–1337. https://doi.org/10.1038/s41593-021-00895-5

      Varela, J. A., Sen, K., Gibson, J., Fost, J., Abbott, L. F., & Nelson, S. B. (1997). A Quantitative Description of Short-Term Plasticity at Excitatory Synapses in Layer 2/3 of Rat Primary Visual Cortex. The Journal of Neuroscience, 17(20), 7926–7940. https://doi.org/10.1523/JNEUROSCI.17-20-07926.1997

    1. Author response:

      The following is the authors’ response to the original reviews

      Thank you very much for the positive and constructive feedback on our manuscript. We have revised the manuscript accordingly and have added a substantial number of additional experiments and have extended the data.

      Questions of the reviewers were focused mostly on mechanical insight into organoid formation, touching following aspects of lens organoid formation presented in the manuscript:

      - Cellular arrangements/re-arrangements during the process of lens formation including potential contribution of differential adhesion-mediated cell sorting to the cellular arrangement in the organoid and characterization of individual contributions of lens- and retina- committed progenitors to this process.

      - Activity of BMP and FGF signaling pathways during organoid formation, namely identification of tissue responding to the signaling withing forming organoids.

      - Contribution of externally supplemented Matrigel to the differentiation process and cellular arrangements in ocular organoids. 

      To address those points in detail we included additional experiments that are now presented in revised version of the manuscript, namely in revised Figure 2-figure supplement 1 (addressing contribution of Matrigel); new Figure 4-supplement 1/Video S5 (addressing contribution of differential adhesion-mediated cell sorting); revised Figure 4/Video S6/Video S7 (addressing contribution of lens-committed progenitors); revised Figure 6 (addressing BMP and FGF signaling pathway activities).

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The authors focused on medaka retinal organoids to investigate the mechanism underlying the eye cup morphogenesis. The authors succeeded to induce lens formation in fish retinal organoids using 3D suspension culture with minimal growth factor-containing media containing the Hepes. At day 1, Rx3:H2B-GFP+ cells appear in the surface region of organoids. At day 1.5, Prox1+cells appear in the interface area between the organoid surface and the core of central cell mass, which develops a spherical-shaped lens later. So, Prox1+ cells covers the surface of the internal lens cell core. At day 2, foxe3:GFP+ cells appear in the Prox1+ area, where early lens fiber marker, LFC, starts to be expressed. In addition, foxe3:GFP+ cells show EdU+ incorporation, indicating that foxe3:GFP+ cells have lens epithelial cell-characters. At day 4, cry:EGFP+ cells differentiate inside the spherical lens core, whose the surface area consists of LFC+ and Prox1+ cells. Furthermore, at day 4, the lens core moves towards the surface of retinal organoids to form an eye-cup like structure, although this morphogenesis "inside out" mechanism is different from in vivo cellular "outside -in" mechanism of eye cup formation. From these data, the authors conclude that optic cup formation, especially the positioning of the lens, is established in retinal organoids though the different mechanism of in vivo morphogenesis.

      Overall, manuscript presentation is nice. However, there are still obscure points to understand background mechanism. My comments are shown below.

      Major comments

      (1) At the initial stage of retinal organoid morphogenesis, a spherical lens is centrally positioned inside the retinal organoids, by covering a central lens core by the outer cell sheet of retinal precursor cells. I wonder if the formation of this structure may be understood by differential cell adhesive activity or mechanical tension between lens core cells and retinal cell sheet, just like the previous study done by Heisenberg lab on the spatial patterning of endoderm, mesoderm and ectoderm (Nat. Cell Biol. 10, 429 - 436 (2008)). Lens core cells may be integrated inside retinal cell mass by cell sorting through the direct interaction between retinal cells and lens cells, or between lens cells and the culture media. After day 1, it is also possible to understand that lens core moves towards the surface of retinal organoids, if adhesive/tensile force states of lens core cells may be change by secretion of extracellular matrix. I wonder if the authors measure physical property, adhesive activity and solidness, of retinal precursor cells and lens core cells. If retinal organoids at day 1 are dissociated and cultured again, do they show the same patterning of internal lens core covering by the outer retinal cell sheet?

      The question, whether different adhesive activity is involved in cell sorting and lens formation is indeed very intriguing.

      To address this point, we included additional experiments in the revised manuscript. As proposed by the reviewer, we performed dissociation and re-aggregation experiments of day one organoids at the timepoint, when retinal cell fate is already established and first cells with early lens fate (Foxe3::GFP positive) start appearing (see new Figure 4-figure supplement 1).

      After dissociation we followed Foxe3::GFP cells over time and observed that initially equally dispersed GFP<sup>+</sup> lens-committed cells gradually sort and establish contact with other GFP<sup>+</sup> cells, ultimately resulting in the formation of a central GFP<sup>+</sup> sphere within a retinal neuroepithelium (AcTub<sup>+</sup>) localized on the surface of the organoid (see new Figure 4-figure supplement 1e and new Video S5). This data show that differential adhesive properties of lens/retinal precursor cells can enable the formation of a spherical lens in the center of the organoid. This is now clearly stated in the revised version of the manuscript. 

      (2) Optic cup is evaginated from the lateral wall of neuroepithelium of the diencephalon. In zebrafish, cell movement occurs from the pigment epithelium to the neural retina during eye morphogenesis in an FGF-dependent manner. How the medaka optic cup morphogenesis is coordinated? I also wonder if the authors conduct the tracking of cell migration during optic cup morphogenesis to reveal how cell migration and cell division are regulated in lens of the Medaka retinal organoids. It is also interesting to examine how retinal cell movement is coordinated during Medaka retinal organoids.

      Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. Our previous study showed that optic vesicles of medaka retinal organoids do not form optic cups (for details please see Zilova et al., 2021, eLife). We provide evidence that the formation of cup-looking structure of the ocular organoids presented here is mediated by the following processes: establishment of retina and lens domains at specific regions of the organoid – retina on the surface and lens in the center (see Figure 3-figure supplement 1d and Figure 3e, and Figure 4). Further, the dislocation of the centrally formed lens towards the organoid periphery results in the opening of the retina layer, moving the lens to the periphery while retinal cells stay static. We propose that the “cup-like” shape is acquired by an extrusion-like process of the lens from the center of the organoid.

      To address the cellular mechanisms involved in this process, we included additional experiments and followed the movements of retinal and lens cells (see new Figure 4c and 4d, new Videos S6, S7 and S8). Retinal cells (tracked as nuclei of the Rx3::H2B-GFP transgenic line) established in the periphery display repeated short distance movements restricted to the retinal epithelium. These movements are characteristic for interkinetic nuclear migration as found in the developing retina. In contrast, Foxe3::GFP lens progenitor cells performed long distance movements from the center to the periphery of the organoid. This movement was accompanied by profound cell shape changes of lens progenitor cells, suggesting an active movement of lens cells to the organoid periphery. These movements are shown in new/extended figures and in new supplementary videos (new Figure 4c and 4d, new Videos S6, S7 and S8) in the revised version of the manuscript.

      (3) The authors showed that blockade of FGF signaling affects lens fiber differentiation in day 1-2, whereas lens formation seems to be intact in the presence of FGF receptor inhibitor in day 0-1. I suggest the authors to examine which tissue is a target of FGF signaling in retinal organoids, using markers such as pea3, which is a downstream target of ERK branch of FGF signaling. Since FGF signaling promotes cell proliferation, is the lens core size normal in SU5402-treated organoids from day 0 to day 1?

      Assessing the activity of FGF signaling (cross-reference to Reviewer #3) in the organoids is an important point that we have taken care of and included in the revised manuscript.

      To address this point, we assessed which tissue/part of the organoid is responding to FGF signaling. To do so we analyzed the presence of phosphorylated ERK (pERK1/2) as FGF signaling target in ocular organoids from day 1 to day 2. At day 1, only low levels of FGF signaling activity were detectable in presumptive retinal or/and lens tissue (see revised Figure 6b). Only half a day later, a significant increase in FGF activity was observed specifically in the central region of the organoids (lens progenitor domain) (at day 1.5), prior to the onset of differentiation of lens fiber cells. This, together with inability of lens progenitor cells to differentiate to lens fiber cells in the presence of FGF inhibitor SU5402 provided during this critical period (day 1 to day 2) demonstrates that FGF signaling activity localized in the lens progenitor cells is required for lens fiber differentiation.

      By day 2, FGF activity was detected in both lens and retinal tissue of the organoid. Similar patterns of FGF activity were observed in embryos at 2 days post fertilization (see revised Figure 6b).

      The treatment with the FGF signaling inhibitor SU5402 from day 0 to day 1 did have no impact on the core size of organoid the dimension of which were fully comparable to the control (please see Figure 6d).

      (4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      That is for sure an interesting question. We are aware of this population of cells. We currently do not have data that clarify the fate of those cells with the required certainty. Rather than speculating, we are currently following up on that question by scRNA sequencing, however we see that beyond the scope of the current manuscript.

      (5) Fig. 5e indicates the depth of Rx3 expression at day 1. Is the depth the thickness of Rx3 expressing cell sheet, which covers the central lens core in the organoids? If so, I wonder if total cell number of Rx3 expressing cell sheet may be different in each seeded-cell number, because thickness is the same across each seeded-cell number, but the surface area size may be different depending on underneath the lens core size. Please clarify this point.

      The referee is right, figure 5e indicates the thickness of the cell sheet expressing Rx3 positioned at the surface of the organoid. Indeed, the number of Rx3-expressing cells (and lens cells) scales with the size of the organoid as stated in the submitted manuscript. We have taken care to remove ambiguities related to that point in the revised version of the manuscript.

      (6) Noggin application inhibits lens formation at day 0-1. BMP signaling regulates formation of lens placode and olfactory placode at the early stage of development. It is interesting to examine whether Noggin-treated organoid expands olfactory placode area. Please check forebrain territory markers.

      What tissue differentiates at the expense of the lens in BMP inhibitor-treated organoids is of course an intriguing question.

      To address this point, we labeled Noggin treated organoids at day 2 and day 3 with forebrain and olfactory placode markers. We could identify an increase in the domains expressing Lhx2, HuC/D and Otx2 in Noggin-treated organoids, showing a shift of the preferential differentiation of the neurons of anterior forebrain identity (see attached figure for reviewer). However, the available markers Lhx2, HuC/D and Otx2 found in the olfactory placode are in addition also co-expressed in further neuronal cell types of the anterior forebrain. While the speculation is tempting, the shift in expression does not allow to conclusively state the expansion of the olfactory placode.

      Author response image 1.

      Expression of forebrain and olfactory placode markers.

      I have no minor comments

      Referees cross-commenting

      I agree that all reviewers have similar suggestions, which are reasonable and provided the same estimated time for revision.

      Reviewer #1 (Significance):

      Strength:

      This study is unique. The authors examined eye cup morphogenesis using fish retinal organoids. Eye cup normally consists of the lens, the neural retina, pigment epithelium and optic stalk. However, retinal organoids seem to be simple and consists of two cell types, lens and retina. Interestingly, a similar optic cup-like structure is achieved in both cases; however, underlying mechanism is different. It is interesting to investigate how eye morphogenesis is regulated in retinal organoids,under the unconstrained embryo-free environment.

      Limitation:

      Description is OK, but analysis is not much profound. It is necessary to apply a bit more molecular and cellular level analysis, such as tracking of cell movement and visualization of FGF signaling in organoid tissues.

      Advancement:

      The current study is descriptive. Need some conceptual advance, which impact cell biology field or medical science.

      Audience:

      The target audience of current study are still within ophthalmology and neuroscience community people, maybe translational/clinical rather than basic biology. To beyond specific fields, need to formulate a general principle for cell and developmental biology.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this study from Stahl et al., the authors demonstrate that medaka pluripotent embryonic cells can self-organise into eye organoids containing both retina and lens tissues. While these organoids can self-organize into an eye structure that resembles the vertebrate eye, they are built from a fundamentally different morphogenetic process - an "inside-out" mechanism where the lens forms centrally and moves outward, rather than the normal "outside-in" embryonic process. This is a very interesting discovery, both for our understanding of developmental biology and the potential for tissue engineering applications. The study would benefit from some additional experiments and a few clarifications.

      The authors suggest that the lens cells are the ones that move from the central to a more superficial position. Is this an active movement of lens cells or just the passive consequence of the retina cells acquiring a cup shape? Are the retina cells migrating behind the lens or the lens cells pushing outwards? High-resolution imaging of organoid cup formation, tracking retina cells in combination with membrane labeling of all cells would help elucidate the morphogenetic processes occurring in the organoids. Membrane labeling would also be useful as Prox1 positive lens cells appear elongated in embryos while in the organoids, cell shapes seem less organised, less compact and not elongated (for example as shown in Fig 3f,g).

      Looking into the detail of how the optic cup-like arrangement of ocular organoids is achieved on the cellular level is indeed highly interesting. In the revised manuscript we now provide evidence that the formation of cup-like structure of the ocular organoids presented here is mediated by the following processes: establishment of retina and lens domains at distinct regions of the organoid – retina on the surface and lens in the center (see Figure 3-figure supplement 1d and Figure 3e, and Figure 4). Further, the dislocation of the centrally formed lens towards the organoid periphery results in the opening of the retina layer, moving the lens to the periphery while retinal cells stay static. We propose that the cup-like shape is acquired by an extrusion process of the lens from the center of the organoid.

      To address cellular mechanisms involved in this process, we included additional experiments and followed the movements of retinal and lens cells (see new Figure 4c and 4e, new Videos S6, S7 and S8).

      Retinal cells (tracked as nuclei of the Rx3::H2B-GFP transgenic line) display repeated short distance movements within the retinal epithelium. These movements are characteristic for interkinetic nuclear migration as found in the developing retina.

      In contrast, Foxe3::GFP lens progenitor cells performed long distance movements from the center to the periphery of the organoid. This movement was accompanied by profound cell shape changes of lens progenitor cells, suggesting an active movement of lens cells to the organoid periphery.

      These movements are shown in new/extended figures and in new supplementary videos (new Figure 4c and 4e, new Videos S6, S7 and S8) in the revised version of the manuscript.

      The organoids could be a useful tool to address how cell fate is linked to cell shape acquisition. In the forming organoids, retinal tissue initially forms on the outside, while non-retinal tissue is located in the centre; this central tissue later expresses lens markers. Do the authors have any insights into why fate acquisition occurs in this pattern? Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      We agree with the reviewer that this is a highly interesting question and in the revised manuscript we followed the advice and dedicated a part of the discussion to this topic. We believe that the arrangement is due to the induction of central lens fates by signal emanating from the retinal epithelium and discuss the role of the diffusion limit and the potential contribution of BMB and FGF signaling to this arrangement. Additional experiments addressing the target tissues of FGF and BMP signaling in the organoid have been provided in response to Reviewer #1. Interfering with FGF signaling that is essential for lens fiber cell differentiation interestingly did not impact on the lens size arguing against an immediate proliferative effect. Although the analysis of the respective proliferation rates at the surface or in the central region of the organoid might show some differences, we do not have any indications, that the proliferation rate itself would be instructive or superior to the cell fate decisions.

      What happens in organoids that do not form lenses? Do these organoids still generate foxe3 positive cells that fail to develop into a proper lens structure? And in the absence of lens formation, does the retina still acquire a cup shape?

      Lens formation is primarily dependent on the acquisition/specification of Foxe3-expressing lens placode progenitors. In the absence of Foxe3-expression, a lens does not develop. Once Foxe3-expressing progenitors are established, a lens is formed in unperturbed conditions (measured by the presence of expression of crystallin proteins). Organoids that do not have a lens, do not contain Foxe3-expressing cells.

      In the absence of a lens, the organoid is composed of retinal neuroepithelium, that does not form an optic cup like shape (for details of such phenotypes please see Zilova et al., 2021, eLIFE). We took care to state that clearly in the revised manuscript.

      The author suggest that lens formation occurs even in the absence of Matrigel. Is the process slower in these conditions? Are the resulting organoids smaller? While there are indeed some LFC expressing cells by day2, these cells are not very well organised and the pattern of expression seems dotty. Moreover, LFC staining seems to localise posterior to the LFC negative, lens-like structure (e.g. Fig.S1 3o'clock). How do these organoids develop beyond day 4? Do they maintain their structural integrity at later stages?

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      We thank the reviewer for pointing this out. In the revised manuscript we made sure to be sufficiently clear in the wording and description of our observation. Indeed, Matrigel is not required for the acquisition of lens fate, which can be demonstrated by the expression of lensspecific markers. However, the presence of Matrigel has a profound impact on structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells to form a retinal epithelium (Zilova et al., 2021, eLife). The absence of the structure of the retinal epithelium indeed negatively impacts on the cellular organization and the overall lens structure.

      To clarify the contribution of the Matrigel to the organoid organization, we performed additional experiments (see revised Figure 2-figure supplement 1c-f). As mentioned above, the absence of Matrigel impacts on the organization and thickness of retinal neuroepithelium (Rx2<sup>+</sup>, Figure 2-figure supplement 1c). However, measurement of the lens in organoids at day 2 and day 5 showed that size of the lens is not impacted upon in the absence of Matrigel (Figure 3-figure supplement 1d-e). Additionally, taking advantage of the Foxe3::GFP lens reporter line, we measured the onset of lens-specific gene expression in organoids with and without Matrigel. In both conditions, with and without Matrigel supplementation, Foxe3::GFP expression was initiated at 25 hours post aggregation (see revised Figure 4b).

      The role of the HEPES in lens formation is indeed very intriguing and currently under investigation. HEPES is mainly used to regulate the pH of the culture media which on its own might have an impact on multiple cellular processes. It will require a significant time investment to address the potential HEPES triggered molecular mechanisms impacting on lens formation (cross reference with Reviewer #3), which goes beyond the scope of the current manuscript.

      Referees cross-commenting

      Pleased to see that all the other reviewers are positive about the study and raise similar concerns and comments

      Reviewer #2 (Significance):

      This is a very interesting paper, and it will be important to determine whether this alternative morphogenetic process is specific to medaka or if similar developmental routes can be recapitulated in organoid cultures from other vertebrate species.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The manuscript by Stahl and colleagues reports an approach to generate ocular organoids composed of retinal and lens structures, derived from Medaka blastula cells. The authors present a comprehensive characterisation of the timeline followed by lens and retinal progenitors, showing these have distinct origins, and that they recapitulate the expression of differentiation markers found in vivo. Despite this molecular recapitulation, morphogenesis is strikingly different, with lens progenitors arising at the centre of the organoid, and subsequently translocating to the outside.

      Comments:

      The manuscript presents a beautiful set of high quality images showing expression of lens differentiation markers over time in the organoids. The set of experiments is very robust, with high numbers of organoids analysed and reproducible data. The mechanism by which lens specification is promoted in these organoids is, however, poorly analysed, and the reader does not get a clear understanding of what is different in these experiments, as compared to previous attempts, to support lens differentiation. There is a mention to HEPES supplementation, but no further analysis is provided, and the fact that the process is independent of ECM contradicts, as the authors point out, previous reports. The manuscript would benefit from a more detailed analysis of the mechanisms that lead to lens differentiation in this setting.

      We followed the reviewer’s advice and have included a systematic analysis of the contribution of ECM (Matrigel) to the process of lens formation. In the revised manuscript we made sure to be sufficiently clear in the wording and description of our observation. Indeed, Matrigel is not required for the acquisition of lens fate, which can be demonstrated by the expression of lensspecific markers. However, the presence of Matrigel has a profound impact on structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells to form a retinal epithelium (Zilova et al., 2021, eLIFE). The absence of the structure of the retinal epithelium in turn indeed negatively impacts on the cellular organization and the overall lens structure.

      To clarify the contribution of the Matrigel to the organoid organization, we performed additional experiments (see revised Figure 2-figure supplement 1c-f). As mentioned above, the absence of Matrigel impacts on the organization and thickness of retinal neuroepithelium (Rx2<sup>+</sup>, Figure 2-figure supplement 1c). However, measurement of the lens in organoids at day 2 and day 5 showed that size of the lens is not impacted upon by the absence of Matrigel (Figure 3-figure supplement 1d-e).

      Additionally, taking advantage of the Foxe3::GFP lens reporter line, we measured the onset of lens-specific gene expression in organoids with and without Matrigel. In both conditions (with and without Matrigel supplementation), Foxe3::GFP expression was initiated at 25 hours post aggregation (see revised Figure 4b).

      The role of the HEPES in lens formation is indeed intriguing and currently under investigation. HEPES is mainly used to adjust the pH of the culture media, which, on its own might have an impact on multiple cellular processes. It will require a significant time investment to address the potential HEPES triggered molecular mechanisms impacting on lens formation (cross reference with Reviewer #3), which clearly goes beyond the scope of the current manuscript.

      The markers analysed to show onset of lens differentiation in the organoids seem to start being expressed, in vivo, when the lens placode starts invaginating. An analysis of earlier stages is not presented. This would be very informative, allowing to determine whether progenitors differentiate as placode and neuroepithelium first, to subsequently continue differentiating into lens and retina, respectively. Could early placodal and anterior neural plate markers be analysed in the organoids? This would provide a more complete sequence of lens vs retina differentiation in this model.

      We have taken care to show according stages in embryo and organoid side by side. We provide additional data to highlight the expression of Rx3::H2B-GFP (retina) and Foxe3::GFP (lens and lens placode) markers in earlier developmental stages. For the presumptive eye field within the region of the anterior neural plate (S16, late gastrula) Rx3 represents one of the earliest markers (see revised Figure 3-figure supplement 1). Already before an apparent lens placode is formed (see revised Figure 3d) Foxe3::GFP expression is detected within the presumptive lens ectoderm, demonstrating that Foxe3 is ideally suited as an early marker for placodal progenitors in medaka. The onset of Rx3 and Foxe3-driven reporters is clearly early enough to support the claim about the separate origin of the lens (placodal) and retinal (anterior neuroectoderm) tissues within the ocular organoids now represented in the revised figures.

      The analysis of BMP and Fgf requirement for lens formation and differentiation is suggestive, but the source of these signals is not resolved or mentioned in the manuscript. Are BMP4 and Fgf8 expressed by the organoids? Where are they coming from?

      Assessing the activity of BMP and FGF signaling (cross-reference to Reviewer #1) in the organoids is an important point that we have taken care of and included in the revised manuscript.

      To address this point, we assessed which tissue/part of the organoid is responding to BMP and FGF signaling. To do so we analyzed the presence of phosphorylated SMAD1/5/8 (pSMAD1/5/8) and phosphorylated ERK (pERK1/2) as BMP and FGF signaling target in ocular organoids from day 1 to day 2. BMP signaling activity was detected in the center (region of establishment of lens-committed progenitors (Figure 3e)) of the organoid at day 1 (see revised Figure 6a). At day 1, only low levels of FGF signaling activity were detectable in presumptive retinal or/and lens tissue (see revised Figure S6b). Only half a day later, a significant increase in FGF activity was observed specifically in the central region of the organoids (lens progenitor domain, at day 1.5), prior to the onset of differentiation of lens fiber cells. This, together with inability of lens progenitor cells to differentiate to lens fiber cells in the presence of FGF inhibitor SU5402 provided during this critical period (day 1 to day 2) demonstrates that FGF signaling activity localized in the lens progenitor cells is required for lens fiber differentiation.

      By day 2, FGF activity was detected in both lens and retinal tissue of the organoid. Similar patterns of FGF activity were observed in embryos at 2 days post fertilization (see revised Figure S6b).

      The treatment with the FGF signaling inhibitor SU5402 from day 0 to day 1 did have no impact on the core size of organoid the dimension of which were fully comparable to the control (please see Figure 6b).

      Related to the presence of the corresponding ligands we can state that they are indeed expressed in the organoids at the matching stages based on RNA seq and RT-PCR analyses, however we could not find them specifically localized. This may be due to a widespread, ubiquitous expression or may simply relate to technical problems.

      While we can state with confidence that the ligands are present at the relevant time points and trigger the downstream pathways in a localized manner, the question whether the response is due to a localized signal or localized competence remains to be addressed.

      The fact that the lens becomes specified in the centre of the organoid is striking, but it is for me difficult to visualise how it ends up being extruded from the organoid. Did the authors try to follow this process in movies? I understand that this may be technically challenging, but it would certainly help to understand the process that leads to the final organisation of retinal and lens tissues in the organoid. There is no discussion of why the morphogenetic mechanism is so different from the in vivo situation. The manuscript would benefit from explicitly discussing this.

      Following the shift of the lens in vivo is indeed very relevant suggestion and we have taken care to address this in the revised manuscript.

      To clarify this process, we included additional experiments and followed the movements of lens cells (see new Figures 4c, 4d and 4e, new Videos S6 and S7). Foxe3::GFP lens progenitor cells were found to actively move over long distances from center to the organoid periphery. This movement was accompanied by profound cell shape changes of lens progenitor cells with the active extension of lamellipodia and filopodia strongly arguing for an active movement of lens cells to the organoid periphery (cross-reference with Reviewer #1 and Reviewer #2).

      Referees cross-commenting

      We all seem to have similar comments and concerns. I think overall the suggestions are feasible and realistic for the timeframe provided.

      Reviewer #3 (Significance):

      This study describes a reproducible approach to differentiate ocular organoids composed of lens and retinal tissues. The characterisation of lens differentiation in this model is very detailed, and despite the morphogenetic differences, the molecular mechanisms show many similarities to the in vivo situation. The manuscript however does not highlight, in my opinion, why this model may be relevant. Clearly articulating this relevance, particularly in the discussion, will enhance the study and provide more clarity to the readers regarding the significance of the study for the field of organoid research, ocular research and regenerative studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their time and consideration of the manuscript. We have added new data to Figure 5 (Figure 5a) to address concerns regarding the conservation of the Hsp70 phosphorylation in yeast. Additionally, we have changed the title of the manuscript to “Hsp70 is phosphorylated in a conserved response to DNA damage and contributes to cell cycle control” to more accurately represent the conclusions we draw.

      Public Reviews:

      Reviewer #1 (Public review):

      The strength of evidence of the mechanistic and "conserved checkpoint" claims that this site is directly activated by DNA damage is inadequate and fundamentally incorrect.

      We respectfully disagree with the reviewer’s characterization of our conclusions. Our data demonstrate that DNA damage induces this phosphorylation in a cell-cycle–dependent manner. We do not claim to have defined the direct kinase or full mechanistic pathway; rather, we establish that site activation is damage-responsive and functionally linked to cell-cycle regulation. Consistent with this, phospho-mutants in yeast exhibit clear cell-cycle defects, supporting a conserved functional role. We address each of the reviewer’s specific concerns below.

      Specific comments:

      (1) Activation of T495:

      The author's premise for the site being activated by DNA damage is Albuquerque et al, where PTMs on MMS treated yeast are analyzed. T492 (the yeast equivalent of human T495) is observed as phosphorylated. However, the authors fail to note that there is no untreated sample analysis in this study, and it is likely that T492 phosphorylation is also present in untreated cells. This is also backed up by later evidence from the same lab (Smolka et al), where they do not identify T492 as being dependent on Mec1/Tel/Rad53 kinases.

      We agree with this assessment of the Albuquerque study. Accordingly, we used their data to generate the hypothesis that this site is phosphorylated, and we took it upon ourselves to more rigorously demonstrate phosphorylation with appropriate controls. The validated antibody that we had previously generated[1] to track pHsp70 was the enabling technology to directly track this phosphorylation event. We now directly show phosphorylation of this site (Figure 5a, lines 276-284). Of note, as Reviewer 1 suggested, there is a smaller amount of pHsp70 in the untreated cells, which corresponds with findings from Holt et al 2009 [2]. This could reflect a baseline role of Hsp70 phosphorylation for normal growth that is accentuated upon MMS insult.

      (2) The kinase(s) directly responsible for T495 phosphorylation are not identified. Instead, the authors show that knockdown or pharmacological inhibition of DNA-PKcs, ATM, Chk2, and CK1 attenuate pHsp70.

      We agree with reviewer 1 that identifying the direct kinase would be an exciting finding, and we believe our manuscript will provide the foundation for future studies to address these questions. While these findings will be impactful, we do not believe their lack detracts from the observations we have made.

      (3) ATM siRNA knockdown has no effect, while ATM inhibitors do, which the authors acknowledge but do not resolve. This discrepancy raises concerns about off-target drug effects.

      We agree with reviewer 1 that off-target drug effects are always a concern when employing pharmacological inhibitors. To that end, we tested structurally distinct inhibitors of ATM (Figure 3b) to decrease the likelihood of the same off target effect. While complementing this with a genetic knockdown would be ideal, the discrepancies between pharmacological and genetic inhibition of ATM have been well reported (lines 214-216).[3,4] Parallel discrepancies in other kinases have been mechanistically explored by other groups.[5] The preponderance of pharmacological evidence in conjunction with RNAi suggests the most likely interpretation of our data is that ATM is involved in signaling upstream of Hsp70 phosphorylation. Thus, our data compel future work to use more sophisticated genetic methods to more specifically determine how ATM connects with pHsc70.

      (4) No in vitro kinase assays, motif analysis, or phosphosite mapping confirming these kinases as direct T495 kinases are presented. Thus, the proposed signaling cascade remains speculative.

      We agree that we should carefully circumscribe our conclusions about the potential signaling cascade. To communicate our conclusions more clearly, we rewrote lines 223-226 to highlight that our findings implicate these kinases in upstream signaling rather than direct phosphorylation of Hsp70.

      (5) Smolka and many other labs characterized DDR sites as SQ/TQ motifs, and T492 doesn't fit that motif.

      We agree, and our response to comment 4 addresses this point. Briefly, we do not claim that Hsp70 is a direct target for DDR. Notably, the SQ/TQ motifs mentioned specifically pertain to ATM and DNA-PK[6], though we would like to note several studies have demonstrated DNA-PK phosphorylation outside of these motifs.[7] Chk2 and CK1 do not prefer SQ/TQ motifs[9]. Additionally, Chk2 is known to phosphorylate non-consensus sequences as well[10].

      (6) No genetic tests in yeast (e.g., BER mutants) are used to connect Ssa1 T492 phosphorylation to BER in that system, despite the strong BER-centric model.

      We agree that it would be interesting to study BER mutants in yeast, and we believe this will be an exciting prospect for future studies to better establish the signaling cascade. We have included a Western blot (Figure 5a) showing that MMS treatment causes increased Hsp70 phosphorylation in yeast. MMS damage is repaired through BER in S. cerevisiae,[11] and the pathway itself is highly conserved.[12] Our experiments demonstrate that the phosphorylation of Hsp70 occurs as a conserved response to alkylation damage, which is the major conclusion of our paper.

      (7) Overexpression of MPG gives only a modest increase in pHsp70, while APE1 overexpression has no effect, and Polβ overexpression does not decrease pHsp70. These mixed results weaken the central claim that Hsp70 phosphorylation is a tuned sensor of BER burden.

      We appreciate this incisive question. Though not immediately intuitive, we do not believe these results are necessarily ‘mixed’. The lack of APE1 over-expression having an effect could be attributed to APE1 activity being necessary for the phosphorylation, but not rate-limiting. Regarding Polβ, it is important to note that not its binding, but rather its dRP lyase activity is rate-limiting in base excision repair.[13] As such, if binding sites are already saturated or near saturated, but the lyase activity remains slow, we may not observe a decrease in BER intermediates. While we do claim that phosphorylation of Hsp70 is triggered by BER intermediates (lines 193-194), we do not claim that pHsp70 is a tuned sensor of BER burden.

      (8) A major concern is that pHsp70 is only convincingly detected after very high, prolonged MMS (10 mM, 5 h) or 0.5 mM arsenite treatments. Other DNA-damaging agents (bleomycin, camptothecin, hydroxyurea) that robustly activate DDR kinases do not induce pHsp70. This suggests to me that the authors are observing a side effect of proteotoxic stress. This is likely (see Paull et al, PMID: 34116476).

      Our data indicate that pHsp70 specifically occurs downstream of base excision repair. Therefore, it is not surprising that drugs that do not activate BER (bleomycin, camptothecin, hydroxyurea) do not elicit the same response. While pHsp70 may arise due to DSBs generated through BER, the fact we do not see phosphorylation after bleomycin treatment could be explained by the cell-cycle dependencies we report (Figure 4e). It is also important to note that MMS-induced pHsp70 occurs primarily in the nucleus, and Western blots of whole cell lysate will contain large amounts of cytosolic Hsp70 that could dilute the signal. Indeed, in our nuclear extraction (Figure 4d), we see faint pHsp70 signal as soon as 1 h after treatment, though it increases in robustness as the time-course progresses. These data are both concordant with a model in which high BER-induced lesion burden in mitosis leads to Hsp70 phosphorylation in late M/G1.

      We would like to add that, in the review article cited by Reviewer 1, the authors specifically cite studies implicating a loss-of-function in DDR pathways leading to increased proteotoxic stress (e.g. ATM deficient cells producing higher levels of aggregated proteins compared to WT). However, we find that inhibition of DDR kinases decreases, rather than increases Hsp70 phosphorylation. We thus believe that DNA damage rather than proteotoxic stress is the parsimonious cause of Hsp70 phosphorylation.

      (9) A recent study in Nature Communications (Omkar et al., 2025) demonstrates rapid phosphorylation of yeast T492 in a pkc1-dependent manner, diminishing the impact of these findings.

      We were excited to see this paper when it was published 3 months after we posted a preprint on bioRxiv, which was released three weeks after our submission to eLife. Rather than diminishing the impact of this paper, we believe that independent lines of evidence from different groups mutually reinforces the impact of the work. We have added a sentence to say that during the review of our work, this group independently observed this phosphorylation event in response to a different stress (lines 421-423). We believe in celebrating the scientific process arriving at consistent results, and the editorial policies of eLife reinforce that philosophy by offering ‘scoop protection.’

      We would also like to highlight several differences between the scope of our papers. The phosphorylation reported by Omkar et al. appears highly constrained to yeast as part of the Cell Wall Integrity pathway, whereas ours occurs as a more highly conserved response. Additionally, our paper provides additional biochemical insight into the consequences of this phosphorylation, which is lacking in Omkar et al. If anything, this paper highlights the important regulatory capacity of this residue on Hsp70, and suggests it may serve multiple functions in the cell.

      (2) Downstream Effects of T492/T495:

      (10) The manuscript's central conceptual advance is that pHsp70 is a cell-cycle-regulated brake on G1/S. Yet in mammalian cells, the authors show only that pHsp70 appears late, after cells have traversed mitosis, and that blocking CDK1 (G2/M) prevents its accumulation.

      We would like to clarify the central contribution of this study. Prior work identified this phosphorylation in yeast, but its existence and conservation in human cells had not been established. A primary advance of our study is demonstrating that this site is phosphorylated in mammalian cells and that its accumulation is cell-cycle regulated — coinciding with late M/G1.

      We further show that phosphorylation depends on cell-cycle progression, as CDK1 inhibition prevents its accumulation. While these data establish regulation, we agree that they do not by themselves define causality in mammalian cells. To address functional consequences, we leveraged the genetic tractability of S. cerevisiae. Phosphomimetic Ssa1 T492E increases the proportion of G1 cells in the absence of MMS and enforces a stronger G1 arrest following MMS treatment. Together, these findings support a conserved, cell-cycle–linked role for this phosphorylation and provide a foundation for future mechanistic work in mammalian systems.

      (11) There is no functional test in human cells: no knockdown/rescue experiments with T495A or T495E, no cell-cycle profiling upon altering Hsp70 phosphorylation state, and no demonstration that pHsp70 actually causes any delay in S-phase entry, rather than simply correlating with late damage responses. The strong conclusion that pT495 "stalls cell cycle progression" (e.g., Figure 6 model) is therefore not supported in the human system.

      We agree that we did not directly test the functional consequences of Hsp70 phosphorylation in human cells. Our intent was not to claim that we have demonstrated causality in the mammalian system, but rather to establish that this conserved phosphorylation exists in human cells and is cell-cycle regulated.

      We instead used S. cerevisiae to interrogate this due to its increased genetic tractability. In this system, phosphomimetic mutation increases the proportion of G1 cells under basal conditions and enhances G1 arrest following MMS treatment, mirroring the damage-associated phenotype observed in human cells. These findings support a conserved functional role for this modification, although we agree that direct mechanistic testing in mammalian cells will be important for future work.

      While we intended the cartoon model to be a speculative illustration of what may be occurring in order to motivate future studies. We now see how this may lead to confusion, so to improve clarity, we have removed Figure 6 from the manuscript.

      (12) All functional conclusions rely on T492A/E point mutants at the endogenous SSA1 locus, usually in an ssa2Δ background, in a family of highly redundant Hsp70s. Without showing that this site is actually modified during their MMS treatments, the assignment of phenotypes to loss of a physiological phospho-switch is premature. The authors need to repeat their studies in an Ssa1-4 background, as in https://pubmed.ncbi.nlm.nih.gov/32205407/.

      Thank you for this feedback. We have included a Western blot to Figure 5 (Figure 5a) addressing this comment. Briefly, we show that, in yeast, Hsp70 phosphorylation increases upon MMS treatment and is not detectable in the point-mutants in the ssa2∆ background. The latter data suggest that Ssa3-4 modification is negligible in our system.

      (13) The authors infer that T495E "locks" Hsc70 in a pseudo-open state based on reduced J-protein-stimulated ATPase activity, unchanged ATP binding, altered trypsin sensitivity, and retained tau binding. However, there is no direct comparison of phosphorylated vs T495E protein (e.g., via in vitro phosphorylation with LegK4 followed by side-by-side biochemical assays, or structural analysis). Thus, it remains unclear to what extent the glutamate substitution mimics a phosphate at this position.

      Previously we did show that phosphorylation impacts the ATPase cycle of Hsp70.[1] In this paper, with the phosphomimetic mutant we see an even greater decrease of activity. This is consistent with incomplete phosphorylation yielded by in vitro phosphorylation with LegK4.[1] Due to this incomplete phosphorylation in vitro, we determined that the phosphomimetic mutant would be more useful for the assays we performed, as they rely on bulk readouts.

      (14) No client release kinetics, co-chaperone binding assays, or in vivo chaperone function tests are provided, yet the discussion builds a detailed model of a "pseudo-open" state that simultaneously resembles ATP-bound conformation and allows persistent substrate engagement.

      We have shown that the conformational cycle of Hsp70 (T495E) is uncoupled from nucleotide state, and that the overall conformation resembles ATP-bound Hsp70. This is consistent with prior studies on AMPylation of the same residue.[14] Additionally, we demonstrate that substrate engagement is similar between WT and T495E. This is consistent with our previously published work showing increased pHsp70 on polysomes,[1] as well as our observations that the phosphomimetic mutant in yeast exerts a phenotype even in the presence of the compensatory isoform SSA2. This dominant-like phenotype is consistent with those seen in mutations locking Hsp70 in a ‘closed’ conformation.[15] We agree that future studies examining client release kinetics and co-chaperone binding would be useful for future structural studies validating and elaborating on our findings.

      Reviewer #2 (Public review):

      Weaknesses:

      The kinase(s) responsible for the phosphorylation have not been identified (and hence remain inaccessible to experimental i.e., genetic or pharmacological manipulation). The mechanistic links to DNA damage repair and the fitness benefits of this proposed adaptation remain obscure. Of greater concern, the data provided in the paper fail to exclude the trivial possibility that the phosphorylation event described (and characterized through biochemical proxies) is biologically neutral, reflecting nothing more than a bystander event in which kinase(s) activated by application of high concentrations of a powerful alkylating agent (MMS) phosphorylate, at meaninglessly low stoichiometry, an abundant protein (Hsp70) on a surface exposed residue. Failure to exclude this (plausible) scenario is this paper's weakness.

      We agree that we have not directly quantified the absolute stoichiometry of Hsp70 phosphorylation. However, several lines of evidence argue against the interpretation that this represents a biologically neutral, bystander modification.

      First, our pulse-chase experiment (Figure 4e) shows that, after MMS removal, pHsp70 levels increase as cells progress through the cell cycle. Notably, total Hsp70 levels remain constant. This indicates that the fraction of phosphorylated Hsp70 increases in a regulated, cell-cycle dependent manner, rather than through a bystander event during acute stress.

      Second, functional perturbation of the homologous site in yeast produces phenotypic consequences. The phosphomimetic Ssa1(T492E) mutant exhibits reduced growth, increased G1 accumulation, and impaired cell-cycle re-entry following MMS treatment (Figure 5). These phenotypes argue that the modification of this residue is functionally consequential.

      While the upstream kinase remains to be identified, the genetic and cell-cycle phenotypes observed upon site perturbation argue that this phosphorylation is functionally consequential.

      Reviewer #2 (Recommendations for the authors):

      (1) The biochemical characterization of the phosphomimetic mutation (T495E) is thorough, relying on ATPase assays and conformational analysis. Figure 1b demonstrates reduced J-protein-stimulated ATPase activity, and Figure 1d shows an ATP-like proteolysis pattern consistent with an open conformation. As the authors are well aware, Hsp70 chaperones act on their substrates via a dynamic cycle that includes binding, ATP hydrolysis, and conformational shifts. One wonders, therefore, at the relevance of the measurement shown in Figure 1f. While it is highly plausible that the T495E mutation mimics the phosphorylation event (BiP T518E mimics key aspects of AMPylation), the lack of a biochemical characterisation of Hsp70 with pThr495 is an important limitation of this paper. Even if such a preparation cannot be accomplished with the endogenous kinase(s) whose identity remains unknown, a characterisation of LegK4-phosphorylated Hsp70 should suffice.

      We agree with Reviewer 2 that the rationale for figure 1f does not logically follow the results of 1b and 1d. Rather, this experiment was motivated by the prior findings that phosphorylation of Hsp70 by L.p. lead to an increase occupancy on polysomes[1] (lines 137-139). We sought to better understand the discrepancy between this finding and our own by assaying the capacity of the T495E mutant to bind substrate.

      Reviewer 2 raises a valid point in that phosphomimetic proteins do not necessarily behave the same as truly phosphorylated proteins. Previous work from our lab characterized the ATPase activity and in vitro folding capacity of Hsc70 that had been directly phosphorylated by LegK4[1] (lines 114-115). We were motivated to turn to a phosphomimetic mutant as LegK4 only phosphorylates around half of the Hsc70 present in solution[1] (line 116); this mixture of species makes batch analysis difficult. As we had previously published with the in vitro phosphorylated Hsc70, we didn’t believe it necessary to include along with our future analyses.

      (2) As noted, the kinase(s) that phosphorylate T495 remain to be identified and is inaccessible genetically. The phenotypic consequences of impaired pThr495 are therefore assessed by a T495A mutation. This most certainly eliminates phosphorylation at that site however, Figure 5C shows quite clearly that the T/A mutation is not neutral. This is expected, given the role of an H-bond network centered upon the homologous residue in the ADP-bound configuration of Hsp70's. Importantly, the biochemical non-neutrality of the T/A mutation also compromises the interpretation of the associated phenotype, as this cannot be attributed solely to a loss of phosphorylation; it may reflect features of the T/A mutations exposed by MMS, but unrelated to the inability of the residue to undergo regulated phosphorylation.

      We appreciate this insightful critique. We agree that the alanine substitution may perturb the local H-bond network, and have added a sentence to our discussion to highlight this caveat (lines 379-381). That being said, our conclusions do not solely rely on the T to A mutant. The phenotypes observed in our phosphomimetic mutant overlap with the TA mutant (increased sensitivity to MMS; defects in cell cycle re-entry after MMS treatment) (Figure 5). While the alanine mutation may not represent a purely ‘loss-of-phosphorylation’ state, our findings do implicate the importance of this residue in cell cycle control after DNA damage.

      (3) It thus remains formally possible that pThr495 arises as an irrelevant side reaction due to activation of a kinase (with other relevant substrates).

      This dismal interpretation of the data would be dispelled somewhat if the stoichiometry of pThr495 were substantial, whereas very low stoichiometry of phosphorylation should leave one wary of the possibility that the surface-exposed Thr495 of ATP-bound Hsc70 is a physiologically irrelevant bystander target of a kinase activated in DNA-damaged cells.

      We have included a Western blot in Figure 5 showing pHsp70 in our yeast samples. Here we can see low abundance of Hsp70 phosphorylation in untreated WT yeast, with a clear increase in MMS treated yeast. Additionally, as mentioned in a previous response, Figure 4e shows the accumulation of pHsp70 in human cells even after MMS removal, indicating it is not simply the byproduct of over-activation of the DNA damage response.

      Unfortunately, the study does not quantify the stoichiometry of Hsp70 phosphorylation; detection relies on phospho-specific immunoblotting, leaving open the question of whether this modification occurs at physiologically significant levels. This worry is compounded by Figure 2a,f that suggests that phosphorylation occurs only under high-dose MMS or arsenite, raising concerns about physiological relevance.

      We agree that we did not quantify absolute phosphorylation stoichiometry. While a precise measurement would be informative, our conclusions are based on regulated dynamics and functional perturbations rather than magnitude alone. Specifically, our pulse-chase (Figure 4e) shows that total Hsp70 levels remain constant while pHsp70 increases in a cell-cycle dependent manner following MMS removal. This indicates a regulated modification rather than a side-effect of kinase over-activation during acute stress. Additionally, perturbation of the homologous site produces cell-cycle phenotypes (Figure 5) in yeast, supporting functional relevance.

      However, as mentioned in responses to Comment 3, our pulse-chase assay in Figure 4e indicates the stoichiometry of pHsp70 increases after MMS removal in a cell-cycle dependent manner. Furthermore, as discussed in response to Reviewer 1 Comment 8, Figure 4d highlights a technical limitation with regards to detection of pHsp70 by Western blotting. Namely, as pHsp70 accumulates in the nucleus, signal appears to be diluted by unmodified Hsp70 in the cytosol when whole-cell lysate is probed, thereby reducing detection capacity. It is therefore possible that less stringent doses do lead to phosphorylation, but due to the experiments being run in asynchronous cells and on whole cell lysate we failed to detect it.

      Reviewer #3 (Recommendations for the authors):

      Major Comments:

      (1) Figure 1e - Which antibody was used to probe this blot?

      Thank you for catching this omission. This was stained with Coomassie. We have edited the figure legend to reflect this.

      (2) Figure 1c- Do the authors have the data of the WT and T495E with DJA2?

      The assay was performed with increasing concentrations of DJA2 for both constructs (from 0 µM to 4 µM) (lines 118-119, Figure 1c).

      (3) Figure 2- The labeling of the right side of the immunoblots is missing.

      We apologize for the confusion. The labeling is on the left. The lines on the right are intended to demarcate blots that came from the same membrane (for easier comparison of loading controls).

      (4) Figure 2d- Does MMS treatment lead to a heat shock response?

      We have not directly tested this. However, we do not see the massive upregulation of HSPs that would be expected from a heat shock response.

      (5) Figure 4c and e - Total protein level of some of the phospho-proteins is missing.

      We used housekeeping proteins as loading control. We do not have antibodies for all the non-phospho proteins. For those we have, blots not included in the publication do not show any marked discrepancies between the non-phospho form and the housekeeping proteins.

      (6) Figure S1A- Although the authors suggest that the phosphorylation event is reversible, they have not integrated it into the final model in Figure 6.

      In line 403 we postulate that dephosphorylation may permit client release. In the interest of clarity, we have now removed the model figure.

      (7) Yeast genotype is missing.

      We used W303a yeast (line 612).

      (8) It is unclear which phosphatase inhibitor was used in their assay (Figure S1A).

      We repeated the experiment with both Halt Phosphatase Inhibitor Cocktail (Thermo Scientific 78440) and Roche PhosStop (Roche 04906837001) (lines 524-525).

      (9) Please add this most recent and up-to-date reference (PMID: 40976416) related to your study.

      We have now added that reference

      (10) Can the authors speculate on whether Hsp70- T495E is expected to primarily reside in the nucleus?

      We have no data to indicate whether or not phosphorylation at T495 or a phosphomimetic mutation in this site would directly affect nuclear import or export. In cells expressing the Legionella kinase LegK4, pHsp70 exists in the cytoplasm,[1] indicating the phosphorylation in of itself does not force nuclear localization. We thus imagine that the nuclear localization seen in Figure 4d is more likely due to the location of the kinase rather than as a consequence of the phosphorylation. In an over-expression system or in the case of a genomic mutation, we believe the protein is most likely to exist in both the cytoplasm and in the nucleus, though we did not directly test this.

      References

      (1) Moss, S. M. et al. A Legionella pneumophila Kinase Phosphorylates the Hsp70 Chaperone Family to Inhibit Eukaryotic Protein Synthesis. Cell Host Microbe 25, 454-462.e6 (2019).

      (2) Holt, L. J. et al. Global Analysis of Cdk1 Substrate Phosphorylation Sites Provides Insights into Evolution. Science 325, 1682–1686 (2009).

      (3) Choi, S., Gamper, A. M., White, J. S. & Bakkenist, C. J. Inhibition of ATM kinase activity does not phenocopy ATM protein disruption. Cell Cycle 9, 4052–4057 (2010).

      (4) Menolfi, D. & Zha, S. ATM, ATR and DNA-PKcs kinases—the lessons from the mouse models: inhibition ≠ deletion. Cell Biosci. 10, 8 (2020).

      (5) Weiss, W. A., Taylor, S. S. & Shokat, K. M. Recognizing and exploiting differences between RNAi and small-molecule inhibitors. Nat. Chem. Biol. 3, 739–744 (2007).

      (6) Kim, S.-T., Lim, D.-S., Canman, C. E. & Kastan, M. B. Substrate Specificities and Identification of Putative Substrates of ATM Kinase Family Members*. J. Biol. Chem. 274, 37538–37543 (1999).

      (7) Jette, N. & Lees-Miller, S. P. The DNA-dependent protein kinase: A multifunctional protein kinase with roles in DNA double strand break repair and mitosis. Prog. Biophys. Mol. Biol. 117, 194–205 (2015).

      (8) O’Neill, T. et al. Determination of Substrate Motifs for Human Chk1 and hCds1/Chk2 by the Oriented Peptide Library Approach*. J. Biol. Chem. 277, 16102–16115 (2002).

      (9) Fulcher, L. J. & Sapkota, G. P. Functions and regulation of the serine/threonine protein kinase CK1 family: moving beyond promiscuity. Biochem. J. 477, 4603–4621 (2020).

      (10) Craig, A. et al. Allosteric effects mediate CHK2 phosphorylation of the p53 transactivation domain. EMBO Rep. 4, 787–792 (2003).

      (11) Xiao, W., Chow, B. L. & Rathgeber, L. The repair of DNA methylation damage in Saccharomyces cerevisiae. Curr. Genet. 30, 461–468 (1996).

      (12) Memisoglu, A. & Samson, L. Base excision repair in yeast and mammals. Mutat. Res.Fundam. Mol. Mech. Mutagen. 451, 39–51 (2000).

      (13) Srivastava, D. K. et al. Mammalian Abasic Site Base Excision Repair IDENTIFICATION OF THE REACTION SEQUENCE AND RATE-DETERMINING STEPS*. J. Biol. Chem. 273, 21203–21209 (1998).

      (14) Preissler, S., Rato, C., Perera, L. A., Saudek, V. & Ron, D. FICD acts bifunctionally to AMPylate and de-AMPylate the endoplasmic reticulum chaperone BiP. Nat. Struct. Mol. Biol. 24, 23–29 (2017).

      (15) Fontaine, S. N. et al. Isoform-selective Genetic Inhibition of Constitutive Cytosolic Hsp70 Activity Promotes Client Tau Degradation Using an Altered Co-chaperone Complement*. J. Biol. Chem. 290, 13115–13127 (2015).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the extent to which phase-amplitude coupling (PAC) of respiratory and electrophysiological brain activity recordings was related to episodes of life-threatening apnoea in human newborns.

      Strengths:

      I want to commend the authors for acquiring unique and illuminating data; the difficulty in recording and handling these data has to be appreciated. As far as I can tell, Zandvoort and colleagues are the first to provide robust evidence for respiration-brain coupling in newborns. Their creative use of the phase-slope index for peripheral-central interactions is innovative and credible. If proven to be robust, the authors' findings have important implications well beyond the field of brain-body research.

      Weaknesses:

      While the analyses were overall competently conducted and well-justified, I was not entirely convinced by a few methodological choices, specifically i) the computation of PAC surrogates, ii) details of the linear mixed-effects model, and iii) the electrode selection for linking phase-amplitude coupling to apnoea frequency.

      Thank you for your kind comments and helpful review of our paper. We have now clarified computation of PAC surrogates, added further details of the linear-mixed effects models and calculated the link between the strength of the cortico-respiratory coupling (phase-amplitude coupling) and apnoea rate with data acquired at all electrodes. We provide further details for each of these in response to your ‘Recommendations for the authors’.

      Reviewer #2 (Public review):

      Summary:

      The author's central hypothesis was that the strength of cortico-respiratory coupling in infants is negatively associated with apnoea rate. To prove this, they first investigated the existence of cortico-respiratory coupling in premature and term-born infants, the spatial localisation of the cortical activity and its relationship with the phase of the respiratory cycle, and the directionality of coupling. 

      Strengths:

      The researchers used synchronised EEG and impedance pneumography to detect the phase amplitude coupling.

      They have studied a wide range of gestations, from 28 weeks to 42 weeks, including males and females. Their exclusion criteria ensured that healthy babies were studied and potential confounders of impaired respiratory activity were avoided. Their sequential approach in addressing the objectives was appropriate.

      Weaknesses:

      As a neonatal clinician and neuroscientist, I have commented based on my expertise. I have not commented on signal processing.

      I did not identify any major weaknesses in the study. Some minor weaknesses include:

      (1) Data relating to the cortical oscillations and the respiratory phase is given. However, whether this would lead to their hypothesis that the strength of cortico-respiratory coupling is negatively associated with apnoea rate is unclear. What preceding data enabled the authors to link the strength of coupling to the rate of apnoea?

      (2) If we did not know of data showing the existence of cortico-respiratory coupling in newborn infants, then should it not be the first research question to examine?

      (3) What are the characteristics of the infants who contributed data to establish the cortico-respiratory coupling (Figures 2 and 3)?

      (4) Although it is the most plausible direction of the relationship, with neural activation driving respiratory muscle contraction, how can the authors prove this with their data? Given that they show coherence between signals, how do we know that the cortical signal precedes the respiratory muscle contraction?

      (5) Apgar score is an ordinal variable. The authors should summarise this as median (range).

      Thank you for your useful comments. We have revised the manuscript to address these comments and improve the clarity.

      (1) We agree that proceeding data leading to the hypothesis that the strength of cortico-respiratory coupling is negatively associated with apnoea rate is limited. We have clarified in the introduction that adult studies have previously suggested that cortical motor activity may prevent hypoventilation and apnoea seen in patient groups. We have also added further clarification to our hypothesis. In the introduction we now state:

      “In adults with congenital central hypoventilation syndrome or obstructive sleep apnoea, a respiratory-linked increase in cortical motor activity suggests that the motor cortex plays an important role in maintaining autonomous respiration, with the authors postulating that cortico-respiratory drive whilst participants are awake may prevent the hypoventilation/apnoea observed in these patients whilst they are asleep.”

      And later:

      “We hypothesised that cortico-respiratory coupling occurs in newborns and that the strength of cortico-respiratory coupling is negatively associated with apnoea rate (in line with the suggestions made from studies of adults with congenital central hypoventilation syndrome[6] and obstructive sleep apnoea[7]).”

      (2) We agree that this was the first research question we examined. We have clarified this in the introduction, now re-writing the hypothesis and aims to state “We hypothesised that cortico-respiratory coupling occurs in newborns and that the strength of cortico-respiratory coupling is negatively associated with apnoea rate (…). To this end, we first examined whether cortico-respiratory coupling exists in both premature and term infants.”

      (3) Figures 2 and 3 used the full dataset. We have clarified this in the Figure captions by stating: “For all panels, data included is from 68 infants (28-42 weeks postmenstrual age [PMA] at time of recording) on 104 recording occasions. See Table 1 for further clinical and demographic characteristics.”

      (4) We used a cross-frequency version of the phase-slope index to quantify the directionality and strength of information flow between cortical and breathing time series (Figure 3C,D). The phase-slope index investigates phase lags and how these change over narrow frequency ranges by examining the slope of the phase spectrum of their complex coherency. This indicates whether one signal leads or trails another signal (and thus indicating directionality). However, we agree (and as was also noted by Reviewer 3) that this analysis does not ‘prove’ directionality as other factors may influence the analysis. We have added the following to the text to address this point:

      “However, caution is needed in the interpretation of these results as signal processing techniques such as the phase-slope index estimate directionality but do not confirm causality. Rather, they show a statistical relationship which can be influenced by a multitude of factors (e.g., signal-to-noise ratio and preprocessing steps). Nevertheless, the results suggest that cortical activity may precede respiration in newborns. Future work is needed to confirm this association by, for example, employing other metrics to estimate directionality, such as the time-lagged cross-correlation and Granger causality and through direct mechanistic studies.”

      (5) We have revised Table 1 so that Apgar scores are provided as median and interquartile range.

      Reviewer #3 (Public review):

      Summary:

      This is a strong and important report that presents a framework for understanding cortical contributions to neonatal respiration. Overall, the authors successfully achieved their goal of linking cortical activity to respiratory drive. Despite the correlational nature of this study, it is a crucial step in establishing a foundation for future work to elucidate the interaction between cortical activity and breathing.

      Strengths:

      (1) The introduction and use of workflows that establish correlational relationships between breathing and brain activity.

      (2) The execution of these workflows in human neonates.

      Weaknesses:

      Interpretations related to causal inference, confounds of sleep and caffeine, and the spatial interpretation of EEG data need to be addressed to ensure that the data appropriately support the conclusions.

      Thank you for your useful comments. We have now substantially revised the manuscript in relation to causal inference and our interpretations of the data, in particular adding further detail to the discussion with regards to the limitations of our approach and revising wording that has causal implications. We provide more detail in response to your ‘Recommendations for the authors’.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I want to elaborate on the three points of methodological criticism, and my apologies in case I have some misconceptions:

      (1) It seems like the surrogate distribution to determine PAC significance was computed by shuffling EEG segments and recomputing PAC each time. Surrogate computations are a difficult topic when handling signals as regular as respiration time series. However, random shuffling of data segments is almost always an overly liberal approach (except for trial-based data) since it destroys the temporal autocorrelation of the underlying signal. As the resting-state data in the present study were per sé continuous (and just segmented for analytical purposes), I am not convinced that random shuffling provides an adequate control. Could the authors either a) provide convincing evidence that the temporal autocorrelation of verum and surrogate time series did not differ from one another, or b) conduct additional control analyses based on an alternative approach, e.g., by constructing surrogate respiration phase vectors and recomputing PAC accordingly? We have had good experiences with the IAAFT approach (outlined in Kluger et al., Nat Comms 2023), but others certainly exist.

      Thank you for this important comment on the construction of surrogates. We agree that it is essential for any surrogate approach that it destroys the cross-signal coupling whilst preserving the signals’ internal structure (e.g., autocorrelation, spectral profile, and non-stationarities) as much as possible. We apologies for not describing this clearer in the initial manuscript, but we want to clarify that in the surrogate analysis, we did not shuffle time points/segments within EEG trials itself. Instead, we permuted the trial order so that respiration trial T1 was paired with an EEG trial other than T1. This leaves the 4-sec segments used in the PAC analysis unaltered. This surrogate technique preserves the important internal properties of each signal (within-trial autocorrelation, auto-spectra and power distribution of the signals) while destroying the cross-signal alignment across trials, and thus the trial-wise phase locking (e.g., coherence) between respiration and EEG. We have clarified this in the manuscript as follows:

      “The surrogate analysis was performed by randomly permuting the trial (4-s segment) order of the EEG amplitude while leaving the respiration trial order unchanged (i.e., respiration segment S1 was paired with an EEG segment Sj, j ≠ 1). Importantly, no temporal samples were shuffled within segments. Thus, the full within-segment temporal structure, including autocorrelation and spectral profile (auto-spectra), was preserved for both signals. This permutation destroys trial-wise cross-signal phase alignment (and therefore coherence) while retaining the intrinsic dynamics of each signal.”

      (2) The LMEM approach is very sound, but it seems like ID was the only random effect included in the model. Could the authors clarify whether multiple sessions from individual neonates were considered or whether each ID was only represented once? In case of the former, one possibility would be to include 'session' as an additional random effect; otherwise, the group statistic could be biased. Many thanks in advance for providing insight on this.

      Thank you for this important point. Of the 68 infants included in the study, 49 only had a single session. The remaining 19 infants had between 2 – 5 sessions included. Given that most infants only had a single session it is not possible to identify random effects of session reliably and so we have not included session as a random factor. Moreover, postmenstrual age [PMA] (which is related to session order within a subject and is likely a more reliable indicator of variance given that sessions were not at fixed intervals) is already included as a factor in the analysis. Indeed, session ID is not a distinct source of clustering and will be indistinguishable from subject and PMA variance.

      In relation to this question, we carefully checked the analysis and realised that we had included infant with a random effect of both slope and intercept. Given that most infants have only one session the random effect of slope cannot be estimated and so we have now removed this from the analysis leading to very minor changes in the results (and no changes in the interpretation). We have clarified in the manuscript that “Infant ID was included as a random effect acting on the intercept.”

      (3) It is not entirely clear to me why the authors selected the two electrodes with the strongest overall PAC for the analysis of apnoea frequency. Why not consider all electrodes individually? What is the worry/hypothesis regarding electrodes with low PAC - would one not expect simply to find no relationship with apnoea frequency, and would that information not be instructive? Again, I want to thank the authors in advance for their take on this comment.

      We initially included only the two electrodes with the strongest coupling as we would not expect a relationship with apnoea rate at those electrodes without significant coupling (as you say). For completeness, we have now included the relationships with all electrodes individually in Supplementary Figure S4. As expected, the relationship between apnoea rate and coupling (coherence) was not significant for the electrodes without strong coupling.

      Reviewer #3 (Recommendations for the authors):

      Major Comments:

      (1) Causal Language and Overinterpretation are evident throughout the manuscript. The manuscript repeatedly uses language suggesting causality (e.g., "cortical motor activity reduces apnoea"), despite the correlational nature of the findings.

      It is recommended that the authors reframe their claims in the abstract and discussion to clarify that the observed associations do not establish causal influence. For example, Abstract: "...revealing novel mechanistic insight....". This correlational observation does not reveal a mechanism but rather supports the concept of mechanistic interactions.

      Thank you for this important point. We have now rephrased the manuscript throughout, particularly in the abstract and results/discussion. We have also added the following sentences to the discussion to address the point on causation:

      “Nevertheless, it is important to recognise that a limitation of this analysis is that correlation does not imply causation, and future mechanistic studies are required to determine whether and how cortico-respiratory coupling plays a role in reducing apnoea in infants.”

      And later:

      “The limitations of our study need to be considered, and in particular, directionality of the cortico-respiratory coupling, improved spatial localisation, and a direct mechanistic link between cortico-respiratory coupling and apnoea rate, should be investigated in future work.”

      (2) Potential Confounding by Sleep State and Caffeine. Sleep state is a significant determinant of apnoea occurrence and EEG frequency composition, yet no objective sleep-state classification is incorporated. Similarly, caffeine, administered in ~50% of recordings, is a potent respiratory stimulant. A reanalysis of the data, incorporating sleep proxies (e.g., EEG spectral ratios, delta/theta dominance) and caffeine exposure as covariates or stratification factors in the PAC-apnoea models, should be performed.

      Sleep state: A limitation of our work is that we did not record sleep state and unfortunately, we do not have anyone trained to annotate sleep states from EEG recordings in our research group. We have added the following to the discussion to address this:

      “It is known that most apnoeas in infants occur during active sleep[6][30] and delta- and theta-band frequencies in EEG are strongly related to sleep state[31]. A limitation of our study is that we did not record the sleep state of the infant.”

      Caffeine: We agree that caffeine is a respiratory stimulant and, hence, it is important to consider this effect. Moreover, those infants prescribed caffeine are those who are at greatest risk of apnoea and so it is of interest to determine whether the relationship between PAC and apnoea rate occurs in those infants receiving caffeine treatment. We conducted a stratified analysis to address this point, now providing an additional Supplementary Figure.

      (3) Directionality Inference from Phase-Slope Index. While PSI suggests a lead-lag relationship, it does not confirm causality and may be influenced by signal-to-noise or preprocessing steps. Validation PSI findings using additional metrics (e.g., time-lagged cross-correlation or Granger causality) or, at a minimum, temper interpretations of cortical "driving" respiration.

      We agree that the PSI (and other metrics such as Granger causality) may be influenced by a range of factors. We have therefore changed the wording throughout and also added the following:

      “However, caution is needed in the interpretation of these results as signal processing techniques such as the phase-slope index estimate directionality but do not confirm causality. Rather, they show a statistical relationship which can be influenced by a multitude of factors (e.g., signal-to-noise ratio and preprocessing steps). Nevertheless, the results suggest that cortical activity may precede respiration in newborns. Future work is needed to confirm this association by, for example, employing other metrics to estimate directionality, such as the time-lagged cross-correlation and Granger causality and through direct mechanistic studies.”

      (4) Limited EEG Spatial Resolution. The attribution of CRC to "cortical motor areas" is overstated, given the use of only 8 EEG electrodes, which provides limited spatial coverage. Avoid overly precise interpretations regarding cortical localization unless source localization or higher-density EEG data are available.

      We have added the following to specifically address this limitation.

      “It is important to note that the number of electrodes in our montage is limited (with only 8 recording electrodes), and so source localisation was not possible; higher-density recordings are warranted to confirm whether the motor cortex plays a role.”

      We have also changed the wording in the summary paragraph and abstract to add this limitation and reworded throughout the manuscript to highlight the limitations of our study.

      Minor Comments

      (1) Consider color-coding individual points in Figure 4A by PMA or caffeine status to visually disambiguate potential age-related or pharmacological effects.

      We agree that this provides additional visual information and have colour-coded the points in Supplementary Figure S6 according to caffeine status.

      (2) Clearly define PAC versus CRC. These are used interchangeably. Readers may benefit from a more consistent and precise usage, especially in the abstract.

      Thank you for noticing this. We have revised the terms where necessary throughout, and the abstract and introduction to read:

      “Using simultaneous electroencephalography (EEG) and impedance pneumography we investigated interactions between cortical and respiratory activity (known as cortico-respiratory coupling) using phase-amplitude coupling.”

      “Recently, it was proposed that communication between the cortex and lungs, known as cortico-respiratory coupling, can be identified and quantified through phase-amplitude coupling. This functional coupling arises when the amplitude of cortical activity is modulated by the respiration phase, or vice versa. Phase-amplitude coupling is typically quantified using non-invasive recordings capturing respiratory and neural activity (e.g., magneto- or electroencephalography [EEG]).”

      (3) Clarify the overlap with previously published datasets (line 358). Are any EEG-apnoea associations re-analyses of data published in Zandvoort et al., 2024?

      We have amended this sentence to explain that the previous study did not investigate respiration/apnoea. We now state:

      “Parts of this dataset have been reported earlier in Zandvoort et al. [33] to address a different research question (this study investigated the development of sensory-evoked potentials, which were also recorded in these infants; it did not explore respiration).”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Joint Public Review:

      Summary:

      The authors investigate how stochastic and deterministic factors are integrated in cell fate decisions, using Dictyostelium discoideum as a model system. They show that cells in different cell cycle phases (a deterministic factor) are predisposed to different fates, albeit with deviations, when exposed to the same environmental stimulus. However, gene expression variability (a stochastic factor) enhances the robustness of cellular responses to environmental cues that disrupt the cell cycle.

      Using a simple, tractable mathematical model, the authors demonstrate that cell fate decisions in D. discoideum depend on a combination of deterministic and stochastic factors, i.e., cell cycle phase and gene expression variability, respectively. They then identify Set1 - a key regulator of gene expression variability - indicate the mechanism through which it modulates this variability, and link it to a phenotype in D. discoideum development. Finally, they confirm that gene expression variability contributes to the robustness of the cell's response to environmental disruptions that interfere with the cell cycle.

      Strengths:

      The authors are careful in the choice of their experiments and in measuring gene expression variability, using methods that account for expected trends with average gene expression.

      Weaknesses:

      However, in terms of mathematical modelling, it would be important to rule out sources of stochasticity (other than gene expression variability), and also to consider cases where stochastic factors are not necessarily completely independent of the deterministic ones.

      We thank you and the reviewers for the insightful comments that have helped clarify the findings presented. We have addressed all comments and feel that the revised manuscript is much improved.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Minor typographical mistakes:

      (a) in the title: Linage -> lineage

      Corrected as suggested

      (b) on page 19: use a full stop in "...are biased towards the stalk fate, Use of the cell cycle position..."

      Corrected as suggested

      (c) on page 20: become -> becoming in "...(and end up biased towards become stalk)..."

      Corrected as suggested

      (d) on page 16: "mu = G p k". Perhaps it should be x instead of k?

      Corrected as suggested

      (2) Regarding the abstract:

      (a) This work tries to outline general principles (coordination/integration of deterministic and stochastic factors) in cell fate choice, especially when cells are faced with (near) identical environmental conditions. Perhaps the abstract, especially the first line, could be rephrased to reflect the generality of symmetry breaking and differentiation that is studied in this article/work. e.g., as was done in the first paragraph of the discussion.

      Corrected as suggested

      (b) It might be worthwhile clarifying what "this" is in the sentence "We suggest this represents an adaptive mechanism that increases developmental robustness against perturbations that affect deterministic signals." in the abstract.

      Corrected as suggested

      (3) Regarding the model:

      (a) The model tries to combine the stochastic and deterministic parts to explain the propensity for stalk fates. It is assumed that the cell cycle-associated factors (CCAF) provide the deterministic part while the cell cycle-independent factors (CCIF) provide the stochastic part. The net result is an addition of the two, which is then compared against a threshold to decide the propensity for stalk fates. However, another simple way to introduce stochasticity would be to make the CCAF decay stochastic. Reasons to consider this scenario would be: (i) the decay process (especially in the biological context) is generally stochastic, (ii) it would not be inconsistent with the fact that cell cycle dependent genes are also variable, and (iii) this way of introducing stochasticity would also provide expression level characteristics/plots similar to the ones outlined in Figure 1C, i.e. with a probability distribution of CCAF values for a given amount of time after mitosis. Would there be arguments or experimental evidence to rule this possibility out? For instance, would the results shown in Figure 7 contradict this model?

      We agree that there could be stochasticity the CCAF decay process. In this scenario, the expected value of CCAF (which would reflect the mean of a noisy distribution) would show a deterministic pattern of decay through time, representing the average value of CCAF across cells that are in the same phase of the cell-cycle. The noisiness around such a pattern of deterministic decay in the mean value of CCAF (i.e., the residual variation) would then represent CCIF since it would be, by definition, cell-cycle independent. Hence, the present model is fully consistent with this possibility since it would still lead to some variation being cell-cycle associated and some variation being cell-cycle independent. Therefore, this scenario could be viewed as a different functional/biological process leading to the same ultimate distribution we model. To clarify this, we have added text justifying the hypothesis that the noisy distribution is due to gene expression differences, rather than decay itself:

      “Protein levels can vary widely between cells because it is regulated at multiple levels, including transcription, translation and stability. The position of the noisiest step in a pathway affects the overall noise dramatically, because each step usually amplifies noise in the previous steps (Alon 2007). Consistent with this idea, theory and single-cell experiments have shown that a major contributor to cell-cell variation is the bursty expression of low-copy mRNAs. We therefore hypothesized that this noisiness across cells arises from stochastic expression of a set of genes contributing to CCIF levels.”

      (b) On page 7, the formula for total CCIF variance assumes independence of the genes g_i. Is this a reasonable assumption?

      This concerns the argument that a set of stochastically expressed genes will yield an approximately Gaussian distribution of CCIF. Our results do not depend on the solution for the mean and the variance, only that noisy genes will generally yield such a Gaussian distribution.This is because independence is not strictly required for the central limit theorem to yield a Gaussian distribution. The distribution will still be Gaussian under a broad range of conditions (especially since gene expression is bounded, so there is no chance of the total ending up generating an infinite variance). The primary requirement is that the expression of any given gene is independent from that of most other genes. As a result, most of the variation in expression across genes is independent (even if any given gene is not independent from all other genes).

      The most likely pattern of non-independence will be the case in which gene expression is ‘modular’, where there are co-expressed blocks, meaning that non-independence is limited in scale so that genes within a co-regulated block show correlated expression, but their expression is uncorrelated to genes in other blocks. This pattern is functionally analogous to what is known as m-dependence in sequences of random variables (e.g., time series), where variables close together in sequence are correlated (but otherwise uncorrelated). Derivations of the central limit theorem have shown that the means (and hence the sum) of these sorts of variables still follow an approximately Gaussian distribution over a broad range of scenarios. In the case of non-independent gene expression, this means that we can view the independent random variable as being the expression value of a group of co-expressed genes (instead of individual genes). Hence, the means (or sums) of these values will still conform to the central limit theorem.

      This problem is addressed in:

      Diananda, P. H. 1955. The central limit theorem for m-dependent variables. Proc. Combin. Philos. Soc. 51:92-95

      Hoeffding, W. & H. Robbins. 1948. The central limit theorem for dependent random variables. Duke Math. J. 15:773-780

      Orey, S. A. 1958. Central limit theorems for m-dependent random variables. Duke Math. J. 25:543-546

      Rosén, B. 1967. On the central limit theorem for sums of dependent random variables, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 7:48-82

      To clarify this, we have added the following text and references:

      Although this derivation implicitly assumes that stochastically expressed genes are independent, this assumption is not strictly required for the distribution of CCIF to be approximately normal. If stochastically expressed genes show clustered co-expression owing to shared regulation, then the sum across these co-expressed blocks is still expected to be approximately normally distributed (as long as there are a reasonably large number of co-expressed clusters) (Diananda 1955; Hoeffding and Robbins 1994; Rosén 1967).

      (4) In section "Cell cycle independent stochastic gene expression variation is extensive in growing cells":

      Regarding the statement: "We first determined the coefficient of variation (CV2) of expression for all genes. As expected, this tends to decrease as average expression level increases (Supplementary Figure 2).":

      It would be good to specify how the "expected variation" was calculated exactly. For instance, it was hard to discern from Supplementary Figure 2 how CV^2 decreasing with average expression levels was used in the calculation of expected variation.

      This is described in the methods on page 38

      “A trend line was fitted to the data using non-linear least squares regression (Scran v1.15.9). Genes were defined as variable (2073 genes) based on a one-sided test assuming a normal distribution around the trend but one where deviation changed depending on the mean expression of a given gene (Scran v1.15.9 - modelGeneCV2) with a FDR of < 0.05.”

      (5) In section "Stochastically expressed genes are associated with cell fate determination"

      (a) For readers unfamiliar with the organism ‘Dictyostelium discoideum’, a short description of its life cycle with growth and development/differentiation phases would be useful to provide the right context.

      Corrected as suggested

      (b) In section "Cell cycle independent stochastic gene expression variation is extensive in growing cells", it was shown that cell cycle dependent genes are also highly variable (in other words, ‘stochastic’). It would, therefore, be useful to elaborate on the definitions of "stochastically expressed genes, cell cycle-associated genes, and non-variable genes", as used in this section. Admittedly, the distinction does get clearer towards the last section of Results, but some elaboration here would make the reading smoother.

      Corrected as suggested

      (c) If the "cell cycle associated genes" are the same as "cell cycle dependent genes", it would be good to use one term consistently.

      Corrected as suggested

      (d) The developmental index is divided into 10 bins from 0 to 1. Is there a rationale for the choice of a number of bins? Would this choice affect significance tests for "stochastic" vs others? <br /> (The same question may apply to the "Cell type index")

      Significance is robust to the number of bins chosen (e.g. 5-25). Of course, if there are too many bins (low number of genes) or too few bins (addition of noisy data) significance falls. In the case of developmental index, our choice of bins is also based on previous analyses (de Oliveira, et al 2019), which developed the index we used, and showed that a threshold of >0.9 can be used to identify ‘developmentally expressed genes’.

      (6) In Figure 5:

      (a) Does the statement "*** binomial test, p<0.01." (as seen in caption for part C) actually refer to part D?

      Corrected as suggested

      (b) Could the authors please specify what "mis-expressed" means in Figure 5D? Are these genes that are upregulated, downregulated, or both? From what set of genes was the random sampling done?

      Corrected as suggested

      (c) In Figure 5F, is the decrease in CV^2 explained entirely by the increase in mean (as shown in Figure 5E)?

      We appreciate the point made by the reviewer and recognise that disentangling changes in gene expression variation from changes in expression levels is extremely difficult (any changes in burst frequency will necessarily affect expression level). However, we do not think this affects our conclusions, which are supported by results with representative Set1 dependent reporter genes (Figure 5G and H) which suggest that the number of cells expressing (rather than the expression in each cell is affected) in these cases at least.

      (7) In Figure 6A: Could the authors please elaborate on the difference between the rows labelled "WT" and "set1-"? Are they two different types of chimera?

      Corrected as suggested

      (8) In Section "Cell cycle position and gene expression variation interact to control cell type proportioning":

      Is there a graph corresponding to the statement "However, the level of GFP expression in each responding cell did not significantly change."?

      Corrected as suggested

      (9) In section "Influence of stochastic variation on sensitivity to cell cycle perturbations" of the Supplementary text:

      (a) The model for cell cycle bias is not entirely clear. For instance, is the quantity N(t) = U(t) + Q_t U(t) also a probability distribution, like U(t) is? If so, there must be a normalization factor. It was difficult to understand the procedure behind this calculation. Perhaps some more elaboration (with words or a small schematic) on this model/method would help.

      The value of U(t) was originally being used to denote the uniform probability density function (for the uniform distribution), but for clarity this has been changed to follow the convention that U[a,b] denotes the uniform distribution over the interval from a to b (which, in this case would be U[0, 1]), while f(t) is now being used to make it clear that this is the probability density, where f(t) = 1 across the interval. Because the uniform distribution necessarily integrates to 1 over the defined range, it does not need to be normalised. The confusion here is perhaps due to the expression f(t) = 1 being interpreted as defining the probability of sampling a value of t (but in a continuous distribution we can only define the probabilities of sampling over an interval), instead of defining the probability density over the interval from a to b, where f(x) would be 1/(b – a), and hence over the interval of 0 to 1, f(x) would equal 1.

      To help clarify this issue, this section has been rewritten and a new figure (which appears as Supplementary Figure 12) has been added that illustrates the resulting probability density functions for biased sampling from the cell cycle.

      (b) References to Figure 8A, B seem to be indicating Supplementary Figure 12 instead. 

      Corrected as suggested

      Reviewer #2 (Recommendations for the authors):

      This manuscript seems quite interesting, but many sections are so unclear that I cannot follow what has been done. I would suggest slowly going through the manuscript and carefully explaining things. This will probably considerably increase the size of the manuscript, but many sections are too terse to follow even after many, many readings of the Results and figure legend.

      Corrected as suggested

      Some specific comments (this is not at all comprehensive, but rather illustrative)

      Page 2 - 'genes strongly associated with fate choice' - can you explain this a bit more - genes associated with one cell type or another, or genes that somehow regulate the choice?

      Corrected as suggested

      Page 2 - this abstract is quite vague, I would suggest being more specific to reflect what is in the manuscript.

      Corrected as suggested

      Page 3 - 'exhibit bivalent H3K4me3..' please explain 'bivalent' a bit more.

      Corrected as suggested

      Page 7 - 'Bernoulli process with probability that (meaning that is scaled to the size of the temporal interval)' (non-copying symbols deleted) could be simplified.

      Corrected as suggested

      Page 7 - please define all variables/ equation components. What is N? What is x bar? What is s2? The middle paragraph is very difficult to follow.

      This paragraph has been rewritten and a definition of the distribution added for clarity.

      Page 7 - 'genes might logically vary in the value of pi, such variability does not impact our results. Trying to decipher this paragraph, it seems that pi is a function of time, so this could affect the results.

      pi is the probability that a stochastically expressed gene is actually expressed in whatever interval is being considered for all genes. pi will necessarily increase if the time interval considered is increased. The key point is we are considering the probability that any given gene is expressed in the same time interval. In this case, genes could vary in pi, and thus some burst more often and others less often.

      Page 9 - '(it is 98.35 times more likely' there may be too many significant figures here.

      Corrected as suggested

      Page 10 - for the Area Under the Receiver Operating Characteristic Curve (AUROC), what are you classifying? AUROC is typically used for diagnostic tests to determine how well the test can discriminate between two completely different outcomes. What is the input, and what are the outcomes?

      Corrected as suggested

      Figures:

      What are the dashed lines in Figure S2A?

      Corrected as suggested

      What are the X-axes in Figure S3?

      Corrected as suggested

      I do not understand what you are showing in Figure S3.

      Corrected as suggested in results

      In Figure 2B, I cannot find in the text or figure legend any description or explanation of 'Group 1', 'Group 2', or 'Group 3'.

      Corrected as suggested

      Figure 3D needs a lot more explanation; I cannot understand this based on the text and the figure legend.

      Corrected as suggested

      The Set1 work should discuss the work in PMID: 39242621

      Corrected as suggested

      Figure 8 D needs a size bar

      Corrected as suggested

    1. Author response:

      The following is the authors’ response to the original reviews

      Public review:

      Reviewer #1 (Public review):

      Summary:

      Badarnee and colleagues analyse fMRI data collected during an associative threatlearning task. They find evidence for parallel processes mediated by the mediodorsal, LGn, and pulvinar nuclei of the thalamus. The evidence for these conclusions is promising, but limited by a lack of clarity regarding the preprocessing and statistical methods.

      Strengths:

      The approach is inventive and novel, providing information about thalamocortical interactions that are scant in the current literature.

      Weaknesses:

      (1) There are not sufficient details present to allow for the direct interrogation of the methods used in the study.

      We thank the reviewer for this comment. We have added more detailed information about the methods to clarify our procedure. In addition to the original description of our threat learning paradigm in humans, we included the following to page 39-40:

      “Experimental procedure

      Threat learning: Please see the original description in the manuscript.

      Shock level: The shock intensity used in the fear learning paradigm was determined during a preexperiment calibration. Electrodes were attached to the participant’s right hand, and stimulation began at a low level (0.1 mA), gradually increasing in small increments. After each increment, participants verbally rated their discomfort. The procedure continued until the participant identified a level they described as “highly annoying but not painful.” This individualized intensity was then used for that participant throughout the experiment. For safety and ethical reasons, the maximum intensity was capped at 20 mA, and no participant received a shock above this limit.

      Instructions to the participants: Each visual stimulus in our paradigm was first shown to participants for 6 seconds. This initial presentation served as habituation, allowing us to isolate the responses to genuinely new stimuli. Before the experiment began, participants were informed that they would see pictures illuminated with different colored lights, such as red or blue. During the experiment, some pictures might be paired with an electric shock, while others might not. Participants were instructed to pay attention to whether a specific color or pattern was associated with the shock. These instructions were adopted from previous studies in which our group developed this paradigm and found them highly effective for human learning. We therefore used the same approach in the current experiment. These instructions were provided throughout all phases of threat learning, and participants were informed that any shocks delivered would be at the same intensity determined on Day 1.”

      (2) The figures do not contain sufficiently granular details, making it challenging to determine whether the observed effects were robust to individual differences.

      We thank the reviewer for this suggestion. We agree that visualizations exposing the full data distribution can be highly informative, and we therefore present distribution-based plots for several analyses (e.g., connectivity results in Figure 7). However, for the activation analyses, our primary goal was to highlight trial-to-trial changes and overall patterns across thalamic nuclei, rather than the distribution of individual data points per se. For this purpose, bar plots with standard errors provide a clearer representation of the directional effects and facilitate comparison across trials and conditions.

      Reviewer #2 (Public review):

      Summary:

      The authors quantify human fMRI BOLD responses in pulvinar and mediodorsal thalamic nuclei during a fear conditioning and extinction task across two days, in a large sample size (hundreds of participants). They show that the BOLD responses in these areas differentiate the conditioned (CS+) and safety (CS-) stimuli. Additionally, this changes with repeated trials, which could be a neural correlate of fear learning. They show that the anterior pulvinar is most correlated with the MD, and that this is not due to anatomical proximity. They perform graph analysis on the pulvinar subnuclei, which suggests that the medial pulvinar is a hub between the sensory (lateral/inferior) and associative (anterior) pulvinar. They show different patterns of thalamic activity across conditioning, extinction, recall, and renewal.

      Strengths:

      The data has a large sample size (n=293 in some measures, n=412 in others). This is a validated human fear conditioning/extinction task that Dr Milad's group has been working with for several years. Few labs have investigated the thalamus activity during fear conditioning and extinction, particularly with a large sample size. There is an independent replication of the pulvinar network structure (Figure 3), which suggests that the processing in the more sensory-related inferior and lateral pulvinar is relayed to the anterior pulvinar (and possibly thereby to more action-related prefrontal areas) via an intermediate step in the medial pulvinar - potentially a novel discovery, but that needs more validation.

      Weaknesses:

      (1) The authors cannot make causal claims about their results based on correlational neuroimaging evidence. Causal claims should be pared back. E.g., sentence 1 in the Results section: "The anterior pulvinar and MD contribute to early associative threat learning, as evidenced by increased functional activation in response to CS+ compared to CS- at the block level (Fig. 1b-c)." needs to be reworded to something like "The anterior pulvinar and MD have increased functional activation... This suggests that these areas may contribute to early associate threat learning."

      We acknowledge the limitations of fMRI studies and agree with the reviewer that causal claims cannot be made based on correlational neuroimaging evidence. Accordingly, we revised the text to reduce causal interpretations. Specifically, we reworded the sentence identified by the reviewer in the Results section and systematically updated language throughout the manuscript.

      Page 9: “At the block level, both the anterior pulvinar and MD showed increased activation to CS+ vs. CS− (anterior pulvinar: t<sub>(292)</sub> = 4.41, p = 0.00001, d = 0.25; MD: t<sub>(292)</sub> = 6.41, p = 5.83x10<sup>-10</sup>, d = 0.37; Fig. 1b–c), suggesting a possible involvement of these regions in early associative threat learning.”

      Throughout the manuscript, we replaced terms such as “reflects” with “likely reflects” and “indicating” with “consistent with,” and introduced explicitly correlational phrasing where appropriate (e.g., “apparently,” “closely align,” and “seems to”). All revisions are highlighted in green in the revised manuscript.

      (2) Figure 1: The fact that the difference in BOLD activity between CS+ and CS- goes away on the third trial is not addressed. This is a very large effect in the data.

      We thank the reviewer for highlighting this important pattern in Trial 3. The CS+ vs. CS− contrast in the third trial in the mediodorsal thalamus remained statistically significant after FDR correction and was correctly reported in the Supplementary Tables. However, we acknowledge that the statistical marker was inadvertently omitted from Figure 1. We have now corrected the figure to include the appropriate significance annotation.

      In addition, we now explicitly describe the attenuation of the CS+ vs. CS− difference by the third trial in the mediodorsal thalamus but not in the pulvinar (page 32):

      “This suggested rapid initial acquisition of the predictive value of the CS+ is thought to be pronounced during the first two trials. The attenuated CS+ vs. CS− differentiation on the third trial specifically in the pulvinar may reflect a decreased requirement for differential thalamic engagement once the initial association has been acquired, or an initial survival fear reaction is expressed. Notably, because the MD sustained the BOLD response to the CS+ in the third trial which may indicate involvement of this nucleus in the consolidation or stabilization of the learned association. This aligns with the wellestablished MD-PFC circuit involved in cognitive processes (Wolff and Halassa, 2024). Additionally, in a previous study using a similar paradigm, we observed sustained CS+ vs. CS− differentiation on the third trial in the nucleus reuniens, as well (Tuna et al., 2025). These findings suggest that trialdependent learning dynamics may vary across thalamic nuclei rather than reflecting a uniform thalamic learning signal. Together, while our paradigm does not inherently distinguish between different stages of learning, such as early acquisition and stabilization, our findings are consistent with stronger associative learning–related engagement during the first two trials, with a reduced differential response by the third trial that may reflect the involvement of different neural processes”.

      (3) Figure 3: Could the observed network structure be due to anatomical proximity? Perhaps the authors should do an analogous analysis to what they did in Figure 2 for this intra-pulvinar analysis. This analysis doesn't take into account the indirect connections through corticothalamic and thalamocortical connections with the visual cortex and the pulvinar. There is an implicit assumption that there are interconnections between the pulvinar subnuclei, but there are few strong excitatory projections between these subnuclei to my knowledge. If visual areas are included in the graph, it would make things more complex, but would probably dramatically change the story. In this way, the message is somewhat constructed or arbitrary.

      We thank the reviewer for this insightful comment. We agree that the network analysis in Figure 3 does not provide a direct anatomical account of pulvinar connectivity and cannot distinguish between direct inter-nuclear interactions and indirect coupling mediated via corticothalamic and thalamocortical pathways, including visual cortex.

      Our intention with this analysis was to characterize functional statistical dependencies among pulvinar divisions during conditioning, rather than to infer monosynaptic anatomical connectivity. Accordingly, the observed network structure should not be interpreted as evidence for direct excitatory projections between pulvinar subnuclei.

      We agree that including visual cortical regions in the network would substantially increase model complexity and could alter the inferred network structure. However, doing so would require a trial-wise, multiregional modeling framework that goes beyond the scope of the present intra-pulvinar analysis.

      In response to this comment, we have now explicitly clarified the assumptions, interpretational limits, and alternative explanations of the network model in the Discussion (page 33):

      “Yet, these intrapulvinar relationships should be understood as a functional and computational model, reflecting statistical dependencies among pulvinar divisions during threat learning, rather than as evidence of direct monosynaptic anatomical connections. Because detailed inter-nuclear anatomical connectivity within the pulvinar remains incompletely characterized, our analysis does not presuppose strong direct excitatory projections between subnuclei. Instead, our findings are intended to highlight candidate functional relationships within the pulvinar during conditioning with different level of data processing, rather than to provide a definitive anatomical map.”

      We also included the following in the Limitations and Future Directions section (page 36):

      “The observed relationships among pulvinar divisions during conditioning are purely functional and do not distinguish direct inter-nuclear interactions from indirect coupling mediated by corticothalamic and thalamocortical pathways, including visual cortical regions. Thus, the pulvinar model may reflect indirect cortical loops, weak or currently undocumented inter-nuclear interactions, or a combination of both.”

      Finally, we added this note to the legend of Fig. 3:

      “Note: The functional relationships among pulvinar divisions during threat learning should be interpreted as computational dependencies derived from statistical associations. These effects may reflect indirect interactions mediated by corticothalamic and thalamocortical pathways (e.g., via visual cortex), rather than direct inter-nuclear connectivity. Elucidating the underlying anatomical mechanisms will require future studies.”

      (3) In the results section describing Figures 4-7, there are no statistics supporting the claims made. There needs to be a set of graphs comparing the results across the study sessions and days, with statistical comparisons between the different experiments to confirm differences.

      We thank the reviewer for this suggestion. In this study, each phase (conditioning, extinction, recall, and renewal) was analyzed separately to characterize thalamic function within that specific phase. Our primary conclusions focus on differences between CS+ and CS− within each phase, rather than comparisons across phases or sessions. Direct statistical comparisons across phases were therefore not performed, as they fall outside the scope of our main hypotheses.

      We have clarified this in the revised manuscript to make the rationale for our analytic approach explicit. Added to page 8:

      “The purpose of this study is to investigate thalamic function during each learning phase separately, focusing on CS+ vs. CS− differences within phases rather than comparing activation across phases. This phase-specific approach allows us to characterize thalamic functional dynamics within each stage of learning and memory, avoiding potential confounds arising from the distinct processes of conditioning, extinction, and recall.”

      (4) Figure 7 does not include the major corticothalamic and thalamocortical projections from early, mid-level, and higher visual cortex to the different pulvinar nuclei. I doubt that there are strong direct projections between the pulvinar nuclei; rather, the functional connections are probably mediated through interconnections with cortical visual areas.

      We thank the reviewer for this point. Reciprocal connections between the visual cortex and the pulvinar are established, but the precise projections to specific pulvinar divisions remain unknown. We have added a note to the Figure 8a caption to clarify this (Figure 7a in the original version).

      “Note (panel a): Known pulvinar–cortical connections, as well as sensory input pathways (e.g., visual inputs via the retina/LGN and nociceptive inputs via the spinothalamic tract), are not explicitly shown. These connections are well established anatomically but were omitted due to their heterogeneity and incomplete characterization at the level of pulvinar subnuclei. Their absence should not be interpreted as a lack of anatomical or functional relevance.”

      (5) Stylistic: There are a lot of hypotheses and interpretations presented in this primary literature paper, which may be better suited for a review or perspective piece.

      We thank the reviewer for this comment. We aimed to integrate our empirical findings within a broader conceptual framework to provide a complementary narrative, rather than presenting isolated observations without connecting them to theoretical context. This approach is intended to strengthen the interpretive value of the study while remaining grounded in primary data.

      (6) In the discussion, there is an assumption that the fMRI BOLD responses to CS+ and CS- need to be different to indicate that an area is processing these distinctly, but the BOLD signal can only detect large-scale changes in overall activity. It's easy to imagine that an area could be involved in processing these two stimuli distinctly without showing an overall difference in the gross amount of activity.

      We thank the reviewer for raising this important point. We fully agree that the fMRI BOLD signal reflects large-scale changes in population activity and may fail to capture more subtle or distributed neural representations. Accordingly, the absence of a CS+ vs. CS− BOLD difference should not be interpreted as evidence that a region is not involved in discriminating these stimuli. Rather, our inferences are limited to differences in aggregate activation at the spatial and temporal resolution of fMRI.

      To partially address this limitation, we analyzed anatomically defined thalamic subregions; however, we acknowledge that finer-scale subdivisions and cell-type– specific processing likely exist that are not currently resolvable in human fMRI. Such distinctions may be better investigated using invasive recordings or circuit-level approaches in rodents or non-human primates. This limitation has now been explicitly acknowledged in the Limitations section of the manuscript (page 36):

      “Pulvinar divisions, MD, and LGN each contain diverse neuron subtypes and finer anatomical subdivisions that may serve distinct functions. Importantly, the absence of CS+ vs. CS− differences in BOLD activity should not be interpreted as a lack of stimulus-specific processing, as such distinctions may occur without changes in overall activation detectable by fMRI…”

      (7) There is strong evidence that the BOLD responses to the threat-related and safetyrelated stimuli are different, modest evidence for their claims of learning/plasticity in these pathways, and circumstantial evidence supporting their hypothesized graph network models. Overall, most of the claims made in the discussion are better considered possible interpretations rather than proven findings - this is not a criticism, as these experiments and subject matter are extremely complex.

      We thank the reviewer for this constructive suggestion. In response, we have revised the discussion to present our interpretations as possible or plausible explanations, rather than definitive conclusions, to better reflect the strength of the current evidence. The changes are marked in green throughout the Discussion section.

      This study continues to validate the power and utility of this in human fear conditioning/extinction paradigm, and extends this paradigm to investigating fear learning beyond the traditional limbic system pathways. It's possible that their models for the pulvinar nuclei interconnections could guide future neuromodulation or DBS studies that could provide more causal evidence for their hypotheses.

      Reviewer #3 (Public review):

      Summary:

      The present work was aimed at investigating the specific contributions of thalamic nuclei to associative threat learning and extinction. Using fMRI, they examined activation patterns across pulvinar divisions, the lateral geniculate nucleus (LGN), and the mediodorsal thalamus (MD) during threat acquisition, extinction, and recall. Their goal was to uncover whether distinct thalamic systems support different modes of learningautomatic survival mechanisms versus more deliberate processes - and to propose a hierarchical pulvinar model of fear conditioning. They also try to refine current neuroanatomical models of threat learning and memory, highlighting the role of thalamic nuclei in it.

      Strengths:

      (1) Valuable theoretical elaboration and modeling regarding the differential role of pulvinar subdivisions on feedforward (inferior, lateral) and higher-order integration (anterior), and their functional interplay with other relevant subcortical and cortical structures in associative threat and extinction learning.

      (2) Large sample sizes and multipronged analytical approaches were used for hypothesis testing.

      (3) Exhaustive literature review in the field of associative threat, as well as regarding the role of thalamic nuclei and other brain structures in it.

      Weaknesses:

      (1) Several weaknesses should be pointed out regarding how fMRI data were collected, as well as decisions regarding how the fMRI data were preprocessed and analyzed:

      (a) fMRI data have low resolution (3 cubic mm), which certainly limits the examination of small nuclei such as the ones investigated here, and especially the examination of the LGN and inferior pulvinar.

      We thank the reviewer for raising this point. While the spatial resolution of fMRI (3 mm isotropic) does limit voxel-wise examination of very small nuclei, our analyses were not performed at the single-voxel level. Instead, signals were extracted using anatomically defined masks for each thalamic nucleus, which is a standard and widely used approach for studying small subcortical structures with fMRI. This strategy increases signal-to-noise ratio and mitigates partial-volume effects by aggregating activity across voxels belonging to the same anatomical region.

      (b) fMRI was normalized to standard space. Analyzing the data in individual-subject space would have given you the options of avoiding altering every participant's brain and of using a probabilistic thalamic atlas that better adapts to each subject's brain and thalamic nuclei (see, for instance, Iglesias et al., 2018). This would have been ideal and would have given the authors more precision, especially considering the low resolution of the fMRI data and the size of the thalamic nuclei of interest.

      We thank the reviewer for pointing out the availability of specialized thalamic atlases. In our study we used the Automated Anatomical Labelling Atlas 3 (AAL3 atlas), which includes thalamic subdivisions (including pulvinar and other nuclei) among its 150+ whole-brain regions and is widely used for ROI extraction in normalized fMRI analyses. This choice allowed us to define consistent ROIs across the entire brain such as the amygdala and hippocampus within the same parcellation framework and to extract functional signals at the resolution of our preprocessed fMRI data.

      While histology-informed probabilistic atlases offer finer microanatomical segmentation of the thalamus, they are implemented primarily for structural segmentation pipelines (e.g., FreeSurfer) and do not change the fact that AAL3’s thalamic subdivisions are established and anatomically reasonable ROIs for functional studies at standard fMRI resolutions. AAL3 thus provides a practical and valid choice for our whole-brain activation and connectivity analyses.

      (c) On top of the two previous points, the authors decided to smooth the data to 6mm, which means that every single voxel within these small nuclei was blurred/mixed with the 2 immediately contiguous voxels (if they followed the standard SPM12 normalization resampling default which resamples, or upsamples the data in this case, to 2 x 2 x 2mm). Given the strong changes in structural connectivity and function that can occur, especially in the thalamus, on voxels of this size, this and the previous 2 decisions do not favor anatomical precision.

      We thank the reviewer for raising this concern regarding anatomical precision. The data were resampled to 2 × 2 × 2 mm resolution in SPM12, and a 6 mm FWHM Gaussian smoothing kernel was applied. Gaussian smoothing does not uniformly mix immediately adjacent voxels; rather, it applies distance-weighted averaging with a standard deviation of approximately 2.55 mm (FWHM = 2.355σ). At 2 mm resolution, this corresponds to ~1.3 voxels, meaning that signal contribution decreases smoothly with spatial distance rather than reflecting simple voxel averaging. Moreover, all statistical analyses were conducted at the ROI level using anatomically defined masks, rather than voxel-wise inference within nuclei.

      To empirically assess whether smoothing may have introduced boundary-driven spillover effects, we divided the mediodorsal (MD) thalamus into medial and lateral divisions and examined the CS effect separately in each. The CS effect did not differ between subdivisions (MD subdivision X CS interaction: F<sub>(1, 292)</sub> = 0.50, p = 0.48).

      Additionally, across trials, the CS+ vs. CS− effect was observed in both subdivisions and showed comparable magnitudes (see Author response image 1). The effect sizes were also comparable across MD divisions as presented in Author response table 1).

      Author response image 1.

      Mean activation in MD subdivisions during threat learning

      Author response table 1.

      Point estimates and 95% confidence intervals of effect sizes (Cohen’s d) for CS+ vs. CS− contrasts in MD, MDm, and MDl During Early Threat Learning

      If smoothing had artificially driven the MD effect via boundary spillover, one would expect consistent asymmetry or substantially larger effects in one subdivision relative to the other. Instead, the CS effect was distributed across both medial and lateral MD, supporting the interpretation that the observed activation reflects intrinsic MD signal rather than smoothing-related contamination.

      (d) Motion during scanning was poorly controlled in the preprocessing. Including the motion parameters as covariates of no interest in the GLM does not fully guarantee that motion is not influencing the results, and that motion is not differentially influencing some experimental conditions more than others.

      Our analyses are within-subject, so each participant serves as their own control, minimizing the impact of motion differences across conditions. Functional data were preprocessed with fMRIPrep 20.0.2, which estimates motion parameters. The motion estimations are included in the GLM to account for residual motion-related variance in SPM12. The connectivity analyses were conducted in CONN, which also includes these motion parameters as regressors and applies additional denoising steps to further reduce motion-related effects. Together, these procedures make it highly unlikely that motion systematically influenced the observed condition differences.

      (2) It is not clearly indicated in the manuscript how many subjects and how many trials went into each of the analyses. It would be important to indicate this in the text and/or the figures.

      We thank the reviewer for this important comment. We have now explicitly reported the number of participants and trials contributing to each analysis throughout the manuscript, including the main text, figure captions, and supplementary materials.

      Specifically, under Materials and Methods (page 38), we now clarify the sample sizes for each learning phase:

      “We analyzed fMRI data from 293 participants during fear conditioning, 320 during extinction, 412 during extinction recall, and 312 during threat renewal.”

      In addition, all figure captions now report the corresponding sample sizes and trial numbers. For example, the caption to Figure 1 (pages 7–8) states:

      “…Block-level comparisons were assessed using paired t-tests, while trial-level effects were examined using a 2 × 2 repeated-measures ANOVA, followed by post hoc comparisons between CS+ and CS− across four trials. Multiple comparisons were controlled using false discovery rate (FDR) correction. Conditioning sample size: n = 293. Detailed statistical parameters are provided in Supplementary Tables 1–2.”

      (3) It is not clear either, why, given the large sample size, some of the results were not conducted using reproducibility strategies such as dividing the sample into 2 or 3 groups or using further cross-validation strategies.

      Cross-validation strategies were applied to the mediation analyses, which are regressionbased and can be sensitive to extreme values or overfitting, ensuring that observed effects generalize beyond the sample. In contrast, the repeated-measures ANOVA tests within-subject condition differences, and is inherently robust to between-subject variability. For these inferential tests, cross-validation or sample-splitting is not typically applied.

      However, following the reviewer’s recommendation, we conducted a cross-validation analysis focusing on the anterior pulvinar and the mediodorsal thalamus, the primary regions of interest in this study. The full sample (N = 293) was randomly divided into three subsamples (n<sub>1</sub> = 106, n<sub>2</sub> = 91, n<sub>3</sub> = 96). For each iteration, we conducted a repeatedmeasures ANOVA (RM-ANOVA) within one subsample and then examined the stability of the CS+ vs. CS− difference in the remaining two subsamples combined. The CS+ vs. CS− difference was statistically significant in most folds for both the mediodorsal thalamus and the anterior pulvinar. Importantly, effect sizes were comparable across folds within each nucleus, indicating stable estimates of the CS effect.

      Finally, we observed a comparable pattern of CS+ vs. CS− differences at the trial level in both the mediodorsal thalamus and the anterior pulvinar. Critically, the effect sizes of these differences were stable across most cross-validation folds

      (4) Limited testing of alternative hypotheses. The results clearly seem to be a selection of the findings supporting the hypotheses that the authors sought to confirm. (just one example: in the analysis reported in Figures 1-2; are there other correlations between the activation of the anterior pulvinar and MD with other pulvinar nuclei? only the MDanterior Puv is reported).

      We thank the reviewer for raising this important point. We would like to clarify that the analyses were not limited to a single, selectively reported association. The relationship between the MD and the anterior pulvinar was evaluated while explicitly accounting for other pulvinar subdivisions, as well as for thalamic input outside the pulvinar.

      Specifically, potential contributions from other pulvinar nuclei were controlled by including them in the regression model (Fig. 2 in the manuscript), and the LGN was included as an additional control region. These analyses therefore test whether the MD–anterior pulvinar association is specific, rather than reflecting a more general thalamic or pulvinar-wide effect. With respect to hypothesis testing, the study was explicitly hypothesis-driven, grounded in functional evidence motivating a specific prediction about MD–anterior pulvinar interactions.

      Still, in response to the reviewer’s suggestion, we further examined pairwise relationships among thalamic subregions. Specifically, we assessed the association between the MD and each pulvinar subdivision using partial correlations, controlling for the remaining pulvinar subdivisions in each analysis. For example, the partial correlation between the MD and the lateral pulvinar was computed while controlling for the activation of the anterior, inferior, and medial pulvinar subdivisions.

      The partial correlation between the MD and the anterior pulvinar was consistent across all four trials of threat learning, whereas the other pulvinar subdivisions did not exhibit a consistent pattern. To evaluate the robustness of these effects, we applied a bootstrap procedure (10,000 resamples) to estimate 95% confidence intervals for each partial correlation. As presented in Figure 4b, only the anterior pulvinar–MD association remained robust, with confidence intervals that did not include zero. In contrast, the confidence intervals for most other pulvinar subdivisions included zero, indicating non-robust associations.

      (5) The manuscript does not contain a limitations subsection. Practically every study has limitations, and this one is not an exception. Better to tell the limitations to the readers upfront so they can factor them into their evaluation of the relevance of the manuscript and reported evidence.

      We thank the reviewer for this constructive suggestion. While the original manuscript already discussed key limitations in the Discussion section (page 36; e.g., “Although distinct thalamic roles in threat learning have been proposed, fMRI data do not fully capture the complexity of this structure…”), we agree that these considerations would benefit from clearer organization and visibility.

      To address this point directly, we have now added a dedicated “Limitations and Future Directions” subsection to the manuscript. This subsection explicitly summarizes the principal limitations of the study—including methodological constraints of fMRI and anatomical resolution—and outlines specific avenues for future research to address them. This change makes the limitations more transparent and allows readers to more easily incorporate them into their evaluation of the findings.

      (6) Data should be made available to the scientific community. Code too. Even if you just used standard fMRI toolboxes, any code used to run analyses will be helpful to the community, or if someone decides to try to replicate your findings.

      We thank the reviewer for this important suggestion and fully agree with the value of data and code sharing for transparency and reproducibility.

      The data supporting the findings of this study are derived from a larger, actively used database that is currently involved in ongoing projects. For this reason, the full dataset cannot yet be publicly released. However, the data underlying the reported analyses are available upon reasonable request from the corresponding author, subject to standard data-use agreements.

      To facilitate reproducibility, all analysis scripts and pipelines used in this study—including preprocessing and analysis workflows implemented in SPM12, and CONN—are available upon request and can be shared with researchers seeking to replicate or extend the reported findings.

      We have clarified this data and code availability statement in the manuscript (page 46).

      Despite these weaknesses and what can be derived from them, this manuscript constitutes a valuable contribution to the field to start characterizing and conceptualizing the involvement of thalamic nuclei and their interactions with other brain regions in the associative threat learning circuitries. It also paves the road for further testing of the functional dynamics among these regions and circuitries, and modeling testing.

      Recommendations for the authors:

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We thank the editors for this important note. Full statistical reporting, including test statistics, degrees of freedom, exact (raw and corrected) p-values, effect sizes, and 95% confidence intervals, is provided for all key analyses in Supplementary Tables 1–9. In addition, uncertainty estimates and major statistics tests are now explicitly reported throughout the main text, as recommended by the reviewers, irrespective of statistical significance.

      During this revision process, we conducted a comprehensive internal consistency check of all reported statistics and figure annotations. We identified and corrected minor discrepancies between some statistical annotations in the figures and the corresponding results reported in the Supplementary Tables. All figures have now been updated to ensure full consistency with the reported analyses. These corrections do not alter the results or conclusions of the study.

      Reviewer #1 (Recommendations for the authors):

      (1) What is the significance of using two different head coils? Were the data comparable from each coil? How did the authors determine this?

      We thank the reviewer for this important question. Data were acquired using two different receiver head coils across participants. Receiver coils primarily influence signal-to-noise ratio (SNR) and spatial sensitivity profiles, rather than the physiological basis of the BOLD response itself (Triantafyllou et al., 2011).

      Importantly, all analyses were based on within-subject contrasts (CS+ vs. CS−), which are robust to global signal scaling differences that may arise from coil sensitivity variations. In addition, standard preprocessing procedures—including intensity normalization, spatial normalization, and nuisance regression—further minimized potential coil-related variability.

      To empirically evaluate whether acquisition differences influenced our results, we conducted a repeated-measures ANOVA testing the Trial × CS × Site interaction (where Site reflects acquisition location and associated scanning setup, including receiver coil configuration) during fear conditioning (N = 293). As shown in Author response table 2, none of the thalamic nuclei demonstrated a significant interaction effect, and all effect sizes were negligible (η<sup>2</sup>p ≤ .01)

      Author response table 2.

      Repeated-Measures ANOVA results for the Trial X CS X site interaction across all relevant thalamic nuclei during fear conditioning.

      (2) Why were the data smoothed? This could have a negative impact on the specificity of the signals averaged within the pre-defined thalamic ROIs.

      Spatial smoothing was applied to improve signal-to-noise ratio and statistical stability in small, deep thalamic subregions, which are particularly susceptible to noise. We acknowledge that smoothing can reduce spatial specificity. However, our analyses were based on anatomically predefined thalamic ROIs and focused on average activation within each region rather than voxel-wise localization. Under this approach, modest smoothing (i.e., a 6-mm full-width at half-maximum smoothing kernel, rather than the commonly used 8-mm kernel) primarily increases reliability while any signal mixing across adjacent regions would be expected to reduce regional specificity and bias effects toward the null, rather than produce spurious or false-positive differences.

      Additionally, we conducted robustness analyses to examine whether spatial smoothing artificially influenced our results. Specifically, we subdivided the mediodorsal thalamus into medial and lateral anatomical regions and compared activation across these subregions. The activation patterns were comparable across both subdivisions, indicating that the observed mediodorsal thalamus effect is unlikely to reflect boundary spillover resulting from smoothing. If smoothing had driven the effect, we would expect differential signal patterns across the subdivisions rather than comparable activation. (See full response to Weakness C, Reviewer 3, as well as Author response image 1 and Author response table 1 in our response).

      (3) Did the authors consider using any null models to determine whether the observed PPI results could have been observed by chance? E.g., block-resampling nulls scramble temporal order while preserving temporal autocorrelation, and can determine whether subtle differences in autocorrelation across regions can give rise to the observed signatures.

      We thank the reviewer for this thoughtful suggestion. All PPI analyses were conducted using the default CONN toolbox pipeline. In this framework, PPI effects are estimated within a GLM at the first level following standard denoising procedures that reduce motion- and physiology-related variance and apply temporal filtering. Importantly, PPI effects are modeled as subject-level contrast terms rather than computed from raw timeseries correlations.

      Group-level inference was performed on these subject-level contrast estimates using paired t-tests with FDR correction across regions. To further assess whether the observed effects could arise by chance, we additionally performed 10,000 bootstrap resamples of the CS+ vs. CS− differences to evaluate the stability of the effects. While we did not implement explicit block-resampling null models that preserve temporal autocorrelation, the combination of first-level GLM modeling following denoising, large sample size (N ≈ 300), and convergent inferential and resampling procedures provides a rigorous and standard assessment of PPI effects. We have revised the manuscript to clarify these procedures and their rationale.

      We added this language to directly address the reviewer’s concern and revised the connectivity analyses section to clarify the workflow (page 44):

      “Following standard denoising procedures—including regression of motion- and physiology-related confounds and temporal filtering—condition-dependent connectivity effects were inferred from subjectlevel generalized psychophysiological interaction (gPPI) contrast estimates rather than from raw timeseries correlations. This GLM-based framework reduces the likelihood that observed PPI effects reflect differences in temporal autocorrelation or spectral properties across regions rather than genuine task-dependent interactions.”

      (4) The authors may wish to report results in text, as there are currently many demonstrative statements that are not associated with requisite uncertainty estimates, making inference challenging.

      We thank the reviewer for this helpful suggestion. We have revised the Results section to explicitly report statistical outcomes in the main text for all key findings, including appropriate uncertainty estimates (e.g., test statistics, effect sizes, and p-values) alongside demonstrative statements. This ensures that all inferences in the text are directly supported by quantitative evidence.

      Additionally, the full statistical details, including test statistics, degrees of freedom, effect sizes, 95% confidence intervals, and both raw and FDR-corrected p-values, are provided in Supplementary Tables 1–9. These changes improve clarity and transparency while avoiding redundancy. Newly added text in the Results section is highlighted in green.

      (5) I could not find any information about the EBICglasso model in the Methods section, nor information about how the centrality measures were estimated. Given the lack of transparency, I recommend down-weighting the often overly-strong language regarding the conclusions of this analysis.

      We have revised and added these details along with other details to the Statistical tests section on pages 42-44:

      “Statistical tests

      All statistical tests were conducted using JASP versions 0.18.3 and 0.19.3(JASP Team, 2024).

      Activation Differences across all phases of threat learning

      In each threat learning phase, we used paired t-tests to examen the differences in activation of the thalamic nuclei in response to CS+ vs. CS- at the block level (average activation across trials), and 2x2 RM-ANOVA to estimate the differences in activation at the trial-wise level. Assumptions of sphericity were checked, and Greenhouse-Geisser corrections were applied where necessary. This model was followed by post hoc tests to estimate the differences at the trial level and False discovery rate (FDR) correction was applied for each question.

      Network analyses of the within pulvinar relationships during conditioning

      The network analyses examined functional relationships between pulvinar divisions. Nodes corresponded to block-level activation estimates of the CS+ minus CS− contrast for each pulvinar division, yielding four nodes (one per division). Networks were estimated using a Gaussian graphical model with EBICglasso (LASSO regularization) based on Pearson correlation matrices, with the EBIC tuning parameter set to γ = 0.5. Edge weights represent partial correlations.

      Three centrality measures were computed on the estimated weighted partial-correlation network: node strength, defined as the sum of the absolute edge weights directly connected to a node; closeness, defined as the inverse of the average shortest path length from a node to all other nodes; and betweenness, defined as the proportion of shortest paths between all pairs of nodes that pass through a given node. Shortest paths were computed using inverse edge weights, consistent with standard practice for weighted networks. Centrality indices were normalized.

      Network accuracy and centrality stability were assessed using nonparametric bootstrapping (10,000 iterations) to estimate confidence intervals for edge weights and centrality measures. All analyses were conducted in JASP (versions 0.18.3 and 0.19.3) using default settings unless otherwise specified, following the procedures described in Epskamp, Borsboom, and Fried (2018).

      Mediation analyses of within pulvinar relationships during conditioning

      Mediation models of the relationships between the activations in pulvinar divisions were estimated using the lavaan package (Rosseel, 2012) with maximum likelihood estimation. All variables were zstandardized prior to analysis. Block-level activation estimates from the inferior and lateral pulvinar were entered as predictors, activation in the medial pulvinar was specified as the mediator, and activation in the anterior pulvinar was specified as the outcome variable.

      To assess the robustness and generalizability of the mediation effects, we conducted 3-fold crossvalidation. The full sample (N = 293) was randomly partitioned into three non-overlapping sub-samples (n = 91, 96, and 106). In each iteration, the mediation model was estimated in one sub-sample, while the remaining sub-samples were used to assess the stability of parameter estimates and indirect effects. This procedure resulted in six cross-validation iterations, allowing evaluation of whether the direction and magnitude of the indirect effect were consistent across independent subsets of the data. Mediation models were estimated using the lavaan package (Rosseel, 2012) with maximum likelihood estimation. Indirect effects were evaluated using bias-corrected percentile bootstrap confidence intervals based on 10,000 resamples, as recommended by Biesanz, Falk, and Savalei (2010). An indirect effect was considered significant when the 95% confidence interval did not include zero (p < 0.05).”

      (6) Bar plots are not effective ways to report group-level data. I recommend replacing all bar plots with visualisations that expose the distribution of the data, such as a violin plot or a raincloud plot.

      We thank the reviewer for this suggestion. In general, we agree that visualizations exposing the full data distribution can be highly informative, and we therefore present distribution-based plots for several analyses (e.g., connectivity results). However, for the activation analyses, our primary goal was to highlight trial-to-trial changes and overall patterns across conditions, rather than the distribution of individual data points per se. For this purpose, bar plots provide a clearer representation of the directional effects and facilitate comparison across trials and conditions.

      (7) The thought bubbles are atypical of scientific figures.

      The figure has been revised to remove the thought bubbles.

      (8) Figure 7 - there are many connections not shown in this figure, suggesting that it is sufficiently oversimplified as to be potentially misleading. For instance, the authors offer no anatomical connections between pulvinar and the cortical hierarchy; however, these connections are ample and (likely) highly important for the functionality assessed here. Similarly, there is no room in the figure for the integration of the shock stimuli (presumably via the spinothalamic tract) and the visual stimuli (via the retina/LGn).

      We agree that the pulvinar has extensive cortical and sensory input/output connections that are not depicted in Figure 7. Our intention was not to provide a complete anatomical wiring diagram, but rather a simplified functional model derived from observed statistical dependencies. We have revised the figure and added an explicit note to the legend clarifying that pulvinar–cortical and sensory pathways (e.g., retina/LGN and spinothalamic inputs) are intentionally omitted due to incomplete subnuclear-level anatomical characterization, and that their omission should not be interpreted as a lack of importance. We added this to Figure 7 legend:

      “Note (panel a):

      Known pulvinar–cortical connections, as well as sensory input pathways (e.g., visual inputs via the retina/LGN and nociceptive inputs via the spinothalamic tract), are not explicitly shown. These connections are well established anatomically but were omitted due to their heterogeneity and incomplete characterization at the level of pulvinar subnuclei. Their absence should not be interpreted as a lack of anatomical or functional relevance.”

      Reviewer #2 (Recommendations for the authors):

      (1) It's somewhat confusing that Figures 1,4,5 D and E are not in the text until later in the results section. Perhaps these should be presented in the figures in the same order they are discussed in the text, although this is a stylistic issue.

      We thank the reviewer for this comment. To improve clarity and align the figures with the structure of the Results section, we reorganized the figures. Specifically, we added a new figure (Figure 7) that consolidates all connectivity analyses. Figures 1, 4, and 5 now focus exclusively on activation results, while Figure 7 presents connectivity results only. This reorganization allows the figures to follow the flow of the text more closely and makes the narrative of each figure clearer.

      (2) Stylistic: I would strongly recommend adding n numbers and describing the basics of statistical tests used and how multiple comparisons were accounted for in the legend for Figures 1,4, and 5.

      We thank the reviewer for this recommendation. We have added the sample sizes (n) and brief descriptions of the statistical tests used, including how multiple comparisons were handled, to the legends of Figures 1, 4, and 5. In addition, we direct the reader to the Supplementary Tables, which were submitted with the original manuscript and provide full statistical details, including test statistics (t, F), degrees of freedom, effect sizes, 95% confidence intervals, raw p values, and corrected p values. Finally, we further elaborated on the statistical tests on pages 42–44, as detailed in our response to Recommendation 5 (Reviewer 1).

      Reviewer #3 (Recommendations for the authors):

      As previously indicated, please note that no information is included in the manuscript about data and code availability. Although you mainly use toolboxes for data analyses, any script(s) that you have used to run things would be great to upload for reproducibility purposes.

      Also, it would be good to include a limitations subsection in the manuscript.

      Thank you for these recommendations. We added limitations subsection to the manuscript. See our responses under Comments 5 and 6 (Reviewer 3, Public Review).

      In terms of data analyses:

      (1) It would be ideal if you quantify in-scanner motion for the different conditions to see if there were no differences in motion due to the task.

      Head motion was estimated at each time point as part of standard preprocessing, and motion parameters were included as nuisance regressors in all first-level models. Because motion estimates are defined per volume rather than per experimental condition, condition-specific motion metrics were not explicitly computed. Importantly, this approach removes motion-related variance uniformly across the time series and therefore controls for potential motion effects across all task conditions. Any residual motion would be expected to increase noise rather than systematically bias condition contrasts.

      (2) You also may want to indicate if normalization followed the SPM 12 default and the data was resampled to 2 x 2 x 2 mm, or kept the same. It is not stated in the data preprocessing subsection of the methods.

      We thank the reviewer for this suggestion. We have now clarified this point in the manuscript (page 41):

      “In addition, spatial normalization was performed with data normalized to Montreal Neurological Institute (MNI) space and resampled to a 2 × 2 × 2 mm<sup>3</sup> voxel grid, followed by spatial smoothing with a 6-mm full-width at half-maximum Gaussian kernel.”

      (3) It is important to indicate how many subjects went into each analysis. Also, it is not clear, based on the current methods section, how many observations per condition were used. That can be reported in the text or the figures.

      We thank the reviewer for this comment. This information has now been added to the Methods section and the relevant figure legends, as described in our response to Comment 2 (Reviewer 3, Public Review).

      References

      Triantafyllou C, Polimeni JR, Wald LL. 2011. Physiological noise and signal-to-noise ratio in fMRI with multi-channel array coils. NeuroImage 55:597–606. DOI: https://doi.org/10.1016/j.neuroimage.2010.11.084, PMID: 21167946

    1. Author response:

      eLife Assessment

      This manuscript reports an important study in which the authors apply smFRET imaging to probe HIV-1 Env conformational dynamics in the presence of antibodies. Previous implementations of smFRET imaging of HIV-1 Env, which focus on gp120 conformation, have yielded limited information on antibodies that target gp41. Through the cutting-edge application of smFRET imaging, the study provides convincing insights into the mechanisms of action of relevant antibodies.

      We appreciate this positive assessment and thank the reviewers for their time and constructive comments. We will make the following changes in the revised manuscript.

      (1) Clarify the distinction between suppression efficiency and functional cost.

      (2) Add controls: smFRET experiments in the presence of monovalent 10E8.4 and iMab individually and compare results with the bivalent 10E8.4/iMab that we currently have.

      (3) Increase the number of repeats in neutralization experiments to reduce variability and, where feasible, perform infectivity and neutralization assays after click chemistry labeling.

      (4) Add discussion on conformational populations probed by smFRET versus structural analyses, Env conformational heterogeneity, ligand effects, and how these approaches complement each other.

      (5) Further clarify the assignments of multiple conformational states by smFRET, the heterogeneity of Env spikes and virion morphology by cryoET, and the focus of the current smFRET-focused storyline.

      Please find below our provisional responses to the public reviews. We will provide detailed point-by-point responses upon submission of the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have considered a panel of antibodies that target epitopes at the gp120/gp41 interface (8ANC195 and PGT151), the fusion peptide in the gp41 domain (VRC34), and the MPER region of gp41 (DH511.2_K3 and VRC42). They also investigate 10E8.4/iMab, which is an engineered bispecific antibody that targets the MPER and the CD4 receptor. On a technical note, they have applied a double amber codon-readthrough strategy to incorporate the non-natural TCO*A amino acid, which gets labeled through click chemistry. This approach should result in less disruption of the native Env structure as compared to the peptide insertion previously used for smFRET imaging of Env. Furthermore, previous implementations of smFRET imaging of HIV-1 Env, which focus on gp120 conformation, have yielded limited information on antibodies that target gp41. Altogether, through the cutting-edge application of smFRET imaging, the study provides novel insights into the mechanisms of action of interesting and clinically relevant antibodies.

      Thank you for the positive comments!

      In validating the functionality of the S401TAG/R542TAG Env, the authors performed infectivity assays and observed 20% infectivity as compared to wild-type (Figure S2A). However, the text equates this with "20% dual-amber suppression efficiency". This would benefit from some explanation. Why do the authors interpret infectivity as reporting on amber suppression efficiency, and not the functional cost of modifying Env, which is probably unavoidable? Or a combination of both? Is there data to suggest that 100% amber suppression would leave Env 100% functional? If so, this would be valuable to show. If not, the text should be clarified.

      We acknowledge this concern and will clarify the distinction between suppression efficiency and functional cost in the revision. The observed reduction in infectivity does not translate into the functional loss; instead, it more reflects the efficiency of suppression (one of the critical limitations of applying genetic code expansion in mammalian cells), as evidenced by reduced Env expression and incorporation on virions (Fig. 1B). In support of the preservation of Env functionality, tag-free and dual-ncAA-incorporated Env virions exhibited similar dose-dependent neutralization sensitivity against trimer-specific neutralizing antibodies (Fig.1D). We have previously discussed several limitations of amber suppression in mammalian cells combined with smFRET viral systems (PMID: 38232732; PMID: 40716060). In brief, orthogonal tRNA/aaRS pair–mediated amber suppression (reassigning/repurposing amber stop codons to non-canonical amino acids) of the introduced ambers in the target protein (Env in our case) must compete with the cellular translation system, particularly release factors that recognize amber codons and terminate translation. Readthrough of endogenous amber codons in virus-producing cells (in our case, HEK293T) can disrupt normal protein expression and virus production. Similarly, readthrough of preexisting amber codons in HIV-1 ORFs other than the targeted ambers in Env can disrupt virus assembly, which we addressed by generating an amber-free provirus (PMID: 38232732). Introducing two amber codons into Env further reduces efficiency, as dual suppression requires two sequential successful suppression events within the same Env molecule.

      The authors state that the contour plots in Figure 2E reveal "dynamic sampling" of the observed FRET states. Strictly speaking, as presented, the contour plots (and FRET histograms) provide no information on dynamics per se. They indicate only the relative thermodynamic stabilities of the FRET states; transitions between states are a matter of interpretation. The TDPs, shown later in Figure 5A, nicely display the dynamics. More importantly, interpretation of the contour plots is challenging, as some seem to suggest an evolution toward lower FRET states. This is especially evident in Figures 2F and 3D, which suggest that the system evolves into a stable 0.1-FRET state (CO) after about 3 sec. Unless the authors want to conclude something from this, I would suggest that they consider removing the contour plots, since their interpretations are fully supported by the FRET histograms alone.

      We agree and will remove the contour plots, as they do not add meaningful information beyond what the histograms show.

      The data indicating that Env conformation is manipulated by 10E8.4/iMab is interesting. If I understand correctly, 10E8.4/iMab is an engineered antibody with one Fab targeting MPER and the second Fab targeting CD4. In the absence of CD4, could the difference between 10E8.4/iMab and the other MPER antibodies be due to 10E8.4/iMab being monovalent with respect to MPER binding?

      We appreciate this question. To answer this, we will perform smFRET experiments in the presence of 10E8.4 and iMab individually and compare those with the bivalent 10E8.4/iMab.

      Reviewer #2 (Public review):

      Summary:

      In this paper, Xu and co-workers unveil two distinct modes of neutralisation by gp41-targeted broadly neutralizing antibodies on HIV-1 Env. So far, it was unclear as to how the mechanism of neutralisation occurred for this subset of neutralising antibodies (that can target the fusion peptide or the membrane proximal external region of the gp41 subunit). Thanks to single-molecule FRET, the authors show that the majority of broadly neutralizing antibodies stabilize the closed Env conformation (named State 1 since the original work by Munro and colleagues PMID: 25298114). Interestingly, the bivalent 10E8.4/iMab stabilized in turn a CD4-bound open state of Env. The two modes of neutralization described for these antibodies show previously unknown allosteric mechanisms that stabilize closed and open Env conformation, stressing the importance of Env conformational dynamics and its efficiency during the process of fusion.

      Strengths:

      The article is well-written, and the figures fully depict the data in a convincing way. The authors have used smFRET, which is now established in the field as a good tool to assess Env dynamics.

      We appreciate these positive comments!

      Weaknesses:

      (1) The limited controls on how click chemistry affects Env (as labelled Env HIV virions were not evaluated).

      We agree. Our validation focused on ncAA-incorporated Env HIV-1 virions, but not the fluorescently labeled virions. To address this, we will increase the number of repeats in neutralization experiments to reduce variability and, where feasible, perform infectivity and neutralization assays after click chemistry labeling. We will attempt to do it. However, we expect that the additional handling time required for labeling and the centrifugation steps needed to remove free dyes, which can deform/disrupt viral membranes and degrade virions, together with the low dual-amber suppression efficiency, will make these experiments technically challenging as an additional layer of functional validation in live cells. On a related note, we have previously performed real-time tracking of single click-labeled Env virion internalization and trafficking in live cells (PMID: 38232732), supporting the retained functionality of click-chemistry-labeled Env.

      (2) Photobleaching of donor and acceptor molecules occurs right after 10sec exposure.

      We acknowledge this limitation and will include it in the corresponding section.

      (3) Other limitations are well described in the corresponding section.

      We appreciate this comment.

    1. Author response:

      Many thanks to the three reviewers and the editors for their comments and review. These are fair, consistent (across positives and negatives), and largely expected comments. On behalf of my coauthors, I use this letter as a provisional response to indicate what we can and intend to change in a revised manuscript.

      (1) A major comment from all three referees is that our single-nucleus RNA-seq data should be validated. The reviewers differ in the detail of exactly what they think should be validated, but they refer, individually, to (1) the discovery of ‘cell types’ themselves, (2) pathways inferred from trajectory analysis, (3) differentially expressed genes in plucked vs control condition at four time points and/or (4) inferred ligand-receptor pairs from cell-cell communication analysis, across the same time course. 

      I think we’re actually on pretty good footing for 1-3, because of work we’ve published in the cichlid fish model.

      I tally that in references cited in the manuscript, and highlighted below (References 1, 10, 11, 29, 30, 31), we present 29 figures with 273 individual figure panels of histology, in situ hybridization and immunohistochemistry featuring genes expressed across stages of tooth development and replacement. These genes are markers of dental competency and regenerative potential.

      In addition, in multiple of these papers, we use pharmacology to manipulate the role of key pathways (Hh, BMP, Wnt, Notch) in cichlid tooth development and replacement. Identification and validation of cell types make use of these published data in cichlids (for markers matched to mouse), as well as an unbiased computational approach (SAMap) that draws homology between cichlid and mouse dental cell types, based on shared global patterns of gene expression.

      In short, experiments to validate cell types, gene expression and pathways active in cichlid teeth are published and referenced herein. I noticed that these references (some of which include Gareth Fraser as an author, when he was a postdoc in my group; for Reviewer 2) were cited in the Introduction and not the Rationale/Methods or Results section (such that reviewers may have missed them). We will be clearer about this in the revision. 

      We have not validated nor analyzed functionally the ligand-receptor pairs inferred from cell-cell communication analysis, across four times points of accelerated replacement. This work is beyond the scope of the current paper, and we will include a statement that these computational inferences represent hypotheses to be tested (although many of these ligand-receptor pairs have been noted in other ‘tooth’ publications that we cite).

      (2) The biggest weakness of our manuscript, noted by referees, is that we do not provide serial histology to accompany our snRNA-seq time course after plucking. We describe this as a limitation in the “Study limitations and future direction” section of the Discussion, but we can and will be stronger about why this is a weakness (e.g., we do not explicitly know for instance, the degree of damage done to tissue in the plucking paradigm). We do know that the jaw recovers quickly, but we do not know how different the plucked side is from the control side (which is also undergoing active replacement and remodeling). Uniting reviewer comments 1 and 2 here, the best future approach is a spatial transcriptomics reference at distinct stages of the plucking<>recovery paradigm, as we framed in the Discussion section, because this addresses simultaneously the state of dental/jaw tissue and the in situ expression of thousands of genes.

      (3) Reviewers asked about the presence of stromal cells in our snRNA-seq data. Because of this and another comment on the posted preprint version of our manuscript, we will take another look at the mesenchymal compartment of the snRNA-seq data and trajectories built from it.

      (4) Multiple (minor) suggestions for clarification in text and figures will be adopted. 

      Generally, I don’t think we’ll require reviewer re-engagement on the revision; editor review should be sufficient.

      References cited in the manuscript, highlighted here:

      (1) Fraser, G. J. et al. An Ancient Gene Network Is Co-opted for Teeth on Old and New Jaws. PLoS Biol. 7, e1000031 (2009).

      (10) Fraser, G. J., Bloomquist, R. F. & Streelman, J. T. Common developmental pathways link tooth shape to regeneration. Dev. Biol. 377, 399–414 (2013).

      (11) Bloomquist, R. F. et al. Developmental plasticity of epithelial stem cells in tooth and taste bud renewal. Proc. Natl. Acad. Sci. 116, 17858–17866 (2019).

      (29) Streelman, J. T., Webb, J. F., Albertson, R. C. & Kocher, T. D. The cusp of evolution and development: a model of cichlid tooth shape diversity. Evol. Dev. 5, 600–608 (2003).

      (30) Fraser, G. J., Bloomquist, R. F. & Streelman, J. T. A periodic pattern generator for dental diversity. BMC Biol. 6, 32 (2008).

      (31) Bloomquist, R. F. et al. Coevolutionary patterning of teeth and taste buds. Proc. Natl. Acad. Sci. 112, (2015).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper by Gao et al. describes the effect of capsaicin on the NRF2/KEAP1 pathway. The authors carried out a set of in vitro and in vivo experiments that addressed the mechanisms of the protective effect of capsaicin on ethanol-induced cytotoxicity.

      The authors conclude that capsaicin activates NRF2, which leads to the induction of cytoprotective genes, preventing oxidative damage. The paper shows that capsaicin may directly bind to KEAP1 and that it is a noncovalent modification of the Kelch domain.

      The authors also designed new albumin-coated capsaicin nanoparticles, which were tested for.

      I appreciate the authors' experimental efforts to strengthen the study's conclusions. However, in my opinion, the paper is still not fully technically sound, which weakens the strength of the evidence.

      Thank you very much for your constructive review. We are truly gratified by your recognition of our key findings—that capsaicin activates NRF2 by disrupting the KEAP1–NRF2 interaction, as conclusively demonstrated through multiple methods including Pull-down, Co-IP, CETSA, SPR, BLI, deuterium exchange MS, CETSA, MS simulations and other target gene expression assays, and that albumin-coated capsaicin nanoparticles exhibit therapeutic effects in vivo. Your technical suggestions were particularly valuable. In this revised version, We have carefully and thoroughly addressed the points raised by you and the other reviewer by providing additional data, including nuclear-cytoplasmic fractionation assays performed with an alternative NRF2 antibody to strengthen and clarify the supporting evidence. We believe this revision have significantly enhanced the overall quality and rigor of the manuscript. Regarding the limitation of the insufficient number of animals used in this article, we have also explained it in the main text. This is the revision we have made with our utmost efforts, and we hope it can meet your expectations to a certain extent.

      Reviewer #2 (Public review):

      Summary:

      In this paper the authors wanted to show that capsaicin can disrupt the interaction between Keap1 and Nrf2 by directly binding to Keap1 at an allosteric site. The resulting stabilization of Nrf2 would protect CAP-treated gastric cells from alcohol- induced redox stress and damage as well as inflammation (both in vitro and in vivo)

      Strengths:

      One major strength of the study is the use of multiple methods (CoIP, SPR, BLI, deuterium exchange MS, CETSA, MS simulations, target gene expression) that consistently show for the first time that capsaicin can disrupt the Nrf2/Keap1 interaction at an allosteric site and lead to stabilization and nuclear translocation of Nrf2.

      Moreover, efforts to show causal involvement of the Keap/Nrf2 axis for the made cellular observations as well as addressing potential off target effects of the polypharmacological CAP appreciated.

      One point that still hampers a bit of full appreciation of the capsaicin effect in cells is that capsaicin is not investigated alone, but mostly in combination with alcohol only.

      Moreover, the true add-on value of the developed nanoparticles remains obscure.

      The partly relatively high levels of NRF2 in putatively unstressed cells question the validity of used models.

      The rationale for switching between different CAP concentrations is unclear /not entirely convincing.

      The language and introduction could be improved.

      Overall, the authors are convinced that capsaicin (although weakly) can bind to Keap1 and releases Nrf2 from degradation, with relevance for biological settings. With this, the authors provide a significant finding with marked relevance for the redox/Nrf2 as well as natural products /hit discovery communities.

      Thank you very much for your positive assessment of our work and for the constructive suggestions to make it better. We also believe that capsaicin (CAP) offers new insights into the activation of NRF2. In this revision, we have addressed the shortcomings with the following efforts:

      (1) The inclusion of a capsaicin (CAP)-only treatment group—covering the same doses and time points as the ethanol co-treatment—revealed that CAP alone can directly inhibit the KEAP1–NRF2 interaction (Figure 3d,3e and Figure 4g), and promote the entry of NRF2 into the nucleus (Figure 2c), resulting in moderate NRF2 activation (Figure 3h and Figure 4d) after carefully revision. However, this effect was significantly enhanced in the presence of ethanol. We attribute the results to the ROS-enriched environment generated by ethanol. Given that KEAP1 is a sensor highly susceptible to oxidative modification, the combination of CAP's allosteric regulation and ethanol-induced oxidative stress promotes a more robust and persistent dissociation of the KEAP1–NRF2 complex. These findings align fully with the established model in which KEAP1–NRF2 dissociation is markedly facilitated under oxidative stress conditions.

      (2) From a translational and industrial perspective, nanoparticle formulations offer improved palatability compared with CAP itself; based on firsthand experience, the nano formulation is more tolerable than CAP. When preparing pure CAP, the pungency often causes irritation, whereas HSA@CAP nanoparticles are milder and demonstrate superior safety in mice following oral gavage. Moreover, ELISA results indicate that HSA@CAP nanoparticles exhibit enhanced anti-inflammatory activity compared with CAP alone (Figure 8d). In light of these findings, we prefer to retain this part of the data.

      (3) Your question is highly professional and well taken. In GES-1 (Fig. 1i) and UC-MSC (Fig. 1l), the expression of NRF2 was low in unstressed conditions, and the transcription and translation of its downstream targets indicate no functional activation, supporting the validity of our model. Accordingly, the control groups in some experiments were suboptimal. We repeated these experiments with additional biological replicates and used cells with early-passage; the discrepancies likely relate to high passage numbers and serum batch effects, but they do not affect our main conclusions. We have replaced the relevant data in the revised manuscript (Fig. 2c and Fig. 3h) and hope this addresses your concern.

      (4) In GES-1 cells, 8 μM consistently yielded the optimal effect, and we therefore maintained this concentration in other experiments in the same cells. And for other experiments, we needed to co-transfect multiple plasmids. Transfection efficiency was poor in GES-1 cells, so we switched to the commonly used HEK-293T cell line. In 293T cells, 2 and 8 μM were suboptimal, so we ultimately used 32 μM (Figure 3h), consistent with other 293T experiments (Co-IP and Pull-down) that also used 32 μM. Therefore, 8 μM were insufficient in Fig. 2g as we repeated many times. This likely reflects cell line–specific differences and the experimental context in 293T cells, including transfection and overexpression of NRF2 and Ub-K48-Myc, which necessitated a relatively higher CAP concentration.

      (5) Thank you very much for noting that the language and Introduction could be further improved. We have rechecked the manuscript for grammar and style and revised the Introduction with a more comprehensive, up-to-date description of the NRF2 pathway. The main changes include rewriting the third and forth paragraph of the Introduction, consolidating/removing irrelevant sections for greater clarity and concision. We hope these updates meet your expectations.

      Figure 2C: It is still not clear why naïve (unstressed /untreated cells) already show rather high nuclear abundance of Nrf2 (shouldn´t Nrf2 be continuously tagged for degradation by Keap1)

      Thank you for your constructive comments. In response to the concern raised, we repeated the nuclear–cytoplasmic fractionation experiments in early-passage GES‑1 cells and performed three independent replications using an alternative, widely recognized NRF2 antibody (Cell Signaling Technology, Cat. No. 12721). The results showed low nuclear NRF2 levels under basal conditions, consistent with the KEAP1-mediated continuous degradation mechanism. Accordingly, we have updated the relevant figure in Figure 2C. Nevertheless, NRF2 could still be detected in the control group, which is basically consistent with the reported baseline levels of NRF2 observed in GES - 1 cells and other cell lines [1,2,3]. Therefore, this does not indicate the failure of model construction.

      References:

      (1) Wang, R. et al. Costunolide ameliorates MNNG-induced chronic atrophic gastritis through inhibiting oxidative stress and DNA damage via activation of Nrf2. Phytomedicine 130, 155581, doi:10.1016/j.phymed.2024.155581 (2024).

      (2) Li, Y. F. et al. Construction of Magnolol Nanoparticles for Alleviation of Ethanol-Induced Acute Gastric Injury. J Agric Food Chem 72, 7933-7942, doi:10.1021/acs.jafc.3c09902 (2024).

      (3) Li, M., Wang, J., Xu, Z., Lin, Y. & Dong, J. Atraric acid attenuates chronic intermittent hypoxia-induced brain injury via AMPK-mediated Nrf2 and FoxO3a antioxidant pathway activation. Phytomedicine 148, 157261, doi:10.1016/j.phymed.2025.157261 (2025).

      Figure 2G-H: Why switch to rather high concentrations?

      To validate ubiquitin-mediated degradation in Figure 2G-H, we needed to co-transfect multiple plasmids. Transfection efficiency was poor in GES-1 cells, so we switched to the commonly used HEK-293T cell line. In 293T cells, 2 and 8 μM were suboptimal, so we ultimately used 32 μM, consistent with other 293T experiments (Co-IP and Pull-down) that also used 32 μM. These choices reflect intrinsic cell line properties and protein overexpression in 293T, but do not affect our investigation of capsaicin’s mechanism.

      Figure 2I: in the pics of mitochondria the control mitochondria look way more punctuated (likely fissed) than the ones treated with EtOH or EtOH + CAP. Wouldn´t one expect that EtOH leads to mitochondrial fission and CAP can prevent it?

      Thank you very much for your comments. We re-acquired and analyzed mitochondrial morphology by the Leica STELLARIS 5 Confocal Microscope Platform, which our school didn't have at that time. The earlier wide-field fluorescence images lacked sufficient magnification and resolution, which obscured details and may have caused confusion. In the revised manuscript, we have replaced them with confocal images showing EtOH-induced mitochondrial abnormalities, whereas CAP treatment restored the reticular network, as expected. We also added a CAP-only group, which shows no discernible effect on mitochondrial morphology.

      Figure 3H: High basal Nrf2 levels in unstressed/untreated HEK WT cells, why?

      Thank you for raising this concern. We repeated the experiments in HEK-293T (WT) cells in better condition, and validated the results using an alternative, widely recognized NRF2 antibody (Cell Signaling Technology, Cat. No. 12721). The data consistently show relatively low NRF2 expression under basal conditions, in line with the KEAP1-mediated continuous degradation mechanism. We have corrected the corresponding figures accordingly.

      Figure 4a: Inclusion of an additional Keap1 binding protein (one with a ETGE motif) would have been desirable (to get information on specificity/risks of off-target (unwanted) effects of CAP).

      Thank you for this valuable suggestion. We have added CETSA experiments for DPP3, which contains an ETGE motif. The results show that endogenous DPP3 expression was low in GES-1 cells and does not bind CAP in vitro that BLI experiments indicated the KD was above 1 mM in Supplementary Figure 4h and 4i, and thus CAP does not thermally stabilize DPP3 at the cellular level. Therefore, the risk of off-target effects via binding to ETGE-containing proteins like DPP3 appears minimal.

      Figure 4D: Why is there no stabilization of Nrf2 by CAP in lane 2?

      Thank you for raising this concern. We repeated the experiment in GES‑1 cells and performed three independent replicates using an alternative, widely recognized Nrf2 antibody (Cell Signaling Technology, Cat. No. 12721). The data show that CAP alone increases NRF2 expression to some extent. We have updated the corresponding figures accordingly in Figure 4D.

      Figure 4f: 5% DMSO is a rather high solvent concentration, why so high (the solvent alone seems to have quite marked effects!)

      Thank you for raising this concern. Our original figure legend was misleading and has been corrected. Only the highest CAP concentration (500 μM) contained 5% DMSO as the vehicle; the other CAP concentrations, prepared by serial dilution in complete medium, did not contain such high solvent levels (e.g., 65.5 μM CAP contained 0.625% DMSO). This experiment included transient overexpression of NRF2-HA as purified recombinant NRF2 protein is prohibitively expensive, 10 ug needs about 900 GBP from Abcam. We therefore conducted a preliminary assay by incubating purified Kelch-Flag protein with cell lysates overexpressing NRF2-HA and measured NRF2 levels in the supernatant and pellet in Figure 4f. Nevertheless, the conclusion that CAP disrupts the NRF2–KEAP1 interaction is better supported by SPR (Figure 3d), Co-IP (Figure 3e) and Pull-down (Figure 4g).

      Figure 6/7: not expert enough to judge formulations and histology scores. However, the benefit of the encapsulated capsaicin does not become entirely clear to me, as CAP and IRHSA@CAP mostly do not significantly differ in their elicited response.

      Thank you very much for the valuable suggestion. Although histopathology suggests only modest differences between the two treatments, the nanoparticle group showed markedly lower inflammatory cytokine levels than pure CAP: IL-1β, IL-6, TNF-α, and CXCL-1 were significantly reduced, while the anti-inflammatory cytokine IL-10 was significantly increased (Figure 8d). These changes are important for maintaining a healthy gastric environment and may better support digestive function in vivo. Accordingly, from a translational and industrial perspective, nanoparticle formulations also offer improved palatability compared with capsaicin itself. Based on firsthand experience, the nano formulation is more tolerable than CAP. When preparing pure CAP, the pungency often causes irritation, whereas HSA@CAP nanoparticles are milder and demonstrate superior safety in mice following oral gavage.

      Figure 7: Rebamipide was introduced as positive control in the text with an activating effect on Nrf2, but there is no induction of hmox and nqo in Figure 7f, why? It does not look as the positive control was wisely chosen.

      Thank you for your insightful comment. We agree that this result was suboptimal and sincerely apologize for the oversight. We are currently facing significant constraints: the original cDNA is depleted, and funding cuts have severely limited our resources for reagents and animal studies. A full repetition of the rat experiment at the original scale and quality is not feasible in the short term. To ensure the scientific rigor of the paper, we have made the difficult decision to remove Figure 7f. We believe this is necessary to base our conclusions on the most robust evidence. We apologize for any inconvenience and hope this solution is acceptable. We have revised the manuscript accordingly and appreciate your understanding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors did not provide data validating the NRF2 antibody for in vitro studies, particularly for IF data where there is no molecular mass indication for NRF2. The IF data suggest that NRF2 is primarily located in the cytoplasm under control conditions (Fig. 2A), whereas the WB data show a strong band in the nucleus (Fig. 2C). What is the reason for this inconsistency?

      We sincerely appreciate your valuable comments. Previously, we used an NRF2 antibody (Cat. No. 16396-1-AP, Proteintech); the vendor’s data show that shRNA knockdown in HeLa cells markedly reduces NRF2 at the expected molecular weight and IF data in HepG2 cells show a trace amount of cytoplasmic localization in controls and clear nuclear translocation after MG-132 treatment, which indicates that this antibody can be used for immunofluorescence (IF) to indicate the subcellular localization of NRF2, and our experimental results are also in line with expectations in Figure 2A. In addition, to address the reviewer's concern, we purchased another NRF2 antibody (Cat. No. 12721, Cell Signaling Technology), which was also highly validated. In this version, we repeated nuclear-cytoplasmic fractionation experiments and other important experiments using this antibody. Together, these data confirm the low basal level of NRF2 in the absence of stress and also show that CAP could improve the expression of NRF2. We have corrected the Figure 2C so that the WB and IF results are now consistent. We wish to reiterate our deep appreciation for the professionalism and rigor of your review.

      (2) Additionally, I could not find Supplementary Figure 4F-I, which concerns TRPV1. These figures are mentioned in the response to reviewers but are missing from the manuscript-please double-check.

      The supplementary figures were initially submitted as a compressed archive. We recognize that there might have been an issue with the transfer of this file to the reviewers. As shown in Supplement Figure 4f to Supplement Figure 4i, we further explored the TRPV1 and DPP3 to detect the potential off-target effects of CAP respectively. Capsazepine (CAPZ), which is TRPV1 receptor antagonist did not affect the protection of CAP against GES-1 (Fig S4f and S4g), which may indicate that CAP activation of NRF2 does not have to depend on TRPV1. The binding of CAP with DPP3, containing an ETGE motif and can bind to KEPA1, was detected by BLI, and we found that the KD between CAP and DPP3 was 1.653 mM(>100 μM), which may indicate the potential off-target effect of CAP is low because CAP had a relatively strong binding force with KEAP1 about 31.45 μM (Fig S4h and S4i).

      (3) I am also somewhat unconvinced by the data obtained from NRF2 KO mice. First, it appears that some NRF2 KO mice respond to CAP treatment well while others do not, resulting in a high standard deviation. To strengthen the conclusions, it would be advisable to use a larger number of animals to confirm or exclude the effect. This is precisely why I still believe that three rats per group are insufficient for the in vivo studies. Please emphasize in the manuscript that a limitation of this study is the use of only three rats per group for the in vivo experiments.

      Thank you very much for your question and suggestions. As for the rat experiments in Figure 7 and Figure 8, there are many other references available, as noted in the introduction: “Recent experiments conducted in rats have demonstrated that red pepper/capsaicin (CAP) possesses significant protective effects on ethanol-induced gastric mucosal damage , and the mechanisms involved may relate to the promotion of vasodilation[6,7], increased mucus secretion[8] and the release of calcitonin gene-related peptide (CGRP)[9,10]. However, it is important to note that the specific role of the antioxidant activity of CAP has not been thoroughly investigated.” Therefore, we conducted extensive literature research and preliminary experiments to ensure that our formal experiment with 8 groups could yield relatively stable results. Of course, we admit that using more rats in vivo would make the conclusion more reliable. Unfortunately, the project was delayed due to funding issues. We are currently facing significant resource constraints: reductions in research funding from the National Natural Science Foundation have severely limited our funding for reagents and animal experiments in this study. As a result, it has become impossible to fully repeat all animal experiments at the original scale and quality in the short term. Regrettably, to supplement the NRF2 knockout animal-related experiments (n=6), we have already spent approximately 70,000 RMB (about 10,000 USD). We have made tremendous efforts to ensure the scientific rigor of the paper. We sincerely apologize for any inconvenience caused. At the same time, we fully recognize the importance of increasing the sample size in animal experiments for this study. We have explicitly acknowledged this as a limitation of our work in the Discussion Section and have revised the manuscript accordingly. We greatly appreciate your understanding.

      (4) Furthermore, please double-check the blot in Fig. 9D. Tubulin and P-p65 bands appear very similar, and tubulin disappears in response to EtOH and EtOH/CAP treatment in KO mice. Is it the case? I am not sure the quantitative data reflect the WB bands. Please verify that.

      We sincerely appreciate your valuable feedback on our manuscript. Indeed, we may have included bands that do not meet the requirements due to our eagerness, and we are very grateful for your pointing this out; it was indeed a significant oversight on our part. I will definitely pay more attention to careful checking in the future. In response to this, we have re-conducted the experiments using the preserved tissue samples and have accordingly updated Figure 9d. Thank you for your insightful suggestions.

      Reviewer #2 (Recommendations for the authors):

      Presentation:

      The data with the encapsulated CAP appear a little as side arm that does not bolster your main message (maybe take out and elaborate on this topic more extensively in another manuscript)

      We sincerely thank the reviewer for this suggestion. However, based on the ELISA results demonstrating that nano-capsaicin exerts a significantly stronger anti-inflammatory effect than pure capsaicin (CAP), and considering its superior sensory profile for industrial applications (confirmed by our sensory evaluations), we believe these data provide valuable insights. Therefore, we would prefer to retain this section in the manuscript and hope for your understanding.

      Revise the introduction on the Nrf2 signaling pathway ...as it is written at the moment, someone outside the Nrf2 field might have trouble to understand

      Thank you for the valuable suggestion again. We have rewritten the introduction to the NRF2 signaling pathway to improve accessibility for readers outside the field.

      “The Kelch-like ECH-associated protein 1 (KEAP1)–Nuclear factor erythroid 2–related factor 2 (NRF2)–antioxidant response element (ARE) pathway is a core defense mechanism against oxidative and electrophilic stress[11]. Under homeostatic conditions, KEAP1 acts as a linker protein for the Cul3-E3 ubiquitin ligase complex, continuously promoting the ubiquitination and proteasomal degradation of NRF2, thereby maintaining NRF2 at basal levels[12]. When oxidative or electrophilic stress occurs, critical cysteine residues in KEAP1 are modified, or the interaction between the ETGE/DLG motifs on NRF2 and the Kelch domain of KEAP1 is disrupted, allowing NRF2 to escape degradation, accumulate, and translocate to the nucleus. There, NRF2 forms heterodimers with small Maf proteins and binds to ARE, inducing the expression of antioxidant and cytoprotective genes such as those involved in glutathione metabolism, NADPH regeneration, phase II detoxifying enzymes, and drug efflux transporters, thereby restoring redox balance within the cell and reducing oxidative damage[13].

      Classical NRF2 agonists, such as sulforaphane, are small molecules that bind to KEAP1 and covalently modify its cysteine residues, thereby altering the binding affinity between KEAP1 and NRF2 [14]. However, traditional covalent agonists may induce sustained overactivation of NRF2, leading to adverse side effects and limiting clinical application [15]. Consequently, recent efforts have shifted toward the development of non-covalent NRF2 agonists, which are generally associated with lower toxicity and greater translational potential, enabling more controlled enhancement of NRF2 activity and offering new insights and therapeutic opportunities in antioxidant-related interventions.”

      The authors should check and review extensively for improvements to the use of English to get rid of awkward phrases /wording.

      Thank you very much for this helpful comment. We sincerely appreciate the suggestion and have carefully re‑read and further polished the entire manuscript to remove awkward phrasing and improve the readability of expressions and phrases. We hope these revisions address your concern, and we remain grateful for your guidance.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The Drosophila wing disc is an epithelial tissue which study has provided many insights into the genetic regulation of organ patterning and growth. One fundamental aspect of wing development is the positioning of the wing primordia, which occurs at the confluence of two developmental boundaries, the anterior-posterior and the dorsal-ventral. The dorsal-ventral boundary is determined by the domain of expression of the gene apterous, which is set early in the development of the wing disc. For this reason, the regulation of apterous expression is a fundamental aspect of wing formation.

      In this manuscript the authors used state of the art genomic engineering and a bottom-up approach to analyze the contribution of a 463 base pair fragment of apterous regulatory DNA. They find compelling evidence about the inner structure of this regulatory DNA and the upstream transcription factors that likely bind to this DNA to regulate apterous early expression in the Drosophila wing disc.

      Strengths:

      This manuscript has several strengths concerning both the experimental techniques used to address a problem of gene regulation and the relevance of the subject. To identify the mode of operation of the 463 bp enhancer, the authors use a balanced combination of different experimental approaches. First, they use bioinformatic analysis (sequence conservation and identification of transcription factors binding sites) to identify individual modules within the 463 bp enhancer. Second, they identify the functional modules through genetic analysis by generating Drosophila strains with individual deletions. Each deletion is characterized by looking at the resulting adult phenotype and also by monitoring apterous expression in the mutant wing discs. They then use a clever method to interfere in a more dynamic manner with the function of the enhancer, by directing the expression of catalytically inactive Cas9 to specific regions of this DNA. Finally, they recur to a more classical genetic approach to uncover the relevance of candidate transcription factors, some of them previously know and other suggested by the bioinformatic analysis of the 463 bp sequence. This workflow is clearly reflected in the manuscript, and constitute a great example of how to proceed experimentally in the analysis of regulatory DNA.

      Weaknesses:

      The previously pointed weakness (vg expression, P compartment specific effects, early vs late analysis of ap expression in mutants) have been throughly and satisfactorily addressed by the authors.

      We thank the reviewer for the positive assessment of our manuscript as well as for the many constructive comments during its revision.

      Reviewer #3 (Public review):

      In this manuscript, authors use the Drosophila wing as model system and combine state-of-the-arte genetic engineering to identify and validate the molecular players mediating the activity of one of the cisregulatory enhancers of the apterous gene involved in the regulation of its expression domain in the dorsal compartment of the wing primordium during larval development. The paper is subdivided into the following chapters/figures:

      (1) In the first couple of figures, authors describe the methodology to genetically manipulate the apE enhancer (a cartoon summarizing all the previous work with this enhancer might help) and identify two well-conserved domains in the OR463 enhancer required for wing development (the m3 region whose deletion phenocopies OR463 deletion: loss of wing, and the m1 region, whose deletion gives rise to AP identify changes in the P compartment).

      (2) In the following three figures, authors characterize the m1 regulatory region, identify HOX and ETS binding sites, functionally validate their role in wing development and the activity of the genes/proteins regulating their activity (eg-. Hth and Pointed) by their ability to phenocopy (when depleted) the m1 loss of function wing phenotype. Authors conclude that Hth and Pointed regulate apterous expression through the m1 region.

      (3) In the last few figures, authors perform similar experiments with the m3 regulatory region to conclude that the Grn and Antennapedia regulate apterous expression through the m3 enhancer.

      My comments:

      Technically sound: As stated in my previous review, the work is technically excellent (authors use stateof-the-art genetic engineering to manipulate the enhancer and combine it with genetic analysis through RNAi and CRISPR/Cas9 and phenotypic characterization to functionally validate their findings), figures are nicely done and cartoons are self-explanatory.

      We thank the reviewer for these positive comments.

      Poor paper writing: The paper is too long and difficult to read/understand, many grammatical mistakes are found, and formatting is in some cases heterodox.

      We thank the reviewer for this assessment. We have carefully revised the manuscript to improve clarity, readability, and consistency throughout. Specifically:

      (1) Streamlined several sections to improve narrative flow. Specially in the abstract, model and dCas9 sections.

      (2) Corrected grammatical issues across the manuscript. As the reviewer pointed out, we found many in the text. We are grateful the reviewer was insistent in this point.

      (3) harmonized formatting and terminology. Many small inconsistencies were found in the figure legends, that have been largely adapted.

      We believe these changes substantially improve the accessibility and overall presentation of the work. However, we have not shortened the manuscript, as we want to transmit the complexity of attempting to dissect non-coding regions, as well as not oversimplify the phenotypes obtained.

      Science:

      (1) The question of "who is locating the relative position of the AP and DV boundaries in the developing wing?" is not resolved. I would then change the intro or reduce the tone of this question. Having said that, I agree that these results shed light on the wing phenotypes of some apterous alleles related to AP identify and growth and, as such, I congratulate the authors.

      We appreciate this important point. We agree that our study does not fully resolve the upstream mechanisms that ultimately position the AP and DV boundaries. Our goal was instead to determine how the ap early enhancer (apE) contributes to the correct spatial relationship between these boundaries. To address the reviewer’s concern, we have revised the Introduction and Discussion to soften the framing of this question and to more clearly state the scope of our conclusions. We now emphasize that our work provides mechanistic insight into how apE function impacts DV/AP boundary organization, rather than claiming to fully resolve the upstream positioning mechanism.

      (2) Identification of two TFs (Grain and Antp) mediating the regulation of apterous expression is interesting but some contextualization might be required. Data on Antp is not as convincing as data on Grn. I wonder whether Antp data can be removed at all.

      We thank the reviewer for this thoughtful evaluation. We agree that the genetic evidence for Grain (Grn) is stronger and more direct than for Antennapedia (Antp). In response, we have revised the manuscript to more carefully calibrate the strength of our conclusions regarding Antp.

      Specifically, we have:

      Softened the language throughout to describe Antp as a candidate HOX input,

      Explicitly stated that direct binding to the m3 site remains to be demonstrated biochemically, and

      Clarified in the Discussion that our data support an early contributory role for Antp rather than establishing it as the definitive HOX factor acting at apE.

      We believe retaining the Antp data is important because:

      (1) The m3 site shows strong HOX dependency in vivo,

      (2) Early Antp depletion produces clear defects in ap expression, and

      (3) Recent literature supports an early requirement for Antp in wing development.

      Together, these observations provide a coherent working model while appropriately acknowledging current limitations. We hope the reviewer agrees that the revised framing now appropriately reflects the strength of the evidence.

      (3) I am not sure whether the term hemizygous is used properly

      We use the term hemizygous as in classical genetics, in which an individual carrying an allele opposite a chromosomal deletion is considered hemizygous at that locus (see for example the entry for ap<sup>4</sup> mutant in the red book (Lindsley and Zimm, The Genome of Drosophila melanogaster):

      “… ap4 /Df(2L) M4IA-54 hemizygote has nearly normal complement of bristles but otherwise resembles ap4 homozygote (Butterworth and King, 1965).”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Adapting Clinical Chemistry Plasma as a Source for Liquid Biopsies" addresses a timely and practical question: whether residual plasma from heparin separator tubes can serve as a source of cfDNA for molecular profiling. This idea is attractive, since such samples are routinely generated in clinical chemistry labs and would represent a vast and accessible resource for liquid biopsy applications. The preliminary results are encouraging, and likely to benefit the research community.

      Comments on revisions:

      The concerns raised have been addressed. The heparin separator-based cfDNA method described in this study is likely to benefit the research community. I have no further scientific concerns.

      We appreciate the encouragement and recognition.

      Reviewer #2 (Public review):

      Summary:

      The authors propose that leftover heparin plasma can serve as a source for cfDNA extraction, which could then be used for downstream genomic analyses such as methylation profiling, CNV detection, metagenomics, and fragmentomics. While the study is potentially of interest, several major limitations reduce its impact; for example, the study does not adequately address key methodological concerns, particularly cfDNA degradation, sequencing depth limitations, statistical rigor, and the breadth of relevant applications.

      Strengths:

      The paper provides a cheap method to extract cfDNA, which has broad application if the method is solid.

      Weaknesses:

      (1) The introduction lacks a sufficient review of prior work. The authors do not adequately summarize existing studies on cfDNA extraction, particularly those comparing heparin plasma and EDTA plasma. This omission weakens the rationale for their study and overlooks important context.

      (2) The evaluation of cfDNA degradation from heparin plasma is incomplete. The authors did not compare cfDNA integrity with that extracted from EDTA plasma under realistic sample handling conditions. Their analysis (lines 90-93) focuses only on immediate extraction, which is not representative of clinical workflows where delays are common. This is in direct conflict with findings from Barra et al. (2025, LabMed), who showed that cfDNA from heparin plasma is substantially more degraded than that from EDTA plasma. A systematic comparison of cfDNA yields and fragment sizes under delayed extraction conditions would be necessary to validate the feasibility of their proposed approach.

      (3) The comparison of methylation profiles suffers from the same limitation. The authors do not account for cfDNA degradation and the resulting reduced input material, which in turn affects sequencing depth and data quality. As shown by Barra et al., quantifying cfDNA yield and displaying these data in a figure would strengthen the analysis. Moreover, the statistical method applied is inappropriate: the authors use Pearson correlation when Spearman correlation would be more robust to outliers and thus more suitable for methylation and other genomic comparisons.

      (4) The CNV analysis also raises concerns. With low-coverage WGS (~5X) from heparin-derived cfDNA, only large CNVs (>100 kb) are reliably detectable. The authors used a 500 kb bin size for CNV calling, but they did not acknowledge this as a limitation. Evaluating CNV detection at multiple bin sizes (e.g., 1 kb, 10 kb, 50 kb, 100 kb, 250 kb) would provide a more complete picture. In addition, Figure 3 presents CNV results from only one sample, which risks bias. Similar bias would exist for illustrations of CNVs from other samples in the supplementary figures provided by the authors. Again, Spearman correlation should be applied in Figure 3c, where clear outliers are visible.

      (5) It is important to point out that depth-based CNV calling is just one of the CNV calling methods. Other CNV calling software using SNVs, pair-reads, split-reads, and coverage depth for calling CNV, such as the software Conserting, would be severely affected by the low-quality WGS data. The authors need to evaluate at least two different software with specific algorithms for CNV calling based on current WGS data.

      (6) The authors omit an important application of cfDNA: somatic mutation detection. Degraded cfDNA and reduced sequencing depth could substantially impact SNV calling accuracy in terms of both recall and precision. Assessing this aspect with their current dataset would provide a more comprehensive evaluation of heparin plasma-derived cfDNA for genomic analyses.

      Comments on revisions:

      As suggested previously, the Pearson correlation analysis tends to be overstated; please replace it with Spearman correlation in the whole manuscript. Currently, the authors include both of them in the abstract, method, results, and graphics, all of which are required to be updated to only use Spearman correlation results.

      I don't have other concerns about the manuscript.

      We entirely agree and have removed all instances of Pearson correlation from the paper, including the abstract, method, results, and graphics. Only the Spearman’s correlation was used.

      We appreciate your efforts and helpful comments.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examines how different parts of the brain's reward system regulate eating behavior. The authors focus on the medial shell of the nucleus accumbens, a region known to influence pleasure and motivation. They find that nerve cells in the front (rostral) portion of this region are inhibited during eating, and when artificially activated, they reduce food intake. In contrast, similar cells at the back (caudal) are excited during eating but do not suppress feeding. The team also identifies a molecular marker, Stard5, that selectively labels the rostral hotspot and enables new genetic tools to study it. These findings clarify how specific circuits in the brain control hedonic feeding, providing new entry points to understand and potentially treat conditions such as overeating and obesity.

      We thank Reviewer 1 for the positive feedback, summary of our findings and for the thorough reading and constructive comments on the manuscript, which allowed us to improve the quality of the revised version.

      Strengths:

      (1) Conceptual advance: The work convincingly establishes a rostro-caudal gradient within the medNAcSh, clarifying earlier pharmacological studies with modern circuit-level and genetic approaches.

      (2) Methodological rigor: The combination of fiber photometry, optogenetics, CRISPR-Cas9 genetic engineering, histology, FISH, scRNA-seq, and novel mouse genetics adds robustness, with complementary approaches converging on the central claim.

      (3) Innovation: The generation of a Stard5-Flp line is a valuable resource that will enable precise interrogation of the rostral hotspot in future studies.

      (4) Specificity of findings: The dissociation between appetitive and aversive conditions strengthens the interpretation that the observed gradient is restricted to feeding.

      We thank Reviewer #1 for their supportive feedback.

      Weaknesses and points for clarification

      (1) Role of D2-SPNs: Since D1 and D2 pathways often show opposing roles in feeding, testing, or discussing D2-SPN contributions would provide an important control and context. Since the claim is that Stard5 is expressed in both D1- and D2MSNs, it seems to contradict the exclusive role of D1R MSNs in authorizing food intake.

      We agree that D2-SPNs represent an important and relevant cell population in the context of our study. The Stard5-Flp line labels a mixed population of D1- and D2-SPNs, and we agree that dissecting the distinct contributions of Stard5<sup>+</sup> D1-SPNs and Stard5⁺ D2-SPNs to feeding behavior would be both interesting and informative.

      Although we understand the point raised by the Reviewer, we do not entirely agree that the expression of Stard5 in both D1- and D2-SPNs contradicts the established role of D1-SPNs in authorizing food intake. In the medNAcSh, D1- and D2-SPNs do not exert opposing functions. D2-SPNs project densely to the ventral pallidum and more sparsely to the lateral hypothalamus and, like D1-SPNs, are predominantly rewardinhibited at the population level (Domingues et al. 2025; Pedersen et al. 2022).

      We added the following in the discussion: “Additionally, a new study showed that manipulation of D2-SPN cell bodies in the medNAcSh modulates reward preference, self-stimulation, and palatable food intake in a frequency- and context-dependent manner (Requejo-Mendoza et al., 2025). Together, these findings suggest that D1- and D2-SPNs within the medNAcSh play complementary rather than opposing roles in reward processing. Hence, the potential role of rostral and caudal medNAcSh D1- and D2-SPNs in foodrelated behaviors beyond the act of consumption could be addressed in future work.” We also acknowledge that not investigating rostro-caudal gradients of D2-SPN in reward and aversion processing “represents a limitation of this work”.

      We fully agree that disentangling the specific contributions of Stard5<sup>+</sup> D1- and Stard5<sup>+</sup> D2-SPNs is an important next step. We have now crossed the Stard5-Flp line with Drd1-Cre and A2a-Cre lines. In a pilot experiment (not shown), we injected Flp+,Cre+, Flp+,Cre- and Flp-,Cre+ mice with 4 different FlpOn-CreOn AAVs to determine if any of these AAVs demonstrate specific expression. However, all AAVs exhibited moderate to strong leaky expression of the Cre, preventing reliable cell-type-specific targeting. This was not seen with Flp-only or Cre-only AAVs. The leakiness mentioned is a known challenge of FlpOn-CreOn AAVs and requires additional troubleshooting (e.g. reduce the titer). As this proved to be more challenging than anticipated, this work is ongoing and will be addressed in a future study rather than in the present revisions.

      (2) Behavioral analyses:

      (a) In Figure 2, group differences in consumption appear uneven; additional analyses (e.g., lick counts across blocks and session totals) would strengthen interpretation.

      The group differences in consumption that appear uneven likely reflect an overall lower total lick counts per session in the Control group. We have now added analyses on average lick counts per block and session totals in the newly included Supplementary Figure S7, which support the results shown in Figure 2.

      Although we observe a difference in total lick count across the entire session between Control and Rostral ChrimsonR mice (Supplementary Figure S7d), we deem the comparison in total session lick counts not that informative here. Instead, we would argue that the laser-on epoch is the most meaningful comparison. During this period, optogenetic activation had no effect on licking behavior in control mice, showed a nonsignificant trend toward reduced consumption in caudal ChrimsonR mice, and produced a significant reduction in lick counts when rostral medNAcSh D1-SPNs were activated (Figure 2g-i and Supplementary Figure S7c).

      We added in the discussion the following explanation:

      “In addition, comparison of licking behavior during the laser-off blocks revealed an interesting effect: following cessation of opto-stimulation, Rostral ChrimsonR mice licked more than Caudal ChrimsonR and Control mice, suggesting a possible compensatory overconsumption. One possible interpretation is that the optogenetic parameters used suppressed consummatory behavior without reducing the motivation to obtain the reward. Furthermore, consistent with the RTPPA results, activation of rostral D1-SPNs may be experienced as aversive and termination of the optogenetic stimulation could produce relief, which in turn reinforces the licking behavior. Further investigations are required to test these possibilities.”

      (b) The design and contribution of aversive assays to the main conclusions remain somewhat unclear and could be better justified.

      We appreciate the Reviewer’s comment regarding the design and contribution of the aversive assays. The rationale for including these experiments was to determine whether the rostro–caudal functional segregation observed for reward-related feeding also applies to aversive processing.

      First, using foot shock, we tested whether D1-SPNs in the rostral versus caudal medNAcSh respond differently to an aversive stimulus. In contrast to reward-related responses, both populations responded similarly, exhibiting excitation. Second, to ensure that this effect was not specific to a single stressor, we tested a second aversive stimulus (tail lift) and again observed comparable excitatory responses in rostral and caudal D1-SPNs. Third, we assessed whether optogenetic activation of these neurons is perceived as rewarding or aversive. Using a real-time place preference/aversion assay, we found that optogenetic stimulation of D1-SPNs in both subregions induced place aversion.

      Together, these experiments show that while D1-SPNs display region-specific effects on reward-related feeding behavior, their activity responses to aversive stimuli and the avoidance response to optogenetic activation are similar across rostral and caudal medNAcSh. This contrast strengthens our conclusion that the D1-SPN rostro-caudal gradient is specific to appetitive contexts.

      We added the following in the discussion:

      “Here, we further tested the existence of rostro-caudal gradients for aversion, asking whether D1-SPNs in the rostral vs. caudal medNAcSh respond differently to aversive stimuli. To ensure that any observed effects were not specific to a single stressor, we tested two distinct aversive stimuli (foot shock and tail lift). In both cases, we found no rostro-caudal differences, as D1-SPNs in both subregions responded with excitation. We also asked whether optogenetic activation of these neurons is perceived as aversive. Stimulation of D1- SPNs in both rostral and caudal medNAcSh promoted aversive behavioral responses in the RTPPA experiment. Hence, in contrast to the pharmacological inhibitions mentioned above, we did not detect differences in aversive behaviors according to the rostro-caudal medNAcSh site.”

      (c) The scope of behavior is mainly limited to consumption; testing related domains (motivation, reward valuation, and extinction) could broaden the significance.

      We thank the Reviewer for the suggestion to examine additional behavioral domains such as motivation, reward valuation, and extinction. We focused our efforts on consumption given the large body of literature demonstrating a very important role of the medNAcSh in reward consumption. However, we fully agree that feeding encompasses multiple phases, from appetitive and goal-directed behaviors to consummatory behavior, and that the NAc in general, and to some extent the NAcSh is involved in behaviors across this spectrum. For instance, prior work has shown that the medNAcSh is involved in reward preference and that this follows a rostro-caudal gradient (e.g. Pedersen et al. 2022).

      While it would be informative to directly test motivational processes using operant paradigms (e.g., nosepoke or lever-press tasks), our current experimental setup did not allow for these assays. Instead, we performed exploratory experiments manipulating the animals’ internal state with food deprivation. As expected, under food deprivation, total licking increased robustly in control mCherry and Rostral ChrimsonR medNAcSh mice as compared to ad libitum feeding (25 min session with 5 alternating on-off blocks: ad libitum Control = 692 and Rostral ChrimsonR= 1280 average total licks per session, see Figure 2g-h and Supplementary Figure S7d; food deprived Control =2428 and Rostral ChrimsonR =2390 total licks averaged for N=9 Control, N= 12 Rostral). Moreover, similar to ad libitum feeding, optogenetic activation of rostral D1-SPNs suppressed licking in food-deprived mice , albeit to a lesser extent than under ad libitum feeding conditions (Figure 2).

      These preliminary observations suggest that internal state modulates the role of rostral D1-SPNs in reward consumption, potentially reflecting an interaction between homeostatic and hedonic feeding circuits. However, as this line of investigation was exploratory and not pursued further in the present study, these data are not included in the main manuscript.

      Author response image 1.

      In vivo optogenetic stimulation of rostral medNAcSh inhibits reward consumption to a lesser extent after overnight food deprivation. a. Quantification of the average lick count per 5 min block in mCherry control mice vs. ChrimsonR (rostral) mice, showing a lower lick count in rostral medNAcSh ChrimsonR mice during the opto-stimulation epoch. Blocks of 5 min with or without opto-stimulation were alternated (on/off/on/off/on) for a total of 5 blocks. b. Quantification of mean lick counts in the opto-stimulation vs. non-opto-stimulation epochs shows a significant decrease in lick counts following stimulation of rostral medNAcSh D1-SPNs and no significant difference in the control mice. 2-way RM-ANOVA (group x epoch). Main effects: epoch F (1, 28) = 6.027, p=0.0206; group F (2, 28) = 1.448, p=0.2520; group x epoch F (2, 28) = 8.123, p=0.0017. Sidak post-hoc opto-stimulation vs. non opto-stimulation: Control on vs. off t(28) = 1.856, p=0.2061; Rostral medNAcSh on vs. off t(28) = 3.054, p= 0.0147. N=9 for Control mCherry; N=12 for Rostral medNAcSh ChrimsonR. c. Pie charts showing % of mice showing food intake inhibition (mean Δlick counts non-opto/opto>0) in each group: 42% of ChrimsonR rostral medNAcSh mice, 20% of controls. Data is mean ± SEM. *p<0.05; **p<0.01; ***p<0.001.

      (3) Molecular profiling:

      (a) Stard5 expression is present in both D1- and D2-SPNs; comparisons to bulk calcium signals and quantification of percentages across rostral and caudal cells would be helpful. The authors should establish whether these cells also express SerpinB2, an established marker of LH projecting neurons.

      We thank the Reviewer for this relevant point. In the photometry experiments (Figure 7) using Stard5-Flp mice, we acknowledge that the recorded signals reflect a mixed population of D1- and D2-SPNs. Based on quantification in a separate set of brains, we estimate that Stard5 is expressed in a variety of cell types, of which 35% are D1-SPNs and 30% are D2-SPNs (Supplementary Figure S3). While Liu et al. 2024 reported no overlap between Stard5 and Drd2, canonical marker for D2-SPNs, available transcriptomic data (Chen et al. 2021) and our own histological and RNA-based analyses (Figure 6 and Supplementary Figure S3) found Stard5 to be expressed in both D1-SPNs and D2-SPNs. Hence, indeed, Stard5 is a mixed population.

      We provide here the quantification of percentages of Stard5 expression across rostral and caudal cells: for instance, in the dorsal rostral medNAcSh, 79% of D1-SPNs and 76% of D2-SPNs express Stard5; in the ventral rostral medNAcSh the percentages are 47% and 55%, whereas the same percentages drop to 39 and 31% in the dorsal caudal medNAcSh and 15% and 20% in the ventral caudal medNAcSh.

      As suggested by the Reviewer, we also performed further analysis of the publicly available scRNA-seq dataset from Chen et al. 2021, which shows that 4.4% of all Stard5-expressing cells are also Serpinb2+, while 1.8% of all sequenced NAc cells are Stard5+/Drd1+/Serpinb2+ and 0.21% are Stard5+/Drd2+/Serpinb2+.

      (b) Verification of the Stard5-2A-Flp line (specificity, overlap with immunomarkers) should be documented more thoroughly.

      We agree with the Reviewer that a more detailed characterization of the Stard5-2A-Flp mouse line would be relevant for the validation of the line.

      In our study, we identified Stard5 as a marker gene that enables selective targeting of the rostral medNAcSh, as it is strongly enriched in the rostral medNAcSh (Figure 5-7). Stard5-Flp mice injected with Flp-dependent AAV in rostral medNAcSh, NAc core and dorsal striatum show specific AAV expression only in the rostral medNAcSh (Figure 7).

      Moreover, we show that the line is specific as injection of a Flp-dependent AAV in a Stard5-Flp negative line does not lead to expression (Figure 7c).

      However, re-analysis of the published scRNA-seq dataset (Chen et al. 2021) indicates that Stard5<sup>+</sup> cells comprise a heterogeneous population, including D1-SPNs (~35%), D2-SPNs (~30%), local interneurons (~18%), glial cells (~12%), and other cell types (Suppl. Fig. S3).

      Together, these data validate the Stard5-2A-Flp line as a spatially specific genetic entry point for the rostral medNAcSh, while highlighting the cellular heterogeneity of Stard5-expressing cells. Given the limited brain material left, we were not able to add additional colocalization analyses with immunomarkers, but agree this would be important to include in future studies.

      (c) The molecular analysis is restricted to a small set of genes; broader spatial transcriptomics could uncover additional candidate markers. See also above.

      We thank the Reviewer for this suggestion. Broader spatial transcriptomic analyses would indeed be highly valuable for identifying additional candidate markers. Our aim for the present study was to identify molecular landmarks to selectively target the rostral medNAcSh, but in a future study, we would be highly interested in building on our initial findings and providing an exhaustive molecular characterization of the region using spatial transcriptomics. We would be particularly motivated to do so, given the important functional specificity of the rostral NAcSh identified in the present publication.

      Reviewer #2 (Public review):

      Summary:

      Marinescu et al. combine in vivo imaging with circuit-specific optogenetic manipulation to characterize the anatomic heterogeneity of the medial nucleus accumbens shell in the control of food intake. They demonstrate that the inhibitory influence of dopamine D1 receptor-expressing neurons of the medial shell on food intake decreases along a rostro-caudal gradient, while both rostral and caudal subpopulations similarly control aversion. They then identify Stard5 and Peg10 as molecular markers of the rostral and caudal subregions, respectively. Through the development of a new mouse line expressing the flippase under the promoter of Stard5, they demonstrate that Stard5-positive neurons recapitulate the activity of D1positive neurons of the rostral shell in response to food consumption and aversive stimuli.

      We thank Reviewer 2 for the positive feedback, summary of our findings and for the thorough reading and constructive comments on the manuscript, which allowed us to improve the quality of the revised version.

      Strengths:

      This study brings important findings for the anatomical and functional characterization of the brain reward system and its implications in physiological and pathological feeding behavior. It is a well-designed study, technically sound, with clear and reliable effects. The generation of the new Stard5-Flp line will be a valuable tool for further investigations. The paper is very well written, the discussion is very interesting, addresses limitations of the findings, and proposes relevant future directions

      We thank Reviewer #2 for their supportive feedback.

      Weaknesses:

      At this stage, identification and characterization of the activity of Stard5-positive neurons is a bit disconnected from the rest of the paper, as this population encompasses both D1- and D2-positive neurons as well as interneurons. While they display a similar response pattern as D1-neurons, it remains to be determined whether their manipulation would result in comparable behavioral outcomes.

      We agree that this represents an important limitation of the current study. In our search for molecular markers of the rostral feeding hotspot, we identified Stard5 as a marker enriched in the rostral medNAcSh; however, Stard5 labels a heterogeneous population that includes D1- and D2-SPNs as well as other cell types. While Stard5<sup>+</sup> neurons display activity patterns similar to D1-SPNs, we acknowledge that whether their direct manipulation would produce comparable behavioral effects to D1-SPNs remains to be determined. Moreover, it remains to be determined how the activity and function of Stard5<sup>+</sup> neurons compares to D2-SPNs.

      To specifically isolate Stard5<sup>+</sup> D1-SPNs, we generated a Stard5-Flp;Drd1-Cre mouse line via breeding. However, the 4 CreON/FlpON AAVs which we tested exhibited leaky expression, including ectopic expression in Cre-positive but Flp-negative cells. This prevented reliable, cell-type-specific manipulation. We are actively working to overcome this common technical limitation of Flp/Cre AAVs, and these experiments will be addressed in a future study.

      Recommendations for the authors:

      Editor's note:

      Readers would also benefit from coding individual data points by sex and noting N/sex in the figure legends.

      We thank the editor for the note, we have noted in each figure legend the N and sex of the mice.

      Reviewer #1 (Recommendations for the authors):

      (1) Integration of results: The manuscript reads as two partly disconnected halves (functional gradient vs. molecular profiling). A more precise articulation of how the molecular findings (Stard5, Peg10) directly relate to the functional data would improve coherence.

      We thank the Reviewer for raising this important point. We agree that clearer integration between the functional gradient and the molecular findings would strengthen the manuscript. In the present study, Stard5 and Peg10 are not introduced as mechanistic drivers of behavior, but as molecular landmarks that map onto the functional rostro-caudal organization of the medNAcSh.

      Stard5 expression is enriched in the rostral medNAcSh, where we identify a functional hotspot for rewardrelated feeding, whereas Peg10 marks more caudal territories. Thus, the molecular profiling provides an independent axis that aligns with and supports the functional gradient revealed by photometry and optogenetic experiments. Whether these genes themselves contribute causally to feeding or aversive behaviors remains an open and interesting question for future studies.

      To improve clarity, we have explicitly articulated this link in the Discussion:

      “Importantly, our results indicate that spatial organization also defines functional specialization in the medNAcSh, and that molecular markers such as Stard5 provide access to these spatially defined subterritories rather than labeling a single, homogenous neuronal subtype.“

      “Having established a robust functional dichotomy of D1-SPNs along the rostro-caudal axis in reward consumption, we next asked whether this functional organization is mirrored by differences in molecular composition across the medNAcSh. Using multiple anatomical techniques, we find strong differences in the molecular composition of the rostral vs. caudal medNAcSh, which in turn could explain behavioral differences between these brain subregions.”

      “This makes Stard5 a spatial molecular landmark that captures the cellular ensemble of the rostral feeding hotspot, rather than a marker defining a single functional cell class. It is interesting that Stard5, a STARTdomain protein implicated in cholesterol metabolism and cellular stress responses (Alpy and Tomasetto, 2005; Rodriguez-Agudo et al., 2012; Calderon-Dominguez et al., 2014), and Peg10, an imprinted gene with roles in embryonic development and cancer (Mou et al. 2025), mark distinct rostro-caudal domains of the medNAcSh. Whether these genes themselves causally contribute to appetitive and consummatory behaviors, or aversive processing in this region remains an important question for future studies.”

      (2) Injection site specificity: Given prior work on NAc manipulations, it is essential to ensure precise targeting. Representative images from both rostral and caudal placements, including verification of fiber/injection confinement, would increase confidence.

      We thank the Reviewer for this important point regarding injection site specificity. Optic fiber placement was validated by identifying the coronal section in which the fiber tip was centered and aligning it to the mouse brain atlas (Franklin and Paxinos, The Mouse Brain in Stereotaxic Coordinates). We validated currently a total of 14 brains, shown in the newly added Supplementary Figure S10.

      The primary source of variability across animals could be the extent of the viral spread and the size of the optic implants, which were 400 for photometry experiments and 200 μm for the optogenetic studies. We acknowledge that this limits the spatial precision with which the individual subregions can be isolated. This limitation is explicitly discussed in the manuscript.

      Importantly, despite this limitation, we detected robust and reproducible differences between rostral and caudal medNAcSh in reward-consumption photometry and optogenetic assays. This argues against injection site proximity or fiber misplacement being a major confounding factor for the main conclusions. Nonetheless this comment is a valid point, and in future studies we plan to establish targeting methods with reduced viral volumes and/or tapered optic fibers (Pisanello et al. 2017). This will allow finer spatial restriction and more precise dissection of medNAcSh subregions.

      (3) Minor clarifications:

      (a) Provide explicit definitions of "rostral" and "caudal" coordinates.

      We adjusted Figure 1 and added the coordinates.

      (b) Consider alternative wording to "gradient" since only two rostro-caudal positions are tested.

      RNA-seq and MERFISH data indicate that molecular markers in the NAcSh are organized along a continuous rostro–caudal gradient rather than discrete boundaries (Chen et al. 2021; Stanley et al. 2020). Our use of the term ‘gradient’ therefore reflects this established molecular organization, even though our functional experiments sampled two representative positions along this continuum.

      We added the following sentence in the discussion for clarification:

      “Of note, in this paper we decided to use the term “rostro-caudal gradient”, motivated by converging evidence from prior pharmacological studies (see below) and scRNA sequencing data (Chen et al., 2021; Stanley et al., 2020), which show continuous molecular and functional changes along the rostro-caudal axis of the medNAcSh rather than sharply defined boundaries. Our use of the term ‘gradient’ therefore reflects this established molecular organization, even though our functional experiments sampled only two representative positions along this continuum.”

      (c) Enhance representative images (e.g., stronger DAPI, zoom-ins, bregma coordinates).

      To improve clarity, we have adjusted Figure 1 by adding schematic representations including stereotaxic surgery coordinates, which facilitate interpretation of rostro–caudal targeting.

      (d) Report trial numbers in figure legends, injection site details (e.g., S1 mouse), learning curves, and rationale for low-pass filtering in photometry.

      We thank the Reviewer for these suggestions. The average number of successful trials is now reported in the figure legends (Figure 1 and Figure 7). Injection site details are described in the Methods and are now also illustrated in Figure 1a and validated in Supplementary Figure S10. In addition, we have added Supplementary Figure S8 showing the learning curves of the Drd1-Cre and Stard5-Flp mice included in this study.

      Regarding the low-pass filtering in photometry analysis: low-pass filtering (1 Hz) was applied to the signal to remove high-frequency noise and isolate slow calcium-dependent fluorescence fluctuations that reflect population-level neural activity as we have done before (Labouesse et al. 2023, 2024). Low-pass filtering is a commonly-used analysis in fiber photometry and often shows a better artifact-corrected signal (Zhang et al. 2023; Keevers and Jean-Richard-dit-Bressel 2025).

      Reviewer #2 (Recommendations for the authors):

      Major Comments:

      (1) As mentioned, I find the part on Stard5-positive neurons a bit disconnected. Ideally, as mentioned in the discussion, the author could cross Stard5-Flp mice with D1-cre to selectively monitor and/or manipulate these neurons. Alternatively, do they have any data regarding D2-positive neurons of the rostral part to show whether they behave differently from D1-positive neurons?

      We thank the Reviewer for this suggestion and agree that selectively monitoring or manipulating Stard5<sup>+</sup> D1-SPNs using an intersectional approach would strengthen the link between the molecular and functional findings. We are pursuing this strategy by crossing Stard5-Flp mice with Drd1-Cre mice; however, as noted above, currently available CreON/FlpON viral tools exhibited leaky expression (a commonly known problem for such AAVs), preventing reliable cell-type–specific targeting. As a result, these experiments are ongoing (including reducing the titers) and will be addressed in a future study.

      At present, we do not have equivalent functional data for D2-SPNs in the rostral medNAcSh. Investigating whether rostral D2-SPNs behave differently from caudal D2-SPNs is an important and interesting question, which we hope to address in a future study. This limitation is acknowledged in the discussion.

      (2) Do the authors have any data on locomotor activity when they manipulate D1-expressing neurons? Lower food consumption as well as lower activity in the stimulated compartment - interpreted as aversion - could be related to diminished locomotor activity.

      We thank the reviewer for the relevant point about locomotion. We ran new analyses of locomotor activity during the feeding task (operant boxes) using a machine-learning model. A small subset of frames (136 frames from 10 video recordings) was manually annotated to define the animal’s body center and nose, as well as the four corners of the operant box. These annotations were used to train a YOLO (Redmon et al. 2015)-based pose estimation model. Locomotion metrics, such as total distance moved were subsequently derived from the temporal integration of positional data and aligned to opto-on and opto-off epochs of the feeding task. During licking periods, the animal’s body center remains largely stationary, which could lead to an overestimation of immobility. Nevertheless, we quantified the total distance traveled in the entire operant box across epochs, shown in Supplementary Figure S9 a-b. In our proof-of-concept experiment (Figure 2c-e), locomotion was increased in rostral ChrimsonR mice compared to controls (Supplementary Figure S9a), a similar effect seen with chemogenetic activation of D1-SPNs (Zhu, Ottenheimer, and DiLeone 2016). In our full experimental cohort, locomotion did not differ between control, rostral and caudal ChrimsonR mice across laser on and laser off epochs. These results indicate that reduced reward consumption during stimulation of rostral D1-SPNs is not due to decreased locomotor activity. Notably, whereas the inhibitory effect on consumption is specific to rostral D1-SPNs activation, locomotor effects are similar for both rostral and caudal D1-SPNs stimulation, indicating they are at least partly dissociated from one another.

      Moreover, in the RTPPA task, it is accepted that the percentage of time spent in the light-paired chamber reflects the preference or aversiveness to optogenetic stimulation. We additionally quantified total distance traveled (Supplementary Figure S9c). While optogenetic stimulation of both rostral and caudal D1-SPNs reduced time spent in the light-paired chamber (Figure 4), total distance traveled was unchanged, indicating that the observed aversion is not due to reduced locomotion.

      We added the following to the Results section: “To determine whether the reduced reward consumption observed in Rostral ChrimsonR mice could be explained by changes in locomotion, we quantified the total distance traveled during this task. Optogenetic stimulation led to an increase in locomotion in the small cohort of Rostral ChrimsonR mice in the reward consumption experiment shown in Figure 2d-e (Supplementary Figure S9a), while no change in locomotion was observed across epochs in mCherry controls, ChrimsonR Rostral and Caudal mice (Supplementary Figure S9b, related to Figure 2g-i)”

      And

      “Quantification of locomotion showed no reduction in distance traveled in the light-paired chamber (Supplementary Figure S9c), indicating that the avoidance was not driven by impaired locomotion. These data indicate that medNAcSh D1-SPNs generally promote aversion without affecting locomotion and without major differences along the rostro-caudal axis”

      Additionally, we added the following sentence to the Discussion: “Importantly, our behavioral effects of rostral D1-SPNs in the reward consumption and RTTPA assays could not be explained by reduced locomotor activity. Indeed, optogenetic stimulation of D1-SPNs during the reward consumption task did not reduce locomotion; instead, locomotion was either unchanged or increased in a small cohort of Rostral ChrimsonR mice. The increased locomotion likely reflected appetitive behavior and is consistent with past chemogenetic studies (Zhu et al., 2016). In the RTTPA no locomotion differences were detected.“

      (3) It would be useful to provide a schematic (or pictures) for the location of fiber implantation in all animals for both photometry and optogenetics.

      We validated optic fiber placement in 14 animals by identifying the coronal section in which the fiber tip was centered and aligning this section to the mouse brain atlas (Franklin and Paxinos, The Mouse Brain in Stereotaxic Coordinates). Representative optic fiber placement and viral spread are shown in the newly added Supplementary Figure S10.

      Minor Comments:

      (1) Figure 6e and g seem mislabeled: "Drd1+ (D2-SPNs)".

      Yes, thank you. We corrected it.

      (2) Line 395-397: the authors mention Flp minimal Flp Leakage, but could it be low activity of Stard5 promoter in the core and dorsal striatum that allows little expression of the flippase that could be sufficient for recombination?

      We thank the Reviewer for this insightful point. We cannot fully distinguish between these possibilities in the current study; however, the overall recombination outside the target region remains minimal, supporting the utility of the Stard5-Flp line for selective targeting of the rostral medNAcSh. Injection of a Flp-dependent AAV into the lateral shell, core and dorsal striatum showed no expression, therefore we think this is unlikely. Moreover, this aligns with Stard5 expression patterns derived from the scRNAseq data (Chen et al. 2021), Allen Brain Atlas quantifications (Figure 5) and our RNAscope analysis (Figure 6). Nevertheless, we acknowledge that histology alone cannot definitively exclude this possibility, and quantitative approaches such as qPCR would be required.

      References

      Alpy, Fabien, and Catherine Tomasetto. 2005. “Give Lipids a START: The StAR-Related Lipid Transfer (START) Domain in Mammals.” Journal of Cell Science 118(13):2791–2801. doi:10.1242/jcs.02485.

      Calderon-Dominguez, Maria, Gregorio Gil, Miguel Angel Medina, William M. Pandak, and Daniel RodríguezAgudo. 2014. “The StarD4 Subfamily of Steroidogenic Acute Regulatory-Related Lipid Transfer (START) Domain Proteins: New Players in Cholesterol Metabolism.” The International Journal of Biochemistry & Cell Biology 49:64–68. doi:10.1016/j.biocel.2014.01.002.

      Chen, Renchao, Timothy R. Blosser, Mohamed N. Djekidel, Junjie Hao, Aritra Bhattacherjee, Wenqiang Chen, Luis M. Tuesta, Xiaowei Zhuang, and Yi Zhang. 2021. “Decoding Molecular and Cellular Heterogeneity of Mouse Nucleus Accumbens.” Nature Neuroscience 24(12):1757–71. doi:10.1038/s41593-021-00938-x.

      Domingues, Ana Verónica, Tawan T. A. Carvalho, Gabriela J. Martins, Raquel Correia, Bárbara Coimbra, Ricardo Bastos-Gonçalves, Marcelina Wezik, Rita Gaspar, Luísa Pinto, Nuno Sousa, Rui M. Costa, Carina Soares-Cunha, and Ana João Rodrigues. 2025. “Dynamic Representation of Appetitive and Aversive Stimuli in Nucleus Accumbens Shell D1- and D2-Medium Spiny Neurons.” Nature Communications 16(1):59. doi:10.1038/s41467-024-55269-9.

      Keevers, Luke J., and Philip Jean-Richard-dit-Bressel. 2025. “Obtaining Artifact-Corrected Signals in Fiber Photometry via Isosbestic Signals, Robust Regression, and DF/F Calculations.” Neurophotonics 12(02). doi:10.1117/1.NPh.12.2.025003.

      Labouesse, Marie A., Arturo Torres-Herraez, Muhammad O. Chohan, Joseph M. Villarin, Julia Greenwald, Xiaoxiao Sun, Mysarah Zahran, Alice Tang, Sherry Lam, Jeremy Veenstra-VanderWeele, Clay O. Lacefield, Jordi Bonaventura, Michael Michaelides, C. Savio Chan, Ofer Yizhar, and Christoph Kellendonk. 2023. “A Non-Canonical Striatopallidal Go Pathway That Supports Motor Control.” Nature Communications 14(1):6712. doi:10.1038/s41467-023-42288-1.

      Labouesse, Marie A., Maria Wilhelm, Zacharoula Kagiampaki, Andrew G. Yee, Raphaelle Denis, Masaya Harada, Andrea Gresch, Alina-Măriuca Marinescu, Kanako Otomo, Sebastiano Curreli, Laia Serratosa Capdevila, Xuehan Zhou, Reto B. Cola, Luca Ravotto, Chaim Glück, Stanislav Cherepanov, Bruno Weber, Xin Zhou, Jason Katner, Kjell A. Svensson, Tommaso Fellin, Louis-Eric Trudeau, Christopher P. Ford, Yaroslav Sych, and Tommaso Patriarchi. 2024. “A Chemogenetic Approach for Dopamine Imaging with Tunable Sensitivity.” Nature Communications 15(1):5551. doi:10.1038/s41467-024-49442-3.

      Liu, Yiqiong, Ying Wang, Zheng-dong Zhao, Guoguang Xie, Chao Zhang, Renchao Chen, and Yi Zhang. 2024. “A Subset of Dopamine Receptor-Expressing Neurons in the Nucleus Accumbens Controls Feeding and Energy Homeostasis.” Nature Metabolism 6(8):1616–31. doi:10.1038/s42255-02401100-0.

      Mou, Dachao, Shasha Wu, Yanqiong Chen, Yun Wang, Yufang Dai, Min Tang, Xiu Teng, Shijun Bai, and Xiufeng Bai. 2025. “Roles of PEG10 in Cancer and Neurodegenerative Disorder (Review).” Oncology Reports 53(5):1–9. doi:10.3892/or.2025.8893.

      O’Connor, Eoin C., Yves Kremer, Sandrine Lefort, Masaya Harada, Vincent Pascoli, Clément Rohner, and Christian Lüscher. 2015. “Accumbal D1R Neurons Projecting to Lateral Hypothalamus Authorize Feeding.” Neuron 88(3):553–64. doi:10.1016/j.neuron.2015.09.038.

      Pedersen, Christian E., Raajaram Gowrishankar, Sean C. Piantadosi, Daniel C. Castro, Madelyn M. Gray, Zhe C. Zhou, Shane A. Kan, Patrick J. Murphy, Patrick R. O’Neill, and Michael R. Bruchas. 2022. “Medial Accumbens Shell Spiny Projection Neurons Encode Relative Reward Preference.”

      Pisanello, Ferruccio, Gil Mandelbaum, Marco Pisanello, Ian A. Oldenburg, Leonardo Sileo, Jeffrey E. Markowitz, Ralph E. Peterson, Andrea Della Patria, Trevor M. Haynes, Mohamed S. Emara, Barbara Spagnolo, Sandeep Robert Datta, Massimo De Vittorio, and Bernardo L. Sabatini. 2017. “Dynamic Illumination of Spatially Restricted or Large Brain Volumes via a Single Tapered Optical Fiber.” Nature Neuroscience 20(8):1180–88. doi:10.1038/nn.4591.

      Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. “You Only Look Once: Unified, Real-Time Object Detection.”

      Requejo-Mendoza, Nikte, José-Antonio Arias-Montaño, and Ranier Gutierrez. 2025. “Nucleus Accumbens D2-Expressing Neurons: Balancing Reward and Licking Disruption through Rhythmic Optogenetic Stimulation” edited by J. M. Dominguez. PLOS ONE 20(2):e0317605. doi:10.1371/journal.pone.0317605.

      Rodriguez-Agudo, Daniel, Maria Calderon-Dominguez, Miguel Angel Medina, Shunlin Ren, Gregorio Gil, and William M. Pandak. 2012. “ER Stress Increases StarD5 Expression by Stabilizing Its MRNA and Leads to Relocalization of Its Protein from the Nucleus to the Membranes.” Journal of Lipid Research 53(12):2708–15. doi:10.1194/jlr.M031997.

      Stanley, Geoffrey, Ozgun Gokce, Robert C. Malenka, Thomas C. Südhof, and Stephen R. Quake. 2020. “Continuous and Discrete Neuron Types of the Adult Murine Striatum.” Neuron 105(4):688-699.e8. doi:10.1016/j.neuron.2019.11.004.

      Zhang, Yan, Márton Rózsa, Yajie Liang, Daniel Bushey, Ziqiang Wei, Jihong Zheng, Daniel Reep, Gerard Joey Broussard, Arthur Tsang, Getahun Tsegaye, Sujatha Narayan, Christopher J. Obara, JingXuan Lim, Ronak Patel, Rongwei Zhang, Misha B. Ahrens, Glenn C. Turner, Samuel S. H. Wang, Wyatt L. Korff, Eric R. Schreiter, Karel Svoboda, Jeremy P. Hasseman, Ilya Kolb, and Loren L. Looger. 2023. “Fast and Sensitive GCaMP Calcium Indicators for Imaging Neural Populations.” Nature 615(7954):884–91. doi:10.1038/s41586-023-05828-9.

      Zhu, Xianglong, David Ottenheimer, and Ralph J. DiLeone. 2016. “Activity of D1/2 Receptor Expressing Neurons in the Nucleus Accumbens Regulates Running, Locomotion, and Food Intake.” Frontiers in Behavioral Neuroscience 10. doi:10.3389/fnbeh.2016.00066.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Weaknesses:

      (1) LH levels were not measured in many mice or in robust temporal detail, such as every 30 or 60 min, to allow a more detailed comparison between the fine-scale timing of RP3V neuron activation with onset and timing of LH surge dynamics.

      Please see “Recommendations for Authors” below.

      (2) The authors report that the peak LH value occurred 3.5 hours after the first RP3V kisspeptin neuron oscillation. However, it is likely, and indeed evident from the 2 example LH patterns shown in Figures 3A-B, that LH values start to increase several hours before the peak LH. This earlier rise in LH levels ("onset" of the surge) occurs much closer in time to the first RP3V kisspeptin neuron oscillatory activation, and as such, the ensuing LH secretion may not be as delayed as the authors suggest.

      Please see “Recommendations for Authors” below.

      (3) The authors nicely show that there is some variation (~2 hours) in the peak of the first oscillation in proestrus females. Was this same variability present in OVX+E2 females, or was the variability smaller or absent in OVX+E2 versus proestrus? It is possible that the variability in proestrus mice is due to variability in the timing and magnitude of rising E2 levels, which would, in theory, be more tightly controlled and similar among mice in the OVX+E2 model. If so, the OVX+E2 mice may have less variability between mice for the onset of RP3V kisspeptin activity.

      Please see “Recommendations for Authors” below.

      (4) One concern regarding this study is the lack of data showing the specificity of the AAV and the GCaMP6s signals. There are no data showing that GCaMP6s is limited to the RP3V and is not expressed in other Kiss1 populations in the brain. Given that 2ul of the AAV was injected, which seems like a lot considering it was close to the ventricle, it is important to show that the signal and measured activity are specific to the RP3V region. Though the authors discuss potential reasons for the low co-expression of GCaMP6 and kisspeptin immunoreactivity, it does raise some concern regarding the interpretation of these results. The low co-expression makes it difficult to confirm the Kiss1 cell-specificity of the Cre-dependent AAV injections. In addition, if GFP (GCaMP6s) and kisspeptin protein co-localization is low, it is possible that the activation of these neurons does not coincide with changes in kisspeptin or that these neurons are even expressing Kiss1 or kisspeptin at the time of activation. It is important to remember that the study measures activation of the kisspeptin neuron, and it does not reveal anything specific about the activity of the kisspeptin protein.

      Please see “Recommendations for Authors” below.

      (5) One additional minor concern is that LH levels were not measured in the ovariectomized females during the expected time of the LH surge. The authors suggest that the lower magnitude of activation during the LH surge in these females, in comparison to proestrus females, may be the result of lower LH levels. It's hard to interpret the difference in magnitude of neuronal activation between EB-treated and proestrus females without knowing LH levels. In addition, it's possible that an LH surge did not occur in all EB-treated females, and thus, having LH levels would confirm the success of the EB treatment.

      Please see “Recommendations for Authors” below.

      (6) This kisspeptin neuron peak activity is abolished in ovariectomized mice, and estradiol replacement restored this activity, but only partially. Circulating levels of estradiol were not measured in these different setups, but the authors hypothesize that the lack of full restoration may be due to the absence of other ovarian signals, possibly progesterone.

      Please see “Recommendations for Authors” below.

      (7) Recordings in several mice show inter- and intra-variability in the time of peak onset. It is not shown whether this variability is associated with a similar variability in the timing of the LH surge onset in the recorded mice. The authors hypothesized that this variability indicates a poor involvement of the circadian input. However, no experiments were done to investigate the role of the (vasopressinergic-driven) circadian input on the kisspeptin neuron activation at the light/dark transition. Thus, we suggest that the authors be more tentative about this hypothesis.

      Please see “Recommendations for Authors” below.

      Recommendations for the authors:

      (1) The study measured LH levels over time in just 5 female mice, a small sample size given the variability between mice. Having said that, n=5 is an OK starting point but the LH values are only shown for 2 mice, and there are no graphs or presentation of mean LH levels over time for all 5 mice. Figure 3 would greatly benefit from graphing and statistical analyses of the LH levels for all 5 mice (mean line graphs over time or similar). The authors report the mean "peak LH" level in the text, but it would be important to show and graph all the LH values over time (either by clock time or time relative to start of first RP3V oscillation or both), to allow the reader to compare the LH pattern to the RP3V kisspeptin neuron activity over time.

      We share the Reviewer’s frustration regarding the lack of detailed LH time points to correlate with the changes in GCaMP signal. Certainly, it was our intention to do better. However, with the benefit of actually being able to monitor surge progress through RP3V neuron activity in real time, we found that frequent blood sampling could often interfere with the normal dynamic of surge activity. One some occasions, the RP3V kisspeptin neuron oscillations would stop abruptly mid- or early-surge while on others it would stop and then start again. Knowing that this was not the normal profile, we resorted to taking as few blood samples as possible, trying primarily to get what we thought might be the “peak” LH surge level. We acknowledge that this is not ideal, and leaves open the important question around the precise relationship of the beginning of RP3V kisspeptin oscillations with LH secretion. Although not answering the question directly, this was part of the motivation for the last figure which emphasizes how the RP3V kisspeptin neuron activity and GnRH neuron dendron activity are essentially identical at the time of the surge. We have re-written the relevant section of the Discussion to be more circumspect.

      (2) The authors report and discuss that the peak LH value occurred 3.5 hours after the first RP3V kisspeptin neuron oscillation but it is likely, and indeed evident from the 2 example LH patterns shown in Figs 3A-B, that LH values start to increase several hours earlier, well before the peak LH. Thus, the rise in LH levels during the surge starts much closer in time to the first RP3V kisspeptin neuron oscillatory activation, which the authors don't analyze. For example, the 2nd LH value for the 2 representative mice shown in Figure 3 is notably higher than the 1st LH value of those mice, even though the peak value has not yet been attained. Even with the LH levels only being measured here every couple hours, this "first detected rise in LH" be at least be graphed and/or analyzed relative to the timing of kisspeptin neuron activity, and commented on in the Discussion.

      As above.

      (3) It is unclear if the variation (~2 hours) in the peak of the first oscillation in proestrus females is the same as in OVX+E2 females, or was the variability smaller or absent in OVX+E2 females versus proestrus? The variability observed in proestrus mice is likely due to variability in timing and magnitude of rising E2 levels, which would may be more tightly controlled and similar among mice in the OVX+E2 model. If so, the OVX+E2 mice might display less variability for the timing of the RP3V kisspeptin activity "onset". This measure would be important to analyze here and to discuss, given that many labs around the world often use an OVX+E2 model.

      This is an interesting point given the dogma surrounding the role of the SCN in initiating the surge. Three of the five OVX+E2 mice exhibited clearly discernible GCaMP oscillations that started at approximately noon, 1pm and 2pm. While this sample is very small, it does suggest that the onset of RP3V kisspeptin neuron activity is variable as found in proestrous mice. We have indicated this cautiously given the sample size.

      (4) If looking at kisspeptin immunoreactivity is problematic, is it possible to look at Kiss1 RNA levels or to look at Cre-recombinase protein levels? While the Cre-recombinase would just be a proxy for Kiss1/kisspeptin, it may result in higher expression and better co-localization with the GCaMP6s.

      Yes, RNAscope would likely be the ideal method to settle this long running issue of apparently poor Kiss-cre targeting in the RP3V. Unexpectedly, however, we found that the mCherry probe bound to Kiss1 in our attempts at an RNAscope evaluation. The use of Cre as a proxy for identifying kisspeptin neurons would almost certainly generate better co-localization as Cre is being used to target GCaMP.

      Minor

      (1) It was not clear in the manuscript how many cells were counted or contributed to the neuronal activation data. Is it the entire population of RP3V Kiss1 cells? Just a subset? How much variability is there in the number of cells measured/counted between animals? Presumably, the brains were extracted to confirm the placement of the optic fiber. Were there neuroanatomical studies also done on these animals to confirm how many cells express GFP (GCaMP6) and the correct placement and specificity of the AAV? Is there any potential that cells in the BnST or even the ARC took up the virus and were included in these measurements?

      It is very difficult if not impossible to establish just how many RP3V kisspeptin neurons contribute to the GCaMP population signal using fibre photometry. This will depend on levels of AAV transfection, distance from the optic fibre, and the numbers of RP3V kisspeptin neurons actually involved in the surge mechanism. Of note, C-Fos data suggest that only around one-third of RP3V kisspeptin neurons are activated at the time of the surge. All fibre placements were subsequently shown to be running alongside GCaMP-expressing AVPV/anterior periventricular nucleus cells (now noted), but the numbers of transfected cells were not quantified. As shown in Fig.4, the GCaMP signal was very similar across all mice suggesting little variation in the relationship between transduction, fibre placement and distance.

      The RP3V region is approximately 4-5 mm from the ARN. We felt that the possibility that an AAV injection in the RP3V would spill over into the ARN was so remote that we did not assess GCaMP expression in ARN kisspeptin neurons. We have previously determined for the ARN that recordable GCaMP fluorescence only occurs if the optic fibre is within 0.5 mm from GCaMP-expressing neurons. Ultimately, proof that we are not recording from ARN kisspeptin neurons comes from the very different activity patterns reported here for RP3V neurons compared to the kisspeptin pulse generator. We did not see any GCaMP expressed in the BNST.

      (2) If it is possible to measure LH levels in the EB surge animals, it would be helpful, at least to confirm that they did surge and to support the proposed idea that LH surge levels are lower in that model.

      Unfortunately, as acknowledged in the original text we did not take blood samples from these mice so do not have the data. However, as noted, other studies undertaken by us using the same EB surge paradigm show that peak LH levels are much lower compared to proestrus. In retrospect we do agree that this would have been useful and particularly to establish whether each mouse did show a surge as two of the OVX+EB mice failed to show typical surge-associated oscillations. We have noted this in the Discussion.

      (3) For Figure 4F, please add a gray shaded box to the graph to denote the "dark" period (lights off), as was done for Figures 2 and 3. This is important because Figure 4F is making the point that there is a consistent 90-minute oscillation event right before lights off, so it would be helpful to denote the period of lights off on the graph.

      There was in fact a very light grey shade, but we have now added a grey bar to make the dark period clearer.

      (4) The Title of the paper should include the brain region because this is specifically the RP3V (or preoptic area "POA") kisspeptin neurons that are studied, not other kisspeptin cell populations.

      We have added “preoptic area” to clarify

      (5) The graphs in Figure 3C-D are from different mice and address a different question than the graphs in Figure 3A-B. This was a bit confusing, and it is recommended that the LH + RP3V kisspeptin activity experiment (Figures 3A-B) be its own figure, and the graphs looking at the detailed oscillatory patterns in Figures 3C-D be their own figure, as the latter are addressing a different question and don't have any LH data.

      We have split the figure as requested.

      (6) The tiny font size of the X and Y axes of Figures 2 and 3 is very small and hard to read. Can this text please be increased in size a little? By comparison, the font size of the X and Y axes of Figure 4 is bigger and more legible.

      Changed.

      (7) In the methods for fiber photometry, there is a sentence saying "Twenty two-hour recordings were made..." This was confusing, as it read as if there were twenty 2-hour recordings, when in fact it was one 22-hour recording. The authors should reword or use "22-hour" in this sentence.

      Changed.

      (8) It's a bit hard to see the difference in color between proestrus 1 and proestrus 2 (both blues) in Figure 6, especially when they overlap. It might be helpful to select a different color for one of them.

      Changed.

      (9) Is the virus from Addgene or just the plasmid? Did Addgene insert the plasmid into the virus, or was that done elsewhere? For purposes of replication, it might be helpful to state the plasmid that was used and the virus that was used, and their origins (e.g., if made by Addgene or donated by another investigator). I was not able to find the virus based on the Addgene number in the manuscript and was getting plasmids with different Addgene #s.

      Apologies, the numbering was incorrect. We have now amended to 100842-AAV9 that was packaged by Addgene.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their study the authors investigated the F. graminearum homologue of the Drosophila Misato-Like Protein DML1 for a function in secondary metabolism and sensitivity to fungicides.

      Strengths:

      Generally, the topic of the study is interesting and timely and the manuscript is well written, albeit in some cases details on methods or controls are missing.

      Weaknesses:

      However, a major problem I see is with the core result of the study, the decrease of the DON content associated with deletion of FgDML1: Although some growth data are shown in figure 6 - indicating a severe growth defect - the DON production presented in figure 3 is not related to biomass. Also, the method and conditions for measuring DON are not described. Consequently, it could well be concluded that the decreased amount of DON detected is simply due to a decreased growth and specific DON production of the mutant remains more or less the same.

      To alleviate this concern, it is crucial to show the details on the DON measurement and growth conditions and to relate the biomass formation on the same conditions to the DON amount detected. Only then a conclusion as to an altered production in the mutant strains can be drawn.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. The point to point responds to the reviewer’s comments are listed as following.

      Comments to the revised manuscript:

      The authors carefully revised the manuscript and provided explanations for methods in several cases. However, there are still some problems - probably due to misunderstanding - that need revision.

      (1) A major problem of the first version of the manuscript was the lack of appropriate description of biomass analysis and the consideration of the respective results for evaluation of production of DON and other metabolites. Although the authors provide some explanation in the response to reviews, I could not find a corresponding explanation or description in the manuscript. It is not sufficient to explain the problem to me, but a detailed explanation and description of the method has to be provided in the manuscript along with the definition of one "unit of mycelium". It is still not entirely clear to me what such a "unit of mycelium" is.

      Please clarify this and any other uncertainties that were commented on by me and other reviewers in the manuscript, not only in the response to reviews. Also adjust the reference list accordingly.

      Thank you very much for your advice. We appreciate the reviewer’s continued attention to the potential impact of biomass differences on DON production, particularly in light of the reduced growth rate observed in the mutant strain.

      We acknowledge that the mutant exhibits slower growth compared to the wild-type strain. However, it is important to emphasize that the reduction in DON levels reported in this study cannot be attributed to decreased fungal biomass. In our experimental design, DON production was normalized to mycelial dry weight, and toxin levels are expressed as μg DON per g dry mycelium. Therefore, differences in total mycelial accumulation among strains were explicitly accounted for and eliminated during data analysis.

      By expressing DON production on a per-unit-biomass basis, the measured values reflect the intrinsic DON biosynthetic capacity of the mycelium rather than the overall growth rate or total biomass. Consequently, the observed reduction in DON content in the mutant indicates a genuine impairment in DON biosynthesis per unit of fungal biomass, rather than a secondary effect resulting from reduced mycelial growth.

      To avoid ambiguity, we have clarified this point in the revised manuscript by explicitly stating the normalization strategy and the definition of the mycelial unit in the Materials and Methods section, and by emphasizing in the Results/Discussion section that DON levels were compared on a biomass-normalized basis.

      We hope that this clarification adequately addresses the reviewer’s concern and clearly distinguishes growth-related effects from alterations in toxin biosynthesis.

      “DON toxin was measured using a Wise Science ELISA-based kit (Wise Science, Jiangsu, China) (Li et al., 2019; Zheng et al., 2018). Under toxin-producing conditions (28 °C, 145 rpm), fungal strains were cultured in TBI medium for 7 days. Cultures were initiated using freshly grown mycelia. After incubation, mycelia and culture filtrates were separated by filtration. The culture filtrates were collected for DON determination, while the mycelia were harvested for biomass analysis. The collected mycelia were washed with sterile distilled water and dried at 60 °C to constant weight. The dry weight of mycelia was recorded and used for normalization of DON production. One mycelial unit was defined as 1 g of dry mycelial biomass. DON concentration in the culture filtrates was quantified using an enzyme-linked immunosorbent assay (ELISA). Briefly, 50 μL of culture filtrate or DON standard solution was added to wells of a 96-well microplate pre-coated with DON antigen, followed by the addition of enzyme conjugate and antibody working solution according to the manufacturer’s instructions. After incubation and washing, color development was achieved using substrate solution and terminated by stop solution. Absorbance was measured at 450 nm using a microplate reader. A standard curve was generated using log<sub>10</sub>-transformed DON concentrations of the standards and the corresponding percentage absorbance values. DON concentrations in the samples were calculated based on the standard curve. Total DON production was calculated according to the culture volume (30 mL) and subsequently normalized to mycelial dry weight. DON production was expressed as μg DON per g dry mycelium. Each treatment group contains three biological replicates and three technical replicates.”

      (2) Another problem was, that the authors considered FgDML1 a regulator of DON production. As mentioned by me and reviewer 3, FgDML1 is crucial to numerous functions in F. graminearum and its lack causes a plethora of problems for fungal physiology. Hence, although it is clear that the lack of FgDML1 causes alterations in DON production, it is not appropriate to designate this factor as a "regulator".

      It seems to me that the authors are afraid that if FgDML1 would not be a "regulator" that this would decrease the value of their study, which is not the case. This is a matter of correct wording. Therefore, please revise the wording accordingly, starting with the title:

      ...FgDML1 impacts DON toxin biosynthesis...

      Moreover, for sure the manuscript might benefit from more detailed description of the whole cascade leading from FgDML1 to DON biosynthesis and production of the other metabolites that change upon deletion. Such explanation can help the reader grasp the relevance of FgDML for regulatory processes as well as on more general versus specific effects.

      Thank you very much for your advice. We fully agree that, given the pleiotropic functions of FgDML1 in F. graminearum and the broad physiological defects caused by its deletion, it is not appropriate to designate FgDML1 as a direct or specific “regulator” of DON biosynthesis.

      We acknowledge that the use of the term “regulator” in the previous version was imprecise. Following the reviewer’s suggestion, we have revised the wording throughout the manuscript to more accurately reflect the role of FgDML1. Specifically, we now describe FgDML1 as a factor that impacts or affects DON toxin biosynthesis rather than directly regulating it. The title has been revised accordingly to read:

      “Mitochondrial protein FgDML1 impacts DON toxin biosynthesis and cyazofamid sensitivity in F. graminearum by affecting mitochondrial homeostasis”

      Importantly, we would like to emphasize that our intention was not to overstate the specificity of FgDML1 in DON regulation, but rather to highlight its influence on secondary metabolism in the context of its broader biological functions. To address this more clearly, we have expanded the Discussion section to provide a more detailed and cautious interpretation of the potential cascade linking FgDML1 deletion to altered DON biosynthesis and changes in other metabolites.

      'Secondary metabolite biosynthesis is generally regarded as an energy-intensive process that is tightly coupled to cellular energy metabolism. ATP serves as the primary energy currency supporting enzymatic reactions, macromolecule synthesis, and subcellular organization required for secondary metabolism. Disruption of ATP generation has been shown to directly impair toxin biosynthesis: for example, silencing of ATP synthase subunit α (AtpA) significantly reduces ATP synthesis and inhibits the production of the TcdA and TcdB toxins(Marreddy et al., 2024). Similarly, in plants, ATP depletion leads to a metabolic shift in which growth and basic physiological processes are prioritized at the expense of energetically costly secondary metabolites, including toxins(Xiao et al., 2024). Together, these findings highlight ATP availability as a key determinant of secondary metabolite production across biological systems.

      In filamentous fungi, mitochondria play a central role in sustaining cellular ATP levels through oxidative phosphorylation and are therefore critical for biosynthetic and stress-adaptive processes. In F. graminearum, mutants defective in mitochondrial components, such as the voltage-dependent anion channel (mitochondrial porin), exhibit aberrant mitochondrial morphology, reduced ATP production, and markedly decreased DON accumulation and virulence (Han et al., 2022). These observations establish a direct link between mitochondrial energy metabolism and secondary metabolite output, supporting the notion that intact mitochondrial function and adequate ATP supply are prerequisites for robust DON production.

      Consistent with this energy-dependent framework, biosynthesis of the mycotoxin DON in F. graminearum requires substantial ATP input. In the present study, ATP content in the ΔFgDML1 mutant was significantly lower than in the wild-type PH-1 and the complemented strain ΔFgDML1-C, and DON production was concomitantly reduced (Fig. 4A). Importantly, DON levels were normalized to mycelial dry weight, indicating that the observed reduction reflects a decreased biosynthetic capacity per unit biomass rather than a secondary consequence of reduced fungal growth. This distinction demonstrates that impaired DON production in the ΔFgDML1 mutant arises primarily from metabolic limitations.

      At the cellular level, ATP depletion compromises multiple energy-dependent steps required for DON biosynthesis. The formation of toxisomes, which are specialized subcellular structures responsible for the spatial organization of DON biosynthetic enzymes, is essential for efficient mycotoxin production and is an ATP-dependent process. Reduced ATP levels disrupt toxisome assembly, and accordingly, the ΔFgDML1 mutant was unable to form functional toxisomes (Fig. 4C). In parallel, western blot analysis revealed a marked reduction in the abundance of the DON biosynthetic enzyme FgTri1 (Fig. 4D). In addition, ATP-dependent processes are directly involved in the biogenesis of the DON biosynthetic machinery: the ATPase activity of myosin I (FgMyo1) is required for efficient translation of key DON biosynthetic enzymes, and disruption of its ATPase function results in reduced DON production(Tang et al., 2018). These findings further underscore the dependence of DON biosynthesis on cellular energy status.

      DON production is also regulated at the transcriptional level by the TRI gene cluster, with Tri5 and Tri6 serving as core components of the biosynthetic pathway. Tri5 encodes trichodiene synthase, which catalyzes the first committed step of DON biosynthesis. In the ΔFgDML1 mutant, expression levels of FgTri5 and FgTri6 were significantly downregulated (Fig. 4B), suggesting that impaired energy metabolism indirectly affects transcription of DON biosynthetic genes. Although no direct regulatory role of DML family proteins in gene expression has been reported in Saccharomyces cerevisiae or Drosophila melanogaster, their established functions in cell division and microtubule organization raise the possibility that FgDML1 indirectly influences gene expression through effects on chromatin organization or cell-cycle progression(Schulze and Wallrath, 2007).

      In addition to reduced ATP levels, deletion of FgDML1 resulted in a significant decrease in acetyl-CoA content (Fig. 5C), a key precursor for trichothecene biosynthesis. Acetyl-CoA links central carbon metabolism with secondary metabolite production, and its depletion further constrains DON biosynthesis by limiting substrate availability. Broader metabolomic studies support this relationship, showing that perturbations in TCA cycle intermediates and central carbon metabolism are closely associated with altered DON production, reinforcing a mechanistic linkage between energy generation and toxin biosynthesis(Atanasova-Penichon et al., 2018).

      “Taken together, these results support a model in which FgDML1 influences DON production indirectly by maintaining mitochondrial energy metabolism. Reduced ATP availability in the ΔFgDML1 mutant restricts energy-dependent biosynthetic processes, disrupts toxisome formation, diminishes DON biosynthetic enzyme abundance and gene expression, and limits precursor supply, ultimately leading to a substantial reduction in DON biosynthesis that is independent of fungal biomass effects.” (in L284-350). In this revised discussion, we explicitly distinguish between general physiological effects caused by the loss of FgDML1 and more specific consequences on secondary metabolic pathways.

      We believe that this revised wording and the expanded mechanistic discussion more accurately reflect the biological role of FgDML1 and improve the conceptual clarity of the manuscript, without overstating its function as a dedicated regulator of DON production.

      Reviewer #2 (Public review):

      Summary:

      The manuscript entitled "Mitochondrial Protein FgDML1 Regulates DON Toxin Biosynthesis and Cyazofamid Sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" identified the regulatory effect of FgDML1 in DON toxin biosynthesis and sensitivity of Fusarium graminearum to cyazofamid. The manuscript provides a theoretical framework for understanding the regulatory mechanisms of DON toxin biosynthesis in F. graminearum and identifies potential molecular targets for Fusarium head blight control. The paper in innovative, but there are issues in the writing that need to be added and corrected.

      Comments on revisions:

      The author has addressed my questions.

      We appreciate it very much that you spent much time on my paper and give me good suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the molecular dynamics of MinD, a component of the Bacillus subtilis Min system, in vitro and in vivo. In Escherichia coli the Min system is highly dynamic and displays rapid pole-to-pole oscillation whereby a time average minimum of the Min proteins at mid-cell is established. However, in B. subtilis, this is not the case, and there is no MinE present. MinD in B. subtilis dynamically relocalizes from the poles to division sites and binds to MinC and MinJ, which mediates its interaction with DivIVA. This paper reports the biochemical characterization of B. subtilis MinD in vitro and dynamics of MinD variants in vivo, providing mechanistic insight into the mechanism of dynamic localization.

      Strengths:

      In the current study, the authors perform a detailed biochemical characterizion of the in vitro ATPase activity of MinD and demonstrate that rapid hydrolysis is elicited by adding phospholipids. They further show using a collection of substitution mutants of MinD that both monomers and dimers bind to the membrane, and ATP occupancy changes the on and off rates. Identification, quantification, and tracking of discrete Halo-MinD populations were nicely done and showed that mutations in MinD alter dynamic localization, correlating with PL binding on and off rates in vitro.

      Weaknesses:

      While the study shows that MinD in B. subtilis utilizes a different (MinE-independent) activation mechanism, it remains to be determined the extent to which MinJ and/or MinC play a role.

      Reviewer #2 (Public review):

      Summary:

      Feddersen & Bramkamp determined important characteristics of how MinD protein binds/dissociates to/from the membrane, and dimerizes in relation to its ATPase activity. The presented data clearly shows the differences in function of MinD homologs from B. subtilis and E. coli.

      Strengths:

      The work presents well-executed experiments that lead to interesting conclusions and a new model of how Min system works during B. subtilis mid-cell division. Importantly, this model is supported by in vitro characterization of well-chosen mutants in the functional domains of MinD. Outstandingly, most of the in vitro data are confirmed by single-molecule localization microscopy.

      Weaknesses:

      The authors immobilized liposomes, for which they used E. coli total lipids, to measure ATPase activity and liposome association and dissociation of B. subtilis MinD. For these experiments would be more suitable to use B. subtilis total lipids as more biologically relevant data could be gained. Although the work is in detail and nicely compares the function of B. subtilis Min system with E. coli Min system, it lacks the comparison of the Min system function in other rod-shaped Gram-positive bacteria. I would suggest including in the Discussion the complexity of other Min systems. Especially, this complexity is seen in other rod-shaped and spore formers such as Clostridial species in which one of these Min systems or both are present, an oscillating E. coli Min system type and more static as in B. subtilis.

      Reviewer #3 (Public review):

      Experimentally, this study provides sufficient data to support the authors' conclusion that MinD dimerization but not ATPase activity is both necessary and sufficient for concentrating it and its binding partner, the division inhibitor MinC, at cell poles. Biochemical data appears to be rigorously acquired and includes proper controls. Although cytological data are consistent with the authors' model, quantitative information on MinD localization in a statistically relevant set of cells is missing (e.g. Figure 2B).

      The study's other major conclusion, as outlined in their discussion, that a reaction-diffusion model explains MinD localization in wild-type cells, is unsubstantiated. If they would like to make this a major conclusion of the final manuscript, they will need to include modeling that takes into account biochemical and cytological data. From a presentation perspective, the manuscript is challenging to read and will require substantial rewriting and revision prior to publication.

      We thank the reviewers for their detailed and constructive comments on our work. We particularly acknowledge that the initial version of our manuscript was difficult to read and might have provoked the impression that the aim was to formulate a new mathematical model of Min dynamics in B. subtilis. However, our work aimed at providing solid (and first) biochemical evidence for the MinD ATPase cycle and the nature of the ATPase stimulation. Furthermore, we aimed at corroborating the in vitro findings with single-molecule microscopy data that provided a detailed in vivo picture of the Min dynamics in living cells. Together, this work combines for the first time in vitro and single-molecule in vivo data. During the revision, we generated a wealth of new data that aimed at unraveling the potential effects of MinC and MinJ on MinD dynamics. A major problem during the revision was the problematic purification of MinJ. The membrane integral MinJ has been shown to be highly susceptible to proteolytic decay during purification attempts. Despite various attempts we did not succeed in the purification of full length MinJ. These efforts also led to the unusual long revision time. We therefore turned to the purification of the soluble part of MinJ, namely the PDZ domain. The revised work now contains in vitro data showing the impact of MinC and MinJ-PDZ on MinD ATPase activity and membrane binding. Furthermore, we now provide single-molecule tracking data of MinD in minC and minJ deletion mutant backgrounds. Importantly, the new data show that MinC has no effect on MinD activities, while the PDZ domain has a mild stimulating effect on MinD´s ATPase activity. In summary, a detailed picture on how MinD dynamics function mechanistically in B. subtills emerges.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It is important to evaluate MinD ATPase activity, PL binding, and release in the presence of MinC and MinJ. In E. coli, MinD recruits MinC to phospholipids. The presence of MinC could change the on/off rates. It is unknown if MinC or MinJ could alter the ATPase rates or dynamics. Presuming that MinD alone drives the complete dynamic story because stimulation is observed in vitro with phospholipids, it follows that Michaelis Menten kinetics is insufficient. It is acknowledged that MinJ is difficult to purify, but one could test a small cytoplasmic subdomain or MinJ-enriched membranes for MinD recruitment and release.

      Indeed, it is unknown whether MinC or MinJ have an impact on the ATPase rates or protein dynamics of MinD in B. subtilis. To address the potential influence of MinC and MinJ on MinD’s ATPase activity and dynamics, we conducted a series of experiments. MinC was successfully purified, and subsequent BLI and ATPase assays revealed no significant impact on MinD activity in our system, except for a modestly reduced ATPase activity (Figure S 5).

      With regard to MinJ, multiple constructs and purification strategies were attempted. While full-length MinJ could not be purified, we isolated the C-terminal PDZ domain to probe potential interactions. In ATPase assays, the PDZ domain reproducibly increased MinD ATP hydrolysis rates, whereas BLI measurements did not reveal detectable changes in MinD membrane-binding kinetics under these conditions. We agree with the reviewer that membrane-integrated MinJ could exert additional effects on MinD recruitment or release that are not captured by the isolated PDZ domain, and we now discuss this limitation in the revised Discussion.

      Furthermore, we performed single-molecule localization and tracking analyses of MinD in ∆minC and ∆minJ backgrounds. These experiments, found in a newly added Results section and summarized in Fig. S 12, demonstrate that MinJ appears to play a role in maintaining dynamic MinD membrane cycling and preventing excessive confinement or aggregation, whereas MinC has no obvious effect on MinD dynamics.

      (2) It is important to show the reduced ATP hydrolysis by MinD mutant proteins (line 243). Stating that they are catalytically inactive without showing the data is presumptuous, and there may be differences between the mutants. Although I am sure that the authors evaluated activity with phospholipids, it should be shown.

      We have now quantified the ATPase activity for all MinD mutants from the respective EnzChek assay data. These experiments confirm that the G12V, K16A, and D40A mutations effectively abolish catalytic activity, yielding phosphate release rates that are essentially at the background detection limit of the assay. We have included these data in Figure S 7 C and updated the text to reflect these findings.

      (3) The shoulder on MinD-K16A suggests that it is capable of forming a dimer at low equilibrium. The suggestion that it is due to interaction with the inert SEC matrix (line 242) raises more concerns, although this is highly unlikely, given that G12V elutes as a single peak. The possibility of a dimer here also demonstrates the necessity of reporting precise ATPase rates for the mutants.

      Thank you for this comment. Since we shared some of your concerns, we made sure to gather enough evidence before making the respective claims. We conducted both in vivo (single-particle tracking, widefield microscopy) experiments and in vitro experiments with the respective K16A mutant of MinD. Most convincingly, K16A is completely catalytically inactive (see previous answer), while both positive and negative controls behave as expected. Both in vivo and in vitro experiments suggest that the protein still binds membrane despite not being able to form dimers. Similar observations were made in a study conducted by colleagues in parallel (Bohorquez et. al, 2024). Furthermore, K16A exchanges in other Walker motif-containing proteins, including E. coli MinD and RecA, and B. subtilis ParA/Soj, abolish dimer formation completely.

      There are many possible explanations why the observed shoulder during elution could appear, which we did not spell out in the results section. This includes possible conformational heterogeneity, as the protein may adopt multiple stable or semi-stable conformations that slightly differ in hydrodynamic volume. It is also possible, that the shoulder represents small protein aggregates from degradation products or proteolysis, which we indeed observe in the respective SDS-PAGE/Blot (Fig. S6). As written in the text, interactions with the SEC column through e.g. hydrophobic patches sticking out is not uncommon, as the surface charges of the mutant protein is different to the wild type version. On the same note, the buffer may subtly affect the surface properties like charge and hydrophobicity differently to the wild type protein and thus its interaction with the column. In conclusion, we are confident that the orthogonal methods used point towards dimer abolishment in a K16A mutant of MinD, despite displaying a small shoulder during SEC elution.

      (4) BLI data - were the kon and koff rates also determined without ATP, since it is assumed that MinD-K16A does not bind ATP, but has a strong Kd (Table 1). Does ATP modify Kd of wt MinD for PLs?

      Without ATP, MinD did neither properly interact with the sensor-bound liposomes nor follow a regular binding kinetic. Therefore, kinetic constants could not be determined, as the fitting of the curves is not possible. In addition to the respective figure (Fig. S8), we attached the graph of the raw/unfitted data in the supplement (Fig. S 13)- (MinD2 dataset)).

      (5) Local MinJ interactions are proposed to alter the dynamic localization of MinD wt and variants in vivo (line 349-358), which could occur through regulation of ATP hydrolysis, PL binding, or release by MinJ or MinC. Localization dynamics should be measured in minC and minJ mutant strains.

      We thank the reviewer for this important suggestion. In response, we have now directly measured MinD localization dynamics in both ∆minC and ∆minJ backgrounds. We performed single-molecule localization microscopy (SMLM) and single-molecule tracking (SMT) of Halo-MinD expressed from its native locus in these mutant strains, using the same experimental and analytical pipeline applied throughout the study. These new experiments are presented in a newly added Results section and summarized in Figure S12, where we quantitatively compare MinD localization, mobility, diffusion states, and confinement between wild type, ∆minC and ∆minJ cells. The data show that deletion of minJ leads to a pronounced increase in the confined/static MinD fraction and reduced dynamic cycling, whereas deletion of minC causes only subtle changes in MinD dynamics. These findings support a specific role of MinJ in maintaining dynamic MinD membrane cycling in vivo, while MinC has a more modest modulatory effect. We have integrated these results into the Discussion to refine our model of how MinJ and MinC differentially influence MinD dynamics and localization.

      (6) Considering the single molecule population counting and a lack of error presented for the binning of tracks (confined/slow/fast); it is difficult to rationalize why G12V and K16A are defective. The relative proportions of confined/slow/fast between wt, G12V, and K16A seem quite similar (i.e., bubble plot). And the static localization in Fig. 2B does not seem dramatically perturbed. This seems to invoke other cellular regulators as critical for the system's operation in the cell, further pointing to important regulatory roles by MinJ and/or MinC.

      First, regarding the apparent lack of error estimates for the population binning, the uncertainties associated with the SMT-based population fitting are intrinsically very small and fall below the graphical resolution of the plots. This reflects the large number of tracks analyzed and the robustness of the fitting procedure, rather than an omission of error analysis.

      Second, we respectfully disagree that the diffusion-state distributions and static localization patterns of G12V and K16A are similar to those of the wild type. In the context of SMT data, the observed shifts in population sizes are substantial and biologically meaningful. Moreover, the static localization of these mutants is markedly altered: instead of forming a graded enrichment at poles and septa, both mutants display a uniform membrane distribution, similar to e.g. a membrane stain (also see Fig. 2 B). This indicates a loss of regulated recruitment, consistent with impaired interaction with MinJ. Importantly, our biochemical analyses, together with extensive data on conserved Walker-type ATPases carrying analogous G12V and K16A mutations, strongly support the conclusion that these variants are functionally defective despite retaining membrane association.

      Third, we agree about the importance of MinC and MinJ, and have now directly tested the contribution of these interactors by analyzing MinD dynamics in ∆minC and ∆minJ backgrounds. These new data, presented in a newly added Results section and summarized in Fig. S12, support our interpretation by showing that MinJ has a pronounced effect on MinD confinement and dynamic cycling in vivo, whereas MinC has a more modest influence. Together, these findings reinforce the conclusion that the defects of G12V and K16A arise from impaired regulatory cycling through the mutations, but also through impaired interaction with MinJ.

      (7) Interesting that they stored the His-MinD protein at 4C for up to one week and not at -80C as it was in 10% glycerol. Was MinD inactivated by freezing? Did this contribute to the observed aggregation (line 695)?

      We thank the reviewer for raising this point. Prior to this comment, we routinely worked with freshly purified MinD and therefore had not systematically compared storage at 4 °C and -80 °C. In response to the suggestion, we have now directly compared the activity of MinD stored at 4 °C for one week with that of MinD stored at -80 °C for four weeks. We did not observe any significant difference in ATPase activity or overall biochemical behavior between the two storage conditions. These results indicate that freezing does not inactivate MinD and that the aggregation observed in some preparations is unlikely to be caused by storage at 4 °C. We have clarified this point in the materials and methods part of the manuscript and thank the reviewer for prompting this.

      (8) Line 109 - Type. Change "component" to "components".

      (9) Page 4, line 52 change 'machinery' to ‘machine'.

      (10) Page 13, line 248, changed 'manifested' to 'displayed'.

      Thank you for pointing out these typos, which have all been corrected.

      Reviewer #2 (Recommendations for the authors):

      I suggest making changes to sentence Lines 60-62: "In rod-shaped model bacteria like Escherichia coli and Bacillus subtilis, division site selection is governed by two protein systems (15-17): nucleoid occlusion and the Min system." However, it was shown previously that the deletion of both systems in B. subtilis, division site selection wasn't disturbed and other mechanism was suggested to be involved.

      We agree that this information should be part of the introduction. Therefore, we included the following sentence at the indicated position:

      “However, it was previously shown that simultaneous deletion of both systems in B. subtilis did not disturb division site selection, suggesting additional mechanisms to be involved (Rodrigues and Harry, 2012).”

      I suggest changing sentence Lines 85-86: "Dimerized MinD recruits MinC and activates it to prevent FtsZ dynamics (46)". It would be more precise to say: "Dimerized MinD recruits MinC and activates it to inhibit FtsZ oligomerization (46).

      Thank you, we agree and changed the sentence accordingly.

      In Figure S2 mark the two mentioned peaks 31 and 62 kDa to which elution volumes correspond.

      We thank the reviewer for this point. We ran the standards for this column again and fitted them to our peaks (see updated Fig. S2), now demonstrating that the shoulders are indeed not at a size where dimers would elute but rather around ~44.3 kDa. We note that both the Ni-NTA eluate and SEC fractions contain multiple His-tagged degradation products (see revised Fig. S2 and His-MinD blot in Fig. S1). Because the SEC run was performed with excess ADP to suppress ATP-dependent dimerization, we interpret the minor shoulder at ~44.3 kDa as arising from sample heterogeneity due to these degradation products, either by co-elution of fragments or by transient fragment:full-length MinD assemblies, rather than full-length MinD dimerization. This is now also described in the respective Results section.

      Reviewer #3 (Recommendations for the authors):

      The quality of the written manuscript is poor, making it difficult to read and appreciate. Specifically: The introduction is quite long. It takes almost three pages until the primary objective of the paper, identifying determinants of MinD localization in B. subtilis, is clearly stated. The introduction should be shortened to focus specifically on Min system function across species-i.e. prevent aberrant polar septation events. Three or four paragraphs should be sufficient. E.g. 1. Introduction to Min systems generally, 2. A summary of the mechanism underlying MinD oscillation in E. coli, 3. An explanation of similarities and differences between E. coli and B. subtilis, and 4. A paragraph outlining the specific questions to be addressed in this study.

      We have substantially revised the Introduction to address this concern. The revised version is considerably shorter and more focused, and now follows the structure proposed by the reviewer. As a result, the main objective of the manuscript is now stated much earlier, and the overall readability and clarity of the Introduction have been substantially improved.

      The results section is challenging to read, in part due to the inclusion of methods as well as some issues with organization. For example, this section begins with a single sentence describing the need to investigate MinD's ATPase cycle in vitro. This sentence is followed by a header and an entirely new section describing the methods used to purify MinD for biochemical analysis. These details should be in the methods section. Similarly, the first paragraph of the following section, which focuses on the ATPase activity MinD in the presence and absence of liposomes, describes how the commercially available EnzChek phosphate assays works. This is, again, something that belongs in methods, not results.

      We have revised the Results section extensively in response to this comment. In the revised manuscript, we have removed or relocated substantial methodological detail from the Results to the Methods section and streamlined the overall organization. Descriptions of protein purification procedures and standard assay principles, including details of the EnzChek phosphate assay, have been condensed or moved to the Methods where appropriate.

      At the same time, we have retained limited methodological information in the Results where it is essential for understanding the interpretation of non-standard experimental setups or key controls, like SMLM. In these cases, brief methodological context is provided to ensure clarity without requiring frequent cross-referencing to the Methods section.

      Overall, the Results section has been substantially condensed and reorganized to improve readability, while additional experiments added in response to reviewer comments necessarily increase the scope of the section. We believe the revised structure now clearly separates experimental outcomes from methodological detail and improves the flow of the Results.

      The discussion section, at 7 pages, is overly long and includes substantial extraneous information. For example, it begins with a 2.5 page long paragraph that includes a summary of pattern formation during embryogenesis in animals, followed by a brief description of Turing's reaction-diffusion model, and finally, repeating parts of the introduction, a summary of the mechanism underlying MinCDE localization in E. coli. It is only in the middle of this paragraph - near the end of the second page - that the authors turn their attention back to MinD localization in B. subtilis, albeit with a focus on reaction-diffusion-based behaviors of other ParA homologues. A revised discussion section should focus on the primary conclusion of the authors, based on data presented in the results. If the authors would like to make the case that their data fit the Turing reaction-diffusion model, they will need to include mathematically based modeling that demonstrates this point in their results.

      We have substantially revised and condensed the Discussion in response to this comment. In the revised manuscript, we removed the extended introductory material on general pattern formation, embryogenesis, and Turing reaction-diffusion theory, as these topics extended beyond the scope of the present study. We also eliminated redundant summaries of the E. coli MinCDE system that overlapped with the Introduction. The revised Discussion now focuses on the primary conclusions supported by our experimental data, namely the biochemical and in vivo mechanisms governing MinD membrane binding, ATPase activity, and dynamic localization in B. subtilis, as well as the regulatory roles of MinJ and MinC. Importantly, we would like to clarify that we did not intend to claim that the B. subtilis Min system follows a Turing-type reaction-diffusion mechanism. References to general reaction-diffusion concepts were meant to provide contextual background and not to imply a specific mathematical framework for the system studied here. To avoid any possible ambiguity, we have removed these references from the Discussion.

      While the overall length of the Discussion is now comparable to the previous version, this reflects the inclusion of substantial new experimental data added during revision. Importantly, the structure and content of the Discussion have been streamlined to prioritize interpretation of the results rather than general background, resulting in a more focused and cohesive narrative.

      Experimental comments:

      Line 213: Please provide a rationale for the ATPase experiments. What is the expected result for each mutant and why?

      We have clarified the rationale for the ATPase experiments in the revised manuscript by briefly outlining the expected behavior of each MinD mutant. The anticipated ATPase properties of G12V, K16A, and D40A are based on well-established studies of conserved Walker-type ATPases and were implicit in the original experimental design, as they should all be catalytically inactive. To avoid any ambiguity, we now state these expectations explicitly in the manuscript.

      Line 243: ATPase data for the mutant proteins should be included in the supplement.

      We have now quantified the ATPase activity for all MinD mutants from the respective EnzChek assay data. These experiments confirm that the G12V, K16A, and D40A mutations effectively abolish catalytic activity, yielding phosphate release rates that are essentially at the background detection limit of the assay. We have included these data in Figure S 7 C and updated the text to reflect these findings.

      Figure 2B: Please include transverse section fluorescence data for all variants as well as quantitative data on average MinD positioning.

      The quantitative information requested is already provided by our single-molecule localization and tracking (SMLM/SMT) analysis of Halo-MinD and its variants (Fig. 4 A and now S 12 A). This approach represents the averaged spatial distribution of individual MinD localizations collected from dozens of cells per condition and provides substantially higher spatial resolution and quantitative precision than transverse fluorescence profiles obtained by conventional widefield microscopy.

      We therefore believe that the SMLM-based analysis is superior to transverse section fluorescence measurements and more accurately captures average MinD positioning across the cell population. To avoid redundancy, we have retained the SMLM analysis as the quantitative framework for MinD localization.

      Figure 2B: I am not convinced that punctate and membrane-associated are mutually exclusive. Quantitative data on protein localization from transverse fluorescent sections is necessary to make this point.

      Please see the answer above and Fig. 4 A

      Figure 2B: It is impossible to assess the functionality of individual mutants without quantitative data on minicell frequency and cell length.

      We have addressed this point by quantitatively measuring both cell length and minicell frequency for all relevant strains. These analyses were performed on a minimum of n = 430 cells per strain and are now presented in Table S 5. The added data provide a quantitative assessment of mutant functionality and support the phenotypic interpretations shown in Fig. 2B, and is also integrated in the Results section.

      Other comments:

      Line 109: should read "components".

      Thank you, corrected.

      Line 135: Why is this sentence outside the major section of the results?

      It now has been integrated into the major section.

      Line 197: I am not sure I understand this sentence.

      We have revised this sentence to improve clarity and readability.

      Line 218: I do not understand this paragraph.

      We have also rephrased and rewritten this paragraph for clarity and readability.

      Line 223: To make this section focused on the results rather than the method, the authors could simply say "To determine the role of ATP mediated dimerization, we...." (If I am understanding this section correctly).

      We followed this suggestion and revised the text accordingly to focus on the experimental outcome rather than methodological detail.

      Line 273: "depicted" not depictured.

      Thank you, corrected.

      Figure 4: The single-cell data look good in the figure, however, the description of these results and their meaning are nearly impossible to follow in the text.

      We acknowledge that the single-molecule data presented in Fig. 4 are complex. While we have made minor clarifications to improve the flow and wording of the text, we did not substantially reduce the level of detail, as the description of the analytical framework is required for correct interpretation of the results.

      At the same time, we aimed to avoid repeating extensive methodological explanations that are already described in the Materials and Methods section, in line with other reviewer comments. We therefore retained a concise but technically accurate description in the Results to ensure that the biological conclusions drawn from Fig. 4 can be properly understood.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public review:

      Reviewer #1 (Public review):

      Strengths:

      (1) The use of chronic two-photon Ca<sup>2+</sup> imaging in awake, behaving mice represents a major technical strength, minimizing confounds introduced by anesthesia. The development of a Pf4Cre:GCaMP6s reporter line, combined with high-resolution intravital imaging, enables long-term and subcellular analysis of macrophage Ca<sup>2+</sup> dynamics in the meninges.

      (2) The comparison between perivascular and non-perivascular macrophages reveals clear niche-dependent differences in Ca<sup>2+</sup> signaling properties. The identification of macrophage Ca<sup>2+</sup> activity temporally coupled to dural vasomotion is particularly intriguing and highlights a potential macrophage-vascular functional unit in the dura.

      3) By linking macrophage Ca<sup>2+</sup> responses to CSD and implicating CGRP/RAMP1 signaling in a subset of these responses, the study connects meningeal macrophage activity to clinically relevant neuroimmune pathways involved in migraine and other neurological disorders.

      Thank you for recognizing the strengths in our work.

      Weaknesses:

      (1) The manuscript relies heavily on Pf4Cre-driven GCaMP6s expression to selectively image meningeal macrophages. Although prior studies are cited to support Pf4 specificity, Pf4 is not an exclusively macrophage-restricted marker, and developmental recombination cannot be excluded. The authors should provide direct validation of reporter specificity in the adult meninges (e.g., co-labeling with established macrophage markers and exclusion of other Pf4-expressing lineages). At minimum, the limitations of Pf4Cre-based labeling should be discussed more explicitly, particularly regarding how off-target expression might affect Ca<sup>2+</sup> signal interpretation.

      We acknowledge that PF4 is not an exclusively macrophage-restricted marker. Yet, among meningeal immunocytes, it is almost exclusively expressed in macrophages (1, 2). Furthermore, in the adult mouse meninges, PF4<sup>Cre</sup>-based reporter lines label nearly all dural and leptomeningeal macrophages and almost no other cells (3, 4). This Cre line has also been used to target border-associated macrophages (2, 4). Moreover, a recent study suggests that the bacterial artificial chromosome used to generate the PF4<sup>Cre</sup> line does not affect meningeal macrophage activity (4). Nonetheless, in the revised version, we discuss a potential limitation of the Pf4Cre-based labeling approach for studying meningeal macrophages’ Ca<sup>2+</sup> signaling, namely that a very small population of other meningeal immune cells may also be labeled.

      (2) The manuscript offers an extensive characterization of Ca<sup>2+</sup> event features (frequency spectra, propagation patterns, synchrony), but the biological significance of these signals is largely speculative. There is no direct link established between Ca<sup>2+</sup> activity patterns and macrophage function (e.g., activation state, motility, cytokine release, or interaction with other meningeal components). The discussion frequently implies functional specialization based on Ca<sup>2+</sup> dynamics without experimental validation. To strengthen the conceptual impact, a clearer framing of the study as a foundational descriptive resource, rather than a functional dissection, would improve alignment between data and conclusions.

      In our discussion, we indicated that “the exact link between the distinct Ca<sup>2+</sup> signal properties of meningeal macrophage subsets observed herein and their homeostatic function remains to be established”. In the revised discussion part, we acknowledge that this is primarily a descriptive study that provides a foundational landscape of Ca<sup>2+</sup> dynamics in meningeal macrophages.

      (3) The GLM analysis revealing coupling between dural perivascular macrophage Ca<sup>2+</sup> activity and vasomotion is technically sophisticated and intriguing. However, the directionality of this relationship remains unresolved. The current data do not distinguish whether macrophages actively regulate vasomotion, respond to mechanical or hemodynamic changes, or are co-modulated by neural activity. Statements suggesting that macrophages may "mediate" vasomotion are therefore premature. The authors should reframe these conclusions more cautiously, emphasizing correlation rather than causation, and expand the discussion to explicitly outline experimental strategies required to establish causality (e.g., macrophage-specific Ca<sup>2+</sup> manipulation).

      In the results section, we indicate that our data suggest that dural perivascular macrophages are functionally coupled to locomotion-driven dural vasomotion, either responding to it or mediating it. Furthermore, we discussed the possibilities that 1) macrophages sense vascular-related mechanical changes and 2) macrophage Ca<sup>2+</sup> signaling regulates dural vasomotion. Moreover, we explicitly state that studying causality will require an experimental approach that has yet to be developed, enabling selective manipulation of dural perivascular macrophages.

      (4) The authors conclude that synchronous Ca<sup>2+</sup> events across macrophages are driven by extrinsic signals rather than intercellular communication, based primarily on distance-time analyses. This conclusion is not sufficiently supported, as spatial independence alone does not exclude paracrine signaling, vascular cues, or network-level coordination. No perturbation experiments are presented to test alternative mechanisms. The authors can either provide additional experimental evidence or rephrase the conclusion to acknowledge that the source of synchrony remains unresolved.

      Thank you for this suggestion. In the revision, we indicate that further studies are required to resolve the exact source of synchrony.

      (5) A major and potentially important finding is that the dominant macrophage response to CSD is a persistent decrease in Ca<sup>2+</sup> activity, which is independent of CGRP/RAMP1 signaling. However, this phenomenon is not mechanistically explored. It remains unclear whether Ca<sup>2+</sup> suppression reflects macrophage inhibition, altered viability, homeostatic resetting, or an anti-inflammatory program. Minimally, the discussion should be more deeply engaged with possible interpretations and implications of this finding.

      While we propose that the decrease in macrophage Ca<sup>2+</sup> signaling following CSD could indicate that a hyperexcitable cortex dampens meningeal immunity, in the revised discussion, we indicate that further studies are needed to determine whether this reduction in meningeal macrophage Ca<sup>2+</sup> activity reflects altered viability or reduced immune function that could interfere with the macrophage’s ability to restore homeostasis and dampen local inflammation.

      (6) The pharmacological blockade of RAMP1 supports a role for CGRP signaling in persistent Ca<sup>2+</sup> increases after CSD, but the experiments are based on a relatively small number of cells and animals. The limited sample size constrains confidence in the generality of the conclusions. Pharmacological inhibition alone does not establish cell-autonomous effects in macrophages. The authors should acknowledge these limitations more explicitly and avoid overextension of the conclusions.

      Although n=3 is common in intravital imaging of the meninges, including experiments employing pharmacological manipulations, such as RAMP1 inhibition (5-7), a larger sample size will increase confidence in the results. We further acknowledge that our pharmacological data indicate only a potential role for RAMP1 signaling in meningeal macrophages and that CGRP/RAMP1 signaling in other meningeal immune or vascular cells may also play a role.

      Reviewer #2 (Public review):

      Using chronic intravital two-photon imaging of calcium dynamics in meningeal macrophages in Pf4Cre:TIGRE2.0-GCaMP6 mice, the study identified heterogeneous features of perivascular and non-perivascular meningeal macrophages at steady state and in response to cortical spreading depolarization (CSD). Analyses of calcium dynamics and blood vessels revealed a subpopulation of perivascular meningeal macrophages whose activity is coupled to behaviorally driven diameter fluctuations of their associated vessels. The analyses also investigated synchrony between different macrophage populations and revealed a role for CGRP/RAMP1 signaling in the CSD-induced increase, but not the decrease, in calcium transients.

      This is a timely study at both the technical and conceptual levels, examining calcium dynamics of meningeal macrophages in vivo. The conclusions are well supported by the findings and will provide an important foundation for future research on immune cell dynamics within the meninges in vivo. The paper is well written and clearly presented.

      Thank you.

      I have only minor comments.

      (1) Please indicate the formal definition of perivascular versus non-perivascular macrophages in terms of distance from the blood vessel. This information is not provided in the main text or the Methods. In addition, please explain how the meningeal vasculature was imaged in the main text.

      We did not measure the exact distance of the perivascular macrophages from the blood vessels, but defined them as such based on previous data showing that these cells reside along the abluminal surface and maintain tight interactions with mural cells (8). We now provide this information in the revised manuscript, including their labeling approach with a dextran tracer.

      (2) Similarly, the method used to induce acute CSD (pin prick) is not described in the main text and is only mentioned in the figure legends and Methods. Additional background on the neurobiology of acute CSD, as well as the resulting brain activity and neuroinflammatory responses, could be helpful.

      We have added more background and the method for inducing CSD (i.e., a pinprick in the frontal cortex) in the Results section.

      Reviewer #3 (Public review):

      Strengths:

      Sophisticated in vivo imaging of meningeal immune cells is employed in the study, which has not been performed previously. A detailed analysis of the distinct calcium dynamics in various subtypes of meningeal macrophages is provided. Functional relevance of the responses is also noted in relation to CSD events.

      Thank you for recognizing the strengths of our paper

      Weaknesses:

      (1) The specificity of the methods used to target both meningeal macrophages and RAMP1 is limited. Additional discussion points on the functional relevance of the two subtypes of meningeal macrophages and their calcium responses are warranted. A section on potential pitfalls should be included.

      Please see previous responses regarding the specificity of the PF4Cre line for targeting macrophages. The specificity of the RAMP1 antagonist we used (BIBN4096, Olcegepant) has been confirmed by its developer Boehringer Ingelheim, and has been used to target CGRP signaling in numerous studies, including those targeting meningeal macrophage and vascular signaling (2, 7). A section on the study’s limitations has been added.

      References:

      (1) H. Van Hove et al., A single-cell atlas of mouse brain macrophages reveals unique transcriptional identities shaped by ontogeny and tissue environment. Nat Neurosci 22, 1021-1035 (2019).

      (2) F. A. Pinho-Ribeiro et al., Bacteria hijack a meningeal neuroimmune axis to facilitate brain invasion. Nature 615, 472-481 (2023).

      (3) G. L. McKinsey et al., A new genetic strategy for targeting microglia in development and disease. Elife 9, (2020).

      (4) H. J. Barr et al., The circadian clock regulates scavenging of fluid-borne substrates by brain border-associated macrophages. bioRxiv, (2025).

      (5) T. L. Roth et al., Transcranial amelioration of inflammation and cell death after brain injury. Nature 505, 223-228 (2014).

      (6) M. V. Russo, L. L. Latour, D. B. McGavern, Distinct myeloid cell subsets promote meningeal remodeling and vascular repair after mild traumatic brain injury. Nat Immunol 19, 442-452 (2018).

      (7) K. L. Monaghan et al., Highly dynamic dural sinuses support meningeal immunity. Nature, (2026).

      (8) H. Min et al., Mural cells interact with macrophages in the dura mater to regulate CNS immune surveillance. J Exp Med 221, (2024).

    1. Author response:

      In the review, the critique was focused mainly on the functional results, which show that interpatch neurons in mouse V1 are more strongly modulated by locomotion than patch neurons. The anatomical results that patch and interpatch modules are recurrently connected in three interareal subnetworks were considered solid.

      We acknowledge the limitations of our work. Specifically, the number of recorded neurons could be higher, the mapping of neurons onto to patch and interpatch modules could be more direct, and the asymmetric distribution of locomotion-modulated responses in layer 2/3 may be confounded by selective masking of GCaMP signals by surface blood vessels. In experiments which are not included in the manuscript we have found no systematic spatial relationship between the M2AChR pattern and the vascular marker CD31, ruling out that masking contributed to the imaging results. Unfortunately, we are unable to revise the manuscript to the extent recommended by the reviewers because the collaborators have left the lab, which closed in 2024.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate mechanisms of acquired resistance (AR) to KRAS-G12C inhibitors (sotorasib) in NSCLC, proposing that resistance arises from signaling rewiring rather than additional mutations.

      Strengths:

      Using a panel of AR models - including cell lines, PDXs, CDXs, and PDXOs - they report activation of KRAS and PI3K/AKT/mTOR pathways, with elevated PI3K levels. Pharmacologic inhibition or CRISPR-Cas9 knockout of PI3K partially restores sotorasib sensitivity, and p-4EBP1 upregulation is implicated as an additional contributor, with dual mTORC1/2 inhibition more effective than mTORC1 inhibition alone.

      Weaknesses:

      While the study addresses an important clinical question, it is limited by several weaknesses in experimental rigor, data interpretation, and presentation. The mechanistic findings are not entirely novel, since the role of PI3K-AKT-mTOR signaling in therapeutic resistance is already well-established in the literature. Rather than uncovering new resistance mechanisms, the study largely confirms known pathways. Several key conclusions are not supported by the data, and critical alternative explanations - such as additional mutations or increased KRAS expression - are not thoroughly investigated or ruled out. Furthermore, while the authors use CRISPR-Cas9 to knock out PI3K and 4E-BP1 in H23-AR and H358-AR cells to restore sotorasib sensitivity, they do not perform reconstitution experiments to confirm that re-expressing PI3K or 4E-BP1 reverses the sensitization. This prevents full characterization of PI3K and p-4EBP1 upregulation as contributors to resistance. The manuscript also has several errors, poor figure quality, and a lack of proper quantification. Additional experimental validation, data improvement, and text revisions are required.

      Acquired resistance to KRAS<sup>G12C</sup> inhibitors such as sotorasib or adagrasib remains a significant clinical challenge. Therefore, the identification of mechanisms of acquired resistance, along with the development of alternative therapeutic strategies, including combination therapies with KRAS inhibitors, represents an urgent unmet clinical need. The emergence of secondary KRAS mutations or new mutations in other oncogenic drivers has been observed as a primary cause of acquired resistance in a fraction of patients. No identifiable mutations were detected in more than half of the tumors from patients who developed acquired resistance after treatment with sotorasib or adagrasib.

      Using a discovery-based approach that integrated global proteomic and phosphoproteomic analyses in the TC303AR and TC314AR PDX models, we identified distinct protein signatures associated with KRAS reactivation, upregulation of mTORC1 signaling, and activation of the PI3K/AKT/mTOR pathway. These findings prompted further investigation into these mechanisms of resistance and evaluation of novel therapeutic combinations to overcome resistance. Notably, the combination of sotorasib with copanlisib (a PI3K inhibitor), or the combination of sotorasib with AZD8055, or sapanisertib (mTORC1/2 dual inhibitors) demonstrated strong potential for future clinical use. These regimens effectively restored sotorasib sensitivity in both in vitro and in vivo models and produced robust, synergistic antitumor effects across various acquired resistance models.

      CRISPR-Cas9-mediated PI3K and 4E-BP1 knockout clones were generated in more than one resistant cell line that expressed a robust level of the knockout target, and multiple independent clones in each cell line were evaluated with and without gene disruption. Given the thorough nature of this analysis, additional reconstitution experiments were deemed unnecessary, as they would not yield further insight.

      Whole exome sequencing was performed on resistant cells or PDX models to confirm retention of the KRAS<sup>G12C</sup> mutation and to assess for potential secondary KRAS mutations. While our study focused on KRAS secondary mutation and its specific signaling pathways, we acknowledge that additional resistance mechanisms may be involved. These will be the focus of future investigations.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors focus on the identification of the mechanisms involved in the acquired resistance to Sotorasib in non-small lung KRASG12C mutant cells. To perform this study, the authors generate different clones of cell lines, cell-derived xenografts, patient-derived xenograft organoids, and patient-derived xenografts. In all these models, the authors generate resistant forms (i.e., resistant cell lines PDXs and organoids) and the genetic and molecular changes were characterised using whole-exome sequencing, proteomics, and phospho-proteomics. This analysis led to the identification of an important role of the PI3K/AKT/mTORC1/2 signalling network in the acquisition of resistance in several of the models tested. Molecular characterisation identified changes in the expression of some of the proteins in this network as key changes for the acquisition of resistance, and in particular, the authors show that changes in 4E-BP1 are common to some of the cells downstream of PI3K. Using pharmacological testing, they show that different drugs targeting PI3K, AKT, and MTORC1/2 sensitise some of the resistant models to Sotorasib. The analyses showed that the PI3K inhibitor copanlisib has an effect in NSCLC cells that, in some cases, seems to be synergistic with Sotorasib. Based on the work performed, the authors conclude that the PI3K/mTORC1/2 mediated 4E-BP1 phosphorylation is one of the mechanisms associated with the acquisition of resistance to Sotorasib and that targeting this signalling module could result in effective treatments for NSCLC patients.

      The work as presented in the current manuscript is very interesting, provides cell models that benefit the community, and can be used to expand our knowledge of the mechanism of resistance to KRAS targeting therapies. Overall, the techniques and methodology seem to be performed in agreement with standard practice, and the results support most of the conclusions made by the authors. However, there are some points that, if addressed, would increase the value and relevance of the findings and further extend the impact of this work. Some of the recommendations for changes relate to the way things are explained and presented, which need some work. Other changes might require the performance of additional experiments or reanalysis of the existing data.

      Strengths:

      (1) One of the stronger contributions of this article is the different models used to study the acquisition of resistance to Sotorasib. The resistant cell lines, PDXs and PDXOs, and the fact that the authors have different clones for each, made this collection especially relevant, as they seem to show different mechanisms that the cells used to become resistant to Sotorasib. Although logically, the authors focus on one of these mechanisms, the differential responses of the different clones and models to the treatments used in this work show that some of the clones used additional mechanisms of resistance that can be explored in other studies. Importantly, as they use in vitro and in vivo models, the results also consider the tumour microenvironment and other factors in the response to the treatments.

      (2) Another strength is the molecular characterisation of the different Sotorasib-resistant tumour cells by WES, which shows that these cells do not seem to acquire secondary mutations.

      (3) The use of MS-based proteomics also identifies proteome signatures that are associated with the acquisition of resistance, including PI3K/mTORC1/2. The combination of proteomics and phospho-proteomics results should allow the identification of several mechanisms that are deregulated in Sotorasib-resistant cells.

      (4) The results show a strong response of the NSCLC cells and PDXs to copanlisib, a drug for which there is limited information in this cancer type.

      (5) The way they develop the PDX-resistant and the PDXO seems to be appropriate.

      Weaknesses:

      In general, the data is of good quality, but due to the sheer amount of data included and the way it is presented and discussed, several of the claims or conclusions are not clear.

      (1) The abstract is rather long and gives details that are not usually included in one. This makes it very complicated to identify the most relevant findings of the work. The use of acronyms PDX, PDXO, and CDX without defining them makes it complicated for the non-specialist to know what the models are. Rewriting and reorganisation of the abstract would benefit the manuscript.

      We revised the abstract to ensure that the key findings and overall message are clearly communicated and easily understood by readers.

      (2) Expression, presentation, and grammar should be reviewed in all sections of the manuscript.

      This has been done in the revised version

      (3) In the different parts of the result section where the models shown in Figure 2 are described the authors indicate "Whole-exome sequencing (WES) confirmed that XXX model retained the KRASG12C mutation with no additional KRAS mutations detected" however, it is not indicated where this data is shown and in not all the cases there is explanation to other possible modifications that might relate to mechanisms of resistance. This information should be included in the manuscript, and the WES made publicly available.

      WES was done for KRAS to investigate the additional secondary mutation in the KRAS as well as to verify the retention of the KRAS<sup>G12C</sup> mutation in these AR models. WES data has been provided as supplements

      (4) The way the proteomics analysis of the TC303 and TC314 parental and resistant PDX is described in the text is confusing. The addition of an experimental layout figure would facilitate the understanding. As it is written, it is not obvious that the parental PDX were also analysed. For instance, the authors say, "The global and phosphoproteomic analyses identified over 8,000 and 4,000 gene protein products (GPPs), respectively". Is this comparing only resistant cells, or from the comparison of the parental and resistant pairs? And where are these numbers presented in the figures? Also, there is information that seems more adequate for the materials and methods sections, i.e., "Samples were analyzed using label-free nanoscale liquid chromatography coupled with tandem mass spectrometry (nanoLC-MS/MS) on a Thermo Fusion Mass Spectrometer. The resulting data were processed and quantified using the Proteome Discoverer 2.5 interface with the Mascot search engine, referencing the NCBI RefSeq protein database (Saltzman, Ruprecht). Two-component analysis is better named principal component analysis."

      The text has been revised accordingly

      (5) While the presentation of the proteomics data could be done in different ways, the way the data is presented in Figure 3 does not allow the reader to get an idea of many of the findings from this experiment. Although it is indicated that a table with the data will be made available, this should be central to the way the data is presented and explained. A table (ie, Excel doc) where the raw data and all the analysis are presented should be included and referenced. Additionally, heat maps for the whole proteomes identified should be included. In the text, it is said, "Global proteomic heatmap analysis revealed unique protein profiles in TC303AR and TC314AR PDXs compared to their sensitive counterparts (Figure 3C)." However, this figure only shows the histogram of the differentially regulated cells. Inclusion of the histogram showing all the cells is necessary, and it might be informative to include the histogram comparing the two isogenic pairs, which could identify common mechanisms and differences between both sets. In Figure 3C, the protein names should be readable, or a reference to tables where the proteins are listed should be included.

      The raw data associated with the proteomics and global proteomics can has beeen added as supplements.

      (6) In Figure 3, the pathway enrichment tool and GO used should be mentioned in the text. The tables with all significant tables should also be provided. The proteomics data seems to convincingly identify mTOR as one of the pathways deregulated in resistant cells, but there is little explanation of what is considered a significant FDR value and if there are other pathways or networks that are also modified, which might not be common to both isogenic models. In MS-based Phosphoproteome could help with the identification of differentially regulated pathways, but it is not really presented in the current manuscript. Most of the analysis of phospho-proteomics comes from the RPPA analysis, which is targeted proteomics. With the way the data is presented, the authors show evidence for a role of mTOR in the acquisition of resistance, but unfortunately, they do not discuss or allow the reader to explore if other pathways might also contribute to this change.

      The authors agree that other pathways may be involved, and this will be the subject of future study. The raw data has been added as supplements for the readers' interest.

      (7) Where is the proteomics data going to be deposited, and will it be made public to comply with FAIR principles?

      Has been uploaded according to the journal guidelines

      (8) The authors claim that the resistance shown for H23AR and H353AR cells is due to reactivation of KRAS signalling. This is done by looking to phosphorylation of ERK as a surrogate, as they claim, "KRAS inhibition is commonly assessed by evaluating the inhibition of ERK phosphorylation (p-ERK)". While this might be true in many cases, the data presented does not demonstrate that the increase in p-ERK is due to reactivation of KRAS. To make this claim, the authors should measure activation of KRAS (and possibly H- and NRAS) using GST-pull down or an image-based method.

      We agree that KRAS activation can be assessed through various methods. In this manuscript, which primarily focuses on mechanisms of resistance, pathway analysis revealed upregulation of KRAS signaling. This finding correlated with the incomplete inhibition of p-ERK by sotorasib in resistant cells. Notably, p-ERK status is widely recognized and routinely used as a surrogate marker for KRAS pathway activation.

      (9) The experiments in Figure 4 are very confusing, and some controls are missing. There is no blot where they show the effect of Sotorasib treatment in H23 and H358 parental cells. Is the increase shown in resistant cells shown in parental or is it exclusive for resistant cells only (and therefore acquired)? Experiment 4B should include this control. What is clear is that there is an increase in the expression of AKT and PI3K.

      H23 and H358 cells are highly sensitive to sotorasib, as demonstrated by the cell viability assays presented in Figure 2. As shown in Figure 3—figure supplement 3, sotorasib treatment led to complete inhibition of p-ERK in these parental cell lines. In contrast, p-ERK inhibition was incomplete in the resistant H23AR and H358AR cells, highlighting a distinct signaling behavior that prompted us to further investigate on AR cells. Moreover, these AR cells were continuously cultured under sotorasib pressure to maintain the resistance.

      (10) The main point here is whether this is acquired resistance or the sensitivity to the drug is already there, and there was no need to do an omics experiment to find this. In some cases, it seems that the single treatment with PI3K inhibitors is as effective as Sotorasib treatment, promoting the death of the parental cells. This is in line with previous data in H23 and H353 that show sensitivity to PI3K inhibition (i.e., H358 10.1016/j.jtcvs.2005.06.051; 10.1016/j.jtcvs.2005.06.051H23 10.20892/j.issn.2095-3941.2018.0361). The data is clear, especially for copanlisib, but would it be the case that this treatment could be used for the treatment of NSCLC alone or directly in combination with Sotorasib and prevent resistance? The results shown in Figure 4C strongly support that a single treatment might be effective in cases that do not respond to Sotorasib. The data in figure 4D-F (please correct typo "inhibition" in labels) seem to support that PI3K treatment of parental cells is as effective as in the resistant cells.

      We agree. Based on our in vitro (Figure 4) and in vivo (Figure 7) data, copanlisib was able to overcome sotorasib resistance, demonstrating either synergistic or additive effects depending on the specific model. These findings support the potential of combining PI3K inhibition with KRAS<sup>G12C</sup> inhibition as a promising strategy to address acquired resistance.

      (11) The experiments presented in Figure 7 show synergy between Sotorasib and copanlisib treatment in some of the resistant cells. But in Figure 7G, the single treatment of H23AR is as effective as the combination. Did the authors check the effect of this drug on the parental cells? As they do not include this control, it is not possible to know if this is acquired sensitivity to PI3K inhibition or if the parental cells were already sensitive (as indicated by the Figure 4 results).

      Both H23 and H23AR cells demonstrated high sensitivity to copanlisib, as shown in Figure 4. Combination index analysis for the copanlisib + sotorasib treatment (Figure 7A) revealed synergistic effects on cell viability at specific concentrations. However, in the in vivo experiment (Figure 7G), we did not observe a clear synergistic effect of the combination treatment against H23AR xenografts. This may be attributed to the dose of copanlisib used, which was potentially sufficient on its own to produce a strong antitumor response, thereby masking any additional benefit from the combination.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      To strengthen the scientific rigor and overall presentation of the study, the authors should consider the following:

      (1) Perform additional functional validations, including reconstitution experiments after PI3K and 4E-BP1 knockouts, to more definitively demonstrate the role of these targets in mediating resistance.

      CRISPR-Cas9-mediated PI3K and 4E-BP1 knockout clones were generated in more than one resistant cell line that expressed a robust level of the knockout target, and multiple independent clones in each cell line were evaluated with and without gene disruption. Acquired resistant H23AR and H358AR isogeneic cells overly expressed PI3K and 4EBP1 proteins, whereas the expression of these proteins was normal in parental cell lines (H23 and H358). These two pairs of cell lines (H23 vs H23AR & H358 vs H358AR), along with multiple knock-out clones from each cell line, were used in every functional assay, which represents the cells or clones with normal, overexpression, and no expression of the target proteins (Figure 5B, D-F & Figure 6D-E). Given the thorough nature of this analysis, additional reconstitution experiments were deemed unnecessary, as they would not yield further insight.

      (2) Improve experimental quantifications, particularly for western blot analyses, and ensure all key findings are supported by statistically significant comparisons.

      The changes observed on the Western blot were not subtle and obvious without quantification.

      (3) Clarify enrichment analysis by directly comparing resistant and sensitive models and use appropriate FDR thresholds (<0.05) when claiming significant pathway activation.

      The Mass Spectrometry data were analyzed by the Department of Biostatistics, and the methodology for the statistical analysis is explained in the Methods section. The enriched pathways were identified by pre-ranked GSEA using the gene list ranked by log-transformed P values with signs set to positive/negative for a fold change of >1 or <1, respectively, from the global proteomics and phosphoproteomics data. All the enriched pathways were ranked based on their enrichment scores and considered significant with an FDR value <0.05. Each enrichment plots in Figure 2 were marked with its respective FDR q value as well as nominal p-value (Figure 2D-E). The result section (page 14) is also revised for clarification.

      (4) Address alternative mechanisms of resistance, such as secondary mutations or KRAS overexpression, through deeper genetic and proteomic profiling.

      The authors agree that other pathways may be involved, and this will be the subject of future research. Our WES analysis on H23AR and H358AR cells shown in Figure 2 Supplement 1, did not find any additional mutations in KRAS, although there were some SNPs and Indel mutations, and not considered as outside the scope of our current study. KRAS signaling upregulation found in Gene Enrichment Analysis, shown in Figure 3D, was validated through its ERK-phosphorylation status in Figure 3-supplement 3.

      (5) Improve data presentation by enhancing figure quality, ensuring consistent labeling, and providing complete figure legends and descriptions.

      Revised

      (6) Revise and polish the manuscript text for clarity, accuracy, and consistency, paying special attention to avoiding contradictory statements and strengthening mechanistic interpretations.

      Revised

      Major Comments:

      (1) In Figure 1A, the authors state that "four PDX models were selected for evaluating sotorasib sensitivity based on their distinct co-mutation patterns," but it is unclear whether these patterns are common, clinically significant, or selected for another specific reason. Clarification is needed regarding the rationale for model selection.

      The models have co-mutations that are common in clinical specimens and are associated with drug resistance (Skoulidis, Ferdinandos, et al. "Co-occurring genomic alterations define major subsets of KRAS-mutant lung adenocarcinoma with distinct biology, immune profiles, and therapeutic vulnerabilities."Cancer discovery 5.8 (2015): 860-877). Out of 11 PDX models with KRAS<sup>G12C</sup> mutations, 4 models were selected for in vivo evaluation of sotorasib sensitivity based on their distinct co-mutation status. Co-mutations with either p53, STK11, or KEAP1 are the most commonly found co-mutations in NSCLC and become more challenging in therapeutic treatments in the clinic. All four PDXs selected for the in-vivo study harbor at least one of these co-mutations with the KRAS<sup>G12C</sup> mutation.

      (2) Whole-exome sequencing (WES) results for TC303 AR and TC314 AR are mentioned but not shown in the supplementary material. These results should be included.

      Included as a figure supplement in Figure 1-figure supplement 1

      (3) In Figure 2 - Figure Supplement 1, H23 AR and H358 AR acquired multiple SNPs and indels compared to their sensitive counterparts. The authors need to address whether these genetic alterations could contribute to resistance.

      The authors agree that other pathways may be involved, and this will be the subject of future research. Our WES analysis on H23AR and H358AR cells, shown in Figure 2 Supplement 1, did not find any additional mutations in KRAS, although there were some SNPs and Indel mutations considered as outside the scope of our current study. KRAS signaling upregulation found in Gene Enrichment Analysis, shown in Figure 3D, was validated through its ERK-phosphorylation status in Figure 3-supplement 3.

      (4) In Figure 3D-E, in the enrichment analysis, the authors describe enrichment of mTORC1 signaling in resistant PDXs without sufficiently comparing with the sensitive counterparts. They need to clarify whether the enrichment is unique to resistant cells.

      The comparison is sensitive to resistant cells (Figure 3C). In Figure 3D-E all enrichment data presented in the figure were derived from global and phosphoproteomic analysis on sotorasib-acquired resistant TC314AR PDX and compared with its sensitive counterpart TC314 PDX (Figure 3D) and sotorasib-acquired resistant TC314AR+TC303AR PDXs (combined) vs their sensitive counterparts TC314 + TC303 PDXs (Combined) in Figure 3E. We revised the text to make it clear.

      (5) In Figure 3F, the FDR values of 0.5 and 1.0 are too high to support conclusions of significant pathway activation. Similar issues exist for Figure 3 - Figure Supplement 2 (FDR q-values of 1.0, 0.989, and 0.813).

      Agree, FDR values are higher in the enrichment analysis on phosphoproteomic data, and not in the proteomics data. However, these enrichment scores indicate pathway activation. The FDR was higher, most likely due to the low number of phosphoproteins enriched in the designated pathways. Significant FDR values were found when the enrichment analysis was done on global proteomics data.

      (6) In Figure 3H, PI3K upregulation is inferred from RPPA quantification. An independent validation, such as immunoblotting, should be provided.

      In addition to the sotorasib-acquired resistant PDX samples, PI3K was found to be upregulated and shown in immunoblotting on sotorasib-resistant isogeneic cell lines (H23AR and H358AR cells) in Figure 4B.

      (7) In Figure 4B, increased PI3K (p85) levels alone do not support pathway activation, as p-AKT levels remain unchanged. Functional downstream markers (e.g., p-S6, p-4EBP1) should be assessed.

      Agree, the status of other downstream markers, such as p-S6 and p-4EBP1, was shown in Figure 4H and Figure 5E & 5F.

      (8) In Figure 4D, PI3K inhibition does not reduce colony formation in AR cells relative to parental cells. The data do not support the conclusion that PI3K inhibition sensitizes AR cells.

      These experiments show that the drugs are equally effective in the presence or absence of drug resistance to sotorasib. The specific role of PI3K is shown in the knockout experiments (Fig. 5) as explained in the result section on pages 18-19. H23AR and H358AR cells showed over 600- and 200-fold resistance to sotorasib as compared with their sensitive counterpart (Figure 2A) with IC50 20µM and 6µM, respectively. Whereas copanlisib, a PI3K inhibitor, can significantly sensitize the AR cells with the IC50 0.39µM and 0.06µM in H23AR and H358AR cells, respectively, which were as sensitive as the parental cells. PI3K signaling was significantly upregulated in AR cells, and inhibition of the PI3K-AKT-mTOR signaling through CRISPR-Cas9 PI3K knock-out (Figure 5) or inhibition of PI3K or downstream molecules by copanlisib, everolimus, or AZD8055 sensitizes the AR cells as singularly or synergistically with sotorasib (Figure 6H, & Figure 7A).

      (9) In Figures 4D-F, single or combination inhibition of PI3K, AKT, and mTORC1 in H23/H23AR and H358/H358AR cells shows no significant difference in colony formation between resistant and parental lines. Therefore, the conclusion that PI3K inhibition sensitizes sotorasib-resistant cells is not supported by the data.

      See response to (8).

      (10) In Figure 4G, copanlisib does not significantly inhibit p-mTOR (S2448) in H23 AR cells, and total mTOR levels decrease slightly. Quantification should be added.

      Added as a supplement

      (11) In Figure 4G, western blot results for p-PDK and PDK are not quantified, and effects vary between H23^AR and H358^AR cells. Quantification needs to be added.

      Added as a supplement

      (12) In Figure 6H, cell viability curves for H23AR/PI3K KO 3-3 cells start from <60%, suggesting pre-existing poor cell health. This casts doubt on conclusions regarding dual drug effects.

      All cell viability remained at or close to 100% at the no-treatment control condition, and the cell viability at the starting point was lower than 100% only in the combination treatment group, where the cells were treated with at least one drug. Here, a fixed dose of AZD8055 (50nM or 100nM) was combined with different doses of sotorasib. The dual drug effects are assessed by the combination index, which takes viability factors into account. Combination effects were confirmed by in vivo experiments.

      (13) The manuscript claims that mTORC1 inhibition alone is insufficient to suppress resistance (page 23), yet earlier reports that the mTORC1 inhibitor everolimus significantly reduces colony formation (page 17). This inconsistency needs to be addressed.

      revised. On p. 23, we are referring to 4E-BP1-mediated resistance.

      (14) In Figure 7G, since copanlisib alone appears as effective as combination therapy, the authors should revise the conclusion to emphasize the sufficiency of PI3K inhibition alone.

      Agree, the copanlisib treatment appeared to be very effective in the H23AR xenograft model, which is most likely due to the copanlisib dose used in this model, which showed a strong antitumor effect and superseded the combination effect. However, the synergistic antitumor activity of copanlisib with sotorasib was found in H358CDX and TC314AR PDX models (Figure 7D, & I).

      (15) In Figure 7I, statistical comparisons (P-value) comparing combination therapy to copanlisib monotherapy are missing. Without statistical significance, the conclusion regarding the combination efficacy cannot be justified.

      Revised

      Minor Comments:

      (1) Figure 1D is not described in the main text.

      Revised

      (2) On page 12, "FigG" and "FigH" should be corrected to "Figure 2G" and "Figure 2H," respectively.

      Revised

      (3) On page 17, the section title "copanlisib modulates PI3K-AKT-mTOR signaling..." should capitalize the first word.

      Revised

      (4) In Figure 7, "sotorasib" and "AMG510" are used interchangeably but refer to the same drug; consistent labeling should be used to avoid confusion.

      Revised

      (5) In Figure 7 - Figure Supplement 2A-B, the rationale for switching from AZD8055 to sapanisertib, another dual mTORC1/mTORC2 inhibitor, is unclear and should be explained.

      Revised

      Reviewer #2 (Recommendations for the authors):

      Please review all the figures and labels, are there are many mistakes? Also, check the way that the figures are presented and, if necessary, increase the definition.

      Revised

      (1) Figure 2 seems to be squashed.

      Revised

      (2) RPPA experiment "PI3K-AKT-mTOR signaling pathway compared to their sensitive counterparts. Specifically, the expression levels of MEK1, p-MEK1, p-MAPK, PDK1, p-PRAS40, p-GSK-3β, p-4E-BP1, p-PI3K, p-Akt, p-PRAS40, p-p38-MAPK, p-AMPK, and p-MAPK were markedly increased in resistant TC303AR and TC314AR PDXs." Several of these proteins are not really part of the PI3K-AKT-MTOR pathway, as such, but the MAPK pathway, and this is masked by not mentioning this. It is also necessary to explain which proteins are called MAPK and why there are 2 p-MAPK.

      Revised

      (3) Figure 3 - Figure Supplement 3. The images seem saturated for some of the blots. Is there still a decrease in ERK activity in the resistant cells? Lower exposure blots should be included, and if possible, some quantification performed.

      Quantification added

      (4) Figure 4I, review the title of the left graph, as this is not only sensitivity to everolimus.

      Revised

      (5) The figure legends need extensive review and rewriting. For instance, in Figure 6, the times for how long the treatments were performed in the different graphs have to be specified. The figure legends must allow interpretation of the data without reading the material and methods or text.

      Revised

      Materials and Methods

      This section needs special attention for typos and style, for instance:

      (1) Correct "KRASG12G inhibitors including sotorasib, adagrasib," to G12C.

      Revised

      (2) Use appropriate symbols i.e., "3 ul sgRNA (30 uM), 0.5 ul Cas9 (20 uM), and 3.5 ul Buffer R were mixed"

      Revised

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      We appreciate Reviewer #1’s very positive feedback. Incorporating the perspective of ‘incidental’ sensory signals is a valuable suggestion that aligns perfectly with our findings. We agree that this perspective significantly strengthens the impact of our paper.

      In the revised version, we will update the manuscript to bridge these perspectives (the functional role of incidental” sensory signals and the role of retinal flow in navigation). In addition we will elaborate on the potential predictions of the model and possible manipulations that might affect the integration between sensory evidence (curl signal) and straight-ahead prior.

      Reviewer #2 (Public review):

      We appreciate the reviewer’s feedback regarding the formalization of our reference frames. We agree that certain definitions were implicitly assumed rather than explicitly stated. We will revise the manuscript to provide all necessary self-contained information, ensuring that the geometry of the task response and the definition of heading are unambiguous. Also, we will address the gap between the task response (in world coordinates) and the functional role of the controller, as well as the other points raised by the reviewer.

      Major issues:

      (1a), (2a) Clarification of Reference Frames

      The reviewer asks: “To ‘directly estimate heading’ relative to what?”

      In our study, participants were instructed to report their “perceived direction of self-motion” by aligning a rotational encoder (steering wheel) with the direction they felt they were moving within the 3D simulated scene. Consequently, participants reported their instantaneous heading in a world-centered reference frame, from which the 3D trajectories were reconstructed. Since the reviewer had to infer this information, it should be clarified to ensure it is immediately evident.

      Participants were informed that the initial heading (i.e. θ<sub>0</sub> in our controller nomenclature) was oriented “straight ahead” relative to their body which was aligned longitudinally with the experimental room. We will modify Figure 1B and revise the Methods section to explicitly clarify this initial alignment and the instructions provided to participants.

      In the revised manuscript, we will clarify that while the participant’s report is world-centered, the retinal curl provides a gaze-relative heading signal. Although this was already mentioned, we will emphasize this point. In natural navigation toward a fixated target, a world-centered vector is often unnecessary; an error signal indicating heading relative to fixation is sufficient (as the reviewer also notes). However, the initial alignment of the heading within the 3D scene allows the brain to “calibrate” this internal controller, mapping the retinal curl signal onto the 3D world coordinates required for the task.

      The reviewer also asks how we can be certain that participants were reporting in world coordinates rather than an alternative frame, such as “heading relative to the fixation target.” We believe our “Cancelled Curl” (and over-cancelled) conditions provide the most compelling evidence to rule out this alternative. In these conditions, the physical position of the fixation target in the scene remained identical to the unaltered flow condition. If participants were simply reporting heading relative to the fixation target’s spatial location, the observed biases should have persisted regardless of the flow manipulation. Instead, the bias vanished when the curl was removed. This causal evidence proves that the bias is driven by the retinal motion signal (curl) rather than the spatial orientation of the eyes or the target’s position in the scene. Furthermore, the temporal evolution of the response supports a world-centered integration. For simulated straight paths, the perceived heading remains straight for the first few seconds (consistent with the initial world-centered alignment), with biases only emerging after approximately 3 seconds of integration (a point we elaborate on in our response to Reviewer #3). Had participants been responding based on a simple gaze-relative reference frame from the onset, these biases would have manifested significantly earlier. We will incorporate these points into the revised Discussion to better frame our findings alongside other cues, such as the Focus of Expansion (FOE), that contribute to heading estimation.

      (1b) The reviewer notes that we must be clear about the relationship between curl and heading (relative to fixation) and the variables that affect curl.

      Beyond the discrepancy between heading (θ) and gaze (ψ), curl is geometrically determined by translational self-motion speed (υ), eye height (h), and pitch (α). More specifically curl = (υ sin_ψ_cos α)/h). The derivation will be included in the Supplementary Information. Since h = d_sin_α, where d is the 3D distance to the fixation point, we could express cos α as a function of distance. Certainly, there is not a 1:1 map from curl signal to heading relative to gaze (e.g. θ – ψ). Participant would need to know υ and eye height plus extra-retinal information. Frenz et al (2003, Vis Res.) showed that people can estimate self-motion directly from optic flow, across different simulated eye height and gaze angle; extra-retinal information can, in addition, provide knowledge to (ψ) and (α). It is then plausible that the visual system can use and transform the curl signal from a qualitative directional cue (i.e. steering left or right of fixation) into a quantitative steering command. By combining curl with knowledge of gaze orientation and eye height, the visual system can resolve ambiguities in the flow field and utilize curl as a more precise error signal for locomotor control. These aspects will be included in the new version.

      (2b) Mismatch between task and controller

      We thank the reviewer for this point. We have addressed the alignment of the reference frames in our response to Issues 1a and 2a. Once the initial orientation () is established in the world frame, the controller model generates steering adjustments that directly translate into heading predictions within that same world reference frame. By treating the perceptual report as an output of the locomotor controller, we resolve the discrepancy between the steering task and the reported heading.

      (2c) No raw data provided

      We respectfully disagree with the reviewer’s interpretation regarding data smoothing. The thin lines in Figure 2 represent the mean 3D paths derived directly from the response variable (θ<sub>0</sub>) across trials of identical conditions for each participant (as detailed in the ‘Computation of Perceived Path’ section). No smoothing or filtering has been applied to these plotted trajectories other than computing the mean across trials. We also wish to remind the reviewer that the raw data and analysis code remain publicly accessible for further inspection. Regarding the visual representation: in earlier versions of the manuscript, we included shaded 95% Confidence Intervals (CIs) in Figure 2. However, this addition rendered the plot overly cluttered and obscured the individual trajectories. We therefore elected to present individual participant means (thin lines) alongside group averages (thick lines) to emphasize inter-subject variability. For clarity, the 95% CIs are explicitly displayed in Figure 3, where the data density is more conducive to shaded areas.

      (3) Difference with Matthis et al (2022)

      While Matthis et al. (2022) described the existence of retinal curl during walking and which information can provide relative to gaze, Our paper provides the causal link, since we manipulate in real-time (the ‘cancelled & overcancelled curl’ condition) providing the critical evidence that perceived heading is affected by this signal.

      (4) Eye movements analysis

      We thank the reviewer for noting that retinal slip (velocity error) is a more critical metric than positional gaze error. We agree that tracking inaccuracies can introduce translational noise into the flow field. The 3° threshold was established based on the eye tracker’s specifications and the naturalistic setup (1-meter viewing distance without head stabilization). Across all participants, the mean positional error ranged from 1.016° to 1.5° (1 deg is 2.08 cm in our setup). We also calculated retinal slip values, which ranged from 0.12 to 0.27 deg/s (X dimension) and 0.12 to 0.23 deg/s (Y dimension). These values are comparable to natural oculomotor drift (Kowler et al., 1979) and are understandably small given the low velocity of the fixation target. Consequently, it is highly unlikely that retinal slip influenced the results. Furthermore, assuming that tracking error remained consistent across fixation conditions, any present retinal slip cannot explain why the bias followed the retinal curl manipulation as predicted by the controller. We therefore consider retinal slip to be an unlikely confounding factor.

      (5) the separate and joined fits

      We thank the reviewer for the opportunity to clarify the logic behind our modeling choices. We acknowledge that the “separate fits” are inherently less informative due to the high number of free parameters relative to the data. Our primary scientific goal was not to achieve perfect descriptive accuracy via 30 parameters, but to test a specific functional hypothesis through the “joint fit.”

      The Logic of the Joint Fit:

      We agree with the reviewer that the joint fit misses some paths in some conditions. Of course, the joint fit reflects a significant compromise. The “Gain” (the weighting of the curl signal) is likely not a static constant but is dynamically tuned based on task demands, confidence in the visual signal, simulated speed, and so on. By using a single Gain parameter, we intentionally ignore this contextual variability to see how much of the behavior can be explained by a “minimalist” controller. In this sense, the 2-parameter joint model is a deliberate attempt to test this limit. By forcing a single Gain parameter to account for all conditions across both straight and curved paths within one flow manipulation (e.g. unaltered flow) we are asking if a single, fixed linear relationship between retinal curl and steering effort/gain can explain the results. We view the joint fit not as a “perfect” model, but as a stronger test of the curl-based control theory. The fact that a 2-parameter model can capture the direction and scale of biases across such a diverse set of conditions (straight/curved paths, five fixation eccentricities) suggests that retinal curl is a robust signal. Upon closer analysis, these discrepancies between the joint model and the data are most pronounced in the over-cancelled condition which is the one when sensory evidence becomes more ecologically inconsistent with the extra-retinal information (gaze direction). While the joint fit successfully demonstrates that a single parameter can capture the general functional role of curl, it fails to account for the complex sensory re-weighting that occurs in ecologically inconsistent conditions (like ‘over-cancelled’ flow). We will update the manuscript to discuss these limitations, framing the model as a parsimonious first-order approximation rather than a complete description of human heading perception based on a minimal set of parameters.

      (6) On the neural simulations

      We acknowledge that the presentation of the neural model requires more clarity regarding its objectives and its relationship to the behavioral data.

      We first wish to clarify the intended scope of the neural ring-attractor model. Our primary goal was not to provide a comprehensive account of behavioral performance across all conditions (which is the role of the controller model), but rather to demonstrate a biologically plausible mechanism that explains the emergence of the “Opposite-to-Gaze” bias. While the controller demonstrates that the bias follows a specific control law, the neural model shows how such a law can emerge from known primate neurophysiology, specifically, spiral-tuned MSTd neurons, gaze-contingent inhibition, and an egocentric “straight-ahead” prior.

      Why Straight Paths are Sufficient for this Objective. The reviewer asks why only straight paths were simulated. In our study, the straight-path condition with eccentric gaze is the purest test of the bias mechanism. Simulating the straight paths allowed us to isolate the interaction between foveal inhibition and the straight-ahead prior without the confounding variable of path-curvature flow. Given the complexity of the neural network’s parameter space, we focused on these conditions to provide a clear neuro-plausible explanation.

      Units: Pixels vs. Degrees. We acknowledge that the use of “pixels” in the plots of internal neural dynamics may appear awkward. The neural network operates on input stimuli that are defined by the pixel resolution of the videos used in the simulations, we used pixels as the native coordinate system to describe the movement of activity peaks within the network’s internal “map.”

      Behavioral Output (Meters): Importantly, the final heading estimates produced by the network are not left in pixels. We use a pinhole camera model to reconstruct the 3D trajectories from the neural activity. These results are expressed in meters, allowing for a direct comparison with the human behavioral data.

      Addressing Wild Oscillations and Smooth Paths. The oscillations observed in the instantaneous heading estimates reflect the stochastic nature of the population peak when tracking high-frequency sensory inputs. In our model, the synaptic time constant (τ) was kept relatively small to ensure a fast, low-latency response to changes in self-motion. While increasing τ would have produced smoother internal dynamics, it would also have introduced delays into the control loop. Instead, we chose to maintain this high sensory responsiveness and applied a temporal moving average later to the network’s decoding to reconstruct the 3D trajectories.

      In addition, the neural activity over time is shown in two ways: the heatmap shows the neuron with preferred heading (one can see more oscillations, specially when the fixation point is closer to the centre (eccentricities -2 and 2), due to larger competition between the sensory evidence and the straight-ahead prior. The other way is the decoded heading. In the ring-attractor model, the decoded heading is not determined by a single neuron but is calculated using a population vector average (equation 19). By summing across the entire population, the decoder effectively integrates sensory evidence from many neurons simultaneously. One can appreciate (see e.g. Fig. 5B) that averaged decoding, leads to a smoother resulting estimate (the white dashed line, whose visibility will be improved in the revised version). Behavioral work by Burr and Santoro (2001) suggests that global motion signals (divergence and rotation in optic flow) are integrated over much longer timescales—roughly 1000ms to 3000ms—compared to local motion units (~200ms).

      See also our comment on temporal integration in the responses to reviewer #3.

      Reviewer #3 (Public review):

      We thank Reviewer #3 the comments regarding the definition of heading at different time scales, the role of the gait cycle, and the temporal integration of the curl signal. They will help us refine the manuscript’s core arguments.

      We agree that “heading” must be precisely defined within the context of the differing temporal demands of balance and steering. While instantaneous retinal motion provides the high-frequency feedback necessary for momentary postural adjustments and balance, our study is concerned with heading as a gaze-relative signal used for the continuous control of a locomotor trajectory. As such, we will revise the manuscript to specify that the perceived heading measured in our task reflects a signal integrated over the gait cycle to filter out the oscillatory noise induced by head bob and sway.

      The reviewer correctly notes that gait-induced head bob and sway produce high-frequency oscillations in the curl signal, yet our behavioral results show smooth, slowly evolving biases. The visual system does not react to “instantaneous” curl, which would lead to jittery, unstable heading estimates. Instead, it integrates flow over a timescale roughly commensurate with a full gait cycle (~500–1000ms). This implies a significant temporal integration process. This temporal integration is consistent with evidence (Burr and Santoro,2001, Vis Res) indicating that optic flow signals (radial and rotational components) are integrated over windows of approximately up to 3 seconds to ensure perceptual stability. Neurally, this likely involves the projection from area MSTd to the Ventral Intraparietal area (VIP), a pathway where fast, eye-centered sensory inputs are transformed into stable, body-centered representations suitable for guiding long-term steering behavior (Chen et al. 2011, JNeurosci.). By grounding our definition of heading in these specific temporal and neural constraints, we aim to clarify how the visual system exploits retinal curl for goal-directed action in natural, dynamic environments and relate our findings to recent studies addressing the role of retinal motion on balance (Powell et al. 2026 Bioarx).

      In our implementation, we explicitly address the high-frequency noise introduced by gait dynamics by smoothing the retinal curl signals computed from the stimulus videos before they are fed into the controller. This temporal filtering allows the fit of the controller’s prediction to the response data while remaining robust to the rapid fluctuations of head bob and sway. In contrast, the neural ring-attractor model would not require an external smoothing step; instead, the integration is an emergent property of the system’s architecture that can be controlled with different parameters. The dynamics of the synaptic weights and the characteristic “leak” in the population activity naturally implement a leaky integration of sensory evidence, ensuring that the decoded heading reflects a sustained estimate rather than an instantaneous response to visual noise.

    1. Author response:

      Reviewer 1:

      Porte et al. investigate how observers form confidence judgments about the presence vs absence of near-threshold audiovisual stimuli. In two psychophysical detection experiments, human participants judged whether a stimulus (visual, auditory, or audiovisual) was present or absent, reported amodal confidence, and then gave modality-specific detection and confidence ratings using a bidimensional scale. The authors report that audiovisual (AV) stimuli are detected more accurately than unimodal stimuli, but that multisensory stimulation does not improve metacognitive efficiency. Participants are more confident in absence than in presence judgments. They extend a previously proposed model to an audiovisual setting, assuming evidence is available only for presence and that absence is inferred via counterfactual detectability. Detection is modeled with a disjunctive integration rule across modalities, while confidence is explained by a combination of conjunctive (for presence) and disjunctive/negation-of-disjunction (for absence) rules.

      We thank the reviewer for thoroughly evaluating our work.

      There are several points I wish to have clarified, outlined below:

      (1) Framing of bimodal vs unimodal detection

      On p.3, the introduction states that "Adults typically show higher detection rates and faster reaction times for bimodal than for unimodal stimuli." This is broadly consistent with the literature, but as written, it obscures the fact that these effects depend critically on experimenter-defined stimulus strengths. It is trivial to construct cases where a strong unimodal stimulus is more detectable than a bimodal stimulus made of two very weak unimodal stimuli. If "bimodal" is understood as the co-presentation of two unimodal components matched in detectability, then Bayes-rule-based arguments indeed predict better detection for the bimodal case; how much better is theoretically interesting, but not quantified in this paper. There is an entire literature on the combination of two unimodal stimuli, which is not touched on. For a pertinent reference, see Ernst & Banks 2002. I recommend clarifying that the statement assumes comparable unimodal intensities.

      We will clarify that when discussing bimodal stimuli, we mean the co-presentation of two unimodal stimuli of similar intensity. We will add references to the literature during discrimination tasks that have shown that multisensory cue-combination followed Bayes rule integration (e.g., Ernst & Banks, 2002; Battaglia et al., 2003; Alais & Burr, 2004) and clarify in which ways our work differs from this rich body of work and provides novel contributions.

      (2) Relationship to signal detection theory and counterfactual perceptibility

      In the introduction, the authors write, "If sensory evidence is only available for presence," motivating counterfactual perceptibility as a necessary ingredient to infer absence. However, standard signal detection theory (SDT) already provides a widely accepted framework in which a continuous internal response is present on both signal and noise (absent) trials, with absence corresponding to the noise distribution and decisions implemented by a criterion. Thus, there is no logical need to invoke counterfactual perceptibility simply to define absence; rather, the Mazor-style framework adds an explicit belief model about detectability and an optimal stopping policy. It would strengthen the paper to more clearly state how the proposed model goes beyond SDT conceptually, acknowledge that SDT can account for presence/absence decisions without counterfactuals, and position the counterfactual account as a hypothesis about how observers actually compute absence/confidence, not as a necessity.

      One of the central claims of the paper is that detection in the case of absence requires counterfactual reasoning. The authors should demonstrate whether or not an SDT-based generative model can describe these amodal and uni- and bi-modal stimulus decisions. In such an SDT model, an SDT-based generative model in which the noise distribution is shared across conditions, and unimodal vs bimodal differences are captured by changes in the mean or variance of the signal+noise distribution.

      We will clarify that our framework explains how absence judgments (and related confidence) are formed, and what it adds to SDT models, including the reproduction of reaction times and a normative explanation of criterion placement (results about RTs are available in the supplementary materials).We will also run additional model comparisons assessing how an SDT-based generative model performs compared to our Bayesian model based on counterfactual perceivability.

      (3) Confidence vs performance: is AV confidence special?

      The paper's central claims about multisensory confidence and metacognition would be stronger if the authors showed that AV confidence deviates from what is expected given performance alone. From the reported results, AV accuracy is around 80%, with visual and auditory at about 60% and 40%, respectively. Given that confidence typically monotonically scales with accuracy, the first question is whether AV confidence is entirely explained by improved performance, or whether there is an additional multisensory contribution. A simple, informative analysis would be for each subject, plot mean confidence vs per cent correct for AV, V, A, and absent conditions, and to test whether AV confidence lies above the trend predicted by accuracy alone.

      This is an excellent suggestion, and we will conduct the proposed analysis.

      (4) Metacognitive measures: logistic regression slopes vs meta-d′/d′

      In the "Multisensory effects on metacognitive performance" section, the authors define "metacognitive sensitivity" as the slope of a Bayesian logistic regression predicting accuracy from confidence. There is substantial literature showing that logistic-slope measures of metacognitive sensitivity are criterion-dependent and can be affected by both task and confidence criteria (for one example, see Rausch & Zehetleitner, 2017). In contrast, meta-d′/d′ was specifically developed to provide a bias-invariant measure of metacognitive efficiency. Though this, too, is dated (see Boundy-Singer et al., 2023). Given that the authors already estimate HMeta-d-based M-ratios, it is unclear why they rely on logistic regression slopes as their primary "metacognitive sensitivity" metric in Figure 4A. I suggest either replacing the logistic-slope metric with SDT-based measures (meta-d′, meta-d′/d′) or providing a clear justification for using logistic slopes, along with a discussion of their known limitations.

      Additionally, Figure 3 reports M-ratios without showing the corresponding d′ or meta-d′ for judge-present vs judge-absent conditions. Presenting these would help contextualize the metacognitive efficiency results and clarify whether differences are driven mainly by changes in metacognitive sensitivity, changes in task performance, or both. The d' values per condition could be added to Figure 2A.

      All typical measures of metacognitive sensitivity are influenced by metacognitive bias and task performance to some extent, and none of them is a pure measure of type-2 sensitivity (e.g., see Rahnev, 2025). Here, we chose logistic regression because it enables modeling interactions with other predictors in a factorial design with a limited number of trials.

      We will clarify the limitations of metacognitive sensitivity measures and better explain why we then used Mratio to estimate metacognitive performance while controlling for underlying task performance.

      Thank you for this suggestion. We will add the d’ values per condition to Figure 2A.

      (5) Interpretation of confidence in absence vs presence

      The authors emphasise that it is surprising subjects are more confident in absence than in presence judgments, both at amodal and modality-specific levels. However, Figure 2B suggests that absent responses are very accurate: absent is reported as present only in about 10% of absent trials, implying a high correct rejection rate. If confidence tracks outcome probability, higher confidence for absence may be at least partly expected. Before attributing this asymmetry primarily to counterfactual reasoning, it would be important to explicitly relate confidence to accuracy for hits, misses, false alarms, and correct rejections and show whether absence confidence remains elevated relative to presence after controlling for accuracy differences across judgment types and conditions. Without this, the interpretation that higher absence confidence is inherently "unexpected" seems overstated.

      This higher confidence for absence judgments than for presence judgments was observed while controlling for response accuracy. We will clarify this in the main text.

      (6) Model: integration rules, confidence, and evidence strength

      The modeling section extends the Mazor et al. ideal observer to two modality-specific sensors, with disjunctive integration for detection and then disjunctive vs conjunctive integration rules for confidence. I have a few comments.

      First, the detection rule is disjunctive and is reported as a finding. However, the conclusion that detection relies on a disjunctive rule ("present if A or V") closely mirrors the task instructions-participants are explicitly told to respond "present" if they detect the stimulus in any modality. As such, this seems more like a sanity check than a novel empirical finding. Relatedly, the conjunctive detection is a weak null. The conjunctive rule ("present only if both A and V") is behaviorally implausible given the task instructions. A more informative baseline would be an SDT-style scalar-evidence model (see comment 2), rather than a conjunctive rule that participants would have to actively violate the instructions to follow.

      Second, confidence in the model is defined as the probability of being correct at the time of the detection decision. However, this implies a fixed amount of evidence at decision time unless additional mechanisms are invoked. This issue is well known in diffusion modeling (see Kiani et al. 2014) and deserves explicit discussion; otherwise, it is unclear how the model produces graded confidence from a bound-crossing rule alone.

      Third, the authors do not consider a straightforward evidence-strength account of confidence. When both modalities indicate presence, there is, on average, more total sensory evidence than in unimodal trials, making correct decisions more likely and, under most frameworks, confidence higher. Likewise, weak evidence in both modalities can be stronger evidence for absence than moderate in one and weak in the other. Many of the patterns that motivate the presence-conjunctive/absence-disjunctive mix could arise from a model where confidence simply reflects the amount of evidence for the chosen option, without positing distinct logical integration rules for presence vs absence. As the authors note, purely disjunctive or purely conjunctive confidence rules fail to capture the trends in confidence reports in Figure 7, leading them to adopt a combined presence-conjunctive/absence-disjunctive rule. A more parsimonious alternative-that confidence scales with evidence magnitude and cross-modal agreement-should be explicitly considered and, ideally, implemented as a competing model. Finally, if the model is intended as a good account of the data, it would be useful to report whether it also reproduces the metacognitive efficiency patterns (M-ratios) beyond the mean confidence patterns shown in Figures 7-8. At present, the model appears systematically over-confident, which should be acknowledged and quantified.

      Indeed, the disjunctive rule was expected, given our design; we will clarify this. As mentioned above, we will directly compare the results of our current model with those of a more traditional SDT-based generative model, as suggested by the reviewer.

      Contrary to a classical drift diffusion model, the model does not assume a fixed decision boundary, but derives an optimal stopping policy per time point and belief state. As a result, and depending on beliefs about perceptual evidence and the temporal discounting factor, optimal decision boundaries can be asymmetric and may collapse asymmetrically toward 0. Furthermore, given the asymmetry in the information value between sensor activations and inactivations, and differences in the information value of sensor activations of the two modalities, boundary crossing can lead to belief states that are far or close to the decision boundary, depending on the nature of the evidence. Together, even without an explicit modeling of post-decisional evidence, the model can account for variability in the total accumulated evidence at decision time.

      From our understanding, the proposed alternative is equivalent to our current model, in which confidence scales with evidence magnitude.

      The model was not fitted to confidence data, which could explain its overall overconfidence. To further test our model, we will assess its ability to reproduce patterns of metacognitive efficiency (M-ratios).

      (7) Confidence asymmetry index (CAI) and modality weighting

      The confidence asymmetry index (CAI) is defined as the difference between auditory and visual confidence on AV vs absent trials, and the authors report strong correlations between observed and simulated CAI across participants. They interpret this as evidence that subjects place different weights on auditory vs visual signals. Several questions arise. First, does CAI capture asymmetries beyond what is expected from accuracy differences between modalities and conditions? Second, because the simulated data are generated from model fits to the observed data, a correlation between observed and simulated CAI is expected: the model is built to reproduce the individual patterns it is then compared to. A stronger test would compare CAI from data simulated with modality-specific belief parameters, versus CAI from data simulated with constrained equal belief parameters (same θs). Relatedly, the paper would benefit from a plot showing the distribution of θs for A and V- present stimuli across subjects. These values could also be related to unimodal sensitivity measured in the calibration/training phases. A natural prediction is that higher unimodal sensitivity should correspond to higher belief parameters for presence.

      The model was not fitted to either the modality-specific responses or the confidence ratings, so the correlation between observed and simulated CAI was not expected and provides a good test of our model's ability to reproduce the observed patterns. We will test whether the same correlations hold when using the difference in accuracy instead of the confidence.

      We found that the best model is the one with the same belief across the visual and auditory sensors. Given this, we cannot investigate how modality-specific belief parameters are linked to unimodal sensitivity for each participant.

      Reviewer 2:

      Summary:

      In this study, across two experiments, the authors wrestle with the question: What is the profile of confidence judgments in presence/absence decisions for audiovisual stimuli? After thresholding observers to 50% target detection rates in each modality, the authors conducted one experiment that included 75% target presence (spread equally across bimodal, auditory, and visual targets) and one experiment with 50% overall target presence. Results showed that, overall, detection performance was higher for audiovisual stimuli compared to unimodal ones, and that a recent model for stimulus detection could be extended to this multisensory scenario. By incorporating a disjunctive rule for absence judgments and a conjunctive rule for presence judgments, the model was able to qualitatively reproduce some of the trends observed in the human data regarding confidence.

      Strengths:

      (1) The paper makes novel contributions to the study of multisensory confidence judgments for yes/no target detection.

      (2) The paper further extends the use of a leading model of stimulus detection (from Mazor et al., 2025).

      (3) Pre-registration of the study was implemented, and the code is publicly available (although the GitLab link requires registration to access the materials).

      (4) One of the empirical results (higher confidence for absence compared to presence judgments) is especially interesting, contributing another empirical finding to a very mixed literature on this topic (as the authors note).

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      (1) Page 5 - I have concerns about the use of the equal-variance model from Signal Detection Theory to analyze the data. For example, the authors should read the recent paper by Miyoshi, Rahnev, and Lau in iScience, found at this link: https://www.cell.com/iscience/fulltext/S2589-0042(26)00373-1 . In this paper, the authors note how the equal variance model should be used with caution in yes/no detection tasks, since the variances of the "stimulus present" and "stimulus absent" distributions are often different from one another. In a revision, I highly recommend that the authors explicitly discuss this paper and review whether the assumptions for the equal-variance model have been met (e.g., since they have confidence data, one way to do this would be to evaluate if the slope of the line in zROC space differs from 1). The authors may also want to incorporate methods from this iScience paper into the current manuscript, or potentially move to using an unequal variance SDT model and compute d'a and c'a.

      This is an excellent suggestion. We will run this analysis and refit the d’ and criterion response using unequal-variance models to see whether we observe the same results.

      (2) Related to the computation/measurement of the response criterion, the authors note on page 18 in the Methods that for Experiment 1, signals are actually present on 75% of trials, since a bimodal stimulus is present on 25% of trials, the visual circle only occurs on 25% of trials, the sinusoidal tone occurs on 25% of trials, and then only noise is present on 25% of trials. Did the authors have any a priori hypotheses about the response criteria that participants would exhibit in Experiment 1, considering the unbalanced target presentation rate in this task? Also, in Experiment 2, what did it mean to equate target present and target absent trials? Is it that they broke 50% target present trials down into 16.67% bimodal targets, 16.67% visual targets, and 16.67% auditory targets? A few more details would be good to explicitly note for those trying to replicate the task

      We will clarify this point in the manuscript. In Experiment 2, the stimulus was absent on 50% of the trials. As a result, the 50% of stimulus present trials were split into the three possible conditions, resulting in a sixth of the trials being auditory, a sixth visual, and a sixth audiovisual; we will make these proportions clearer in the text.

      We did not have any a priori hypotheses about the response criteria for Experiment 1. The reviewer is right, the proportion of absent versus present trials can indeed have an impact on response bias. In fact, one of the goals of Experiment 2 was to test whether the low frequency of absent trials compared to present ones could explain both response bias and higher confidence in absence observed in Experiment 1, which we found was not the case, as we did not observe a difference between the two experiments. We will clarify this in our revision.

      (3) It is important to plot the individual data for Figure 2. If the authors didn't match detection performance for the visual and auditory modalities, it would be good to see the individual data to know why. Is it that the thresholding procedure didn't work for some of the participants in the visual modality, and that's why the "yes" response rate is (on average) ~60% or higher across the two experiments? Similarly, in the auditory domain, do the authors have participants that are at floor? Or is it simply that the staircases failed to successfully target 50% detection on average?

      We will add individual data to Figure 2.

      Indeed, staircases failed to achieve 50% detection on average; participants for whom psychometric curves did not converge were excluded, as were those at floor level in one of the two modalities.

      (4) The authors mentioned that data were collected on the Prolific platform. What checks did they conduct to ensure that this data wasn't produced by bots? There are recent high-profile publications in PNAS and Behavioral Research Methods that indicate how online data collection is problematic (e.g., https://www.pnas.org/doi/10.1073/pnas.2535585123and https://link.springer.com/article/10.3758/s13428-025-02852-7 ). What analyses or quality checks are there to ensure that humans were the ones completing the task?

      Data were collected on the Prolific platform, which has been shown to yield high-quality data (Kay, 2025). However, we agree that this is a potential concern and will add a note of caution in the revised manuscript, even if the risk that the data do not come from humans but from bots is low (Huskey et al., 2026; Chetverikov, 2026).

      (5) Page 7 - Since confidence was collected on a continuous scale, the authors should say a bit more about how they were able to compute measures of metacognitive efficiency. My understanding is that to compute meta-d', the data has to be binned. How was the binning implemented? With whatever bin size the authors chose, would it make any difference to the results if they changed the number of the bins in the analysis?

      We will clarify this aspect of the analysis. Data were binned into four quartiles based on the overall distribution of confidence values across participants, based on the binning used in the example in Fleming (2017). We will examine whether changing the number of bins changes the results (Dayan, 2023).

      (6) Page 8 - Is there a prior precedent for using slope of the Bayesian logistic regression predicting accuracy from confidence as a measure of metacognitive sensitivity? If so, can the authors cite those papers as a reference? If not, can they place this analysis within the context of other measures of metacognitive sensitivity that exist? (meta-d', AUROC (Type 2), etc.)

      Yes, logistic regression has been used to quantify metacognitive sensitivity before. We will add the relevant papers as references (e.g., Sandberg et al., 2010; Norman et al., 2011; Siedlecka et al., 2016; Wierzchoń et al., 2012; Faivre et al., 2018; Pereira et al., 2023)

      (7) Page 8 - Another one of the results on page 8 is worth reflecting further upon: the authors note how in Experiment 1, no credible difference was found between unimodal and bimodal trials (DeltaM = -0.25 [-0.59, 0.10]), but in Experiment 2, "we observed higher metacognitive efficiency in unimodal compared to bimodal trials (DeltaM = -0.28 [-0.54, -0.02]. Those DeltaM values are nearly identical, so without a power analysis motivating the number of participants the authors collected, how certain are they that the results from these two experiments are really that distinct? It reminds me a bit of the Andrew Gelman blog post, "The difference between significance and non-significance is not significant".

      The number of participants was determined using a Bayesian optional stopping rule, as preregistered. The reviewer is right that the delta values are very similar in the two experiments. Given that a difference was found in only one experiment, we decided not to draw conclusions from it.

      (8) Is there any way to look at whether the presence of multisensory hallucinations (or perhaps that word is too strong, and we should simply consider them miscategorizations) increased as the task progressed? That is, the authors have repeated presentations of audiovisual stimuli for at least some percentage of the trials. Since the percentages for auditory stimuli being correctly categorized as auditory are at 85% in Experiment 1 and 79% in Experiment 2, were the trials where they miscategorized these stimuli equally spread throughout the task? Or did they come later in the experiment, after being repeatedly exposed to multisensory trials?

      We will examine how the proportion of miscategorisation changed throughout the task.

      (9) Would the authors obtain the same results if they got rid of the amodal confidence judgment in their task, and simply had participants report the bimodal confidence following the presence/absence judgment? Part of the reason for asking this is that, according to page 11, the model is only fitted to amodal detection accuracy and response time data. This surprised me. I would have expected that the bimodal confidence would provide more useful information for the model fit. The authors should further explain this rationale in the paper. It seems odd to me to have the multisensory confidence ratings and not have them play a central role in the modeling work.

      Our main goal was to investigate how participants form integrated, supramodal confidence judgments on the basis of multisensory sources of information. Therefore, the amodal confidence judgments are required here.

      Moreover, the model was fitted to response times that corresponded to the amodal judgment. Because we had no meaningful response times for the modality-specific judgment, we could not use them to fit the model.

      (10) In Figure 6, it appears the model is a bit off in its estimate of auditory responses (panel B, E) in the AV condition. Do the authors have any intuitions about why this might be happening?

      Indeed, the model does not capture the full behavioral effects reflecting multisensory interference in the modality-specific responses. We suppose that the model does not reproduce these interferences, as it is only fitted to amodal detection accuracy, and as the two sensors are completely independent from one another. We will clarify this aspect in the text.

      (11) The authors talk about how the model is reproducing effects in the human data, but there's no systematic comparison, quantitatively, of how the two things relate. The authors should include some quantitative measure that reflects this

      In addition to the d’ and criterion comparison between the observed and simulated data, we will compare modality-specific d’ and the correlations between observed and simulated confidence.

      (12) Related to this, I am not sure I agree with the characterization in Figure 7 that "when confidence followed a disjunctive rule, the model failed to capture important aspects of the data. On the other hand, when confidence followed a conjunctive rule, it reproduced confidence in presence judgments but failed to capture variability in confidence ratings for absence judgments." What, quantitatively, is the basis of this claim? This applies to Figure 8, too. I am not clear how, specifically, and quantitatively, the authors are justifying their claims about model fits. I don't think the confidence asymmetry index in Figure 8 is enough to quantify the quality of the model fitting procedure.

      To further support this claim, we will add a quantitative comparison of the different confidence fits.

      (13) Is there any chance the higher metacognitive efficiency for auditory trials is simply driven by differences in the d' values across the modalities? It might be good to probe this effect further.

      Thank you for this remark. Indeed, the difference in metacognitive efficiency may be driven by differences in the d’ values, and so a lower d’ for auditory stimuli can lead to higher metacognitive efficiency for a similar metacognitive sensitivity.

      Reviewer 3:

      This study used a pre-registered novel behavioural paradigm and computational modelling to investigate multi-sensory influences on detection and confidence. Participants performed amodal detection of auditory and visual stimuli (indicating that a stimulus was there when either an auditory stimulus or a visual stimulus or both were present), followed by amodal and unimodal confidence ratings. Detection was higher when both stimuli were present, and the presence of one modality increased the confidence in the presence of the other modality. In contrast to previous detection studies, confidence was higher for absent than for present judgements, but metacognitive efficiency was higher for present judgements. Metacognitive sensitivity was higher for bimodal stimuli, but this was not the case for metacognitive efficiency, suggesting that the sensitivity might be driven by first-order performance. The computational model showed that both detection and confidence in absence followed a disjunctive evidence integration rule, while confidence in presence followed a conjunctive integration rule.

      We thank the reviewer for engaging with our work.

      Strengths:

      The paper has several major strengths. Firstly, it addresses a novel research question using an innovative and well-controlled paradigm. Furthermore, the paradigm and analyses were pre-registered, and all effects that were interpreted were replicated in two independent samples. Finally, the paper uses an advanced computational model to capture counterintuitive patterns in the data.

      Weaknesses:

      The major weakness of the paper is the narrative structure. It is not always clear how the different analyses relate to the main research question. Many different effects are reported in terms of detection accuracy, bias, confidence and metacognition, as well as cross-modal and unimodal versus bimodal effects. It would help readability if the paper were streamlined in terms of the research question that is being answered, which I believe is specifically about multimodal absence judgements. Relatedly, for a reader not intimately familiar with the metacognition literature, the difference between MRatio, metacognitive sensitivity and metacognitive efficiency is not obvious. It would be good to clarify this more in the manuscript.

      We will improve the narrative structure so that each result clearly relates to the research question.

      We will also add a clearer definition of the various metacognition metrics to improve readability.

      In general, the conclusions drawn by the authors seem to be supported by the results. However, I was missing quantitative model comparisons between the conjunctive and the disjunctive models and an explanation of why the models systematically overestimated the confidence ratings. Furthermore, the 'perceptual multisensory interference' section reports on very interesting effects, but these are not supported by statistical tests in the main text. It would help to assess the strength of the claims if the statistical evidence in favour of these claims were presented together in the main text.

      The model was not fitted to confidence data, which could explain its overall overconfidence. As stated in previous responses, we will perform additional analyses to evaluate the model’s ability to reproduce confidence ratings. As some of the results were not replicated across experiments, we decided to put all statistical results related to multisensory interference in the supplementary materials and to focus only on consistent results across experiments.

      One other concern is that in real-world multi-sensory perception, such as the mosquito example in the introduction, the auditory and visual signals have a strong natural association, which means that if you hear the auditory signal, you expect that you will see the visual signal soon and vice versa. As far as I understood, this association was not present in the current paradigm, which might influence the type of effects that one would expect to see.

      The relation here is indeed artificial; we try to reinforce it as much as possible in the instructions of the task by indicating to the participants that they have to “detect a mosquito” that could be present auditory, visually, or both. But we acknowledge that the association between the visual and auditory stimuli is artificial, which may indeed influence our results.

      References

      Alais, D., & Burr, D. (2004). The Ventriloquist Effect Results from Near-Optimal Bimodal Integration. Current Biology, 14(3), 257‑ 262. https://doi.org/10.1016/j.cub.2004.01.029

      Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. (2003). Bayesian integration of visual and auditory signals for spatial localization. JOSA A, 20(7), 1391‑ 1397. https://doi.org/10.1364/JOSAA.20.001391

      Chetverikov, A. (2026). Online behavioral studies are safe for now : Unusual RTs do not imply bots (A reply to Van der Stigchel et al., 2026) (Gjw5u_v1). PsyArXiv. https://osf.io/preprints/psyarxiv/gjw5u_v1/

      Dayan P. (2023). Metacognitive Information Theory. Open mind : discoveries in cognitive science, 7, 392–411. https://doi.org/10.1162/opmi_a_00091

      Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), Article 6870. https://doi.org/10.1038/415429a

      Faivre, N., Filevich, E., Solovey, G., Kühn, S., & Blanke, O. (2018). Behavioral, Modeling, and Electrophysiological Evidence for Supramodality in Human Metacognition. Journal of Neuroscience, 38(2), 263‑ 277. https://doi.org/10.1523/JNEUROSCI.0322-17.2017

      Fleming, S. M. (2017). HMeta-d : Hierarchical Bayesian estimation of metacognitive efficiency from confidence ratings. Neuroscience of Consciousness, 2017(1),

      Huskey, R., Zhao, Z., Parry, D. A., & Fisher, J. T. (2026). An AI agent can complete the Attention Network Test with human-like behavioral signatures : Implications for the bot-or-not debate (T2jru_v1). PsyArXiv. https://osf.io/preprints/psyarxiv/t2jru_v1/

      Kay, C.S. Why you shouldn’t trust data collected on MTurk. Behav Res 57, 340 (2025). https://doi.org/10.3758/s13428-025-02852-7nix007. https://doi.org/10.1093/nc/nix007

      Norman, E., Price, M. C., & Jones, E. (2011). Measuring strategic control in artificial grammar learning. Consciousness and Cognition, 20(4), 1920-1929. https://doi.org/10.1016/j.concog.2011.07.008

      Pereira, M., Skiba, R., Cojan, Y., Vuilleumier, P., & Bègue, I. (2023). Preserved Metacognition for Undetected Visuomotor Deviations. Journal of Neuroscience, 43(35), 6176‑ 6184. https://doi.org/10.1523/JNEUROSCI.0133-23.2023

      Rahnev, D. (2025). A comprehensive assessment of current methods for measuring metacognition. Nature Communications, 16(1), 701. https://doi.org/10.1038/s41467-025-56117-0

      Sandberg, K., Timmermans, B., Overgaard, M., & Cleeremans, A. (2010). Measuring consciousness : Is one measure better than the other? Consciousness and Cognition, 19(4), 1069‑ 1078. https://doi.org/10.1016/j.concog.2009.12.013

      Siedlecka, M., Paulewicz, B., & Wierzchoń, M. (2016). But I Was So Sure ! Metacognitive Judgments Are Less Accurate Given Prospectively than Retrospectively. Frontiers in Psychology, 0. https://doi.org/10.3389/fpsyg.2016.00218

      Wierzchoń, M., Asanowicz, D., Paulewicz, B., & Cleeremans, A. (2012). Subjective measures of consciousness in artificial grammar learning task. Consciousness and cognition, 21(3), 1141-1153. https://doi.org/10.1016/j.concog.2012.05.012

    1. Author response:

      We sincerely thank the Reviewing Editor (Dr. Florent Ginhoux), Senior Editor (Dr. Satyajit Rath), and both reviewers for their thoughtful and constructive evaluation of our manuscript. We appreciate the recognition that our study provides a valuable observation regarding the TLR7-independent effects of imiquimod (IMQ) via the unfolded protein response (UPR) and Gelsolin in psoriasis-like dermatitis. Importantly, we acknowledge that the current framing may overemphasize direct relevance to human psoriasis. In the revised manuscript, we will reposition the study to focus on IMQ-induced skin inflammation as a model of chemical- and stress-induced inflammatory responses, rather than a direct representation of human plaque psoriasis. We also acknowledge that the mechanistic link between Gelsolin and skin inflammation remains incomplete, and we are committed to addressing the key concerns raised.

      Below, we outline our planned revisions in response to the public reviews. We will submit a revised version after performing the additional experiments and textual improvements.

      Reviewer #1 (Public review):

      We fully agree that the exclusive use of the IMQ model has limitations in fully recapitulating human plaque psoriasis, which is primarily driven by the IL-23/IL-17 axis involving Th17/Tc17 cells. We will substantially temper our claims regarding direct translational relevance to human psoriasis and clearly discuss the IMQ model as a tool to study innate immune-driven and chemical stress-induced inflammation in the skin (new Discussion section). In addition, we will strengthen the rationale for focusing on Gelsolin by incorporating available human data suggesting altered Gelsolin expression in inflammatory conditions.

      (1) We will add a dedicated paragraph in the Introduction and Discussion acknowledging the differences between IMQ-induced dermatitis and human psoriasis (citing key references such as PMID: 28945199).

      (2) For keratinocyte experiments, we will revise the text to avoid implying that keratinocytes stimulated with IMQ represent a psoriasis model, and instead position this system more conservatively. Specifically, we will treat keratinocytes as a system to assess AMP and chemokine induction rather than as a direct model of psoriasis. We will therefore incorporate stimulation with IL-17A (100 ng/ml) ± TNF-α (10 ng/ml) to establish AMP/chemokine induction, and additionally examine the effect of UPR activation by co-treatment with DTT (or other UPR inducers). This will allow us to determine whether UPR activation enhances IL-17A/TNF-α-driven AMP and chemokine expression.

      (3) We will expand the Methods section with full details on RNA-seq dataset selection, normalization, cross-species mapping, and statistical analysis, and re-evaluate key analyses where necessary to ensure robustness and reproducibility. Canonical psoriasis signature genes (e.g., S100A8/A9, IL-17C, IL-36g) will be validated by qRT-PCR in the revised manuscript.

      (4) Vehicle controls (including Aldara-specific effects) will be clearly described and shown in all relevant figures.

      Reviewer #2 (Public review):

      We thank the reviewer for recognizing the strengths in demonstrating TLR7-independent UPR induction and Gelsolin as an IMQ-binding protein.

      (1) To strengthen the mitochondrial Ca<sup>2+</sup> signaling data (Fig. 1B), we will add an orthogonal approach (e.g., pharmacological inhibition or alternative Ca<sup>2+</sup> probe) in a new supplementary figure.

      (2) For Gelsolin-IMQ interaction specificity (Fig. 7E-G), we will perform additional experiments comparing IMQ versus RSQ (resiquimod) effects on the observed phenotypes, as recommended.

      We believe these revisions will substantially address the key concerns raised by the reviewers and strengthen the overall quality of the manuscript.

      We again thank the reviewers and editors for their time and valuable feedback, which will significantly improve the manuscript.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading to this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al.makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1-associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) Provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) Identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Nonetheless, the study has several shortcomings in methodology, analysis, and conceptual insight, which limits its overall impact.

      Below I outline several issues that the authors should address to strengthen their findings.

      Major comments:

      (1) Co-localization of cdhr1a and pcdh15b proteins

      The model proposed by the authors is that the interaction of cdhr1a and pcdh15b occurs in trans as a heterodimer. In cochlear hair cells, PCDH15 and CDHR23 are proposed to interact first as dimers in cis and then as heteromeric complexes in trans. This was not shown here for cdhr1a and pcdh15b, but it is a plausible configuration, as are single heteromeric dimers or homodimers. Regardless, this model depends on the differential compartmental expression of the cdhr1a and pcdh15b proteins. Data in Figure 1 show convincing evidence that these two proteins can, at least in some cases, be distributed along the length of photoreceptor membranes that are juxtaposed, as would be the case for OS and CP. If pcdh15b is predominantly expressed in CPs, whereas cdhr1a is predominantly expressed in OS, then this should be confirmed with actin double labeling with cdhr1a and pcdh15b since the apicobasal oriented (vertical) CPs would express actin in this same orientation but not in the OS. This would help to clarify whether cdhr1a and pcdh15b can be trafficked to both OS and CP compartments or whether they are mutually exclusive.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we are completed imaging of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections (Fig 1C-H). Additionally, we have recently established an immuno-gold-TEM protocol and showcase co-labeling of cdhr1a and pcdh15b at TEM resolution along the CP (Fig 1I).

      Photoreceptor heterogeneity goes beyond the cone versus rod subtypes discussed here and it is known that in zebrafish, CP morphology is distinct in different cone subtypes as well as cone versus rod. It would be important to know which specific photoreceptor subtypes are shown in zebrafish (Figures 1A-C) and the non-fish species depicted in Figures 1E-L. Also, a larger field of view of the staining patterns for Figures 1E-L would be a helpful comparison (could be added as a supplementary figure).

      The revised manuscript includes labels for the location of different cone subtypes in figure 1. All of the images showcasing CHDR1 localization across species concentrate on the PNA positive R/G cones. Larger fields of view were not collected as we prioritized the highest resolution possible and therefore collected small fields of view.

      (2) Cdhr1a function in cell culture

      The authors should explain the multiple bands in the anti-FLAG blots. Also, it would be interesting to confirm that the cdhr1a D173 mutant prevents the IP interaction with pcdh15b as well as the additive effects in aggregate assays of Figure 2.

      The multiple bands on the WB is like our previous results (Piedade 2020), which we believe arise due to ubiquitination and proteolytic cleavage of cdhr1a. We expect the D173 mutation to result in a complete absence of cdhr1a polypeptide, based on the lack of in situ signal in our WISH studies.

      Is it possible that the cultured cells undergo proliferation in the aggregation assays shown in Figure 2? Cells might differentially proliferate as clusters form in rotating cultures. A simple assay for cell proliferation under the different transfection conditions showing no differences would address this issue and lend further support to the proposed specific changes to cell adhesion as a readout of this assay.

      This is a possibility; however we did not use rotating cultures, this was a monolayer culture. We did not observe any differences in total cell number between the differing transfections. As such, we do not feel proliferation explains the aggregation of K562 cells.

      Also, the authors report that the number of clusters was normalized to the field of view, but this was not defined. Were the n values different fields of view from one transfection experiment, or were they different fields of view from separate transfection experiments? More details and clarification are needed.

      This will be clarified in the revised manuscript, in short we replicated this experiment 3 times, quantifying 5 different fields of view in each replicate.

      (3) Methodological issues in quantification and statistical analyses

      Were all the OS and CP lengths counted in the observation region or just a sample within the region? If the latter, what were the sampling criteria? For CPs, it seems that the length was an average estimate based on all CPs observed surrounding one cone or one-rod cell. Is this correct? Again, if sampled, how was this implemented? In Fig 4M', the cdhr1a-/- ROS mostly looks curvilinear. Did the measurements account for this, or were they straight linear dimension measurements from base to tip of the OS as depicted in Fig 5A-E? A clearer explanation of the OS and CP length quantification methodology is required.

      The revised manuscript will clearly outline measurement methods. In short, we measured every CP/OS in the imaged regions. We did not average CPs/cell, we simply included all CP measurements in our analysis. All our CP measurements (actin or cdhr1a or pcdh15), were measured in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements (landmark) and association with proper cell type. Our new figure 7 now includes cone OS counter staining to better highlight the OS.

      All measurements were taken as best as possible to reflect a straight linear dimension for consistency.

      How were cone and rod photoreceptor cell counts performed? The legend in Figure 4 states that they again counted cells in the observation region, but no details were provided. For example, were cones and rods counted as an absolute number of cells in the observation region (e.g., number of cones per defined area) or relative to total (DAPI+) cell nuclei in the region? Changes in cell density in the mutant (smaller eye or thinner ONL) might affect this quantification so it would be important to know how cell quantification was normalized.

      The revised manuscript will clearly outline measurement methods. In short, rod and cone cell counts were based on the number of outer segments that were observed in the imaging region and previously measured for length. We did not observe any eye size differences in our mutant fish.

      In Figure 6I, K, measuring the length of the signal seems problematic. The dimension of staining is not always in the apicobasal (vertical) orientation. It might be more accurate to measure the cdhr1a expression domain relative to the OS (since the length of the OS is already reduced in the mutants). Another possible approach could be to measure the intensity of cdhr1 staining relative to the intensity within a Prph2 expression domain in each group. The authors should provide complementary evidence to support their conclusion.

      The revised manuscript will clearly outline measurement methods. In short, all of our CP measurements (actin or cdhr1a or pcdh15), were done in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements and association with proper cell type.

      A better description of the statistical methodology is required. For example, the authors state that "each of the data points has an n of 5+ individuals." This is confusing and could indicate that in Figure 4F alone there were ~5000 individuals assayed (~100 data points per treatment group x n=5 individuals per data point x 10 treatment groups). I don't think that is what the authors intended. It would be clearer if the authors stated how many OS, CP, or cells were counted in their observation region averaged per individual and then provided the n value of individuals used per treatment group (controls and mutants), on which the statistical analyses should be based.

      This has been addressed in the revised manuscript. In short, we had an n=5 (individual fish) analyzed for each genotype/time point.

      There are hundreds of data points in the separate treatment groups shown in several of the graphs. It would not be correct to perform the ANOVA on the separate OS or CP length measurements alone as this will bias the estimates since they are not all independent samples. For example, in Figure 6H, 5dpf pcdh15b+/- have shorter CPs compared to WT but pcdh15b-/- have longer compared to WT. This could be an artifact of the analysis. Moreover, the authors should clarify in the Methods section which ANOVA post hoc tests were used to control for multiple pairwise comparisons.

      We have re-analyzed the data using multiple pairwise comparison ANOVA with post hoc tests (Tukey test). This new analysis did not significantly alter the statistical significance outcome of the study.

      (4) Cdhr1a function in photoreceptors

      The Cdhr1a IHC staining in 5dpf WT larvae in Figure 3E appears different from the cdhr1a IHC staining in 5dpf WT larvae in Figure 1A or Figure 6I. Perhaps this is just the choice of image. Can the authors comment or provide a more representative image?

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we have included an image that better represents cdhr1a staining in the WT and mutant.

      The authors show that pcdh15b localization after 5dpf mirrored the disorganization of the CP observed with actin staining. They also show in Figure 5O that at 180dpf, very little pcdh15b signal remains. They suggest based on this data that total degradation of CPs has occurred in the cdhr1a-/- photoreceptors by this time. However, although reduced in length, COS and cone CPs are still present at 180dpf (Figure 5E, E'). Thus, contrary to the authors' general conclusion, it is possible that the localization, trafficking, and/or turnover of pcdh15b is maintained through a cdhr1a-dependent mechanism, irrespective of the degree to which CPs are maintained. The experiments presented here do not clearly distinguish between a requirement for maintenance of localization versus a secondary loss of localization due to defective CPs.

      We agree, this point has been addressed in our revised manuscript. Additionally, we have also included data from 1 and 2 year old samples.

      (5) Conceptual insights

      The authors claim that cdhr1a and pcdh15b double mutants have synergistic OS and CP phenotypes. I think this interpretation should be revisited.

      First, assuming the model of cdhr1a-pcdh15b interaction in trans is correct, the authors have not adequately explained the logic of why disrupting one side of this interaction in a single mutant would not give the same severity of phenotype as disrupting both sides of this interaction in a double mutant.

      Second, and perhaps more critically, at 10dpf the OS and CP lengths in cdhr1a-/- mutants (Figure 7J, T) are significantly increased compared to WT. In contrast, there are no significant differences in these measurements in the pcdh15b-/- mutants. Yet in double homozygous mutants, there is a significant reduction of ~50% in these measurements compared to WT. A synergistic phenotype would imply that each mutant causes a change in the same direction and that the magnitude of this change is beyond additive in the double mutants (but still in the same direction). Instead, I would argue that the data presented in Figure 7 suggest that there might be a functionally antagonistic interaction between cdhr1a and pcdh15b with respect to OS and CP growth at 10dpf.

      If these proteins physically interacted in vivo, it would appear that the interaction is complex and that this interaction underlies both OS growth-promoting and growth-restraining (stabilizing) mechanisms working in concert. Perhaps separate homodimers or heterodimers subserve distinct CP-OS functional interactions. This might explain the age-dependent differences in mutant CP and OS length phenotypes if these mechanisms are temporally dynamic or exhibit distinct OS growth versus maintenance phases. Regardless of my speculations, the model presented by the authors appears to be too simplistic to explain the data.

      We agree with the reviewer, as such we have revised the discussion in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binding assay, and high-resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely oppose PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicates these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potentially stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone-specific phenotypes associated with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, the results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption are not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Figures 4F, 6E) as well as other morphometric data (Figure 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also note whether the analysis was done in an automated and/or masked manner.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      The revised manuscript outlines both methods and statistics used for quantitation of our data. (please see comments from reviewer 1). While we do not include direct evidence of the mechanism of CDHR1 function, we do propose that its role is important in anchoring the CP and the OS, particularly in the cones, while in rods it may serve to regulate the release of newly formed disks (as previously proposed in mice). We do plan to test both of these hypothesis directly, however, that will be the basis of our future studies.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss is less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty of this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      Weaknesses:

      (1) The imaging data in Figure 1 is insufficient to show the specific localization of Pcdh15 to calyceal processes or Cdhr1a to the outer segment membrane. The addition of actin co-labelling with Pcdh15/Cdhr1a would be a good start, as would axial sections. The division into rod and cone-specific imaging panels is confusing because the two cell types are in close physical proximity at 5 dpf, but the cone Cdhr1a expression is somehow missing in the rod images. The SIM data appear to be disrupted by chromatic aberration but also have no context. In the zebrafish image, the lines of Pcdh15/Cdhr1a expression would be 40-50 um in length if the scale bar is correct, which is much longer than the outer segments at this stage and therefore hard to explain.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we have added images of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections. Additionally, we have established an immuno-gold-TEM protocol and provide data showcasing co-labeling of cdhr1a and pcdh15b at TEM resolution.

      (2) Figure 3E staining of Cdhr1a looks very different from the staining in Figure 1. It is unclear what the authors are proposing as to the localization of Cdhr1a. In the lab's previous paper, they describe Cdhr1a as being associated with the connecting cilium and nascent OS discs, and fail to address how that reconciles with the new model of mediating CP-OS interaction. And whether Cdhr1a localizes to discrete domains on the disc edges, where it interacts with Pcdh15 on individual calyceal processes.

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we include an image that better represents cdhr1a staining in the WT and mutant.

      (3) The authors state "In PRCs, Pcdh15 has been unequivocally shown to be localized in the CPs". However, the immunostaining here does not match the pattern seen in the Miles et al 2021 paper, which used a different antibody. Both showed loss of staining in pcdh15b mutants so unclear how to reconcile the two patterns.

      We agree that our staining appears different, but we attribute this to our antigen retrieval protocol which differed from the Miles et al paper. We also point to the fact that pcdh15b localization has been shown to be similar to our images in other species (monkey and frog). As such, we believe our protocol reveals the proper localization pattern which might be lost/hampered in the procedure used in Miles et al 2021.

      (4) The explanation for the CRISPR targets for cdhr1a and the diagram in Figure 3 does not fit with crRNA sequences or the mutation as shown. The mutation spans from the latter part of exon 5 to the initial portion of exon 6, removing intron 5-6. It should nevertheless be a frameshift mutation but requires proper documentation.

      This was an overlooked error in figure making, we have corrected this typo in the revised manuscript.

      (5) There are complications with the quantification of data. First, the number of fish analyzed for each experiment is not provided, nor is the justification for performing statistics on individual cell measurements rather than using averages for individual fish. Second, all cone subtypes are lumped together for analysis despite their variable sizes. Third, t-tests are inappropriately used for post-hoc analysis of ANOVA calculations.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript.

      (6) Unclear how calyceal process length is being measured. The cone measurements are shown as starting at the external limiting membrane, which is not equivalent to the origin of calyceal processes, and it is uncertain what defines the apical limit given the multiple subtypes of cones. In Figure 5, the lines demonstrating the measurements seem inconsistently placed.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript. We have also clarified that CP measurements were made based on a counterstain for the cone/rod OS so that the actin signal was only CP associated. We have included the counter stain in our revised Figure 7.

      (7) The number of fish analyzed by TEM and the prevalence of the phenotype across cells are not provided. A lower magnification view would provide context. Also, the authors should explain whether or not overgrowth of basal discs was observed, as seen previously in cdhr1-null frogs (Carr et al., 2021).

      The revised manuscript now includes the n number for our TEM samples. We have also added text comparing our results directly to Carr 2021.

      (8) The statement describing the separation between calyceal processes and the outer segment in the mutants is not backed up by the data. TEM or co-labelling of the structures in SIM could be done to provide evidence.

      We have completed both more SIM as well as immuno-gold TEM to support our conclusions, see new Figure 1.

      (9) "Based on work in the murine model and our own observations of rod CPs, we hypothesize that zebrafish rod CPs only extend along the newly forming OS discs and do not provide structural support to the ROS." Unclear how murine work would support that conclusion given the lack of CPs in mice, or what data in the manuscript supports this conclusion.

      In the revised manuscript we have adjusted our discussion to hypothesize that the small length of rod CPs is most likely to represent their interaction with newly forming discs rather than connect with mature discs which are enclosed in the OS.

      (10) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs" without providing a reference. In the manuscript, the measurements do show rod CPs to be shorter, but there are errors in the cone measurements, and it is possible that the RPE pigment is interfering with the rod measurements.

      We have included references where rod CPs have been found to be shorter. We have no doubt that in zebrafish the rod CPs are significantly shorter. All our CP measurements are done with a counter stain for rods and cones to be sure that we are measuring the correct cell type.

      (11) The discussion should include a better comparison of the results with ocular phenotypes in previously generated pcdh15 and cdhr1 mutant animals.

      The revised manuscript has included these points.

      (12) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

      We assure the reviewer that each of the images in supplemental figure 1 are distinct and represent different in situ experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the second sentence of the Introduction section, the acronym 'PRC' should be defined.

      This has been corrected

      (2) In the Discussion section, it would be useful to comment on differences between the published Xenopus cdhr1-/- OS phenotypes and the published zebrafish pcdh15b-/- OS phenotypes compared to the present zebrafish cdhr1a-/- phenotypes. In the published studies, OS in these mutants demonstrated dysmorphic and overgrown disc membranes compared to the relatively minor disc layering defects shown for cdhr1a-/- in the present study.

      This discussion has been added.

      (3) CDHR1 mutations in patients cause cone-rod dystrophy, but mutations in PCDH15 (Usher 1F) cause rod-cone dystrophy. In the Discussion section, the authors should comment on what might lead to these different phenotypic trajectories in humans in the context of their proposed model.

      We have added to our discussion highlighting that is not possible to assess rod-cone dystrophy in the pcdh15b model as the mutation is lethal by 15dpf, which is still before most rods mature.

      Reviewer #2 (Recommendations for the authors):

      In addition to defining the 'n' for animal and cell numbers (as well as methods of analysis - automated/masked), there are a few additional recommendations for the authors.

      (1) Expression of USH1 genes in larval zebrafish (Figure S1) is not very convincing. SC RNAseq data exists and argues against this cell type restriction.

      Based on extensive experience with WISH we are confident that our interpretation of the data are valid. Furthermore, analysis of the daniocell data base confirms that cdh23, ush1ga, ush1c (harmonin) and myo7aa all have either no expression in photoreceptors or very low levels especially compared to pcdh15b and cdhr1a.

      (2) The model in Figure 1 is great. The coloring was a bit confusing. Cdhr1 and axoneme are both in green, while Pcdh15 and actin are both in red. Can each have its own color?

      Changed pcdh15b color to blue

      (3) Figure 2A: Please explain the multiple bands in some lanes. What do the full blots look like?

      Full blots were uploaded to eLife and do not exhibit any additional bands. The multiple bands are likely due to ubiquitination or proteolytic cleavage of cdhr1a and have been documented in our previous publication (Piedade 2020).

      (4) Is "data not shown" permissible? (lack of compensation of cdh1b in cdh1a mutants) (nonsense-mediated decay of the mutant transcript).

      We have added a supplementary figure showcasing this data.

      (5) Figure 4: Is there a TEM phenotype in discs before 15dpf? One would think there would be...?

      Due to technical limitations, we have not been able to examine disc phenotypes prior to 15dpf.

      (6) Figure 5: How are calyceal processes discriminated from cortical/PM-associated actin? A bonafide calyceal marker seems to be needed. Espin or Myo3, for example.

      We discriminate to identify CPs as actin signal that originates at the base of the OS and travels along the OS. Pcdh15b is a bonafinde CP marker which we show overlaps with actin signal along CPs.

      (7) Figures 5A-J: How is actin staining for CPs discriminating between rod and cones??? Apical - basal level imaging? This could be better clarified.

      CP identification is based on co-stain for either rod or cone Oss

      (8) Figure 6: Het phenotype for pcdh15b+/- (cone OS length and CP length at 5 and 10 dpf) is surprising ... worth discussing. (Figures 6E, H).

      The discussion section has been updated to discuss this finding.

      (9) Last, the authors state "Data not shown" throughout the manuscript. I do not believe this is allowed for the journal.

      This data (cdhr1b expression in cdhr1a mutants as well as cdhr1a WISH in cdhr1a mutants) has been added as supplementary figures.

      Reviewer #3 (Recommendations for the authors):

      Major comments are addressed above and the most important is the need for a convincing demonstration of Cdhr1a localization on the outer segment and proximity to Pcdh15b. The SIM could be a powerful tool, but the images provided are impossible to assess without any basis for context. Could a membrane, Prph2, and/or actin label be added? And lower magnification views?

      Minor comments.

      (1) The mention of "short CPs" in rodents is not an accurate description. Particular rodents (e.g. mouse, rat) lack CPs altogether or have a single vestigial structure.

      We have adjusted the text to reflect this point.

      (2) Inconsistent spacing between numbers and units.

      We have corrected these inconsistencies

      (3) Missing references.

      We have added missing references

      (4) Indicate the mean or median for bar graphs.

      The materials and methods section now specifies that all of our graphs depict a mean value

      (5) Unclear how rods are distinguished from cones in the cone analysis if both are labeled with prph2 antibody.

      Rods are physiological separate from cones in zebrafish retina and therefore easily identified by location as well as their distinct pattern of actin staining.

      (6) Red and green should not be used together for microscopy images.

      (7) The diagram in Figure 1D is confusing because of the repeated use of red and green for disparate structures. Also, the location and structure of actin are misrepresented, as is the transition of disc structure during maturation in rods.

      We have adjusted the color of pcdh15b to blue.

    1. Author response:

      Thank you very much for your careful evaluation of our manuscript entitled “Cross-Species BAC Transgenesis Reveals Long-Range Regulation Drives Variation in Brain Oxytocin Receptor Expression and Social Behaviors.” We sincerely appreciate the insightful and constructive comments from both reviewers.

      We are particularly encouraged by the positive assessment that our study provides a useful experimental framework and resource for understanding how regulatory variation contributes to diversity in brain expression patterns and social behaviors. We have carefully considered all comments and outline below the key revisions we will implement in the revised manuscript.

      Conceptual clarification: We will clarify the conceptual framework of the study. While our initial aim was to test whether prairie vole regulatory elements could recapitulate vole-like Oxtr expression patterns in mice, the generation of multiple independent Koi lines revealed that such expression is not faithfully reproduced but instead varies across lines. This observation led us to refocus the study on how regulatory architecture gives rise to diverse expression patterns and their functional consequences. Accordingly, we will revise the manuscript to emphasize that the goal is not to reconstruct prairie vole circuits, but to test how variation in Oxtr expression distribution drives variation in social behaviors.

      Quantification of expression patterns: We will include quantitative analyses of Oxtr expression in both brain and mammary gland tissues. These additions will provide an objective basis for comparing tissue-specific expression and support the conclusion that brain expression is more variable, whereas mammary gland expression is broadly conserved. We will include qRT-PCR data to support mammary gland comparisons.

      Behavioral interpretation: We will clarify that the behavioral analyses are designed to assess how distinct Oxtr expression patterns influence social behaviors within a controlled mouse system, rather than to directly replicate prairie vole phenotypes. We will refine the manuscript to clearly distinguish between partial resemblance to prairie vole expression and the broader goal of linking regulatory variation to behavioral diversity.

      Technical clarification and limitations: We will revise the manuscript to more carefully interpret the roles of genomic integration site and transgene copy number, noting that while integration site likely plays a major role, contributions from copy number cannot be excluded. In addition, we will explicitly acknowledge that our analyses of 3D chromatin architecture are correlative in nature, and that establishing causality would require direct perturbation of chromatin structure, which is beyond the scope of the current study.

      Presentation improvements: We will improve figure clarity, include representative reference images from prairie vole brain to facilitate qualitative comparison, and refine descriptions in the Results and Methods sections to enhance clarity and readability.

      We thank the reviewers again for their insightful and constructive feedback, which we believe will significantly strengthen the manuscript. We look forward to submitting a revised version incorporating these improvements.

    1. Author response:

      General Statements

      We thank the reviewers for their insightful and constructive comments, which have substantially strengthened the manuscript. We have addressed all concerns and replaced the previous nonquantitative RNA-seq analysis with a new analysis that allowed for quantitative assessment. We were encouraged to find that the revised analysis not only confirmed our original observations but also reinforced and extended our conclusions.

      Point-by-point description of the revisions

      Reviewer #1:

      Significance

      At its current stage, this work represents a robust resource for molecular parasitology research programs, paving the way for mechanistic studies on multilayered gene expression control and it would benefit from experimental evidence for some of the claims concerning the in silico regulatory networks. Terms like "regulons", "recursive feedback loop" are employed without solid confirmation or extensive literature support. In my view, the most relevant contribution of this study is centered in the direct association between proteasome-dependent degradation and Leishmania differentiation.

      We thank the reviewer to acknowledge the impact of our work as a robust resource for further mechanistic studies. We agree that the new concepts emerging from our multilayered analysis should be experimentally assessed. However, given the scope of our analysis (i.e. a complete systems-level analysis of bona fide, hamster-isolated L. donovani amastigotes and derived promastigotes) and the amount of data presented in the current manuscript, such functional genetic analysis will merit an independent, in-depth investigation. The current version has been very much toned down and modified to emphasize the impact of our work as a powerful new resource for downstream functional analyses.  

      Evidence, reproducibility and clarity

      The narrative becomes somewhat diffuse with the shift to putative multilevel regulatory networks, which would benefit from further experimental validation.

      We agree with the reviewer and toned down the general discussion while suggesting putative multilevel regulatory networks for follow-up, mechanistic analyses. We now emphasize those networks for which evidence in trypanosomatids and other organisms has been published. Experimental validation of some of these regulatory networks is outside the scope of our manuscript and will be pursued as part of independent investigations.

      Major issues

      Fig.1D suggests a significant portion of the SNPs are exclusive, with a frequency of zero in one of the two stages. Were only the heterozygous and minor alleles plotted in Fig.1D, since frequencies close to 1 are barely observed? Is the same true in Sup Fig. S2B? Why do chrs 4 and 33 show unusual patterns in S2B?

      We thank the reviewer for this observation. The SNPs exclusive to either one or the other stage are likely the result of the 10% cutoff we use for this kind of analysis (eliminating SNPs that lack sufficient support, i.e. less than 10 reads). Due to bottle neck events (such as in vitro culture or stage differentiation), many low frequency SNPs are either ‘lost’ (filtered out) or ‘gained’ (passing the 10% cutoff) between the ama and pro samples. All SNPs above 10% were plotted. The absence of SNPs at 100% is one of the hallmarks of the Ld1S L. donovani strain we are using. Instead, these parasites show a majority of SNPs at a frequency of around 50%, which is likely a sign of a previous hybridization event. Chr 4 and chr 33 show a very low SNP density, most likely as they went through a transient monosomy at one moment of their evolutionary history, causing loss of heterozygosity. We now explain these facts in the figure legend.

      Chr26 revealed a striking contrasting gene coverage between H-1 and the other two samples. While a peak is observed for H-1 in the middle of this chr, the other two show a decrease in coverage. Is there any correlation with the transcriptomic/proteomic findings?

      This analysis is based on normalized median read depth, taking somy variations into account. This is now more clearly specified in the figure legend. We do not see any significant expression changes that would correlate with the observed (minor) read depth changes. As indicated in the legend, we do not consider such small fluctuations (less than +/- 1,5 fold) as significant. The reversal of the signal for chr 26 sample H1 eludes us (but again, these fluctuations are minor and not observed at mRNA level).

      The term "regulon" is used somewhat loosely in many parts of the text. Evidence of co-transcriptomic patterns alone does not necessarily demonstrate control by a common regulator (e.g., RNA-binding protein), and therefore does not fulfill the strict definition of a regulon. It should be clear whether the authors are highlighting potential multiple inferred regulons within a list of genes or not. Maybe functional/ gene module/cluster would be more appropriate terms.

      We thank the reviewer for this important comment. We replaced ‘regulon’ throughout the manuscript by ‘co-regulated, functional gene clusters’ (or similar).

      It is unclear whether the findings in Fig.3E are based on previous analysis of stagespecific rRNA modifications or inferred from the pre-snoRNA transcriptomic data in the current work or something else. I struggle to find the significance of presenting this here.

      We thank the reviewer for this comment. Yes, these data show stage-specific rRNA modifications based on previous analyses that mapped stage-specific differences of pseudouridine (Y) (Rajan et al., Cell Reports 2023, DOI: 10.1016/j.celrep.2024.114203) and 2'O-modifications (Rajan et al., Nature Com, in revision) by various RNA-seq analyses and cryoEM. This figure has been modified in the revised version to consider the identification of stageregulated snoRNAs in our new and statistically robust RNA-seq analysis. These data are shown to further support the existence of stage-regulated ribosomes that may control mRNA translatability, as suggested by the enriched GO terms ‘ribosome biogenesis’, ‘rRNA processing’ and ‘RNA methylation’ shown in Figure 2. We better integrated these analyses by moving the panels from Figure 3 to Figure 2.

      The protein turnover analysis is missing the critical confirmation of the expected lactacystin activity on the proteasome in both ama and pro. A straightforward experiment would be an anti-polyUb western blotting using a low concentration SDS-PAGE or a proteasome activity assay on total extracts.

      We thank the reviewer for this comment and have now included an anti-polyUb Western blot analysis (see Fig S7).

      The viability tests upon lactacystin treatment need a positive control for the PI and the YoPro staining (i.e., permeabilized or heat-killed promastigotes).

      This control is now included in Fig S7 and we have added the corresponding description to the text.

      I found that the section on regulatory networks was somewhat speculative and less focused. Several of the associated conclusions are, in some parts, overstated, such as in "uncovered a similar recursive feedback loop" (line 566) or "unprecedented insight into the regulatory landscape" (line 643). It would be important to provide some form of direct evidence supporting a functional connection between phosphorylation/ubiquitination, ribosome biogenesis/proteins and gene expression regulation.

      We agree with the reviewer and have considerably toned down our statements. Functional analyses to investigate and validate some of the shown network interactions are planned for the near future and will be published separately.

      Minor issues

      (1) The ordinal transition words "First,"/"Second," are used too frequently in explanatory sections. I noted six instances. I suggest replacing or rephrasing some to improve flow.

      Rectified, thanks for pointing this out.

      (2) Ln 168: Unformatted citations were given for the Python packages used in the study.

      Rectified, thanks for pointing this out.

      (3) Fig.1D: "SNP frequency" is the preferred term in English.

      Corrected.

      (4) Fig.2A: not sure what "counts}1" mean.

      This figure has been replaced.

      (5) Ln 685: "Transcripts with FC < 2 and adjusted p-value > 0.01 are represented by black dots" > This sentence is inaccurate. The intended wording might be: "Transcripts with FC < 2 OR adjusted p-value > 0.01 are represented by black dots"

      We thank the reviewer and corrected accordingly.  

      (6) Ln 698: Same as ln 685 mentioned above.

      We thank the reviewer and corrected accordingly.

      (7) Fig.2B and elsewhere: The legend key for the GO term enrichment is a bit confusing. It seems like the color scales represent the adj. p-values, but the legend keys read "Cluster efficiency" and "Enrichment score", while those values are actually represented by each bar length. Does light blue correspond to a max value of 0.05 in one scale, and dark blue to a max value of 10-7 in the other scale?

      This was corrected in the figure and the legends were updated accordingly.

      (8) Sup Figure S3A and S4A: The hierarchical clustering dendrograms are barely visible in the heatmaps.

      Thanks for the comment. Figure S3 was removed and replaced by a hierarchical clustering and a PCA plot.

      (9) S3A Legend: The following sentence sounds a bit awkward: "Rows and columns have been re-ordered thanks to a hierarchical clustering". I suggest switching "thanks to a hierarchical clustering" to "based on hierarchical clustering".

      This figure was removed and the legend modified.

      (10) Fig.5D: The font size everywhere except the legend key is too small. In addition, on the left panel, gene product names are given as a column, while on the right, the names are shown below the GeneIDs. Consistency would make it clearer.

      Thank you, this is now rectified. To ensue readability, we reduced the number of shown protein kinase examples.

      Reviewer #2 Evidence, reproducibility and clarity:

      In the absence of riboprofiling the authors return to the RNA-seq to assess the levels of pre-Sno RNA (the role of the could be more explicitly stated).

      We thank the reviewer for this comment. We moved the snoRNA analysis from Fig 3 to Fig 2 (see also the similar comment of reviewer 1), which better integrates and justifies this analysis. Based on the new and statistically robust RNA-seq analysis, the volcano plot showing differential snoRNA expression and possible ribosome modification has been adjusted (Figures 2C and D).

      The authors provide a clear and comprehensive description of the data at each stage of the results and this in woven together in the discussion allowing hypotheses to be formed on the potential regulatory and signalling pathways that control the differentiation of amastigotes to promastigotes. Given the amount and breadth of data presented the authors are able to present a high-level assessment of the processes that form feedback loops and/or intersectional signalling, but specific examples are not picked out for deeper validation or exploration.

      We thank the reviewer to acknowledge the amount and breadth of data presented. As indicated above (see responses to reviewer 1), mechanistic studies will be conducted in the near future to validate some of the regulatory interactions. These will be subject of separate publications. As noted above (response to reviewer 1), we toned down the general discussion, suggest follow-up mechanistic analyses and emphasize those networks for which evidence in trypanosomatids and other organisms has been published.

      Major comments:

      (1) As I have understood it from the description in the text, and in Data Table 4, the RNA-seq element of the work has only been conducted using two replicates. If this is the case, it would substantially undermine the RNA-seq and the inferences drawn from it. Minimum replicates required for inferential analysis is 3 bio-replicates and potentially up to 6 or 12. It may be necessary for the authors to repeat this for the RNA-seq to carry enough weight to support their arguments. (PMID: 27022035)

      We agree with the reviewer and conducted a new RNA-seq analysis with 4 independent biological replicates of spleen-purified amastigotes and derived promastigotes. Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary. We thank the reviewer for this important comment, and the new data not only confirm the previous one (providing a high level of robustness to our data) but allowed us to increase the number of identified stage-regulated snoRNAs, thus further supporting a possible role of ribosome modification in Leishmania stage development.   

      (2) There are several examples that are given as reciprocal or recursive signalling pathways, but these are not followed up with independent, orthogonal techniques. I think the paper currently forms a great resource to pursue these interesting signalling interactions and is certainly more than just a catalogue of modifications, but to take it to the next level ideally a novel signalling interaction would be demonstrated using an orthogonal approach. Perhaps the regulation of the ribosomes could have been explored further (same teams recently published related work on this). Or perhaps more interestingly, a novel target(s) from the ubiquitinated protein kinases could have been explored further; for example making precision mutants that lack the ubiquitination or phosphorylation sites - does this abrogate differentiation?

      We agree with the reviewer that the paper currently forms a great resource. In-depth molecular analysis investigating key signaling pathways and regulatory interactions are outside the scope of the current multilevel systems analysis but will be pursued in independent investigations.

      (3) I found the use of lactacystin a bit curious as there are more potent and specific inhibitors of Leishmania proteasomes e.g. LXE-408. This could be clarified in the write-up (See below).

      We thank the reviewer for this comment. We opted for the highly specific and irreversible proteasome inhibitor lactacystin that has been previously applied to study the Leishmania proteasome (PMID: 15234661) rather than the typanosomatid-specific drug candidate LXE408 as the strong cytotoxic effect of the latter makes it difficult to distinguish between direct effects on protein turnover and secondary effects resulting from cell death, limiting its utility for dissecting proteasome function in living parasites. We have added this information in the Results section.

      (4) If it is the case that only 2 replicates of the RNA-Seq have been performed it really is not the accepted level of replication for the field. Most studies use a minimum of 3 bioreplicates and even a minimum of 6 is recommended by independent assessment of DESeq2.

      See response to comment 1 above.

      (5) As far as I could see, the cell viability assay does not include a positive control that shows it is capable of detecting cytotoxic effects of inhibitors. Add treatment showing that it can differentiate cytostatic vs cytotoxic compound.

      This control has now been added to Fig S7.

      (6) It is realistic for the authors to validate the cell viability assay. If the RNA-seq needs to be repeated then this would be a substantial involvement.

      Redoing the RNA-seq analysis was entirely feasible and very much improved the robustness of our results.

      (7) All the methods are written to a good level of detail. The sample prep, acquisition and data analysis of the protein mass spectrometry contained a high level of detail in a supplemental section. The authors should be more explicit about the amount of replication at each stage, as in parts of the manuscript this was quite unclear.

      We thank the reviewer for this comment and explicitly state the number of replicates in Methods, Results and Figure legends for all analyses. The number of replicates for each analysis is further shown in the overview Figure S1.

      (8) Unless I have misunderstood the manuscript, I believe the RNA-seq dataset is underpowered according to the number of replicates the authors report in the text.

      See response to comment 1 above.

      (9) Looking at Figure 1 and S1 and Data Table 4 to show the sample workflow I was surprised to see that the RNA-seq only used 2 replicates. The authors do show concordance between the individual biological replicates, but I would consider that only having 2 is problematic here, especially given the importance placed on the mRNA levels and linkage in this study. This would constitute a major weakness of the study, given that it is the basis for a crucial comparison between the RNA and protein levels.

      We agree and have repeated the RNAseq analysis using four independent biological replicates - see response to comment 1.

      (10) It also wasn't clear to me how many replicates were performed at each condition for the lactacystin treatment experiment - can the authors please state this clearly in the text, it looks like 4 replicates from Figure S1 and Data Table 8.

      Indeed, we did 4 replicates. This is now clarified in Methods, Results and Figure legends and shown in Figure S1.

      (11) Four replicates are used for the phosphoproteomics data set, which is probably ok, but other researchers have used a minimum of 5 in phosphoproteomics experiments to deal with the high level of variability that can often be observed with low abundance proteins & modifications. The method for the phosphoproteomics analysis suggests that a detection of a phosphosite in 1 sample (also with a localisation probability of >0.75) was required for then using missing value imputation of other samples. This seems like a low threshold for inclusion of that phosphosite for further relative quantitative analysis. For example, Geoghegan et al (2022) (PMID: 36437406) used a much more stringent threshold of greater than or equal to 2 missing values from 5 replicates as an exclusion criteria for detected phoshopeptides. Please correct me if I misunderstood the data processing, but as it stands the imputation of so many missing values (potentially 3 of 4 per sample category) could be reducing the quality of this analysis.

      We thank the reviewer for this remark and for highlighting best practices in phosphoproteomics data analysis. Unlike other studies that use cultured parasites and thus have access to unlimited amounts, our study employs bona fide amastigotes isolated from infected hamster spleens. In France, the use of animals is tightly controlled and only the minimal number of animals to obtain statistically significant results is tolerated (and necessary to obtain permission to conduct animal experiments).

      Regarding the number of biological replicates, we would like to emphasize that the use of four biological replicates is fully acceptable and used in quantitative proteomics and phosphoproteomics, particularly when combined with high-quality LC–MS/MS data and stringent peptide-level filtering. While some studies indeed employ five or more replicates, this is not a strict requirement, and many high-impact phosphoproteomics studies have successfully relied on four replicates when experimental quality and depth are high. In the present study, we adopted a discovery-oriented approach, aimed at detecting as many confidently identified phosphopeptides as possible. The consistency between replicates, combined with the depth of coverage and signal quality, indicates that four replicates are adequate for both the global proteome and the phosphoproteome in this context. Importantly, the quality of the MS data in this study is supported by (i) a high number of confidently identified peptides and phosphopeptides (identification FDR<1%), (ii) robust phosphosite localisation probabilities (localisation probability >0.75), and (iii) reproducible quantitative profiles across replicates. Notably, most of the identified phosphopeptides are quantified in at least two replicates within a given condition (between 73.2% and 83.4% of all the identified phosphopeptides among replicates of the same condition).

      Regarding missing value imputation, we appreciate that our initial description may have been unclear and we have revised the Methods to avoid misunderstanding. Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate. This criterion was chosen to retain biologically relevant, low-abundance phosphosites, which are more difficult to identify and are often stochastically sampled in phosphoproteomics datasets. For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition. Notably, they were replaced by values in the neighborhood of the observed intensities, rather than by globally low, noise-like values.

      We agree that more stringent exclusion rules, such as those used by Geoghegan et al. (2022), are appropriate in some contexts. However, there is no universally accepted standard for missingness thresholds in phosphoproteomics, and different strategies reflect trade-offs between sensitivity and stringency. In our discovery-oriented approach, we deliberately prioritized biological coverage while maintaining data quality. Our main conclusions are supported by coherent biological patterns, rather than by isolated phosphosite measurements.

      (12) For the metabolomics analysis it looks like 2 amastigote samples were compared against 4 promastigote samples. Why not triplicates of each?

      We thank the reviewer for noticing this point. It is an error in the figure file (Sup figure S1). Four biological replicates of splenic amastigotes were prepared (H130-1, H130-2, H133-1 and H133-2). Amastigotes from 2 biological replicates (H131-1 and H131-2) were seeded for differentiation into promastigotes in 4 flasks (2 per biological replicate) that were collected at passage 2. We have updated the figure file accordingly.

      Minor comments:

      Are prior studies referenced appropriately?

      Yes

      Are the text and figures clear and accurate?

      The write up is clear, with the data presented coherently for each method. The analyses that link everything together are well discussed. The figures are mostly clear (see below) and are well described in the legends. There is good use of graphics to explain the experimental designs and sample names - although it is unclear if technical replicates are defined in these figures.

      We thank the reviewer for these positive comments. We now included the information on replicates in the overview figure (Figure S1).

      As I have understood it, the authors have calculated the "phosphostoichiometry" using the ratio of change in the phosphopeptide to the ratio of the change in total protein level changes. This is detailed in the supplemental method (see below). Whilst this has normalised the data, it has not resulted in an occupancy or stoichiometry measurement, which are measured between 0-1 (0% to 100%). The normalisation has probably been sufficient and useful for this analysis, but this section needs to be re-worded to be more precise about what the authors are doing and presenting. These concepts are nicely reviewed by Muneer, Chen & Chen 2025 (PMID: 39696887) who reference seminal papers on determination of phosphopeptide occupancy - and may be a good place to start. An alternative phrase should be used to describe the ratio of ratios calculated here, not phosphostoichiometry.

      We thank the reviewer for this insightful comment and fully agree with the conceptual distinction raised. The reviewer is correct that the approach used in this study does not measure absolute phosphosite occupancy or stoichiometry, which would indeed require dedicated experimental strategies and would yield values bounded between 0 and 1 (0–100%). Instead, we calculated a normalized phosphorylation change, defined as the ratio of the change in phosphopeptide abundance relative to the change in the corresponding total protein abundance (a ratio-of-ratios approach – see doi :10.1007/978-1-0716-1967-4_12), and we tested whether this normalized phosphorylation change differed significantly from zero. This normalization approach is comparable to those previously published in the « Experimental Design and Statistical Analysis of the Proteome and the Phosphoproteome » section of the following paper (DOI: 10.1016/j.mcpro.2022.100428).

      Our intention was to account for protein-level regulation and thereby better isolate changes in phosphorylation dynamics. While this normalization is informative and appropriate for the biological questions addressed here, we agree that the term “phosphostoichiometry” is imprecise and not correct in this context.

      In response, we (i) replaced the term “phosphostoichiometry” throughout the manuscript with a more accurate description, such as “normalized phosphorylation level”, or “relative phosphorylation change normalized to protein abundance”, and (ii) revised the corresponding Methods and Results text to clearly state that absolute occupancy was not measured.

      This rewording will improve conceptual accuracy without altering the validity or interpretation of the results.

      From the authors methods describing the ratio comparison approach: "Another statistical test was performed in a second step: a contrasted t-test was performed to compare the variation in abundance of each modified peptide to the one of its parent unmodified protein using the limma R package {Ritchie, 2015; Smyth, 2005}. This second test allows determining whether the fold-change of a phosphorylated peptide between two conditions is significantly different from the one of its parent and unmodified protein (paragraph 3.9 in Giai Gianetto et al 2023). An adaptive Benjamini-Hochberg procedure was applied on the resulting pvalues thanks to the adjust.p function of R package cp4p {Giai Gianetto, 2016} using the Pounds et al {Pounds, 2006} method to control the False Discovery Rate level."

      The references have been formatted.

      Several aspects of the figures that contain STRING networks are quite useful, particularly the way colour around the circle of each node to denote different molecular functions/biological processes. However, some have descended into "hairball" plots that convey little useful information that would be equally conveyed in a table, for example. Added to this, the points on the figure are identified by gene IDs which, while clear and incontrovertible, are lacking human readability. I suggest that protein name could be included here too.

      We thank the reviewer for this comment but for readability we opted to keep the figure as is. We now refer to Tables 8, 9, and 12 that allow the reader to link gene IDs to protein name and annotation (if available).

      It is also not clear what STRING data is being plotted here, what are the edges indicating - physical interactions proven in Leishmania, or inferred interactions mapped on from other organisms? Perhaps as supplemental data provide the Cytoscape network files so readers can explore the networks themselves?

      We thank the reviewer for this comment. While the STRING plugin in Cytoscape enables integrated network-based analyses, it represents protein–protein associations as a single edge per protein pair derived from the combined confidence score. Consequently, the specific contribution of individual evidence channels (e.g. experimental evidence, curated databases, coexpression, or text mining) cannot be disentangled within this framework. However, this representation was considered appropriate for the present study, which focused on global network topology and functional enrichment rather than on the interpretation of individual interaction types. The information on stringency has been added to the Methods section and the Figure legends (adding the information on confidence score cutoff).

      We decided not to submit the Cytoscape files as they were generated with previous versions of Cytoscape and the STRING plugin. Based on the differential abundance data shown in the tables it will be very easy to recreate these networks with the new versions for any follow up study.

      The title of columns in table S10 panel A are written in French, which will be ok for many people particularly those familiar with proteomics software outputs, but everything else is in English so perhaps those titles could be made consistent.

      We apologize and have translated the text in English.

      I would suggest that the authors provide a table that has all the gene IDs of the Ld1S2D strain and the orthologs for at least one other species that is in TriTrypDB. This would make it easy to interrogate the data and make it a more useful resource for the community who work on different strains and species of Leishmania. Although this data is available it is a supplemental material file in a previous paper (Bussotti et al PNAS 2021) and not easy to find.

      We thank the reviewer for this very useful suggestion and have added this table (Table S13).

      Figure 5b - from the legend it is not clear where the confidence values were derived in this analysis, although this is explained in the supplemental method. Perhaps the legend can be a bit clearer.

      We have the following statement to the legend: ‘Confidence values were derived as described in Supplementary Methods’.

      Can the authors discuss why lactacystin was used? While this is a commonly used proteasome inhibitor in mammalian cells there is concern that it can inhibit other proteases. At the concentrations (10 µM) the authors used there are off-target effects in Leishmania, certainly the inhibition of a carboxypeptidase (PMID: 35910377) and potentially cathepsins as is observed in other systems (PMID: 9175783). There is a specific inhibitor of the Leishmania proteasome LXE-408 (PMID: 32667203), which comes closer to fulfilling the SGC criteria (PMID: 26196764) for a chemical probe - why not use this. Does lactacystin inhibit a different aspect of proteasome activity compared to LXE-408?

      We have add the following justification to the results section (see also response above to comment 3 for reviewer 2): We chose the highly specific and irreversible proteasome inhibitor lactacystin over the typanosomatid-specific, reversible drug candidate LXE408 as the latter’s potent cytotoxicity can confound direct effects on protein turnover with secondary consequences of cell death, limiting its utility for dissecting proteasome function in living parasites.

      The application of lactacystin is changing the abundance of a multitude of proteins but no precision follow up is done to identify if those proteins are necessary and/or sufficient from driving/blocking differentiation. This could be tested using precision edited lines that are unable to be ubiquitinated? There is a lack of direct evidence that the proteins protected from degradation by lactacystin are ubiquitinated? Perhaps some of these could be tagged and IP'd then probed for ubiquitin signal. Di-Gly proteomics to reveal ubiquitinated proteins? These suggestions should be considered as OPTIONAL experiments in the relevant section above.

      We very much appreciate these very interesting suggestions, which we will be considered for ongoing follow-up studies.

      In the data availability RNA-seq section the text for the GEO link is : (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE227637) but the embedded link takes me to (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE165615) which is data for another, different study. Also, the link to the GEO site for the DNA seq isn't working and manual searches with the archive number (BioProject PRJNA1231373 ) does not appear to find anything. The IDs for the mass spec data PRIDE/ProteomeXchange don't seem to bring up available datasets: PXD035697 and PXD035698

      The links have now been rectified and validated. For those data that are still under quarantine, here is the login information: To access the data:

      DNAseq data: https://dataview.ncbi.nlm.nih.gov/object/PRJNA1231373?reviewer=6qt24dd7f475838rbqfn228d 0

      RNAseq data: https://www.ebi.ac.uk/biostudies/ArrayExpress/studies/E-MTAB-16528?key=65367b55-d77f4c06-b4bd-bc10f2dc0b14

      Proteomic data:  http://www.ebi.ac.uk/pride

      Phosphoproteomic data: http://www.ebi.ac.uk/pride

      Significance

      Strengths:

      (1) The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses of the intersections of regulatory proteins that are associated with life-cycle progression.

      We thank the reviewer for this positive assessment of our work.

      (2) The differentiation step studied is from amastigote to promastigote. I am not aware that this has been studied before using phosphoproteomics. The use of the hamster derived amastigotes is a major strength. While a difficult/less common model, the use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy, the promastigote experiments are performed at a low passage number. This is a strength or the work as it reduces the interference of the biological plasticity of Leishmania when it is cultured outside the host.

      We thank the reviewer for the acknowledgment of our relevant hamster system, for which we face many challenges (financial, ethical, administrative as protocols need to be approved by the French government).

      Limitations:

      Potential lack of appropriate replication (see above).

      See response to comment 1.

      Lack of follow up/validation of a novel signalling interaction identified from the systems-wide approach. There is a lack of assessment of whether a single signalling cascade is driving the differentiation or these are all parallel, requisite pathways. The authors state the differentiation is not driven by a single master regulator, but I am not sure there is adequate evidence to rule this in or out.

      See response to comment 2 above.

      The study applies well established techniques without any particular technical stepchange. The application of large-scale multi-omics techniques and integrated comparisons of the different experimental workflows allow a synthesis of data that is a step forward from that existing in the previous Leishmania literature. It allows the generation of new hypotheses about specific regulatory pathways and crosstalk that potentially drive, or are at least active, during amastigote>promastigote differentiation.

      We thank the reviewer for these positive comments.

      This manuscript will have primary interest to those researchers studying the molecular and cell biology of Leishmania and other kinetoplastid parasites. The approaches used are quite standard (so not so interesting in terms of methods development etc.) and given the specific quirks of Leishmania biology it may not be that relevant to those working more broadly in parasites from different clades/phyla, or those working on opisthokont systems- yeast, humans etc. Other Leishmania focused groups will surely cherry-pick interesting hits from this dataset to advance their studies, so this dataset will form a valuable reference point for hypothesis generation.

      We thank the reviewer for this assessment and agree that our data sets will be very valuable for us and other teams to generate hypotheses for follow-up studies.

      Relevant expertise: Trypanosoma & Leishmania molecular & cell biology, RNA-seq, proteomics, transcriptional/epigenetic regulation, protein kinases - some experience of UPS system.

      I have not provided comment on the metabolomics as it is outside my core expertise. However, I can see it was performed at one of the leading parasitology metabolomics labs.

      We thank the reviewer for sharing expertise, investing time and intelligence in the assessment of our manuscript, and the highly constructive criticisms provided.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The study presents a comprehensive multi-omics investigation of Leishmania differentiation, combining genomic, transcriptomic, proteomic, phospho-proteomic and metabolomic data. The authors aim to uncover mechanisms of post-transcriptional and post-translational regulation that drive the stage-specific biology of L. donovani. The authors provide a detailed characterization of transcriptomic, proteomic, and phospho-proteomic changes between life stages, and dissect the relative contributions of mRNA abundance and protein degradation to stage-specific protein expression. Notably, the study is accompanied by comprehensive supplementary materials for each molecular layer and provides public access to both raw and processed data, enhancing transparency and reproducibility. While the data are rich and compelling, several mechanistic interpretations (e.g., "feedback loops," "recursive networks," "signaling cascades") are overstated. Similarly, the classification of gene sets as "regulons" is not adequately supported, as no common regulatory factor has been identified and only a single condition change (amastigote to promastigote) was assessed.

      We thank the reviewer for these comments and have corrected the manuscript to eliminate all unjustified mechanistic interpretations.

      Major Comments:

      (1) Across several sections (incl abstract, L559-565, L589-599, L600-L603, L610-612, L613-614, L625, L643-645, L650-652), the manuscript describes "recursive or self-controlling networks", "signaling cascades", "self-regulating", and "recursive feedback loops" - involving protein kinases, phosphatases, and translational regulators. While the data convincingly demonstrate stage-specific changes in phosphorylation and abundance changes in key molecules, the language used implies causal, direct and directional regulatory relationships that have not been experimentally validated.

      We agree with the reviewer and have corrected the text, replacing all expressions that may allude to causal or directional relationships by more neutral expressions such as ‘coexpression’.  

      (2) Co-expression and shared function alone do not define a regulon (L363, and several other places in the manuscript). A regulon also requires the gene set to be regulated by the same factor, for which there is no evidence here. Regulons can be derived from transcriptomic experiments, but then they need to show the same transcriptional behavior across many biological conditions, while here just 1 condition change is evaluated. Therefore, this analysis is conventional GO enrichment analysis and should not be overinterpreted into regulons.

      We agree with the reviewer and have replaced ‘regulon’ with ‘co-regulated gene clusters’ (or similar).

      (3) LFQ intensity of 0 (e.g., L389): An LFQ intensity of 0 does not necessarily indicate that a protein is absent, but rather that it was not detected. This can occur for several reasons: (1) true biological absence in one condition, (2) low abundance below the detection threshold, or (3) stochastic missingness due to random dropout in mass spectrometry. While the authors state that adjusted p-values for the 1534 proteins exclusively detected in either amastigotes or promastigotes are below 0.01, I could not find corresponding p-values for these proteins in Table 8 ('Global_Proteomic'). An appropriate statistical method designed to handle this type of missingness should be used. In this context, I also find the following statement unclear: "identified over 4000 proteins at each stage in at least 3 out of 4 biological replicates, representing 3521 differentially expressed proteins (adjusted p-value < 0.01), 1534 of which were exclusively detected in either ama or pro." If a protein is exclusively detected in one stage, then by definition it should not be detected in that number of replicates at both stages. This apparent contradiction should be clarified.

      We fully agree with the reviewer, an LFQ intensity of 0 may results from various reasons. We realize that our wording may have been ambiguous. For clarity, we have modified the original text to: ‘Label-free quantitative proteomic analysis of 4 replicates of amastigotes and derived promastigotes identified over 4000 proteins, including 1987 differentially expressed proteins (adjusted p-value < 0.01), and 1534 that were exclusively detected in either ama or pro (Figure 3A left panel, Table 6).’ We also modified the legend of the Figure 3B. Concerning missing values that could be either missing not at random (MNAR) or missing completely at random (MCAR), rather than introducing potentially misleading imputed values, we chose to treat these missing values as genuine stage-specific differences (presence/absence): quantitative statistics are restricted to proteins with measurable LFQ in both stages, while proteins with consistent presence in one stage and non-detection in the other are reported as stage-restricted detections. We believe this strategy is transparent and minimizes modeling assumptions, while still highlighting robust stage-specific signals. Our approach is supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stage-specific proteins, providing biological coherence to these findings. Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions.  

      (4) L412 - Figure 3B: The figure shows proteins with infinite fold changes, which result from division by zero due to LFQ intensity values of zero in one of the compared conditions. As previously noted, interpreting LFQ zero values as true absence of expression is problematic, since these zeros can arise from several technical reasons - such as proteins being just below the detection threshold or due to stochastic dropout during MS analysis. Therefore, the calculated fold changes for these proteins are likely highly overestimated. This concern is visually supported by the large gap on the y-axis (even in log scale) between these "infinite" fold changes and the rest of the data. Moreover, given Leishmania's model of constitutive gene expression, it seems biologically implausible that all these proteins would be completely absent in one stage. This issue applies not only to Figure 3B, but also to the analyses presented in Figures 4D and 4E.

      We thank the reviewer for this comment. To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’ We also deleted the ‘infinity’ symbol from the Figure.

      Minor Comments:

      Methods

      L132: Typo: "A according" should be "according."

      The ‘A’ refers to RNase A. We added a comma for clarification (…RNase A, according to…)

      L158: How exactly were somy levels calculated? Please specify the method used, as I could not find a clear description in the referenced manuscript.

      We thank the reviewer for this comment. Aside the already quite detailed description in Methods and the reference there to the paper describing the pipeline, we now added a link to the description of the karyotype module of the giptools package (https://gip.readthedocs.io/en/latest/giptools/karyotype.html). There the following explanation can be found: “The karyotype module aims at comparing the chromosome sequencing coverage distributions of multiple samples. This module is useful when trying to detect chromosome ploidy differences in different isolates. For each sample the module loads the GIP files with the bin sequencing coverage (.covPerBin.gz files) and normalizes the meancoverage values by the median coverage of all bins. The bin scores are then converted to somy scores which are then used for producing plots and statistics.” The description then goes into further detail.  

      L158: Chromosome 36 is not consistently disomic, as stated. It has been observed in other somy states (e.g., Negreira et al. 2023, EMBO Reports, Figure 1), even if such occurrences are rare in the studied context. Normalizing by chr36 remains a reasonable choice, but it would be helpful to confirm that the majority of chromosomes appear disomic post-normalization to support the assumption that chr36 is disomic in this dataset as well.

      We thank the reviewer for this comment. Unlike the paper cited above (using longterm cultured promastigotes), our analysis uses promastigote parasites from early culture adaptation (p2) that were freshly derived from splenic amastigotes known to be disomic (and confirmed here), which represents an internal control validating our analysis.

      L163: Suggestion: Cite the GIP pipeline here rather than delaying the reference until L173.

      Corrected

      L188: "Controlled" may be a miswording. Consider replacing with "confirmed" or "validated."

      Corrected to ‘validated’

      L214: Please specify which statistical test was used to assess differential expression at the protein level. L227: Similarly, clarify which statistical test was applied for determining differential expression in the phospho-proteomics data.

      As noted in the Methods section, a limma t-test was applied to determine proteins/phosphoproteins with a significant difference in abundance while imposing a minimal fold change of 2 between the conditions to conclude that they are differentially abundant {Ritchie, 2015; Smyth, 2005}.

      Results

      L337-339: The interpretation here is too speculative. Phrases like "suggesting" and "likely" are too strong given the evidence presented. Alternative explanations, such as mosaic variation combined with early-stage selective pressure in the culture environment, should be considered.

      We thank the reviewers for these suggestions and have reformulated into: ‘In the absence of convergent selection, it is impossible to distinguish if these gene CNVs provide some strain-specific advantage or are merely the result of random genetic drift.’

      L340: The "undulating pattern" mentioned is somewhat subjective. To support this interpretation, consider adding a moving average (or similar) line to Figure 3A, which would more clearly highlight this trend across the data points.

      These lines have been added to Figure 1C (not 3A).

      L356: It may be more accurate to say "control of individual gene expression," since Leishmania does have promoters - the key distinction is that initiation does not occur on a gene-by-gene basis.

      Corrected

      L403-405: The statement "this is because these metabolites comprise a glycosomal succinate shunt..." should be rephrased as a hypothesis rather than a definitive explanation, as this causal link has not been experimentally validated.

      Thank you for the comment – we followed your advice.

      L407: Replace "confirming" with "matching" to avoid overstating the agreement with previous observations.

      Corrected

      L408: Replace "correlated" with "matched" for more accurate interpretation of results.

      Corrected

      L433: It is unclear how differential RNA modifications were detected. Please specify which biological material was used, the number of replicates per life stage, and how statistical evaluation of differential modifications was performed.

      This figure has now been updated using our statistically robust RNA-seq analysis conducted for the revision. See comments above.

      L436: This conclusion appears incomplete. While the manuscript mentions transcript-regulated proteins, it should also note that other proteins showed discordant mRNA/protein patterns. A more balanced conclusion would mention both the matching and non-matching subsets.

      We thank the reviewer for this comment and have made the necessary adjustments to better balance this conclusion.

      L441: The phrase "poor correlation" overgeneralizes and lacks nuance. Earlier sections of the manuscript describe hundreds of genes where mRNA and protein levels correlate well, suggesting that mRNA turnover plays a key regulatory role. Please rephrase this sentence to clarify that poor correlation applies only to a subset of the data.

      This has been corrected to ‘The discrepancies we observed in a sub-set of genes between….’.

      L454: The claim that "epitranscriptomic regulation and stage-adapted ribosomes are key processes" should be supported with references. If this builds on previously published work, please cite it accordingly.

      Corrected

      L457: Proteasomal degradation is a well-established mechanism in Leishmania. These findings are interesting but should be presented in the context of existing literature (e.g. Silva-Jardim et al.2014, [PMID: 15234661]) rather than as entirely novel.

      Corrected

      L459: The authors shoumd add a microscopy image of promastigotes treated with lactacystin. This would provide insight into whether treatment affects morphology, as is known in T. cruzi (see Dias et al., 2008). It would be particularly informative if Leishmania behaves differently.

      We added this information to Figure S7.

      L472 + L481: Table 9 shows several significant GO terms not discussed in the manuscript. Please clarify how the subset presented in the text was selected.

      We added this information to the text (‘some of the most significantly enrichment terms included …’).

      L482: The argument that a single master regulator can be excluded is unclear. Could the authors please elaborate on the reasoning or data supporting this conclusion?

      This statement was too speculative and has been removed. Instead, we added ‘Thus, Leishmania differentiation correlates with the expression of complex signaling networks that are established in a stage-specific manner’.

      L494: The term "unexpected" may not be appropriate here, as protein degradation is a wellestablished regulatory mechanism in trypanosomatids. Consider omitting this term to better reflect the field's current understanding.

      We deleted the term as suggested and reformulated to ‘….our results confirm the important role of protein degradation….’.

      L543: The term "feedback loop" should be used more cautiously. The current data are correlative, and no interventional experiments are provided to support a causal regulatory loop between proteasomal activity and protein kinases. As such, this remains a hypothesis rather than a confirmed mechanism.

      We fully agree and have toned down the entire manuscript, referring to feedback loops only as a hypothesis and not as a fact emerging from our datasets, which set the stage for future functional analyses.

      Discussion

      L555: As noted in L494, reconsider using the word "unexpected."

      Removed

      L589: The data do not fully support the presence of stage-specific ribosomes. Rather, they suggest differential ribosomal function through changes in abundance and regulation. Please consider rephrasing.

      We thank the reviewer for this comment and have follow the advice reformulating the sentence according to the suggestion.

      L657-658: The discussion of post-transcriptional and post-translational regulation of gene dosage effects would benefit from citing additional literature beyond the authors' own work. E.g. the study by Cuypers et al. (PMID: 36149920) offers a relevant and comprehensive analysis covering 4 'omic layers.

      We apologize for this omission and now describe and cite this publication in the Results section when concluding the results shown in Figure 1.

      L659-664: The reference to deep learning for biomarker discovery appears speculative and loosely connected to the current findings. As no such methods were applied in the study, and the manuscript does not clarify what types of biomarkers are intended, this statement could be seen as aspirational rather than evidence-based. Consider either omitting or elaborating with clear justification.

      We agree and have deleted this section.

      L690 + L705 (Figure 2): The phrase "main GO terms" is vague. Please clarify the criteria for selecting the GO terms shown - were they chosen based on adjusted p-value, enrichment score, or another metric? Additionally, define "cluster efficiency," explaining how it was calculated and what it represents.

      Corrected to ‘some of the most significantly enriched GO terms’.

      Referee cross-commenting

      Overall, I think the other reviewers' comments are fair. They seem to align particularly on the following points:

      (1) Reviewers agree that this is a comprehensive body of work with original contributions to the field of Leishmania/trypanosomatid molecular biology, and that it will serve as a valuable reference for hypothesis generation.

      (2) Several reviewers raise concerns about overinterpretation of the data, particularly regarding regulatory networks, regulons, and master regulators. The interpretation and large parts of the discussion are considered too speculative without additional functional validation.

      (3) There are comments about the incorrect statistical treatment of missing values in the proteomics experiments, which affects confidence in some of the conclusions.

      (4) While the correlation between the two RNA-Seq replicates is high, the decision to include only two biological replicates is seen as unfortunate and not ideal for statistical robustness.

      (5) The use of lactacystin should be more clearly motivated, and its limitations discussed in the context of the experiments.

      Even though I did not remark on the last two points (4 and 5) in my own review, I agree with them.

      We thank the reviewer for this cross-comparison, which served us as guide to revise our manuscript. We believe that we have responded to all these concerns.

      Reviewer #3 (Significance):

      This study provides a rich, integrative multi-omics dataset that advances our understanding of stage-specific adaptation in the transcriptionally unique parasite Leishmania. By dissecting the relative contributions of mRNA abundance and protein turnover to final protein levels across life stages, the authors offer valuable insights into post-transcriptional and post-translational regulation. The work represents a resource-driven yet conceptually informative contribution to the field, with comprehensive supplementary materials and transparent data sharing standing out as additional strengths.  

      However, the mechanistic insights proposed are speculative in several places and require more cautious language. The study is most impactful as a resource and descriptive atlas, initiating hypotheses for future validation. The broad scientific community working on Leishmania, trypanosomatids, and post-transcriptional regulation in eukaryotes would benefit from this work.

      We thank the reviewer for this positive assessment and have modified the manuscript to further emphasize its strength as an important resource to incite mechanistic follow-up studies.

      Field of reviewer expertise: multi-omics integration, bioinformatics, molecular parasitology, transcriptomics, proteomics, metabolomics, Leishmania, Trypanosoma.

      Reviewer #4 (Evidence, reproducibility and clarity):

      Summary:

      This study investigates the regulatory mechanisms underlying stage differentiation in Leishmania donovani, a parasitic protist. Pesher et al., aim to address the central question of how these parasites establish and maintain distinct life cycle stages in mostly the absence of transcriptional control. The authors employed a five-layered systems-level analysis comparing hamster-derived amastigotes and their in vitro-derived promastigotes. From those parasites, they performed a genomic, transcriptomic, proteomic, metabolomic and phosphoproteomic analysis to reveal the changes the parasites undertook between the two life stages.

      The main conclusion stated by the authors are:

      - The stage differentiation in vitro is largely independent of major changes in gene dosage or karyotype.

      - RNA-seq analysis identified substantial stage-specific differences in transcript abundance, forming distinct regulons with shared functional annotations. Amastigotes showed enrichment in transcripts related to amastins and ribosome biogenesis, while promastigotes exhibited enrichment in transcripts associated with ciliary cell motility, oxidative phosphorylation, and posttranscriptional regulation itself.

      - Quantitative phosphoproteome analysis revealed a significant increase in global protein phosphorylation in promastigotes. Normalizing phosphorylation changes against protein abundance identified numerous stage-specific phosphoproteins and phosphosites, indicating that differential phosphorylation also plays a crucial role in establishing stage-specific biological networks. The study identified recursive feedback loops (where components of a pathway regulate themselves) in post-transcriptional regulation, protein translation (potentially involving stage-specific ribosomes), and protein kinase activity. Reciprocal feedback loops (where components of different pathways cross-regulate each other) were observed between kinases and phosphatases, kinases and the translation machinery, and crucially, between kinases and the proteasomal system, with proteasomal inhibition disrupting promastigote differentiation.

      We thank the reviewer for the time and implication dedicated to our manuscript.  

      Further details are organised by order of apparition in the text:

      Material and Methods: while the authors are indicating some key parameters, providing the codes and scripts they used throughout the manuscript would improve reproducibility.

      We thank the reviewer for this comment and added the URL for the codes to the data availability section.

      Why only 2 biological replicates for RNA while the others layers have 3 or 4?

      We agree with the other reviewers and have repeated this analysis to have statistically more robust results.

      Is the slight but reproducible increase in median coverage observed for chr 1, 2, 3, 4, 6 and 20 stable on longer culture derived promastigotes and sandfly derived promastigotes ?

      No, as published in Barja et al Nature EcolEvol 2017 (PMID: 29109466) and Bussotti et al PNAS 2023 (PMID: 36848551), these minor fluctuations are not predicting subsequent aneuploidies in long-term culture nor in sand fly-derived promastigotes. This information has been added to the text.

      Is this change of ploidy a culture adaptation representation rather than a life cycle event as the authors discuss later on? (This is probably an optional request that would be nice to include, if the authors have performed the sequencing of such parasites. Otherwise, it should be mentioned in the discussion).

      Yes, this is a well-known culture adaptation phenomenon, on which we have published extensively. We added this conclusion and the references to the text.

      L333 "Likewise, stage differentiation was not associated with any major gene copy number variation (Figure 1C, Table 2)". The authors are looking here at steady differentiated stages rather than differentiation itself. "Likewise, stage differentiation was.." would be more appropriate.

      We corrected this sentence to ‘Likewise, differentiation of promastigotes was not associated with any major gene copy number variation at early passage 2’.

      L349-355: have the mRNA presenting change in abundance between stages been normalised by their relative DNA abundance ? Said otherwise, can the wave patterns observed at the genome level explain the respective mRNA level ? Can the authors plot in a similar way the enrichment scores in regards to the position on the genome and can the authors indicate if there is a positional enrichment in addition to the functional one they observe ? This may affect the conclusion in L356-358.

      As noted above, we did not see any significant read depth changes at DNA level when comparing amastigotes and promastigotes. Thus there is no need to normalize the RNAseq results to DNA read depth. Furthermore, in our comparative transcriptomics analysis, we only consider 2-fold or higher changes in mRNA abundance (which is far beyond the non-significant read depth change we have observed on DNA level). Manual inspection of the enrichment scores with respect to position did not reveal any significant signal (other than revealing some overrepresented tandem gene arrays where all gene copies share the same location and GO term).

      L415 "stage-specific expression changes correlate between protein and RNA levels, suggesting that the abundance of these proteins is mainly regulated by mRNA turn-over". Overstatement. Correlation does not suggest causation. "suggesting that the abundance of these proteins could be regulated by mRNA turn-over" would be more appropriate.

      We thank the reviewer for this comment and have corrected the statement accordingly.

      Figure 3B, could the authors clarify what are the "unique genes" that are on the infinite quadrants? It seems these proteins are identified in one stage and not the other. This implies that the corresponding missing values are missing non-at random (MNAR). Rather than removing those proteins containing NMAR from the differential expression analysis, the authors should probably impute those missing values. Methods of imputation of NMAR and MAR can be found in the literature. Indeed, the level of expression in one stage of those proteins is now missing, while it could strongly affect the conclusions the authors are drawing in figure 4E regarding the proteins targeted for degradation and rescued in presence of the proteasome inhibitor.

      We thank the reviewer for this important comment. However, we would like to clarify several key points regarding the treatment of proteins identified in only one condition.

      First, the reviewer assumes that proteins identified in one stage but not the other are necessarily missing not-at-random (MNAR). However, this cannot be definitively established, as these missing values could equally be missing completely at random (MCAR). Without additional information, categorizing them specifically as MNAR may be an oversimplification. More importantly, we have concerns about the reliability of imputation methods in this specific context. Algorithms designed to impute MNAR values (such as QRILC) replace absent data using random sampling from arbitrary probability distributions, typically assuming low intensity values. However, when no intensity value has been detected or quantified for a protein in a given condition, imputing an arbitrary low value raises significant concerns about data interpretation. Such imputed values would not reflect actual measurements but rather statistical assumptions that could introduce bias into downstream analyses. For instance, imputed values could lead to the conclusion that a protein is not differentially abundant, when in reality it is detected in one condition but completely absent in the other. In our view, there are two biologically plausible scenarios: either these proteins are expressed at levels below our detection threshold, or they are genuinely absent (or present at negligible levels) in the corresponding stage. Rather than introducing potentially misleading imputed values, we chose to treat these as genuine stage-specific differences (presence/absence), which results in infinite fold-changes in Figure 3B. Critically, our approach is strongly supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stagespecific proteins, providing biological coherence to these findings. These converging lines of evidence (proteomics, transcriptomics, and functional enrichment) strengthen our confidence that these represent biologically meaningful differences rather than technical artifacts.Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions. To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’

      L430-435 "These data fit with the GO [...] the ribosome translational activity (34)." This discussion feels out of place and context. It is too speculative and with little support by the data presented at this stage of the manuscript. It should be removed as Figure 3E or could be placed in the discussion and supplementary information.

      We agree with the reviewer. In response to a comment from reviewer 1, we have moved both panels to Figure 2, which much better integrates these data.  

      The authors present an elegant way to show stage specific degradation through the comparison of stage specific proteasome blockages that show rescue in ama of proteins present in pro and vice versa. L494 "reveal an unexpected but substantial" the term unexpected is inappropriate, as several studies have shown in kinetoplastids the essential role of protein turnover through degradation / autophagy during differentiation. Furthermore the conclusions may be strongly affected by the level of expression of the proteins in the infinite quadrants as we discussed above, and should be revised accordingly.

      We rephrased the conclusion to ‘In conclusion, our results confirm the important role of protein degradation in regulating the L. donovani amastigote and promastigote proteomes and identify protein kinases as key targets of stage-specific proteasomal activities.’ Please see the response to comment 9 regarding the unique proteins.

      L518 "These data reveal a surprising level of stage-specific phosphorylation in promastigotes, which may reflect their increased biosynthetic and proliferative activities compared to amastigotes." Overstatement. Could also be due to culture adaptation - What is the overlap of stage-specific phosphorylations with previous published datasets in other species of Leishmania? Looking at such comparisons could help to decipher the role of culture adaptation response, species specificity and true differentiation conserved mechanisms.

      We agree with the reviewer and have toned this statement down by adding the statement ‘….or simply be a consequence of culture adaptation’.

      The discussion is extremely speculative. While some speculation at this stage is acceptable, claiming direct link and feedback without further validation is probably far too stretched. For example, the changes of phosphorylation observed on particular sets of proteins, such as phosphatase and DUBs, need to be validated for their respective change of protein activity in the direction that fits the model of the authors. Those discussions should be toned down.

      We agree with the reviewer and have strongly toned down the entire discussion, emphasizing the hypothesis-building character of our results, which provide a novel framework for future experimental analyses.

      A couple of typos:

      In the phosphoproteome analysis section, "...0,2 % DCA..." should be "...0.2 % DCA..." (use a decimal point).

      L225 "...peptide match was disable." should be "...peptide match was disabled."

      Both corrected

      Reviewer #4 (Significance):

      While there is not too much novelty around the emphasis of gene expression at post-translational level in kinetoplastid organisms, the scale of the work presented here, looking at 5 layers of potential regulations, is. Therefore, this study represents a substantial amount of work and provides interesting and comprehensive datasets useful for the parasitology community.

      We thank the reviewer for this positive statement.

      Several potential concerns regarding the biological meaning of the findings were identified. These include the limitations of in vitro systems promastigote differentiation potentially limiting the conclusions, the challenge of inferring causality from correlative "omics" data, and the complexities of functional interpretation of changes in phosphorylation and metabolite levels. The proposed feedback loops and functional roles of specific molecules would require further experimental validation to confirm their biological relevance in the natural life cycle of Leishmania, but that would probably fall out of the scope of this manuscript.

      We agree with the reviewer and have modified pour manuscript throughout to remove any causal relationships. Indeed, this work is setting the stage for future investigations on dissecting some of the suggested regulatory mechanisms.

      Area of expertise of the reviewers: Kinetoplastid, Differentiation, Signalling, Omics

    1. Author response:

      Public Reviews:

      Reviewer #1:

      Summary:

      The authors aim to study mutational paths connecting WW domains with different binding specificities. Their approach combines an unsupervised sequence generative model based on RBMs with a path-sampling algorithm. The key result is that most intermediate sequences along the designed transition paths retain measurable binding activity in wet-lab assays, whereas paths containing the same mutations introduced in a randomized order are largely nonfunctional. This difference is attributed to epistatic interactions captured by the RBM model.

      Strengths:

      Exploring mutational paths in high-dimensional protein sequence space is a challenging problem. The computational framework used here is state-of-the-art and is strengthened by systematic experimental characterization of binding activity. The study is comprehensive in scope, including multiple transition paths both within and across WW specificity classes, and the integration of modeling with high-throughput experimental validation is a clear strength.

      Weaknesses:

      A major concern is whether the stated goal of specificity switching is fully achieved. Along the sampled transition paths, most intermediate variants appear to retain specificity close to either the initial or the final class, rather than exhibiting gradually shifting specificity. For example, in Figure 4G (Class I to Class II/III), binding appears largely binary, with intermediates behaving similarly to one of the endpoints. A similar pattern is observed in Figure 3H for the Class I to Class IV transition, where binding responses are close to 0 or 1. In this sense, the specificityswitching objective is only partially realized by assigning two endpoints with different specificity. This raises a broader conceptual question: is it possible that different WW specificities evolved from a common ancestor without passing through intermediates that exhibit mixed or intermediate specificity? If so, then inferring specificity-switching pathways purely from extant natural sequences may be fundamentally challenging.

      This is a key question, which was one of the original motivations of our work. Both hypothesis of ‘abrupt switches’ (punctuated equilibria, corresponding to distinct specificities) and more gradual changes (smooth transition, through intermediate that exhibit mixed or intermediate specificity) are possible.

      Many natural specificity-switching events have probably resulted from the need to adapt to environmental change and selection for a different specificity, which can be compatible with an abrupt change in specificity. Others may reflect the gradual evolution of promiscuous ancestral sequences to more specialized ones, loosing cross-reactivity. A molecular mechanism that could allow abrupt switching is gene duplication, a frequent mechanism for WW domain diversification, beyond standard mutational-driven evolution processes.  

      As for the specificity-switching paths for WW domains found in this work, the presence of weakly responsive cross-reactive intermediates along the designed paths for I<->IV, and their absence in the I<->II path, suggests that designing promiscuous domains is hard (see also related response to point 3 of Reviewer 2) and generally not selected by natural evolution (as seen from the clear clustering of extant proteins in different specificity classes). 

      For a small domain such as WW, mutations that favor some specificity classes are known to have detrimental effects on fundamental properties, such as folding kinetics and stability, see Ref [72]. It is possible that larger, less constrained protein domains could allow for more crossreactive variants and smoother specifity switching. However, experiments on fluorescent proteins looking for interpolation between two wave-lengths have shown that the switch was abrupt [Poelwijk et al. Nature Communications (2019)].

      Our scope was to achieve a functional switch (imposed by the two extant end-points) through a path of designed, functional intermediates and to correctly predict, with our RBM model, the location of the specificity transition and of the cross-reactivity region (which we expected only along the I-IV path). This scope was successfully reached as demonstrated by experiments.  

      Reviewer #2:

      This is an extremely important work that shows how one can use generative models to construct specificity-switching mutational paths in complex fitness landscapes. The experimental evidence is very clear, and the theoretical tools are innovative.

      The work will likely have a deep impact on future research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

      The manuscript is extremely clear and well written, the experimental evidence is strong, and the methods are clearly described, so I do not have major issues to raise. A few minor issues are listed below.

      (1) I consider the WW domain as an 'easy' case from the point of view of generative modelling. The domain is rather short, epistatic effects are not very strong (e.g. Boltzmann learning usually converges very quickly to a very paramagnetic state), and the resulting models are well interpretable (e.g. the hidden units of the RBM correlate well with subclasses).

      This is not always (not often?) the case, however. In more complex proteins, the learning procedures can be slower and the resulting models less interpretable. Just for completeness, perhaps the authors could comment on the generality of the results and what they would expect for other systems based on their experience.

      We agree with Reviewer 2 that WW sequences are short and simple to handle from a computational point of view, and was chosen for this reason to test the design of full mutational paths (after having benchmarked it to lattice-protein models, see Refs. [30] and [44]). Our work gives additional support to the effectiveness of generative models learned from sequence data.  This said, from a biological point of view, WW is a highly constrained domain, see comment by Reviewer 1 above and our answer.

      In longer and more complex proteins, we expect it will be more difficult to disentangle specificityswitching latent units, see Fernandez-de-Cossio-Diaz et al., Physical Review X 2023 for a discussion and a possible computational approach to this issue. Notice that, while relating the latent units to specificity classes was convenient, it was not used to generate the paths themselves. Therefore, we believe that our method is quite robust and easily generalizable to applications to more complex and longer proteins. As an illustration, we have recently used it to sample viral trajectories (more precisely, variants of the Receptor Binding Domain of the SARSCoV-2 spike protein) capable of escaping antibody recognition, see Huot et al., PNAS 2026. In this recent work, we projected the paths onto the principal antigenic space, defined by the top two Principal Components of the viral variant binding affinities to 32 antibodies. In this representation, sampled paths displayed trends similar to natural paths, drawn from the sequences sampled during the pandemics. This finding supports the applicability and interpretation of our method for more complex proteins.

      (2) In Section 3.3, the authors say that direct paths connecting Class I and Class IV behave similarly to indirect paths, despite having lower scores according to the RBM. How generic is this? Does it also happen for other classes? This might be an important point to address, as direct paths are easier to sample.

      We think that this finding, true for paths connecting classes I and IV, is not general. In a previous paper we have benchmarked our path-designing approach on simple models of insilico lattice proteins and shown that indirect path led to gains in the overall fitness (computed according with the ground-truth model) [Mauri, Cocco, Monasson, Physical Review E 2023, fig. 9-12].

      In general, we would expect that indirect paths could explore alternative mutations, important to compensate for transitory destabilizing mutations that could occur along the path. We speculate that these stabilizing mutations happen for non-direct paths at its extremity near class-I wildtype. A slightly decrease in binding response to peptide C1 for direct path is nevertheless observed (see Suppl Table 4), but our experimental detection, focused on binding response, is not tailored to directly detect a difference in stability. When approaching the class-IV anchoring point, we observe that paths interpolating between classes I and IV are very constrained and show limited diversity, going through a funnel in sequence space corresponding to the direct path. We agree with Reviewer 2 that a more exhaustive comparison with direct paths would be interesting, and will add a sentence in conclusion.

      (3) The path shown in Figure 4 goes through a region of non-functionality around sequences 1819. It seems that the sample path is basically exploring the functional regions for Class I and Class II/III separately, trying to approach the other class, but then it can't really make the switch.

      By contrast, the path going from Class I to Class IV seems able to perform the functional switch in a single step (20-21) without losing too much of the function.

      Perhaps the authors could better comment on this? Is this a limitation of the sampling method, or a fundamental biological fact?

      Class I to Class IV paths and Class I to Class II paths fundamentally differ because the binding pocket in Class I WW domains is different from the one of Class IV WWs, while Classes I and II/III share the same binding region. This important difference may explain why class I specificity can switch to class IV specificity (steps 20-21), without completely loosing affinity to the peptide of class I. To investigate if the two binding regions are really independent or not, we have tested some additional specific mutations along the I-IV mutational paths. In our attempts to engineer cross-reactivity, we have observed that it is important to substantially lower affinity to class I peptide to acquire class IV specificity, in agreement with previous studies [72]. Moreover, the I to IV path seems to go through a funnel-like part in the region with no natural sequences, with the same transition intermediates obtained in several designed paths. This indicates that the Class I to Class IV functional switch is more constrained than the Class I to II switch. Let us also emphasize that our assessment of class specificity is based on one peptide for each class. It would be interesting to test multiple WW-binding peptides with similar biochemical properties to acquire a more complete view of the specificities. 

      (4) On page 12, it is stated that the temperature was chosen to 1/3 to maximize the score. This is important and should be mentioned earlier (I didn't notice it until that point).

      Section 3.5 explains that RBM samples can be biased, by lowering the sampling temperature to 1/3 to obtain high-scores sequences, which are more likely to be functional as proven in [Russ et al., Science 2020]. We acknowledge (as also noted by Reviewer 1) that this section comes at the end of the manuscript, while differences in scores along the path are shown before, so the discussion of this important point is somewhat delayed. We will add a sentence earlier in Results to explain this point.  

      (5) On page 13, it is stated that: "However, the scores of the ancestral sequences along the phylogenetic pathways assigned by the RBM are significantly lower than the ones of the RBMdesigned sequences. This result is expected as ASR reconstruction does not take into account epistasis, differently from RBM, and we expect ASR sequences to generally be of lesser quality."

      I was very surprised by this result. My own experience with ASR shows that, on the contrary, sequences found by ASR (via maximum likelihood) tend to have high scores in the (R)BM, and tend to be more stable than extant sequences. I attribute this to the fact that ASR typically finds a "consensus" sequence that maximizes the contribution to the score coming from the fields (the profile), which is typically dominant over the epistatic signal, resulting in a bigger score. Maybe the authors did not use maximum likelihood in the ASR? Some clarification might be useful here.

      We agree with Reviewer 2 that the consensus sequence is an atypical sequence for an independent model with a large RBM score. We will update Figure 5 of the manuscript to show that this is also happening in our case. 

      We use Maximum Likelihood in ASR but our ASR path corresponds to all internal nodes of the reconstructed tree joining the two extant sequences, not only to the most ancestral node. Overall, the ancestral sequences along the ASR paths are different from the consensus sequence (mean identity of 76% and 60% respectively). The most ancestral nodes in the paths  are also different from the consensus having 81% (paths between type I and IV domains) or 54%(paths between type I and II/III domains) similarity, and an RBM score  of -21, or -58, respectively. We agree that some ASR internal-node sequence have a higher score than the natural wild-types (extant sequences). This is shown in Fig. 6: several points have larger RBM score than the two anchoring points at the extremities of the path, possibly due to the fact that natural sequences are not always the most stable ones. As discussed in conclusion, ASR nodes have moreover generally better scores than the sequences obtained by sampling an independent model. Phylogenetic reconstruction implicitly takes into account some degree of co-variation between sites in natural sequences, as shown by the success of the use of the phylogenetic distance of a mutated sequence to the wild-type for predicting the fitness effect of these mutations [Laine, Mol. Biol. Evol. 2019]. 

      To better show this effect we will update Figure 6, reporting also the scores of the « scrambled » sequences, which do not respect potential epistasis extracted by the RBM. It appears that ASR sequences generally have better scores than the scrambled sequences, and lower than RBM sequences (sampled at T=1/3). RBM models takes into account multiple-residues correlations, which could contribute to reaching better scores than ASR and BM models. Ongoing studies on larger proteins show that the score of sequences sampled from ASR reconstruction, including the Maximum Likelihood one, can still be improved according to the RBM score by a few mutations consistent with the ASR posterior probabilities (unpublished). 

      Mistakes in the reference list will be amended in the updated version.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.<br /> Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.

      The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      The revised manuscript includes additional data supporting mitochondrial bioenergetic impairment in MIRO1 knockout VSMCs. Measurements of oxygen consumption rate (OCR), along with Complex I (ETC-CI) and Complex V activity, have been added and analyzed across multiple experimental conditions. Collectively, these findings provide a more comprehensive characterization of the mitochondrial functional state. Following revision, the association between MIRO1 deficiency and impaired Complex I activity is more robust.

      Although the precise molecular mechanism of action remains to be fully elucidated, in this updated version, experiments using a MIRO1 reducing agent are presented with improved clarity

      Although some limitations remain, the authors have addressed nearly all the concerns raised, and the manuscript has substantially improved

      Weaknesses:

      Figure 6: The authors do not address the concern regarding the cristae shape; however, characterization of the cristae phenotype with MIRO1 ΔTM would have strengthened the mechanistic link between MIRO1 and the MIB/MICOS complex

      Although the authors clarified their reasoning, they did not explore in vivo validation of key biochemical findings, which represents a limitation of the current study. While their justification is acknowledged, at least a preliminary exploratory effort could have been evaluated to reinforce the translational relevance of the study.

      Finally, in line with the explanations outlined in the rebuttal, the Discussion section should mention the limits of MIRO1 reducer treatment.

      Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses are suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) High-resolution respirometry (Oroboros) to determine mitochondrial ETC activity in permeabilized VSMCs would be informative.

      (2) Therapeutic targeting of MIRO1 failed to prevent neointima formation, however, the technical difficulties of such an experiment is appreciated.

      Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are useful for understanding the importance of mitochondrial positioning and function in this specific cell type, the main bioenergetic and mechanistic claims are not strongly supported.

      Strengths:

      This study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      This study explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a significant area for both basic and translational biology.

      The use of both in vivo and in vitro systems provides a useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      The proposed link between MIRO1 and respiratory supercomplex biogenesis or function is not clearly defined.

      Completeness and integration of mitochondrial assays is marginal, undermining the strength of the conclusions regarding oxidative phosphorylation.

      We thank the reviewers for their thoughtful and constructive feedback. We appreciate their recognition of our work’s value and the improvements made in this revised version.

      We are particularly grateful to Reviewer 3 for their detailed and insightful comments, which identified errors we (and other reviewers) had unfortunately overlooked. To address these concerns and ensure the manuscript meets the high standards of clarity and rigor we aim for, we have made additional corrections and refinements.

      As part of this process, we conducted a thorough review of the original source files. This was especially important given that the project spanned from 2018 to 2025, and many co-authors have since left their previous positions.

      We appreciate the opportunity to resubmit this manuscript and are confident that these updates fully address the concerns raised by the reviewer and the editorial team.

      Reviewer #3 (Recommendations for the authors):

      (1) I still do not see the data in WB 2G reflecting the quantification in 2H and 2I. Moreover, the authors state they performed 1 additional experiment, but it appears not to have been included in the analysis of 2H and 2I since the graphs remained the same from the last version of the manuscript.

      We apologize for this oversight. The additional experiment has now been incorporated into the analysis for Figures 2H and 2I, and the graphs have been updated accordingly. While we had uploaded the new blot, we inadvertently forgot to update the analysis graphs. Thank you for bringing this to our attention.

      (2) The authors talk several times about "supercomplexes 1 and 2" without testing their precise composition (there is a ton of literature about SC species in several mouse cell types, and separate BN-PAGE immunoblotting of individual MRC complexes would precisely define them in this context)

      We agree with the reviewer that this is an important point. However, structural differences between supercomplexes were outside the scope of this paper, and we did not perform such analyses. That said, examining the precise composition of supercomplexes could be a valuable direction for future work.

      (3) Steady-state levels of MRC subunits do not match the observations from BN-PAGE results. That might be potentially interpreted and explained by the possible accumulation of intermediates but this is not explored.

      We appreciate the reviewer’s observation. There is indeed a strong possibility that differences in the expression of structural components of mitochondrial complexes exist between WT and Miro1 -/- cells. However, in this study, we chose to focus on assessing potential differences in the enzymatic activities of the complexes rather than examining their structural composition. Exploring the accumulation of intermediates and structural differences could be an interesting avenue for future investigations.

      (4) Citrate synthase normalization of kinetic enzyme activities is claimed, yet it is not shown in any graph and no description of the method is provided.

      We sincerely thank the reviewer for pointing out this discrepancy. Upon careful review, we realized that our statement regarding citrate synthase normalization of kinetic enzyme activities in the last revised version was made in error. This was a miscommunication between co-authors, and we did not perform citrate synthase normalization. Instead, the normalization was performed against protein concentration, determined by the BCA assay as described in the manuscript. We regret this oversight and appreciate the opportunity to clarify this.

      (5) Complex I activity is still wrongfully described as NADPH oxidation in the methods

      We corrected this error.

      (6) The authors state 'Thank you for this comment. We believe this is due to a technical issue. Complex IV can be challenging to detect consistently, as its visibility is highly dependent on sample preparation conditions. In this specific case, we suspect that the buffer used during the isolation process may have influenced the detection of Complex IV'. I do not understand this, I find this justification insufficient and not substantiated by any experimental evidence. What buffer has been used for isolation? There are hundreds of protocols for isolation of intact mitochondria and MRC complexes. Also, DDM and digitonin are the gold-standard detergents for MRC complexes isolation and separation via BN-PAGE.

      We thank the reviewer for raising this important point. We have revised the response to clarify the exact experimental conditions and to provide supporting data.

      For BN-PAGE, mitochondrial fractions purified from cultured VSMCs or aortic tissue were prepared using a standard protocol (now explicitly detailed in the Methods). Briefly, mitochondria were resuspended in 6-aminocaproic acid (ACA) buffer containing 750 mM ACA, 50 mM Bis-Tris (pH 7.0), and protease inhibitors. Forty micrograms of mitochondrial protein were solubilized with 1.5% digitonin, using a final detergent-to-protein ratio of 8:1, and incubated on ice for 20 minutes prior to clarification by centrifugation at 16,000 g for 30 minutes at 4°C. Thus, consistent with established standards, digitonin—one of the gold-standard detergents for MRC complex solubilization and BN-PAGE—was used throughout.

      Despite using these widely accepted conditions, we found that detection of fully assembled Complex IV by BN-PAGE was inconsistent, a limitation that has been reported by others and is known to be sensitive to mitochondrial source, tissue type, and solubilization efficiency. To address this directly and avoid over-interpretation, we assessed Complex IV integrity by examining core subunits. As shown in Figure 6—figure supplement 1 (panels B and C), expression levels of MTCO1 and MTCO2, both essential core components of Complex IV, do not differ significantly between WT and Miro1-/- cells, supporting the conclusion that Complex IV abundance is not altered.

      We have revised the manuscript to clarify these methodological details and to explicitly state that conclusions regarding Complex IV are based on subunit analysis rather than BN-PAGE visualization alone.

      (7) Complex V IGA also does not seem to reflect its quantification.

      Thank you for highlighting this concern. To address it, we will include the numerical data alongside the figures to ensure clarity and alignment with our findings. We hope this will provide a more comprehensive understanding and resolve any ambiguity.

      (8) Figure 6 supplement 1, the authors state 'we concentrated on ETC1 and 5 and performed experiments in cells after expression of MIRO1 WT and MIRO1 mutants'. I do not understand, what background is being used? what mutants are being expressed? all the figures refer to Miro1 -/- which is, according to standard genetic nomenclature, a loss-of-function allele (KO).

      Thank you for your comment. To clarify, we first infected MIRO1fl/fl VSMCs with an adenovirus expressing the DNA recombinase Cre or a control adenovirus. Cells infected with the adenovirus expressing Cre are labeled as MIRO1-/- cells. In these MIRO1-/- cells, we then introduced MIRO1 wild type (WT) and MIRO1 mutants via adenoviral expression.

      The mutants include one lacking the transmembrane domain (MIRO1-ΔTM), and another in which the two EF hands of MIRO1 were point-mutated (MIRO1-KK). MIRO1-WT is denoted as Ad WT, the mutant MIRO1-KK as Ad KK, and MIRO1-ΔTM as Ad ΔTM in the figures. We hope this explanation clarifies the experimental background and nomenclature used.

      (9) Figure 6 supplement 1B, no normalization is provided (e.g. VDAC, TOM20 etc.). Interestingly, VDAC is then used to normalize the data in C-D-E-F-G. Also, why is MIRO1 detected in lane 4? Is the mutant stable or not? There is zero signal in A.

      Thank you very much for pointing out that the immunoblot for VDAC1 was missing in Figure 6—Supplement 1B. This figure has been reviewed several times, and unfortunately, this error was not detected. We sincerely apologize for this oversight. We have now revised the figure to include the immunoblot for VDAC1 to address this issue.

      Regarding the detection of MIRO1 in lane 4, we confirm that the "mutant" is not stable. To generate MIRO1 knockout cells, aortic smooth muscle cells from MIRO1fl/fl mice were isolated and cultured, followed by infection with an adenovirus expressing Cre. As these are primary cells and the deletion was induced by Cre expression, the recombination efficiency can vary, which is reflected in the variability observed in lanes 2 and 4 of the immunoblot.

      (10) Why are COX4 levels so low in the 2nd replicate in 7A? the authors 'We also performed anti-VDAC immunoblots on the same membranes as alternative loading control (see image below)'. I could not find the image.

      Thank you for your comment. The second pair of samples in Figure 7A is from a different preparation of mitochondria. In our experimental design, a control sample and a MIRO1 knockdown sample were processed side by side and run next to each other on the immunoblot.

      Regarding the anti-VDAC immunoblot, the image was included in our response to reviewers during the previous revision, as we did not believe it altered the message conveyed by the COX4 blot. However, to ensure clarity and address your concern, we have now included the anti-VDAC immunoblot directly in the figure. We hope this addition resolves any ambiguity and provides further confidence in the data presented.

      (11) The proposed interaction between MIRO1 and NDUFA9 is very difficult to reconcile, as the two proteins reside in distinct mitochondrial compartments. MIRO1 is anchored to the outer mitochondrial membrane (OMM), with its functional domains facing the cytosol, whereas NDUFA9 is a matrix-facing accessory subunit of mitochondrial Complex I, positioned at the interface between the N- and Q-modules.

      We appreciate the reviewer’s comment and agree that MIRO1 and NDUFA9 occupy distinct mitochondrial compartments. MIRO1 is anchored to the outer mitochondrial membrane with cytosol-facing domains, whereas NDUFA9 is a matrix-facing accessory subunit of Complex I at the N/Q-module interface.

      Our data do not suggest a stable, constitutive interaction within intact mitochondria. Rather, the observed association likely reflects an indirect, transient, or context-dependent interaction, potentially occurring during mitochondrial stress, remodeling, or turnover. Such associations may be mediated by multi-protein complexes spanning mitochondrial membranes, dynamic contact sites, or post-lysis interactions detected under experimental conditions. Increasing evidence supports functional coupling between outer mitochondrial membrane proteins and inner membrane or matrix pathways without direct physical binding.

      Additional comments:

      (12) All the raw data should be provided to the readers (uncropped and annotated WB, IHC images, numerical data with statistics applied).

      We agree with the reviewer and appreciate the emphasis on transparency. In accordance with eLife submission requirements, we have provided all raw data. The Source Data files associated with each figure now include uncropped and annotated immunoblots, as well as the numerical source data for all quantified analyses.

      During the compilation of these materials, we were unable to locate the original source files for Figure 2A. The control experiment depicted in the previous version, which demonstrates in vitro recombination, was performed in 2018. However, this experiment was repeated several times throughout the project. Therefore, to ensure the manuscript remains complete, we have replaced this panel with a representative immunoblot from a similar experiment. Additionally, during our review, we discovered a labeling error in Figure 3D and G. We have corrected these figures to ensure accuracy.

      All source files have been provided and carefully labeled to facilitate independent evaluation.

    1. Author response:

      Point-by-point description of the revisions

      Reviewer #1:

      Thank you very much for considering that our manuscript evaluates an important question and that the reagents used are well prepared and characterized. We also much appreciate that you consider the information generated as potentially useful for those studying HIV infection processes and strategies to prevent infection.

      (1) While a single particle tracking routine was applied to the data, it's not clear how the signal from a single GFP was defined and if movement during the 100 ms acquisition time impacts this. My concern would be that the routine is tracking fluctuations, and these are related to single particle dynamics, it appears from the movies that the density or the GFP tagged receptors in the cells is too high to allow clear tracking of single molecules. SPT with GFP is very difficult due to bleaching and relatively low quantum yield. Current efforts in this direction that are more successful include using SNAP tags with very photostable organic fluorophores. The data likely does mean something is happening with the receptor, but they need to be more conservative about the interpretation.

      Some of the paradoxical effects might be better understood through deeper analysis of the SPT data, particularly investigation of active transport and more detailed analysis of "immobile" objects. Comments on early figures illustrate how this could be approached. This would require selecting acquisitions where the GFP density is low enough for SPT and performing a more detailed analysis, but this may be difficult to do with GFP.

      When the authors discuss clusters of <2 or >3, how do they calibrate the value of GFP and the impact of diffusion on the measurement. One way to approach this might be single molecules measurements of dilute samples on glass vs in a supported lipid bilayer to map the streams of true immobility to diffusion at >1 µm2/sec.

      We fully understand the reviewer’s apprehensions regarding the application of these high-end biophysical techniques, in particular the associated complexity of the data analysis. We provide below extensive explanations on our methodology, which we hope will satisfactorily address all of the reviewer’s concerns.

      We would first like to emphasize that the experimental conditions and the quantitative analysis used in our current experiments are similar to the established protocols and methodologies applied by our group previously (Martinez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022; Gardeta et al. Frontiers in Immunol., 2022; García-Cuesta et al. eLife, 2024; Gardeta et al. Cell. Commun. Signal., 2025) and by others (Calebiro et al. PNAS, 2013; Jaqaman et al. Cell, 2011; Mattila et al. Immunity, 2013; Torreno-Pina et al. PNAS, 2014; Torreno-Pina et al. PNAS, 2016).

      As SPT (single-particle tracking) experiments require low-expressing conditions in order to follow individual trajectories (Manzo & García-Parajo Rep. Prog. Phys., 2015), we transiently transfected Jurkat CD4<sup>+</sup> cells with CXCR4-AcGFP or CXCR4<sup>R334X</sup>-AcGFP. At 24 h post-transfection, cells expressing low CXCR4-AcGFP levels were selected by a MoFlo Astrios Cell Sorter (BeckmanCoulter) to ensure optimal conditions for SPT. Using Dako Qifikit (DakoCytomation), we quantified the number of CXCR4 receptors and found ~8,500 – 22,000 CXCR4-AcGFP receptors/cell, which correspond to a particle density ~2 – 4.5 particles/µm<sup>2</sup> (Author response image 1) and are similar to the expression levels found in primary human lymphocytes.

      Author response image 1.

      Purified AcGFP monomeric protein was immobilized on glass at various concentrations. Dependency of the distribution of particle components on particle density was calculated; >95% were monomeric single particles at 2.0-4.5 particles/µm<sup>2</sup>. This range of particle density was used to analyze the dynamics of CXCR4-AcGFP, or CXCR4<sup>R334X</sup>-AcGFP single particles on JKCD4 cells.

      These cells were resuspended in RPMI supplemented with 2% FBS, NaPyr and L-glutamine and plated on 96-well plates for at least 2 h. Cells were centrifuged and resuspended in a buffer with HBSS, 25 mM HEPES, 2% FBS (pH 7.3) and plated on glass-bottomed microwell dishes (MatTek Corp.) coated with fibronectin (FN) (Sigma-Aldrich, 20 µg/ml, 1 h, 37°C). To observe the effect of the ligand, we coated dishes with FN + CXCL12; FN + X4-gp120 or FN + VLPs, as described in material and methods; cells were incubated (20 min, 37°C, 5% CO<sub>2</sub>) before image acquisition.

      For SPT measurements, we use a total internal reflection fluorescence (TIRF) microscope (Leica AM TIRF inverted) equipped with an EM-CCD camera (Andor DU 885-CS0-#10-VP), a 100x oilimmersion objective (HCX PL APO 100x/1.46 NA) and a 488-nm diode laser. The microscope was equipped with incubator and temperature control units; experiments were performed at 37°C with 5% CO<sub>2</sub>. To minimize photobleaching effects before image acquisition, cells were located and focused using the bright field, and a fine focus adjustment in TIRF mode was made at 5% laser power, an intensity insufficient for single-particle detection that ensures negligible photobleaching. Image sequences of individual particles (500 frames) were acquired at 49% laser power with a frame rate of 10 Hz (100 ms/frame). The penetration depth of the evanescent field used was 90 nm.

      We performed automatic tracking of individual particles using a very well established and common algorithm first described by Jaqaman (Jaqaman et al. Nat. Methods, 2008). Nevertheless, we would stress that we implemented this algorithm in a supervised fashion, i.e., we visually inspect each individual trajectory reconstruction in a separate window. Indeed, this algorithm is not able to quantify merging or splitting events.

      We follow each individual fluorescence spot frame-by-frame using a three-by-three matrix around the centroid position of the spot, as it diffuses on the cell membrane. To minimize the effect of photon fluctuations, we averaged the intensity over 20 frames. Nevertheless, to assure the reviewer that most of the single molecule traces last for at least 50 frames (i.e., 5 seconds), we provide the following data and arguments. We currently measure the photobleaching times from individual CD86-AcGFP spots exclusively having one single photobleaching step to guarantee that we are looking at individual CD86-AcGFP molecules. The distribution of the photobleaching times is shown below (Author response image 2). Fitting of the distribution to a single exponential decay renders a t0 value of ~5 s. Thus, with 20 frames averaging, we are essentially measuring the whole population of monomers in our experiments. As the survival time of a molecule before photobleaching will strongly depend on the excitation conditions, we used low excitation conditions (2 mW laser power, which corresponds to an excitation power density of ~0.015 kW/cm<sup>2</sup> considering the illumination region) and longer integration times (100 ms/frame) to increase the signal-to-background for single GFP detection while minimizing photobleaching.

      Author response image 2.

      Single molecule photobleaching times measured directly from single molecule trajectories of CD86-AcGFP, considering only traces that exhibit single molecule photobleaching steps. The experimental data are shown in gray bars (n=273 trajectories over 3 independent experiments). The red line corresponds to a single exponential decay fitting of the experimental data, from where t<sub>o</sub> has been extracted.

      To infer the stoichiometry of receptor complexes, we also perform single-step photobleaching analysis of the TIRF trajectories to establish the existence of different populations of monomers, dimers, trimers and nanoclusters and extract their percentage. Some representative trajectories of CXCR4-AcGFP with the number of steps detected are shown in new Supplementary Figure 1.  

      The emitted fluorescence (arbitrary units, a.u.) of each spot in the cells is quantified and normalized to the intensity emitted by monomeric CD86-AcGFP spots that strictly showed a single photobleaching step (Dorsch et al. Nat. Methods, 2009). We have preferred to use CD86-AcGFP in cells rather than AcGFP on glass to exclude any potential effect on the different photodynamics exhibited by AcGFP when bound directly to glass. We have also previously shown pharmacological controls to exclude CXCL12-mediated receptor clustering due to internalization processes (Martinez-Muñoz et al. Mol. Cell, 2018) that, together with the evaluation of single photobleaching steps and intensity histograms, allow us to exclude the presence of vesicles in our data. Thus, the dimers, trimers and nanoclusters found in our data do correspond to CXCR4 molecules on the cell surface. Finally, distribution of monomeric particle intensities, obtained from the photobleaching analysis, was analyzed by Gaussian fitting, rendering a mean value of 980 ± 86 a.u. This value was then used as the monomer reference to estimate the number of receptors per particle in both cases, CXCR4-AcGFP and CXCR4<sup>R334X</sup>-AcGFP (new Supplementary Figure 1).

      (2) I understand that the CXCL12 or gp120 are attached to the substrate with fibronectin for adhesion. I'm less clear how how that VLPs are integrated. Were these added to cells already attached to FN?

      For TIRF-M experiments, cells were adhered to glass-bottomed microwell dishes coated with fibronectin, fibronectin + CXCL12, fibronectin + X4-gp120, or fibronectin + VLPs. As for CXCL12 and X4-gp120, the VLPs were attached to fibronectin taking advantage of electrostatic interactions. To clarify the integration of the VLPs in these assays, we have stained the microwell dishes coated with fibronectin and those coated with fibronectin + VLPs with wheat germ agglutinin (WGA) coupled to Alexa647 (Author response image 3) and evaluated the staining by confocal microscopy. These results indicate the presence of carbohydrates on the VLPs and are, therefore, indicative of the presence of VLPs on the fibronectin layer.

      Author response image 3.

      Representative confocal images of microwell dishes coated with fibronectin ((left panel) or fibronectin + VLPs (right panel)) and stained with wheat germ agglutinin (WGA) coupled to Alexa647. Bar scale 1µm.

      Moreover, it is important to remark that the effect of the VLPs on CXCR4 behavior at the cell surface observed by TIRF-M confirmed that the VLPs remained attached to the substrate during the experiment.

      (3) Fig 1A - The classification of particle tracks into mobile and immobile is overly simplistic description that goes back to bulk FRAP measurements and it not really applicable to single molecule tracking data, where it's rare to see anything that is immobile and alive. An alternative classification strategy uses sub-diffusion, normal diffusion and active diffusion (or active transport) to descriptions and particles can transition between these classes over the tracking period. Fig 1B- this data might be better displayed as histograms showing distributions within the different movement classes.

      In agreement with the reviewer’s commentary, the majority of the particles detected in our TIRFM experiments were indeed mobile. However, we also detected a variable, and biologically appreciable, percentage of immobile particles depending on the experimental condition analyzed (Figure 1A in the main manuscript). To establish a stringent threshold for identifying these immobile particles under our specific experimental conditions, we used purified monomeric AcGFP proteins immobilized on glass coverslips. Our analysis demonstrated that 95% of these immobilized proteins showed a diffusion coefficient £0.0015 µm<sup>2</sup>/s; consequently, this value was established as the cutoff to distinguish immobile from mobile trajectories. While the observation of truly immobile entities in a dynamic, living system is rare, the presence of these particles under our conditions is biologically significant. For instance, the detection of large, immobile receptor nanoclusters at the plasma membrane is entirely consistent with facilitating key cellular processes, such as enabling the robust signaling cascade triggered by ligand binding or promoting the crucial events required for efficient viral entry into the cells.

      Regarding the mobile receptors (defined as those with D<sub>1-4</sub> values exceeding 0.0015 µm<sup>2</sup>/s), we observed distinct diffusion profiles derived from mean square displacement (MSD) plots (Figure V) (Manzo & García-Parajo Rep. Prog. Phys., 2015), which were further classified based on motion, using the moment scaling spectrum (MSS) (Ewers et al. PNAS, 2005). Under all experimental conditions, the majority of mobile particles, ~85%, showed confined diffusion: for example under basal conditions, without ligand addition, ~90% of mobile particles showed confined diffusion, ~8.5% showed Brownian-free diffusion and ~1.5% exhibited directed motion (new Supplementary Figure 5A in the main manuscript). These data have been also included in the revised manuscript to show, in detail, the dynamic parameters of CXCR4.

      Due to the space constraints, it is very difficult to include all the figures generated. However, to ensure comprehensive assessment and transparency (for the purpose of this review), we have included below representative plots of the MSD values as a function of time from individual trajectories, showing different types of motion obtained in our experiments (Author response image 4).

      Author response image 4.

      Representative MSD plots from individual trajectories of CXCR4AcGFP detected by SPT-TIRF in resting JKCD4 cells showing different types of motion: A) confined, B) Brownian/Free, C) direct transport.

      (4) Fig 1C,D - It would be helpful to see a plot of D vs MSI at a single particle level. In comparing C and D I'm surprised there is not a larger difference between CXCL12 and X4-gp120. It would also be very important to see the behaviour of X4-gp120 on the CXCR4 deficient Jurkat that would provide a picture of CD4 diffusion. The CXCR4 nanoclustering related to the X4-gp120 could be dominated by CD4 behaviour.

      As previously described, all analyses were performed under SPT conditions (see previous response to point 1). Figure 1C details the percentage of oligomers (>3 receptors/particle) calibrated using Jurkat CD4<sup>+</sup> cells electroporated with monomeric CD86-AcGFP (Dorsch et al. Nat. Methods, 2009). The monomer value was determined by analyzing photobleaching steps as described in our previous response to point 1.

      In our experiments, we observed a trend towards a higher number of oligomers upon activation with CXCL12 compared with X4-gp120. This trend was further supported by measurements of Mean Spot Intensity. However, the values are also influenced by the number of larger spots, which represents a minor fraction of the total spots detected.

      The differences between the effect triggered by CXCL12 or X4-gp120 might also be attributed to a combination of factors related to differences in ligand concentration, their structure, and even to the technical requirements of TIRF-M. Both ligands are in contact with the substrate (fibronectin) and the specific nature of this interaction may differ between both ligands and influence their accessibility to CXCR4. Moreover, the requirement of the prior binding of gp120 to CD4 before CXCR4 engagement, in contrast to the direct binding of CXCL12 to CXCR4, might also contribute to the differences observed.

      We previously reported that CXCL12-mediated CXCR4 dynamics are modulated by CD4 coexpression (Martinez-Muñoz et al. Mol. Cell, 2018). We have now detected the formation of CD4 heterodimers with both CXCR4 and CXCR4<sup>R334X</sup>, and found that these conformations are influenced by gp120-VLPs. In the present manuscript, we did not focus on CD4 clustering as it has been extensively characterized previously (Barrero-Villar et al. J. Cell Sci., 2009; JiménezBaranda et al. Nat. Cell. Biol., 2007; Yuan et al. Viruses, 2021). Regarding the investigation of the effects of X4-gp120 on CXCR4-deficient Jurkat cells, which would provide a picture of CD4 diffusion, we would note that a previous report has already addressed this issue using single molecule super-resolution imaging, and revealed that CD4 molecules on the cell membrane are predominantly found as individual molecules or small clusters of up to 4 molecules, and that the size and number of these clusters increases upon virus binding or gp120 activation (Yuan et al. Viruses, 2021).

      (5) Fig S1D- This data is really interesting. However, if both the CD4 and the gp120 have his tags they need to be careful as poly-His tags can bind weakly to cells and increasing valency could generate some background. So, they should make the control is fair here. Ideally, using non-his tagged person of sCD4 and gp120 would be needed ideal or they need a His-tagged Fab binding to gp120 that doesn't induce CXCR4 binding.

      New Supplementary Figure 2D shows that X4-gp120 does not bind Daudi cells (these cells do not express CD4) in the absence of soluble CD4. While the reviewer is correct to state that both proteins contain a Histidine Tag, cell binding is only detected if X4-gp120 binds sCD4. Nonetheless, we have included in the revised Supplementary Figure 2D a control showing the negative binding of sCD4 to Daudi cells in the absence of X4-gp120. Altogether, these results confirm that only sCD4/X4-gp120 complexes bind these cells.

      (6) Fig S4- Panel D needs a scale bar. I can't figure out what I'm being shown without this.

      Apologies. A scale bar has been included in this panel (new Supplementary Figure 6D).

      Reviewer #2:

      (1) This study is well described in both the main text and figures. Introduction provides adequate background and cites the literature appropriately. Materials and Methods are detailed. Authors are careful in their interpretations, statistical comparisons, and include necessary controls in each experiment. The Discussion presents a reasonable interpretation of the results. Overall, there are no major weaknesses with this manuscript.

      We very much appreciate the positive comments of the reviewer regarding the broad interest and strength of our work.

      (2) NL4-3deltaIN and immature HIV virions are found to have less associated gp120 relative to wild-type particles. It is not obvious why this is the case for the deltaIN particles or genetically immature particles. Can the authors provide possible explanations? (A prior paper was cited, Chojnacki et al Science, 2012 but can the current authors provide their own interpretation.)

      Our conclusion from the data is actually exactly the opposite. As shown in Figure 2D, the gp120 staining intensity was higher for NL4-3DIN particles (1,786 a.u.) than for gp120-VLPs (1,223 a.u.), indicating lower expression of Env proteins in the latter. Furthermore, analysis of gp120 intensity per particle (Figure 2E) confirmed that gp120-VLPs contained fewer gp120 molecules per particle than NL4-3DIN virions. These levels were comparable with, or even lower than, those observed in primary HIV-1 viruses (Zhu et al. Nature, 2006). This reduction was a direct consequence of the method used to generate the VLPs, as our goal was to produce viral particles with minimal gp120 content to prevent artifacts in receptor clustering that might occur using high levels of Env proteins in the VLPs to activate the receptors.  

      This misunderstanding may arise from the fact that we also compared Gag condensation and Env distribution on the surface of gp120-VLPs with those observed in genetically immature particles and integrase-defective NL4-3ΔIN virions, which served as controls. STED microscopy data revealed differences in Env distribution between gp120-VLPs and NL4-3ΔIN virions, supporting the classification of gp120-VLPs as mature particles (Figure 2 A,B).

      Reviewer #3:

      We thank the reviewer for considering that our work offers new insights into the spatial organization of receptors during HIV-1 entry and infection and that the manuscript is well written, and the findings significant.

      (1) For mechanistic basis of gp120-CXCR4 versus CXCL12-CXCR4 differences. Provide additional structural or biochemical evidence to support the claim that gp120 stabilises a distinct CXCR4 conformation compared to CXCL12. If feasible, include molecular modelling, mutagenesis, or crosslinking experiments to corroborate the proposed conformational differences.

      We appreciate the opportunity to clarify this point. The specific claim that gp120 stabilizes a conformation of CXCR4 that is distinct from the CXCL12-bound state was not explicitly stated in our manuscript, although we agree that our data strongly support this possibility. It is important to consider that CXCL12 binds directly to CXCR4, whereas gp120 requires prior sequential binding to CD4, and its subsequent interaction is with a CXCR4 molecule that is already forming part of the CD4/CXCR4 complex, as demonstrated by our FRET experiments and supported by previous studies (Zaitseva et al. J. Leuk. Biol., 2005; Busillo & Benovic Biochim. Biophys. Acta, 2007; Martínez-Muñoz et al. PNAS, 2014). This difference makes it inherently complex to compare the conformational changes induced by gp120 and CXCL12 on CXCR4.

      However, our findings show that both stimuli induce oligomerization of CXCR4, a phenomenon not observed when mutant CXCR4<sup>R334X</sup> was exposed to the chemokine CXCL12 (García-Cuesta et al. PNAS, 2022).

      (1) CXCL12 induced oligomerization of CXCR4 but did not affect the dynamics of CXCR4<sup>R334X</sup> (Martinez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022). By contrast, X4-gp120 and the corresponding VLPs—which require initial binding to CD4 to engage the chemokine receptor—stabilized oligomers of both CXCR4 and CXCR4<sup>R334X</sup>.

      (2) FRET analysis revealed distinct FRET<sub>50</sub> values for CD4/CXCR4 (2.713) and CD4/CXCR4<sup>R334X</sup> (0.399) complexes, suggesting different conformations for each complex.

      (3) Consistent with previous reports (Balabanian et al. Blood, 2005; Zmajkovicova et al. Front. Immunol., 2024; García-Cuesta et al. PNAS, 2022), the molecular mechanisms activated by CXCL12 are distinct when comparing CXCR4 with CXCR4<sup>R334X</sup>. For instance, CXCL12 induces internalization of CXCR4, but not of mutant CXCR4<sup>R334X</sup>. Conversely, X4-gp120 triggers approximately 25% internalization of both receptors. Similarly, CXCL12 does not promote CD4 internalization in cells co-expressing CXCR4 or CXCR4<sup>R334X</sup>, whereas X4-gp120 does, although CD4 internalization was significantly higher in cells co-expressing CXCR4.

      These findings suggest that CD4 influences the conformation and the oligomerization state of both co-receptors. To further support this hypothesis, we have conducted new in silico molecular modeling of CD4 in complex with either CXCR4 or its mutant CXCR4<sup>R334X</sup> using AlphaFold 3.0 (Abramson et al. Nature, 2024). The server was provided with both sequences, and the interaction between the two molecules for each protein was requested. It produced a number of solutions, which were then analyzed using the software ChimeraX 1.10 (Meng et al. Protein Sci., 2023). CXCR4 and its mutant, CXCR4<sup>R334X</sup> bound to CD4, were superposed using one of the CD4 molecules from each complex, with the aim of comparing the spatial positioning of CD4 molecules when interacting with CXCR4.

      Author response image 5.

      CD4/CXCR4 complexes were superimposed with CD4/CXCR4 complexes (left panel) or CD4/CXCR4<sup>R334X</sup> complexes (right panels). Arrows indicate the CD4 molecule used as reference for the superimposing.

      As illustrated in Author response image 5, the superposition of the CD4/CXCR4 complexes was complete. However, when CD4/CXCR4 complexes were superimposed with CD4/CXCR4<sup>R334X</sup> complexes using the same CD4 molecule as a reference, indicated by an arrow in the figure, a clear structural deviation became evident. The main structural difference detected was the positioning of the CD4 transmembrane domains when interacting with either the wild-type or mutant CXCR4. While in complexes with CXCR4, the angle formed by the lines connecting residues E416 at the C-terminus end of CD4 with N196 in CXCR4 was 12°, for the CXCR4<sup>R334X</sup> complex, this angle increased to 24°, resulting in a distinct orientation of the CD4 extracellular domain (Author response image 6).

      Author response image 6.

      Comparison of the angle between the transmembrane domains of CD4 in CXCR4 WT and WHIM complexes. The angle between residues N196 from one CXCR4 molecule and E416 from the two CD4 dimer molecules was calculated for the CXCR4 WT (12°) and WHIM (24°) complexes to demonstrate the difference in CD4 positioning.

      To further analyze the models obtained, we employed PDBsum software (Laskowski & Thornton Protein Sci., 2021) to predict the CD4/CXCR4 interface residues. Data indicated that at least 50% of the interaction residues differed when the CD4/CXCR4 interaction surface was compared with that of the CD4/CXCR4<sup>R334X</sup> complex (Author response image 7). It is important to note that while some hydrogen bonds were present in both complex models, others were exclusive to one of them. For instance, whereas Cys<sup>394</sup>(CD4)-Tyr<sup>139</sup> and Lys<sup>299</sup>(CD4)-Glu<sup>272</sup> were present in both CD4/CXCR4 and CD4/CXCR4<sup>R334X</sup> complexes, the pairs Asn<sup>337</sup>(CD4)-Ser<sup>27</sup>(CXCR4<sup>R334X</sup>) and Lys<sup>325</sup>(CD4)-Asp<sup>26</sup>(CXCR4<sup>R334X</sup>) were only found in CD4/CXCR4<sup>R334X</sup> complexes.

      Author response image 7.

      Interacting residues at the CD4/CXCR4 interface. The panel displays the interface residues from the CXCR4 and CD4 oligomer. CD4 residues labeled with a red sphere show the interacting residues present in both CXCR4-WT and –WHIM hetero- oligomers. The continuous red lines represent a saline bridge, while the blue lines indicate a hydrogen bond and the dashed red lines represent non-bonded interactions. As illustrated in the figure, half of the interacting residues differ between the WT and WHIM models, indicating that the interacting surfaces are also distinct.

      These findings, which are consistent with our FRET results, suggest distinct interaction surfaces between CD4 and the two chemokine receptors. Overall, these results are compatible with differences in the spatial conformation adopted by these complexes.

      (2) For Empty VLP effects on CXCR4 dynamics: Explore potential causes for the observed effects of Envdeficient VLPs. It's valuable to include additional controls such as particles from non-producer cells, lipid composition analysis, or blocking experiments to assess nonspecific interactions.

      As VLPs are complex entities, we thought that the relevant results should be obtained comparing the effects of Env(-) VLPs with gp120-VLPs. Therefore, we would first remark that regardless of the effect of Env(-) VLPs on CXCR4 dynamics, the most evident finding in this study is the strong effect of gp120-VLPs compared with control Env(-) VLPs. Nevertheless, regarding the effect of the Env(-) VLPs compared with medium, we propose several hypotheses. As several virions can be tethered to the cell surface via glycosaminoglycans (GAGs), we hypothesized that VLPs-GAGs interactions might indirectly influence the dynamics of CXCR4 and CXCR4<sup>R334X</sup> at the plasma membrane. Additionally, membrane fluidity is essential for receptor dynamics, therefore VLPs interactions with proteins, lipids or any other component of the cell membrane could also alter receptor behavior. It is well known that lipid rafts participate in the interaction of different viruses with target cells (Nayak & Hu Subcell. Biochem., 2004; Manes et al. Nat. Rev. Immunol., 2003; Rioethmullwer et al. Biochim. Biophys. Acta, 2006) and both the lipid composition and the presence of co-expressed proteins modulate ligand-mediated receptor oligomerization (Gardeta et al. Frontiers in Immunol., 2022; Gardeta et al. Cell. Commun. Signal., 2025). We have thus performed Raster Image Correlation Spectroscopy (RICS) analysis to assess membrane fluidity through membrane diffusion measurements on cells treated with Env(-) VLPs.

      Jurkat cells were labeled with Di-4-ANEPPDHG and seeded on FN and on FN + VLPs prior to analysis by RICS on confocal microscopy. The results indicated no significant differences in membrane diffusion under the treatment tested, thereby discarding an effect of VLPs on overall membrane fluidity (Author response image 8).

      Author response image 8.

      VLPs treatment does not alter cell membrane fluidity. Diffusion values obtained by RICS from JKCD4X4 cells. (n = 3, with at least 10 cells analyzed per experiment and condition; n.s., not significant).

      Nonetheless, these results do not rule out other non-specific interactions of Env(-) VLPs with membrane proteins that could affect receptor dynamics. For instance, it has been reported that Ctype lectin DC-SIGN acts as an efficient docking site for HIV-1 (Cambi et al. J. Cell. Biol., 2004; Wu & KewalRamani Nat. Rev. Immunol., 2006). However, a detailed investigation of these possible mechanisms is beyond the scope of this manuscript.

      (3) For Direct link between clustering and infection efficiency - Test whether disruption of CXCR4 clustering (e.g., using actin cytoskeleton inhibitors, membrane lipid perturbants, or clustering-deficient mutants) alters HIV-1 fusion or infection efficiency.

      Designing experiments using tools that disrupt receptor clustering by interacting with the receptors themselves is difficult and challenging, as these tools bind the receptor and can therefore alter parameters such as its conformation and/or its distribution at the cell membrane, as well as affect some cellular processes such as HIV-1 attachment and cell entry. Moreover, effects on actin polymerization or lipids dynamics can affect not only receptor clustering but also impact on other molecular mechanisms essential for efficient infection.

      Many previous reports have, nonetheless, indirectly correlated receptor clustering with cell infection efficiency. Cholesterol plays a key role in the entry of several viruses. Its depletion in primary cells and cell lines has been shown to confer strong resistance to HIV-1-mediated syncytium formation and infection by both CXCR4- and CCR5-tropic viruses (Liao et al. AIDS Res. Hum. Retroviruses, 2021). Moderate cholesterol depletion also reduces CXCL12-induced CXCR4 oligomerization and alters receptor dynamics (Gardeta et al. Cell. Commun. Signal., 2025). By restricting the lateral diffusion of CD4, sphingomyelinase treatment inhibits HIV-1 fusion (Finnegan et al. J. Virol., 2007). Depletion of sphingomyelins also disrupts CXCL12mediated CXCR4 oligomerization and its lateral diffusion (Gardeta et al. Front Immunol., 2022). Additional reports highlight the role of actin polymerization at the viral entry site, which facilitates clustering of HIV-1 receptors, a crucial step for membrane fusion (Serrano et al. Biol. Cell., 2023). Blockade of actin dynamics by Latrunculin A treatment, a drug that sequesters actin monomers and prevents its polymerization, blocks CXCL12-induced CXCR4 dynamics and oligomerization (Martínez-Muñoz et al. Mol. Cell, 2018).

      Altogether, these findings strongly support our hypothesis of a direct link between CXCR4 clustering and the efficiency of HIV-1 infection.

      (4) CD4/CXCR4 co-endocytosis hypothesis - Support the proposed model with direct evidence from livecell imaging or co-localization experiments during viral entry. Clarification is needed on whether internalization is simultaneous or sequential for CD4 and CXCR4.

      When referring to endocytosis of CD4 and CXCR4, we only hypothesized that HIV-1 might promote the internalization of both receptors either sequentially or simultaneously. The hypothesis was based in several findings:

      a) Previous studies have suggested that HIV-1 glycoproteins can reduce CD4 and CXCR4 levels during HIV-1 entry (Choi et al. Virol. J., 2008; Geleziunas et al. FASEB J, 1994; Hubert et al. Eur. J. Immunol., 1995).

      b) Receptor endocytosis has been proposed as a mechanism for HIV-1 entry (Daecke et al. J. Virol., 2005; Aggarwal et al. Traffick, 2017; Miyauchi et al. Cell, 2009; Carter et al. Virology, 2011).

      c) Our data from cells activated with X4-gp120 demonstrated internalization of CD4 and chemokine receptors, which correlated with HIV-1 infection in PBMCs from WHIM patients and healthy donors.

      d) CD4 and CXCR4 have been shown to co-localize in lipid rafts during HIV-1 infection (Manes et al. EMBO Rep., 2000; Popik et al. J. Virol., 2002)

      e) Our FRET data demonstrated that CD4 and CXCR4 form heterocomplexes and that FRET efficiency increased after gp120-VLPs treatment.

      We agree with the reviewer that further experiments are required to test this hypothesis, however, we believe that this is beyond the scope of the current manuscript.

      Minor Comments:

      (1) The conclusions rely solely on the HXB2 X4-tropic Env. It would strengthen the study to assess whether other X4 or dual-tropic strains induce similar receptor clustering and dynamics.

      The primary goal of our current study was to investigate the dynamics of the co-receptor CXCR4 during HIV-1 infection, motivated by previous reports showing CD4 oligomerization upon HIV1 binding and gp120 stimulation (Yuan et al. Viruses, 2021). We initially used a recombinant X4gp120, a soluble protein that does not fully replicate the functional properties of the native HIV-1 Env. Previous studies have shown that Env consists of gp120 trimers, which redistribute and cluster on the surface of virions following proteolytic Gag cleavage during maturation (Chojnacki et al. Nat. Commun., 2017). An important consideration in receptor oligomerization studies is the concentration of recombinant gp120 used, as it does not accurately reflect the low number of Env trimers present on native HIV-1 particles (Hart et al. J. Histochem. Cytochem., 1993; Zhu et al. Nature, 2006). To address these limitations, we generated virus-like particles (VLPs) containing low levels of X4-gp120 and repeated the dynamic analysis of CXCR4. The use of primary HIV-1 isolates was limited, in this project, to confirm that PBMCs from both healthy donors and WHIM patients were equally susceptible to infection. This result using a primary HIV-1 virus supports the conclusion drawn from our in vitro approaches. We thus believe that although the use of other X4- and dual-tropic strains may complement and reinforce the analysis, it is far beyond the scope of the current manuscript.

      (2) Given the observed clustering effects, it would be valuable to explore whether gp120-induced rearrangements alter epitope exposure to broadly neutralizing antibodies like 17b or 3BNC117. This would help connect the mechanistic insights to therapeutic relevance.

      As 3BNC117, VRC01 and b12 are broadly neutralizing mAbs that recognize conformational epitopes on gp120 (Li et al. J. Virol., 2011; Mata-Fink et al. J. Mol. Biol., 2013), they will struggle to bind the gp120/CD4/CXCR4 complex and therefore may not be ideal for detecting changes within the CD4/CXCR4 complex. The experiment suggested by the reviewer is thus challenging but also very complex. It would require evaluating antibody binding in two experimental conditions, in the absence and in the presence of oligomers. However, our data indicate that receptor oligomerization is promoted by X4-gp120 binding, and the selected antibodies are neutralizing mAbs, so they should block or hinder the binding of gp120 and, consequently, receptor oligomerization. An alternative approach would be to study the neutralizing capacity of these mAbs on cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> complexes. Variations in their neutralizing activity could be then extrapolated to distinct gp120 conformations, which in turn may reflect differences between CD4/CXCR4 and CD4/CXCR4<sup>R334X</sup> complexes.

      We thus assessed the ability of the VRC01 and b12, anti-gp120 mAbs, which were available in our laboratory, to neutralize gp120 binding on cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup>. Specifically, increasing concentrations of each antibody were preincubated (60 min, 37ºC) with a fixed amount of X4-gp120 (0.05 µg/ml). The resulting complexes were then incubated with Jurkat cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> (30 min, 37ºC) and, finally, their binding was analyzed by flow cytometry. Although we did not observe statistically significant differences in the neutralization capacity of b12 or VRC01 for the binding of X4-gp120 depending on the presence of CXCR4 or CXCR4<sup>334X</sup>, we observed a trend for greater concentrations of both mAbs to neutralize X4-gp120 binding in Jurkat CD4/CXCR4 cells than in Jurkat CD4/CXCR4<sup>R334X</sup> cells (Author response image 9).

      Author response image 9.

      Flow cytometry analysis of gp120 binding to Jurkat cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> in the presence of different concentrations of the neutralizing anti-gp120 antibodies b12 (left panel) and VRC01 (right panel). AUC comparison by Welch’s t-test: pvalues 0.2950 and 0.2112 for b12 and VRC01 respectively (n = 2).

      These slight alterations in the neutralizing capacity of b12 and VRC01 mAbs may thus suggest minimal differences in the conformations of gp120 depending of the coreceptor used. We also detected that X4-gp120 and VLPs expressing gp120, which require initial binding to CD4 to engage the chemokine receptor, stabilized oligomers of both CXCR4 and CXCR4<sup>R334X</sup>, but FRET data indicated distinct FRET<sub>50</sub> values between the partners, (2.713) for CD4/CXCR4 and (0.399) for CD4/CXCR4<sup>R334X</sup> (Figure 5A,B in the main manuscript). Moreover, we also detected significantly more CD4 internalization mediated by X4-gp120 in cells co-expressing CD4 and CXCR4 than in those co-expressing CD4 and CXCR4<sup>R334X</sup> (Figure 6 in the main manuscript). Overall these latter data and those included in Author response images 5,6 and 7 indicate distinct conformations within each receptor complexes.

      (3) TIRF imaging limits analysis to the cell substrate interface. It would be useful to clarify whether CXCR4 receptor clustering occurs elsewhere, such as at immunological synapses or during cell-to-cell contact.

      In recent years, chemokine receptor oligomerization has gained significant research interest due to its role in modulating the ability of cells to sense chemoattractant gradients. This molecular organization is now recognized as a critical factor in governing directed cell migration (Martínez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022, Hauser et al. Immunity, 2016). In addition, advanced imaging techniques such as single-molecule and super-resolution microscopy have been used to investigate the spatial distribution and dynamic behaviour of CXCR4 within the immunological synapse in T cells (Felce et al. Front. Cell Dev. Biol., 2020). Building on these findings, we are currently conducting a project focused on characterizing CXCR4 clustering specifically within this specialized cellular region.

      (4) In LVP experiments, it would be useful to report transduction efficiency (% GFP+ cells) alongside MSI data to relate VLP infectivity with receptor clustering functionally.

      These experiments were designed to validate the functional integrity of the gp120 conformation on the LVPs, confirming their suitability for subsequent TIRF microscopy. Our objective was to establish a robust experimental tool rather than to perform a high-throughput quantification of transduction efficiency. It is for that reason that these experiments were included in new Supplementary Figure S6, which also contains the complete characterization of gp120-VLPs and LVPs. In such experimental conditions, quantifying the percentage of GFP-positive cells relative to the total number of cells plated in each well is very difficult. However, in line with the reviewer’s commentary and as we used the same number of cells in each experimental condition, we have included, in the revised manuscript, a complementary graph illustrating the GFP intensity (arbitrary units) detected in all the wells analyzed (new Supplementary Fig. 6E).

      (5) To ensure that differences in fusion events (Figure 7B) are attributable to target cell receptor properties, consider confirming that effector cells express similar levels of HIV-1 Env. Quantifying gp120 expression by flow cytometry or western blot would rule out the confounding effects of variable Env surface density.

      In these assays (Figure 7B), we used the same effector cells (cells expressing X4-gp120) in both experimental conditions, ensuring that any observed differences should be attributable solely to the target cells, either JKCD4X4 or JKCD4X4<sup>R334X</sup>. For this reason, in Figure 7A we included only the binding of X4-gp120 to the target cells which demonstrated similar levels of the receptors expressed by the cells.

      (6) HIV-mediated receptor downregulation may occur more slowly than ligand-induced internalization. Including a 24-hour time point would help assess whether gp120 induces delayed CD4 or CXCR4 loss beyond the early effects shown and to better capture potential delayed downregulation induced by gp120.

      The reviewer suggests using a 24-hour time point to facilitate detection of receptor internalization. However, such an extended incubation time may introduce some confounding factors, including receptor degradation, recycling and even de novo synthesis, which could affect the interpretation of the results. Under our experimental conditions, we observed that CXCL12 did not trigger CD4 internalization whereas X4-gp120 did. Interestingly, CD4 internalization depended on the coreceptor expressed by the cells.

      (7) Increase label font size in microscopy panels for improved readability.

      Of course; the font size of these panels has been increased in the revised version.

      (8) Consider adding more references on ligand-induced co-endocytosis of CD4 and chemokine receptors during HIV-1 entry.

      We have added more references to support this hypothesis (Toyoda et al. J. Virol., 2015; Venzke et al. J. Virol., 2006; Gobeil et al J. Virol., 2013).

      (9) For Statistical analysis. Biological replicates are adequate, and statistical tests are generally appropriate. For transparency, report n values, exact p-values, and the statistical test used in every figure legend and discussed in the results.

      Thank you for highlighting the importance of transparency in statistical reporting. We confirm that the n values for all experiments have been included in the figure legends. The statistical tests used for each analysis are also clearly indicated in the figure legends, and the interpretation of these results is discussed in detail in the Results section. Furthermore, the Methods section specifies the tests applied and the thresholds for significance, ensuring full transparency regarding our analytical approach.

      In accordance with established conventions in the field, we have utilized categorical significance indicators (e.g., n.s., *, **, ***) within our figures to enhance readability and focus on biological trends. This approach is widely adopted in high-impact literature to prevent visual clutter. However, to ensure full transparency and reproducibility, we have ensured that the underlying statistical tests and thresholds are clearly defined in the respective figure legends and Methods section.

      Reviewer #4:

      We thank the reviewer for considering that this work is presented in a clear fashion, and the main findings are properly highlighted, and for remarking that the paper is of interest to the retrovirology community and possibly to the broader virology community.

      We also agree on the interest that X4-gp120 clusters CXCR4<sup>R334X</sup> suggests a different binding mechanism for X4-gp120 from that of the natural ligand CXCL12, an aspect that we are now evaluating. These data also indicate that WHIM patients can be infected by HIV-1 similarly to healthy people.

      (1) The observation that "empty VLPs" reduce CXCR4 diffusivity is potentially interesting. However, it is not supported by the data owing to insufficient controls. The authors correctly discuss the limitations of that observation in the Discussion section (lines 702-704). However, they overinterpret the observation in the Results section (lines 509-512), suggesting non-specific interactions between empty VLPs, CD4 and CXCR4. I suggest either removing the sentence from the Results section or replacing it with a sentence similar to the one in the Discussion section.

      In accordance with the reviewer`s suggestion, the sentence in the result section has been replaced with one similar to that found in the discussion section. In addition, we have performed Raster Image Correlation Spectroscopy (RICS) analysis using the Di-4-ANEPPDHQ lipid probe to assess membrane fluidity by means of membrane diffusion, and compared the results with those of cells treated with Env(-) VLPs. The results indicated that VLPs did not modulate membrane fluidity (Author response image 8). Nonetheless, these results do not rule out other potential non-specific interactions of the Env(-) VLPs with other components of the cell membrane that might affect receptor dynamics (see our response to point 2 of reviewer #3).

      (2) In the case of the WHIM mutant CXCR4-R334X, the addition of "empty VLPs" did not cause a significant change in the diffusivity of CXCR4-R334X (Figure 4B). This result is in contrast with the addition of empty VLPs to WT CXCR4. However, the authors neither mention nor comment on that result in the results section. Please mention the result in the paper and comment on it in relation to the addition of empty VLPs to WT CXCR4.

      We would remark that the main observation in these experiments should focus on the effect of gp120-VLPs, and the results indicates that gp120-VLPs promoted clustering of CXCR4 and of CXCR4<sup>R334X</sup> and reduced their diffusion at the cell membrane. The Env(- ) VLPs were included as a negative control in the experiments, to compare the data with those obtained using gp120VLPs. However, once we observed some residual effect of the Env(-) VLPs, we decided to give a potential explanation, formulated as a hypothesis, that the Env(-) VLPs modulated membrane fluidity. We have now performed a RICS analysis using Di-4-ANEPPDHQ as a lipid probe (Author response image 9). The results suggest that Env(-) VLPs do not modulate cell membrane fluidity, although we do not rule out other potential interactions with membrane proteins that might alter receptor dynamics. We appreciate the reviewer’s observation and agree that this result can be noted. However, since the main purpose of Figure 4B is to show that gp120-VLPs modulate the dynamics of CXCR4<sup>R334X</sup> rather than to remark that the Env(-) VLPs also have some effects, we consider that a detailed discussion of this specific aspect would detract from the central finding and may dilute the primary narrative of the study.

      Minor comments

      (1) It would be helpful for the reader to combine thematically or experimentally linked figures, e.g., Figures 3 and 4.

      (2) Figures 3 and 4 are very similar. Please unify the colours in them and the order of the panels (e.g. Figure 3 panel A shows diffusivity of CXCR4, while Figure 4 panel A shows MSI of CXCR4-R334X).

      While we considered consolidating Figures 3 and 4, we believe that maintaining them as separate entities enhances conceptual clarity. Since Figure 3 establishes the baseline dynamics for wildtype CXCR4 and Figure 4 details the distinct behavior of the CXCR4<sup>R334X</sup> mutant, keeping them separate allows the reader to fully appreciate the specificities of each system before making a cross-comparison.

      (3) Some parts of the Discussion section could be shortened, moved to the Introduction (e.g., lines 648651), or entirely removed (e.g., lines 633-635 about GPCRs).

      In accordance, the Discussion section has been reorganized and shortened to improve clarity.

      (4) I suggest renaming "empty VLPs" to "Env(−) VLPs" (or similar). The name empty VLPs can mislead the reader into thinking that these are empty vesicles.

      The term empty VLPs has been renamed to Env(−) VLPs throughout the manuscript to more accurately reflect their composition. Many thanks for this suggestion.

      (5) Line 492 - please rephrase "...lower expression of Env..." to "...lower expression of Env or its incorporation into the VLPs...".

      The sentence has been rephrased

      (6) Line 527 - The data on CXCL12 modulating CXCR4-R334X dynamics and clustering are not present in Figure 4 (or any other Figure). Please add them or rephrase the sentence with an appropriate reference. Make clear which results are yours.

      (7) Line 532 - Do the data in the paper really support a model in which CXCL12 binds to CXCR4R334X? If not, please rephrase with an appropriate reference.

      Previous studies support the association of CXCL12 with CXCR4<sup>R334X</sup> (Balabanian et al. Blood, 2005; Hernandez et al. Nat Genet., 2003; Busillo & Benovic Biochim. Biophys. Acta, 2007). In fact, this receptor has been characterized as a gain-of-function variant for this ligand (McDermott et al. J. Cell. Mol. Med., 2011). The revised manuscript now includes these bibliographic references to support this commentary. In any case, our previous data indicate that CXCL12 binding does not affect CXCR4<sup>R334X</sup> dynamics (García-Cuesta et al. PNAS, 2022).

      (8) Line 695 - "...lipid rafts during HIV-1 (missing word?) and their ability to..." During what?

      Many thanks for catching this mistake. The sentence now reads: “Although direct evidence for the internalization of CD4 and CXCR4 as complexes is lacking, their co-localization in lipid rafts during HIV-1 infection (97–99) and their ability to form heterocomplexes (22) strongly suggest they could be endocytosed together.”

    1. Author Response:

      We sincerely thank the reviewers for their insightful and constructive suggestions on our manuscript. We are encouraged by the positive recognition of our study’s conceptual significance, particularly the involvement of the mushroom body (MB) in nociceptive escape behavior and the utility of our ALTOMS behavioral platform.

      We fully agree with the reviewers’ assessments and have initiated several key revisions, additional experiments, and analytical refinements to strengthen the study.

      Below is a summary of our planned improvements:

      1. Experimental Revisions and Scope Expansion

      To address concerns regarding potential developmental compensation (Reviewers 1 and 2), we are performing new experiments using temporally precise manipulation tools to confirm the acute necessity of the identified circuits. Additionally, responding to Reviewer 3, we are conducting further behavioral assays to include necessary genetic controls (e.g., split-GAL4-only lines) and expanding our screen to cover all major MBON and DAN compartments using standardized lines to ensure a comprehensive functional map.

      2. Analytical Refinements and Methodological Transparency

      We are revising our quantitative and anatomical reporting to address several technical suggestions from all three reviewers. Specifically, we will implement a weighted “Behavioral Potency Level” that accounts for driver-specific expression intensity and specificity. Anatomical clarity will be enhanced by providing presynaptic expression patterns alongside trans-Tango signals and a neuron-centric data model for Figure 5. Furthermore, the Materials and Methods will be updated to explicitly detail habituation protocols, stimulation timing, sample sizes, while incorporating a more nuanced discussion on the limitations of the tracing systems.

      We believe these revisions will significantly enhance the rigor and clarity of our manuscript. We look forward to submitting the revised version upon completion of these supplementary tasks.

    1. Author response:

      eLife Assessment

      This work presents a valuable new open-source tool for wirelessly controlling optogenetic stimulation in neuroscience experiments in behaving rodents. Evidence for its potential usefulness in different types of optogenetic experiments is solid, although some details and concerns were viewed as lacking or overlooked (e.g., system latency, battery weight). The work is expected to interest neuroscientists working with optogenetics and neuroengineers developing small-sized integrated devices for rodent experiments.

      We thank the eLife team for taking the time to consider and assess our manuscript. Please find below our provisional author responses accompanying the first version of the Reviewed Preprint.

      We would like to clarify an important error regarding the battery model reported in the manuscript. We mistakenly referred to the CP1254-A3 (1.8 g), whereas the battery used for all devices is the CP9440 A4X (0.8 g).

      Importantly, this correction reduces the total device weight by approximately 1 g compared to the value assumed by Reviewer #3. We believe this directly addresses the concern raised regarding battery weight in both the individual review and the overall eLife assessment.

      We will correct this error in the revised manuscript and clearly report the exact battery model and total device weight.

      For reference, the official VARTA CoinPower catalog is available here:

      https://www.varta-ag.com/fileadmin/varta/industry/downloads/products/lithium-ion-cells/VARTA_CoinPower_EN_digital_221124_A5_6p.pdf

      The battery used in BlueBerry is listed on the last line of page 2.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents a wireless device for closed-loop control of optogenetic stimulation based on behavioral triggers. The authors demonstrate the device through two behavioral experiments in mice, showcasing the device's capabilities and emphasizing open accessibility and using off-the-shelf components.

      Strengths:

      The paper presents a device that is open access and easily reproducible for wireless stimulation in a closed loop based on behavioral triggers. Other strengths of the device include the simultaneous use of multiple devices in parallel and the claimed ease of integration with existing frameworks. The paper shows to behavioral experiments on multiple mice along with some device validation results.

      We thank the reviewer for the statement.

      Weaknesses:

      The main weakness of the presented device lies in the lack of flexibility in stimulation power. For a device that is intended for stimulation only, having to physically change a component on the board to adapt stimulation power is a major downside. Reprogrammable stimulation current is not complex to implement and should really have been included on this device. Another weakness lies in the limited battery life of the device. While using a battery-powered device decreases spatial constraints, allowing for the maze experiment presented in the paper, it also means the lifespan of the device is limited compared to an inductively powered device, limiting its ability for long-term experiments.

      We thank the reviewer for these valuable comments. We did consider implementing programmable control of stimulation power, for example using a digital potentiometer. However, in our current design this approach was not sufficient because the output current supported by typical digital potentiometers is too low for the high-power LEDs used in our system. For this reason, we did not include programmable stimulation current in the present version. We agree that this is a limitation and that further work is needed to identify a suitable solution for adjustable stimulation power, which we plan to pursue in future versions of the device. We will revise the manuscript to make this limitation and future direction clearer.

      We also agree that the use of a battery-powered wireless system introduces an important trade-off. We will revise the manuscript to discuss this limitation more explicitly.

      Reviewer #2 (Public review):

      Summary:

      The authors have developed an elegant, lightweight, open-source system that should be able to be widely disseminated to the community. They have used this system in multiple experimental paradigms and demonstrate its functionality quite elegantly. One of these experiments involves two of three animals in the arena being stimulated, a situation that clearly requires an untethered approach. They have appropriately quantified key system parameters (latency and battery life).

      Strengths:

      The introduction places this work in a broader context. That context includes a number of previous solutions, many of which are smaller or more technically complex. However, I agree with the authors that there is a need for something that is easy for labs to acquire and deploy in terms of both what goes on the head and the broader infrastructure (i.e., not needing complex wireless power delivery approaches).

      The paper does an excellent job of describing the system architecture. And the architecture is good! Their system comprises more than just the bluetooth enabled head-mounted devices - they also have built an interface that allows for TTL triggers that link into existing workflows.

      The key metrics for a device like this are weight, battery life, and latency. The weight is 1.4g, which is appropriate for adult mice; the battery life is ~100 minutes of continuous stimulation, which should be sufficient for many experiments, and the latency is typically less than 30 ms, which is fine for all but the most demanding closed-loop experiments.

      Performance is demonstrated in two experiments, a continuous Y-maze, which elegantly demonstrates how transfected animals learn to sense optogenetic closed-loop stimulation to drive their choice behavior in a way that control-stimulated animals do not. While authors claim that the ~2m diameter apparatus is "large scale", the second behavior more convincingly demonstrates the need for wireless stimulation.

      They used closed-loop monitoring of animal pose to selectively stimulate animals for approaching the tails of a dominant conspecific (based on pre-experimental pairwise assessments). It seems that the original hope was that the increases in following that they observe would result in long-lasting changes in the hierarchy of a cage, but as they report, this was not observed. Critically, their supplementary video demonstrates that they conducted this experiment with two instrumented animals simultaneously. This is a situation where a tether would have been hopelessly tangled within a few moments!

      The online documentation seems complete, and it seems quite possible for other labs to adopt and deploy the system.

      We appreciate the reviewer’s enthusiasm. Thank you.

      Weaknesses:

      The battery life is highly dependent on the stimulation paradigm. It makes sense that the LED is a major component of power consumption. It would have been elegant to measure the total optical energy that can be provided by the system. In addition, Bluetooth transmission is probably a major consumer of power, and receiving may not be "free". Quantifying power as a function of Bluetooth message rates would have been useful.

      We thank the reviewer for this important suggestion. We agree that this is a missing characterization in the current manuscript. In the revised version, we will include a more detailed analysis of the system’s power budget, including the maximum stimulation power supported by the BlueBerry device, the corresponding output currents, and the contribution of the main integrated circuits to overall current consumption.

      Presumably, the major constraint on latency is that the Bluetooth receiver polls at ~10 Hz, resulting in latency blocks of 20+, 30+, or 40+ ms. Why latency is never less than 10 ms is unclear. Could latency be reduced by changing a setting? Having a low-latency option would be very helpful for some experimental situations. Latency is probably the primary weakness of the system.

      In the revised manuscript, we will clarify more explicitly that latency is a key limitation of the current system. We will also further investigate the source of this latency, including whether it can be reduced through additional configuration changes. In addition, we will include comparative latency measurements using different Arduino modules as the central BLE controller for the BlueHub device.

      The programming process sounds quite complicated. It would be nice if they had OTA updates. But described and open source. Similarly, the configuration process (Arduino IDE) seems a bit complex. It would be nice if there were a dedicated cross-platform application.

      We will investigate this matter and provide a simpler install and configuration script to setup both the BlueHub and Blueberry systems.

      It is unclear what the maximum number of devices that could be used without wireless interference is. The base station has two charging stations, but it would have been nice to understand the limits beyond this number.

      Due to the current structure of the ArduinoBLE library used in BlueHub devices, each BlueHub unit can support active communication with up to maximum 3 BlueBerry units. We thank the reviewer for highlighting this point and in the next version of the paper we will clarify this point.

      There is a very nice website for the system, but there is some concern that the code and design files are not archived. Could they be deposited with the paper?

      In the revised submission, we will deposit all code used to program both the BlueHub and BlueBerry devices, together with the Gerber files required for PCB fabrication, alongside the paper.

      Reviewer #3 (Public review):

      Summary:

      This study presents a novel device for wireless control of optogenetic stimulation of the mouse brain, the Blueberry, using Bluetooth Low Energy (BLE) communication for parallel activation of up to 4 devices through an Arduino interface. The authors also present two types of brain implants for light delivery that can be connected to the Blueberry: one using uLEDs for surface cortical stimulation, and another using optical fibers for intra- or sub-cortical implants. The architecture of the system, including electronics, communication, and programming, is thoroughly described. Because the system was especially designed to be integrated with existing software used for neuroscience behavioral experiment for closed-loop experiments, validation of the system is shown on two different scenarios: a learning task in a "infinite" Y-maze, where light delivery at precise locations conditions arm choice for navigation; and a social interaction analysis where 3 animals are simultaneously stimulated in order to alter social dynamics among the group.

      Strengths:

      (1) The full system can be built by individual labs with simple PCB printing, off-the-shelf components, and readily available hardware (Arduino) for widespread dissemination.

      (2) Four headstages can be controlled in parallel for simultaneous experiments with multiple mice.

      (3) Validation across different relevant behavioral tests, demonstrating the potential of integrating Bluberry in closed-loop setups.

      We thank the reviewer for the statement.

      Weaknesses:

      (1) Some details in the manuscript regarding system characterization (latency, battery life, etc) are included only in the supplementary materials.

      As correctly mentioned, in the revised manuscript we will move the necessary quantifications from supplementary section to main section.

      (2) The practical details of integration with other commercial and open-source software used for the closed-loop experiments, which could help third-party researchers interested in using the system, are lacking sufficient detail.

      We will clarify this point more clearly in the revised manuscript.

      (3) System range (3 meters reported) is limited for a BLE device.

      The system range reported is the range considered as reliable communication range. In the revised manuscript we quantify this problem by reporting the Received Signal Strength (RSS) value for multiple BlueBerry devices across varying distances.  

      (4) Light output amplitude is not programmable, limiting the choice of stimulation protocols and LEDs used.

      That is indeed a limitation of our system, we will investigate the feasibility of integrating programmable stimulation protocols in the updated version of BlueBerry device.

      (5) Thermal modeling of the cortical surface stimulator was not performed, and it is unclear if the brain implant for this purpose is within the safety limits.

      We thank the reviewer for this comment. In the revised manuscript, we will clarify that the thermal measurements reported here apply only to the specific superficial implant geometry and stimulation conditions used in this study. Because tissue heating depends strongly on implant design and on parameters such as optical power, pulse width, and stimulation frequency, a general safety statement cannot be made for all possible implant configurations. Since the primary goal of this work is to present the wireless device platform rather than to validate a particular implant design, thermal safety should be evaluated individually for each implant and stimulation paradigm.

      (6) The paper is missing a comparison with other state-of-the-art devices for wireless control of optogenetic stimulation in mice.

      In the revised manuscript, we will include a comparison table summarizing our system alongside currently available wireless optogenetic devices.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mancl et al. present a comprehensive integrative study combining cryo-EM, SAXS, enzymatic assays, and molecular dynamics (MD) simulations to characterize conformational dynamics of human insulin-degrading enzyme (IDE). In the revised manuscript, the study now also includes time-resolved cryo-EM and coarse-grained MD simulations, which strengthen the mechanistic model by revealing insulin-induced allostery and β-sheet interactions between IDE and insulin. Together, these results expand the original mechanistic insight and further validate R668 as a key residue governing the open-close transition and substrate-dependent activity modulation of IDE.

      Strengths:

      The authors have substantially expanded the experimental scope by adding time-resolved cryo-EM data and coarse-grained MD simulations, directly addressing requests for mechanistic depth and temporal insight. The integration of multiple resolution scales (cryo-EM heterogeneity analysis, all-atom and coarse-grained MD simulations, and biochemical validation) now provides a coherent description of the conformational transitions and allosteric regulation of IDE. The addition of Aβ degradation assays strengthens the claim that R668 modulates IDE function in a substrate-specific manner. Finally, the manuscript reads more clearly: figure organization, section headers, and inclusion of a new introductory figure make it accessible to a broader audience. Overall, the revision reinforces the conceptual advance that the dynamic interdomain motions of IDE underlie both its unfoldase and protease activities and identifies structural motifs that could be targeted pharmacologically.

      Weaknesses:

      While the authors acknowledge that future studies on additional IDE substrates (e.g., amylin and glucagon) are warranted, such experiments remain outside the present scope. Their absence modestly limits the generalization of the R668 mechanism across all IDE substrates. Despite improved discussion of kinetic timescales and enzyme-substrate interactions, experimental correlation between MD timescales and catalysis remains primarily inferential. The moderate local resolution of some cryo-EM states (notably O/pO) continues to limit atomic interpretation of the most flexible regions, though the authors address this carefully.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by C-domains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. Authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from high degree of intrinsic motion amongst the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. Total five structures were generated in the originally submitted manuscript using cryo-EM. Another cryo-EM reconstruction (sixth) at 5.1Å resolution was mentioned after first revision which was obtained using time-resolved cryo-EM experiments. Authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involves R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complimented and analyzed in atomic details by using MD simulation studies. The studies are meticulously conducted and lay ground for future exploration of protease structure-function relationship.

      Comments after first peer-review:

      The authors have addressed all my concerns, and have added new data and explanations in terms of time-resolved cryo-EM (Fig. 7) and upside simulations (Fig. 8) which in my opinion have strengthened the merit of the manuscript.

      We are grateful for the dedication and constructive feedback provided by the editors and reviewers. We have revised our manuscript according to the suggestions by both reviewers.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The new version of the manuscript reads exceedingly well and the corrections the authors have made during their revision made the manuscript much easier to read and digest than the first version. Below are minor details that may be corrected:

      Abstract:

      Line 45-47: "IDE is known to transition between a closed state, poised for catalysis, and an open state, able to release cleavage products and bind a new substrate." (consider adding a)

      Fixed

      Line 48-50: "Combining cryo-EM heterogeneity analysis with all-atom molecular dynamics (MD) simulations, we identified the structural basis and key residues for IDE conformational dynamics that were not previously revealed by IDE static structures." (consider adding previously)

      Changed

      Line 52-54: "Our small-angle X-ray scattering analysis and enzymatic assays of an R668A mutant indicate a profound alteration of conformational dynamics and catalytic activity." (consider adding analysis)

      Changed

      Line 54: Consider leaving out "Upside" in the abstract (to avoid confusion when reading the abstract) and leave it to be introduced in the introduction when Upside MD simulations are first mentioned.

      Changed

      Results:

      Figure 5D: There seems to be an error in the legend for Figure 5D. It says "... presence of varying amounts of insulin", but this must be Aβ1-40. Please add info on whether the replicates are technical or biological.

      The legend has been revised as suggested.

      Line 125: Consider switching the order of "here" and "we"

      “here” has been removed.

      Line 128: Replace "5" with "five"

      Changed

      Line 137: Replace "when insulin is present" with "in the presence of insulin"

      Changed

      Line 228: Replace "5" and "6" with "five " and "six"

      Changed

      Line 229: Consider adding the word "form": "First, the open subunits did not close to form a singular structure."

      We have adjusted the sentence to read “close to a singular consensus structure”

      Line 327: Replace "2" with "two"

      Changed

      Line 276: Consider replacing "Conversely" with a more suitable connecting term as it implies that the observation presented in the two sentences are reverse or rephrase what is being compared. Is it the fact there is a dose dependency or not between the substrates or is it the actual kinetic parameters that are described. I just don't think conversely is fair with the current formulation as "the R668A mutant did not exhibit a dose-dependent response to the presence of Aβ" not that the Ki is reduced for WT compared to the R668A construct when looking at Aβ.

      The connecting term has been removed completely, beginning the sentence with “When Abeta…”

      Line 359: Replace "6" with "six"

      Changed

      Consider getting rid of possessive apostrophes to keep a formal tone, e.g. lines 211 (cryoSPARC's), 259 (IDE's) and 382 (IDE's). Exception to this is Alzheimer's disease.

      All instances of possessive apostrophes, aside from Alzheimer’s, have been replaced alter more formal wording.

      Figure 7 supplement 1: The color scheme for the local resolution is missing the unit (Å).

      This has been corrected.

      Finally, the supplementary videos illustrating IDE conformational dynamics are difficult to interpret and somewhat redundant in their current form. The transitions occur very rapidly, making it hard to appreciate the described motions, and the uniform coloring of IDE further limits visual clarity. I apologize for not including this point in my initial review. I recommend either removing the videos or re-rendering them to improve interpretability, for example by slowing down the motion and applying the same domain color scheme introduced in the new Figure 1 (and used in the MD trajectory video). This would greatly aid readers in connecting the descriptions in the text to the visual representations in the movies.

      Figure 3 videos 1-4 were slowed down, simplified, and recolored to improve clarity.

      Reviewer #2 (Recommendations for the authors):

      Comments after first revision for authors:

      Thanks a ton to the authors for the detailed explanation on my comments. I believe the discussions will help a large group of audience, especially the non-experts. Please address the minor comment below:

      Minor comment:

      Please update Supplementary file 1 (Cryo-EM data collection, refinement, and validation statistics) regarding the new volume obtained by time-resolved cryo-EM. Kindly also check line 47 in the abstract: "Here, we present five cryo-EM structures" , which may need an update (six structures and resolution 3.0-5.1 Å) or rephrase the sentences accordingly. If similar instances are found in the manuscript, where list of all the structures are mentioned together, please update accordingly if necessary.

      The cryo-EM statistics for the time-resolved cryo-EM are shown in supplementary file 2 to differentiated two datasets. The abstract has been changed, as has line 149.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The goal of this paper was to determine whether the T cell receptor (TCR) repertoire differs between a male or female human. To address this, this group sequenced TCRs from doublepositive and single-positive thymocytes in male and female humans of various ages. Such an analysis on sorted thymocyte subsets has not been performed in the past. The only comparable dataset is a pediatric thymocyte dataset where total thymocytes were sorted.

      They report on participant ages and sexes, but not on ethnicity, race, nor provide information about HLA typing of individuals. The experiments are heroic, yet do represent a relatively small sampling of diverse humans. They observed no differences in TCRbeta or TCRalpha usage, combinational diversity, or differences in the length of the CDR3 region, or amino acid usage in the CD3aa region between males or females. Though they observed some TCRbeta CD3aa sequence motifs that differed between males and females, these findings could not be replicated using an external dataset and therefore were not generalizable to the human population.

      They also compared TCRbeta sequences against those identified in the past databases using computational approaches to recognize cancer-, bacterial-, viral-, or autoimmune-antigens. They found little overlap of their sequences with these annotated sequences (depending on the individual, ranged from 0.82-3.58% of sequences). Within the sequences that were in overlap, they found that certain sequences against autoimmune or bacterial antigens were significantly over-represented in female versus male CD8 SP cells. Since no other comparable dataset is available, they could not conclude whether this is a generalizable finding in the human population.

      Strengths:

      It is a novel dataset that attempts to understand sex differences in the T cell repertoire in humans. Overall, the methodologies are sound and are the current state-of-the-art. There was an attempt to replicate their findings in cases where an appropriate dataset was available. I agree that there are no gross differences in TCR diversity between males and females. This is an important negative result.

      Weaknesses:

      Weaknesses:

      Overall, the sample size is small given that it is an outbred population. This reviewer recognizes the difficulty in obtaining samples for this experiment (which were from deceased donors), and this limitation was appropriately discussed. Their analysis was limited by the current availability of other TCR sequences. These weaknesses were appropriately discussed and considered.

      We thank this reviewer for his appreciation of our work.

      Reviewer #2 (Public review):

      Summary:

      This study addresses the hypothesis that the strikingly higher prevalence of autoimmune diseases in women could be the result of biased thymic generation or selection of TCR repertoires. The biological question is important and the hypothesis is valuable. Although the topic is conceptually interesting and the dataset is rich, the study has a number of major issues. In particular, the majority of "autoimmunity-related TCRs" considered in this study are in fact specific to type 1 diabetes (T1D). Notably, T1D incidence is higher in males, which directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. Given this conceptual inconsistency, the evidence presented does not support the authors' conclusions.

      We disagree with the reviewer’s assertion that our findings create a conceptual inconsistency.

      Autoimmune diseases are multifactorial conditions in which multiple biological layers, including thymic selection, peripheral immune regulation, hormonal effects, environmental exposures, and tissue-specific vulnerability, contribute to disease incidence. These layers may influence sex ratios in different directions. Therefore, observing a higher frequency of TCRs annotated as T1D-associated in females does not imply that T1D incidence must also be higher in females.

      Actually, T1D incidence itself is not uniformly male-biased worldwide. Epidemiological analyses (reviewed in Qu and Hakonarson, Diabetes Obes Metab 2025) show that male predominance is mainly observed in high-incidence Northern European populations, whereas in several lowerincidence regions, including parts of East Asia and Africa, the sex ratio is balanced or even femalebiased. Furthermore, another recent study highlights that T1D incidence and prevalence in women and men varies depending on the study period (PMC12544016).

      This heterogeneity indicates that disease incidence reflects context-dependent interactions between genetic load, environmental exposures, and sex-specific biological modifiers. Moreover, biological sex acts as a dynamic modifier of genetic risk and immune function in T1D, influencing central tolerance, peripheral immune activation, and β-cell intrinsic resilience (reviewed in Qu and Hakonarson, 2025). Experimental models further demonstrate estrogenmediated protection of pancreatic β-cells (Kim et al., Biochem Biophys Res Commun 2025), indicating that disease incidence reflects the integration of immune, hormonal, and tissuespecific layers rather than central autoreactive TCR release alone. Sex hormones may exert distinct and sometimes opposing effects on thymic selection and on target-organ vulnerability, while environmental factors such as vitamin D status, infections, and microbiota composition further shape disease expression.

      Importantly, our study does not claim causality, nor does it aim to predict the epidemiology of any specific autoimmune disease. Our conclusions are limited to the observation that sexdependent differences exist in thymic TCR selection.

      Strengths:

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here, although the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses. Importantly, this dataset represents a valuable community resource and should be openly deposited rather than being "available upon request."

      We agree with the reviewer’s comment. As already stated in the previous revision and the "Data Availability" section of the manuscript, all raw sequencing data have been deposited and are publicly available on NCBI (BioProject PRJNA1379632): https://www.ncbi.nlm.nih.gov/sra/PRJNA1379632.

      Weaknesses:

      I thank the authors for their detailed responses to my previous comments. Several concerns were addressed satisfactorily; however, important issues remain unresolved, and a new major concern has emerged from the revised manuscript.

      Major concerns:

      (1) Autoimmune specificity is dominated by T1D, contradicting the study's premise. Newly added supplementary Table 3 shows that the authors considered only 14 autoimmune-related epitopes, of which 12 are associated with type 1 diabetes (T1D) and 2 with celiac disease (CeD). (I guess this is because identification of particular peptide autoantigens is an extremely difficult task and was only successful in T1D and CeD.) Thus conclusions of this work mostly relate to T1D. However, the incidence of T1D is higher in males than in females (e.g. doi:10.1111/j.13652796.2007.01896.x; doi:10.25646/11439.2). This directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. As a result, the authors' conclusions (a) cannot be generalized to autoimmune disease as a whole as the authors only considered T1D and CeD antigens and (b) are internally inconsistent with the stated objective of the study.

      (2) By contrast, CeD does show a female bias (~60/40 female/male; doi: 10.1016/j.cgh.2018.11.013). However, the manuscript does not allow evaluation of how much the reported "autoimmune TCR enrichment" derives from T1D versus CeD. Despite my previous request, the authors did not provide per-donor and per-epitope distributions of autoimmune-specific TCR matches. I therefore explicitly request a table in which: each row corresponds to a specific autoimmune antigen; each column corresponds to a donor (with metadata available including sex); each cell reports the number of unique TCRs specific to that antigen in that donor. Without such data, the conclusions cannot be evaluated.

      (3) It is scientifically inappropriate to generalize findings to "autoimmune diseases" when only T1D and CeD were analyzed. Moreover, given that T1D and CeD show opposite directions of sex bias, combining them into a single "AID" category is misleading. All analyses presented in Figure 8 and Supplementary Figure 16 should be repeated and shown separately for T1D and CeD, rather than combined.

      We acknowledge that currently available antigen-annotated TCR databases remain limited. This reflects the considerable experimental difficulty of defining TCRs’ antigen specificities and is a widely recognized limitation in the field.

      In the curated database used here, the autoimmune-associated entries correspond primarily to type 1 diabetes (T1D) and celiac disease (CeD), two autoimmune contexts for which antigen-specific TCRs have been experimentally characterized. However, focusing on the number of antigens alone does not accurately reflect the breadth of the dataset.

      Specifically, our analysis is based on 48 epitopes and nearly 200 annotated TRB sequences, providing substantially broader antigenic representation than suggested by antigen count alone.

      Author response table 1.

      Importantly, our analytical framework does not attempt to interpret each epitope specificity individually. Instead, we examine whether TCRs annotated as autoimmune-associated are differentially represented between sexes at the level of thymic selection.

      In our dataset we observe a stronger CD8⁺ thymic selection of TCRs annotated as autoimmune- associated in females. We interpret this as evidence that central tolerance mechanisms may contribute to sex-dependent differences in autoreactive repertoire composition, rather than as a determinant of any specific autoimmune disease pathophysiology.

      (4) The McPAS database contains TCRs associated with other autoimmune diseases (e.g., multiple sclerosis, rheumatoid arthritis), although the exact autoantigens in these contexts are unknown. Why didn't the authors perform the search for such TCRs? I believe disease association even without particular known antigen could still be insightful.

      For multiple sclerosis, the only antigen present in the database is myelin basic protein (MBP). In our thymic repertoire dataset, we could not detect any CDR3 sequence matching MPB annotated CDR3s from the database.

      For rheumatoid arthritis, the database contains only a small number of TRA sequences without corresponding TRB chains. Because our specificity analysis is based on TRBs, these entries could not be used in our analyses.

      (5) Misuse of the concept of polyspecificity. I appreciate the authors' reference to Don Mason's work; however, the concept of polyspecificity discussed there is fundamentally different from the authors' usage. Mason, Sewell (doi:10.1074/jbc.M111.289488), Garcia(doi:10.1016/j.cell.2014.03.047), and others demonstrated that individual TCRs can recognize multiple peptides, possibly around 1 million. But importantly these peptides are not random but share some sequence motif. This is a general feature of TCRs, i.e. 100% of TCRs are polyspecific in this sense.

      In contrast, the authors define polyspecificity as TRB sequences annotated as specific to unrelated epitopes in TCR databases such as VDJdb. These databases are well known to contain substantial numbers of false-positive annotations (see, e.g., Ton Schumacher's preprint https://www.biorxiv.org/content/10.1101/2025.04.28.651095.abstract). The authors acknowledge that, under their definition, polyspecificity has been experimentally validated for only one (!) TCR (Quiniou et al.). In the absence of robust experimental validation, use of the term "polyspecificity" in this context is misleading. I strongly recommend removing all analyses and conclusions related to polyspecificity from the manuscript unless supported by independent functional validation.

      We agree with the reviewer that the concept of TCR polyspecificity is complex, controversial and not uniformly defined in the literature.

      For some, polyspecificity refers to the ability of individual TCRs to recognize multiple related peptides sharing structural motifs, as described by Mason, Sewell, Garcia, and others. With this definition, we agree that many/most TCRs exhibit some degree of cross-reactivity and would thus be defined as polyspecific.

      In contrast, our definition of polyspecificity came from our observation arising from large-scale repertoire analyses that certain CDR3 sequences are repeatedly annotated across databases as recognizing distinct and unrelated antigenic categories. In our previous study (Quiniou et al.), we showed that these sequences display specific biochemical and repertoire features and may represent a particular class of TCRs involved in early or heterologous immune responses. A classic cross reactivity based on structural motif sharing could not explain these results.

      We believe that the existence of such TCRs, rather than classic cross-reactive TCRs, has the potential to better explain why patients with extremely reduced TCR repertoires (around 3000 TCRs only) can respond well to various infectious challenges (https://doi.org/10.1073/pnas.97.1.274) or why there are T cells with memory phenotypes against viruses not previously encountered (https://pmc.ncbi.nlm.nih.gov/articles/PMC3626102/ ). We acknowledge that direct experimental validation of the function of such TCRs is currently limited; further work will help clarify the notion of polyspecificity, and hopefully to better understand the overlooked “heterologous immunity”.

      Of note, a recent paper in Nature Machine Intelligence (https://doi.org/10.1038/s42256-02501096-6) described the in-silico generation of antigen-specific TCRs. Using our definition of polyspecificity (TCRs with higher generation probabilities, specific V/J gene preferences, shared CDR3s across individuals, and reactivity to multiple unrelated peptides), they showed that “multitask models preferentially sample polyspecific CDR3β sequences”. Therefore, we consider the debate on polyspecificity to be ongoing, and our discussion of polyspecificity in this paper to be part of this debate.

      (6) I agree that comparing specificity enrichment between sexes is meaningful. However, enrichment relative to the database composition itself is not biologically interpretable, as acknowledged by the authors in their response. I therefore recommend removing Supplementary Figure 15, which is potentially misleading.

      In the original manuscript, the comparison to the pooled database was intended as a descriptive assessment rather than as a biological enrichment analysis. Differences between an experimental thymic repertoire and a curated reference database are expected, given the structure and annotation biases inherent to the reference resource.

      The purpose of Supplementary Figures 15B and 15C was therefore twofold: (i) to provide a descriptive overview of how specificity categories are distributed in our thymic dataset relative to the curated database, and (ii) to evaluate whether deviations from database proportions were of similar magnitude in males and females, ensuring that database composition did not differentially bias one sex over the other. In addition, the donor-resolved representations demonstrate that these patterns are consistent across individuals and are not driven by a single donor.

      To avoid any potential misinterpretation, we have revised the manuscript to remove references to “enrichment” relative to database composition and eliminated quantitative comparisons to baseline database frequencies. The corresponding text and figure legends have been clarified to indicate that these analyses are descriptive and methodological in nature, while all biological interpretations rely exclusively on direct sex-specific comparisons within the thymic dataset.

      (7) In contrast, Supplementary Figure 16 represents the most convincing result of the study (keeping in mind that the AID group should be splitted to T1D and CeD with T1D and that T1D and CeD have opposing directions of sex biases) and should be shown as a main figure, replacing Figure 8A-B which is less convincing as it doesn't show per-donor distribution.

      (8) The authors argue that applying mixed-effects modeling to Rényi entropy would require assuming a common sex effect across subsets. I do not find this assumption unreasonable. For example, if sex effects are mediated through AIRE-dependent negative selection, one would indeed expect a consistent direction of effect across subsets. The lack of statistical significance in Figure 3 may reflect limited sample size rather than true absence of the difference. Moreover, the title's phrasing "comparable TCR repertoire diversity" is vague: what is the statistical definition of "comparable"?

      The use of “comparable” in comparing TCR repertoire diversity is indeed “soft”, and aimed to indicate that there are no obvious dissimilarities.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) Available HLA typing data for selected donors should be included as a table in the manuscript.

      The available low-resolution HLA typing data for the donors included in this study have been compiled and added as Supplementary Table 1 in the revised manuscript.

      (2) The authors' explanation for why external validation of gene usage biases was not possible should be concisely incorporated into the Discussion.

      We have incorporated a concise explanation in the Discussion clarifying why independent validation of the TRBV6-5 bias in external thymic datasets is currently not feasible, due to the absence of publicly available cohorts combining sorted thymic subsets, balanced sex representation, and sufficient sequencing depth.

      (3) The clarification that considered sex-specific motifs are public should be included explicitly in the main text, not only figure legend and methods.

      We now explicitly state in the main Results section that only public motifs, defined as motifs containing CDR3 sequences shared by at least two individuals, were retained in the analysis.

      (4) The statement "Thymocytes expressing TCRs with insufficient or excessive avidity are eliminated (negative selection)" is strictly speaking incorrect. Thymocytes with insufficient avidity are eliminated by death by neglect during positive selection.

      We thank the reviewer for pointing out this imprecision. The statement has been corrected.

      (5) Figure 8C is unclear - what does "80% of unique polyspecific TCRs" mean? In any case, I strongly recommend removal of all polyspecificity-related analyses.

      We apologize for the lack of clarity in the axis label of Figure 8C. To clarify, this analysis represents the proportion of polyspecific CDR3aa sequences among all sequences with an assigned specificity within an individual’s repertoire. Specifically, it measures how many unique TCR sequences, previously identified as having a known specificity in reference databases, are also categorized as polyspecific.

      To address the reviewer’s concern, we have updated the Y-axis label of Figure 8C to: "Proportion of polyspecific CDR3aa among antigen-specific sequences (%)".

      (6) "However, no significant sex-based differences were found in the usage of hydrophobic, hydrophilic, or neutral aa at the critical p109 and p110 positions in TRB" - this Discussion statement is inconsistent with the new analysis on Fig. 4C.

      We regret that the Discussion still contained wording from a previous version of the analysis. The text has now been corrected to reflect the updated results showing a significant increase in hydrophobic amino acid usage at positions p109/p110.

      (7) In the Discussion the authors write: "the absence of age-related clustering in repertoire features (data not shown)". What is the reasoning for not showing the data?

      We understand the reviewer's point. This exploratory clustering analysis was performed on the data presented in the heatmaps (Figure 2B and Supplemental Figures 10-13). However, as it revealed no distinct patterns or clustering based on the donors' age (with samples from different age groups being interspersed throughout the clusters), we chose not to add an extra layer of annotation to Figure 2B to maintain clarity.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Age-related synaptic dysfunction can have detrimental effects on cognitive and locomotor function. Additionally, aging makes the nervous system vulnerable to late-onset neurodegenerative diseases. This manuscript by Marques et al. seeks to profile the cell surface proteomes of glia to uncover signaling pathways that are implicated in age-related neurodegeneration. They compared the glial cell-surface proteomes in the central brain of young (day 5) and old (day 50) flies and identified the most up- and down-regulated proteins during the aging process. 48 genes were selected for analysis in a lifespan screen, and interestingly, most sex-specific phenotypes. Among these, adult-specific pan-glial DIP-β overexpression (OE) significantly increased the lifespan of both males and females and improved their motor control ability. To investigate the effect of DIP-β in the aging brain, Marques et al. performed snRNA-seq on 50-day-old Drosophila brains with or without DIP-β OE in glia. Cortex and ensheathing glia showed the most differentially expressed genes. Computational analysis revealed that glial DIP-β OE increased cell-cell communication, particularly with neurons and fat cells.

      Strengths:

      (1) State-of-the-art methodology to reveal the cell surface proteomes of glia in young and old flies.

      (2) Rigorous analyses to identify differentially expressed proteins.

      (3) Examination of up- and down-regulated candidates and identification of glial-expressed mediators that impact fly lifespan.

      (4) Intriguing sex-specific glial genes that regulate life span.

      (5) Follow-up RNA-seq analysis to examine cellular transcriptomes upon overexpression of an identified candidate (DIP-β).

      (6) A compelling dataset for the community that should generate extensive interest and spawn many projects.

      Weaknesses:

      (1) DIP-β OE using flySAM:

      (a) These flies showed a larger increase in lifespan compared to using UAS-DIP-β (Figure 2 C, D). Do the authors think that flySAM is a more efficient way of OE than UAS? Also, the UAS construct would be specific to one DIP-β isoform, while flySAM would likely express all isoforms. Could this also contribute to the phenotypes observed?

      We agree with the reviewer that both can contribute to the different lifespan effect. In the original paper presenting flySAM1.0 and flySAM 2.0 (Jia et al., 2018), the authors first tested how flySAM1.0 overexpression (OE) phenotypes compare to several VPR (CRISPRa) and UAS:cDNA OE lines. They found that flySAM1.0 reliably outperforms (i.e., produces stronger OE phenotypes) than VPR in most cases, and produces OE phenotypes that are comparable (i.e., generally equivalent) to UAS:cDNA (Jia et al., 2018). After determining how flySAM1.0 performance compares to VPR and UAS:cDNA, the authors next tested if flySAM2.0 also outperforms VPR; they found that like flySAM1.0, flySAM2.0 outperforms VPR in most cases (Jia et al., 2018). In general, the data suggest that we should expect comparable overexpression phenotypes for our flySAM2.0 and UAS:cDNA lines.

      We chose to proceed with the DIP-β flySAM line for the climbing assays and snRNA-seq, as it gave a stronger lifespan effect and we thought it was likely to be the more robust OE line. While our glial cell-surface proteomics initially identified DIP-β isoform C as the candidate, it is possible that other DIP-β isoforms were also present (such as isoform F, which is identical in polypeptide sequence to isoform C) (FlyBase). Ultimately, we believe that the larger increases in lifespan observed for DIP-β flySAM are likely because flySAM targets all isoforms, whereas UAS:cDNA lines target only one isoform. Importantly, our UAS- DIP-β line was specific to DIP-β isoform C, which is the same isoform that was identified by our proteomics.

      We have made clarifications in the manuscript to address these comments.

      (b) The Glial-GS>DIP-β flySAM flies without RU-486 have significantly shorter lifespans (Figure 2C) than their UAS-DIP-β counterparts. flySAM is lethal when expressed under the control of tubulin-GAL4 (Jia et al. 2018), likely due to the toxicity of such high levels of overexpression. Is it possible that a larger increase in lifespan is due to the already reduced viability of these flies?

      This is a good point. The flySAM lines do exhibit a shorter baseline lifespan compared to the traditional UAS lines. This is likely due to the specific genetic background of the flySAM transgenic insertions, or a low level of "leaky" expression, as previously noted in the literature (Jia et al., 2018).

      However, we believe that the lifespan extensions we observed for DIP-β flySAM is a robust biological effect, rather than an artifact of reduced viability for the following reasons. First, by utilizing the GeneSwitch (GS) system, we can compare the lifespan of flies with the exact same genetic background (+/- RU-486). This ensures that the extension we report is specifically due to the induction of the transgene, rather than a comparison between disparate lines with different basal fitness levels. Second, if the lifespan extensions merely represented a recovery from lower baseline viability, we would expect to see similar improvements across other flySAM lines in our screen. However, DIP-β was the only candidate across our screen that significantly increased lifespan in both sexes (Extended Data Figs. 7 & 8). Third, the lifespan-extending effect of DIP-β was independently confirmed using a traditional UAS-cDNA line, which importantly does not share the same baseline viability issues as the flySAM lines.

      (c) Statistics: It is stated in the Methods that "statistical methods used are described in the figure legend of each relevant panel." However, there is no description of the statistics or sample sizes used in Figure 2.

      We have updated the figure legends for Figure 2 to include the missing statistical details and sample sizes.

      Specifically, for Fig. 2A: The reviewer is correct that with only two replicates of each time point (5d vs. 50d) in the initial proteomic screen, traditional p-value calculations lack the necessary power for meaningful interpretation. We have revised the legend to clarify that this panel represents a discovery-based screen. Candidates were selected based on biological relevance and specific enrichment thresholds to narrow the 872 proteins down to the 48 top candidates for screening (we were initially aiming to identify approximately 50 candidate genes for screening). For Fig. 2B: We have updated the legend to detail the parameters used for the Gene Ontology (GO) enrichment analysis.

      (2) Figure 3: The authors use a glial GeneSwitch (GS) to knock down and overexpress candidate genes. In Figure 3A, they look at glial-GS>UAS-GFP with and without RU. Without RU, there is no GFP expression, as expected. With RU, there is GFP expression. It is expected that all cell body GFP signal should colocalize with a glial nuclear marker (Repo). However, there is some signal that does not appear to be glia. Also, many glia do not express GFP, suggesting the glial GS driver does not label all glia. This could impact which glia are being targeted in several experiments.

      We thank the reviewer for this careful observation regarding the expression pattern of the GSG3285-1 line and acknowledge that the overlap between this driver and the Repo-positive cells is not absolute.

      Our selection of this specific GeneSwitch line was based on several critical experimental considerations: 1) To minimize background toxicity. We initially tested multiple Repo-GeneSwitch lines; however, we found they exhibited significant, genotype-dependent lifespan reductions upon RU486 administration, even in control crosses. This baseline toxicity confounded the interpretation of any potential lifespan effects. GSG3285-1 was chosen for this study, as it provided a robust control baseline and didn’t show lifespan effects with RU486 treatment in multiple control lines. This is essential for lifespan studies. 2) The driver breadth and specificity. As noted in its original characterization (Nicholson et al., 2008) and a later study (Catterson et al. 2023), GSG3285-1 is characterized as a pan-glial driver, though it may include a small population of sensory neurons. Furthermore, while Repo is a standard glial marker, its antibody does not label all glial subtypes with equal intensity. The "non-overlapping" signal observed in Figure 3A may reflect this staining bias. 3) The expression mosaicism. The fact that some glial cells do not show GFP expression suggests a degree of mosaicism, which is common to many GeneSwitch lines (Osterwalder et al., 2001). While we acknowledge this means our manipulations may target a broader subset — rather than every single glial cell — the fact that we still observed significant lifespan effects across two independent platforms (UAS and CRISPRa) suggests that the targeted population is sufficient to mediate these systemic effects.

      We have added a clarifying statement to contextualize the choice of the GSG3285-1 driver and its relationship to the Repo population.

      (3) It is interesting that sex-specific lifespan effects were observed in the candidate screen.

      (a) The authors should provide a discussion about these sex-specific differences and their thoughts about why these were observed.

      We agree that the sex-specific effects observed in our lifespan screen are one interesting aspect of this study. We have added a dedicated section to the Discussion exploring these differences from both a technical and biological perspective.

      On the technical side, the GeneSwitch inducer, RU486, can have sex-specific effects on metabolism and lifespan, depending on the nutritional environment (Dos Santos & Cocheme, 2024). Specifically, RU486 has been shown to counteract the lifespan-shortening effects of mating in females, an effect that is less pronounced in males (Landis et al., 2015; Tower et al., 2017). While we optimized our media and used the GSG3285-1 line to minimize these baseline effects, it remains possible that certain genotypes exhibited a sex-specific sensitivity to the inducer itself. Beyond the technical considerations, sex differences in aging are well-documented in Drosophila and other organisms (Regan et al., 2016; Austad & Fischer, 2016). Male and female flies exhibit distinct transcriptional trajectories and metabolic shifts as they age. Furthermore, recent studies have highlighted that glial function and the neuroinflammatory landscape can differ significantly between sexes, which may dictate how a specific genetic manipulation impacts the aging process in a sex-dependent manner (PMID: 40951920). While our screen identifies DIP-β as a rare candidate that extends lifespan in both sexes, the prevalence of female-specific hits in our data suggests that the female "aging program" may be more plastic or responsive to the specific glial pathways we targeted. These observations provide a valuable foundation for future studies into the mechanisms of sex-specific neuroprotection.

      (b) The authors should also provide information regarding the sex of the flies used in the glial cell surface proteome study.

      It is a mixture of half male and half female flies. This information has been added to the main text, Fig. 1, and to the methods section.

      (c) Also, beyond the scope of this study, examining sex-specific glial proteomes could reveal additional insights into age-related pathways affecting males and females differentially.

      Agreed, this would be a great idea for future studies.

      (4) The behavioral assay used in this study (climbing) tests locomotion driven by motor neurons. The proteomic analysis was performed with the adult brain, which does not include the nerve cord, where motor neurons reside. While likely beyond the scope of this study, it would be informative to test other behaviors, including learning, circadian rhythms, etc.

      We thank the reviewer for this insightful point. While our initial proteomic screen focused on the adult central brain, our behavioral validation used a pan-glial driver, which targets glia throughout the entire nervous system, including the ventral nerve cord (VNC). We have addressed the reviewer's comment as below:

      Additional behavioral data: As suggested, we performed Drosophila Activity Monitoring (DAM) assays to evaluate circadian locomotor rhythms in 50-day-old DIP-β overexpression flies compared to negative controls. Interestingly, we did not detect significant changes in circadian activity at this time point.

      The difference between our climbing and circadian results highlights the complexity of age-related decline. In Drosophila, locomotor performance (i.e., climbing) and circadian coordination often decouple. For example, specific isoforms of human Tau (hTau) can induce severe cognitive and neurodegenerative deficits without affecting lifespan or motor coordination in the same manner (Sealey et al., 2017). Furthermore, motor-specific defects can emerge independently of systemic lifespan changes, as seen in certain SOD1 models of ALS (Hirth, 2010). It is possible that the 50-day timepoint represents a specific window where motor coordination is improved by DIP-β, while circadian circuits — governed by distinct glial-neuronal interactions — remain largely unaffected, or require a different temporal window for observation.

      We agree that identifying the specific glial populations (central brain vs VNC) responsible for the improved climbing would be highly informative. While the current study establishes the pro-longevity effect of DIP-β, future work utilizing in-situ proteomics on the fully intact CNS (including the VNC) or specific VNC will be essential to map the stereotyped progression of these effects across the peripheral and central nervous systems.

      (5) It is surprising that overexpressing a CAM in glia has such a broad impact on the transcriptomes of so many different cell types. Could this be due to DIP-β OE maintaining the brain in a "younger" state and indirectly influencing the transcriptomes? Instead of DIP-β OE in glia directly influencing cell-cell interactions? Can the authors comment on this?

      We agree that the observed changes likely represent a combination of direct cell-cell interactions and a broader, more indirect maintenance of a "younger" physiological state.

      Direct: Among the DIP family, DIP-β exhibits some of the strongest and most promiscuous binding affinities, interacting with a wide array of partners including Dpr6, 8, 9, 15, and 21 (Cosmanescu et al., 2018; Sergeeva et al., 2020). This biochemical flexibility allows DIP-β to potentially interface with a much broader range of neuronal subtypes than other DIP family members, such as DIP-δ, which exclusively binds Dpr12 and did not extend lifespan in our screen. It is possible that by overexpressing DIP-β, we may be partially compensating for the global downregulation of CAMs that typically occurs during aging, thereby preserving essential glial-neuronal communication integrity.

      Indirect: By maintaining these primary glial functions and communication activities, DIP-β overexpression likely delays the overall "aging" of the brain. This preservation of neural health can have downstream effects on systemic physiology, such as the improved glia-fat body communication we observed in 50-day-old flies. In this model, the broad transcriptomic shifts are not necessarily all direct targets of DIP-β, but rather a signature of a brain that has successfully avoided the catastrophic breakdown of homeostasis typically seen in aged wild-type flies.

      We have expanded the Discussion to clarify this distinction, adding that DIP-β likely acts as a "scaffold" or “bridge” for maintaining a younger brain state, which in turn preserves multi-organ communication.

      Reviewer #2 (Public review):

      This manuscript presents an ambitious and technically innovative study that combines in situ cell-surface proteomics, functional genetic screening, and single-nucleus RNA sequencing to uncover glial factors that influence aging in Drosophila. The authors identify DIP-β as a glial protein whose overexpression extends lifespan and report intriguing sex-specific differences in lifespan outcomes. Overall, the study is conceptually compelling and offers a valuable dataset that will be of considerable interest to researchers studying glia-neuron communication, aging biology, and proteomic profiling in vivo.

      The in-situ proteomic labeling approach represents a notable methodological advance. If validated more extensively, it has the potential to become a widely used resource for probing glial aging mechanisms. The use of an inducible glial GeneSwitch driver is another strength, enabling the authors to carefully separate aging-relevant effects from developmental confounds. These technical choices meaningfully elevate the rigor of the study and support its central conclusions. The discovery of new candidate genes from the proteomics pipeline, including DIP-β, is intriguing and opens new avenues for understanding glial contributions to organismal lifespan. The observation of sex-specific lifespan effects is particularly interesting and warrants further exploration; the study sets the stage for future work in this direction.

      At the same time, several areas would benefit from clarification or additional analysis to fully support the manuscript's claims:

      (1) The manuscript frequently refers to "improved" or "increased" cell-cell communication following DIP-β overexpression, but the meaning of this term remains somewhat vague. Because the current analysis relies largely on transcriptomic predictions, it would be helpful to define precisely what metric is being used, e.g., increased numbers of predicted ligand-receptor interactions, enrichment of specific signaling pathways, or altered expression of communication-related components. Strengthening the mechanistic link between DIP-β, cell-cell communication, and lifespan extension, potentially through targeted validation of specific glial interactions, would substantially reinforce the interpretation.

      We agree that a more precise description of “improved” or “increased” cell-cell communication is necessary.

      Our conclusion that DIP-β overexpression is associated with “increased” cell-cell communication is based on the quantification of our CCC scores, which was performed using FlyPhoneDB2, a computational tool used to estimate cell-cell signaling from single-cell RNA-sequencing data (Liu et al., 2021; Qadiri et al., 2025). To infer cell-cell signaling, FlyPhoneDB2 and its predecessor, FlyPhoneDB, calculate “interaction scores,” comparing the expression levels of a curated list of ligand-receptor pairs between cell types (Liu et al., 2021; Qadiri et al., 2025). For example, if we detect a ligand in cell type A and its receptor in cell type B in DIP-β overexpression flies but didn’t detect both ligand and receptor in control flies, the CCC score is increased by 1. FlyPhoneDB2 additionally enables users to estimate signaling activity by also taking into consideration the expression of downstream reporter genes (Qadiri et al., 2025).

      “Improved cell-cell communication” is our interpretation based on the CCC analysis. It is important to note that the metric being used here (increased CCCs) is the number of predicted ligand-receptor interactions, and that our CCC analysis was based entirely on inferences from snRNA-seq data. We have added further clarification to our manuscript, which now further expands on the results of our CCC analysis (i.e., the increased expression for 61% and decreased expression for 39% of ligand-receptor pairs we observed in our DIP-β overexpression group, compared to our negative control), which ultimately led us to conclude that DIP-β overexpression is associated with improved cell-cell communication.

      (2) The lifespan screen is central to the paper, and clearer visualization and contextualization of these results would significantly improve the manuscript's impact. For example, Figure 3D is challenging to interpret in its current form. More explicit presentation of which manipulations extend lifespan in each sex, along with effect sizes and significance values, would provide clarity. Including positive controls for lifespan extension would also help contextualize the magnitude of the observed effects. The reported effects of DIP-β, while promising, are modest relative to baseline effects of RU feeding, and a discussion of this would help appropriately calibrate the conclusions.

      We appreciate the reviewer’s suggestion to improve the clarity of the lifespan screen results. We have significantly revised Figures 3D, 3E, and 3F to provide a more intuitive summary of the candidate gene manipulations. Figures 3D and 3E now explicitly include the effect sizes and p-values for each candidate gene, broken down by sex. We also added a new Figure 3G with a visual layout that has been streamlined to allow for quick identification of manipulations that successfully extended lifespan.

      The reviewer raises an important point regarding the use of positive controls to calibrate the magnitude of lifespan extension. We carefully considered adding a standard control (such as Rapamycin treatment); however, we opted against it for several methodological reasons:

      As noted in the literature, the magnitude of lifespan extension from standard controls can vary drastically depending on genetic background and lab environment. For instance, Rapamycin-induced extension ranges from ~10% (Schinaman et al., 2019), to over 80% (Landis et al., 2024). We felt that adding a single positive control might provide a false sense of "calibration" rather than a true universal benchmark.

      To ensure the robustness of our findings, we instead employed a dual-validation strategy. We confirmed the lifespan-extending effects of our candidates using both traditional UAS:cDNA and CRISPR-based overexpression. The fact that two independent genetic systems yielded consistent results provides strong internal evidence for the reported effects.

      We acknowledge that the effects of DIP-β are modest when compared to the baseline impact of RU486 feeding. We have added a section to the Discussion addressing this. While the effects are subtle, their reproducibility across different overexpression platforms suggests they are biologically relevant, even if they do not reach the dramatic shifts seen in some caloric restriction or drug-based models.

      We have further addressed this in the results section.

      (3) Several figures would benefit from improved labeling or more detailed legends. For instance, the meaning of "N" and "C" in Figure 1D is unclear; Figure 3A should clarify that Repo is a glial marker; and Figure 5C appears to have truncated labels. Reordering certain panels (e.g., moving control data in Figure 4A-B) may also improve narrative flow. These refinements would greatly aid reader comprehension.

      We have modified and improved the labeling of these figures to increase the clarity. For Fig. 1D, we added the explanation to the Figure legends. In brief, in the Tandem Mass Tag (TMT) isobaric labeling system, 128N is one of many channels (126, 127N, 127C, 128N, 128C, etc.) used to index and compare up to 18 samples simultaneously, improving throughput and reducing missing values.

      Fig. 3A has been updated to clarify that Repo is the glial marker. Fig. 4A-D have been reordered so that the DIP- β lifespan results are presented before the control lifespan, which hopefully improves the narrative flow of this figure. The Fig. 4 references in the manuscript have also been updated to match these changes. Additionally, Fig. 5C has been updated to include the truncated x-axis and y-axis labels.

      (4) A few claims would be strengthened by more specific references or acknowledgment of alternative interpretations. Examples include the phenoxy-radical labeling radius, the impact of H₂O₂ exposure, and the specificity of neutravidin. Additionally, downregulation of synapse-related GO terms may reflect age-related transcriptional changes rather than impaired glia-neuron communication per se, and this possibility should be recognized. The term "unbiased" to describe the screen may also be reconsidered, given the preselection of candidate genes.

      These are good suggestions. We have added references for the phenoxy-radical labeling radius (Durojaye, 2021), the impact of H₂O₂ exposure (J. Li et al., 2021), and the binding specificity of neutravidin (J. Li et al., 2021). We have also removed the term “unbiased” from our manuscript.

      Regarding the request to further address the downregulation of synapse-related GO terms, we believe this indicates a lack of clarity on our part. We did not intend to suggest that our GO analyses, which were based on our proteomics data, were necessarily indicative of impaired neuron-glia communication. Our conclusions regarding altered neuron-glia communication have come from our later snRNA-seq data and analyses. Inspired by this comment, we agree that our differential gene analysis may reflect transcriptional changes rather than impaired glia-neuron communication. We have added such alternative interpretation.

      (5) Clarifying the rationale for focusing on central brain glia over optic-lobe glia would be useful. 

      Agreed! As the intended focus of this study was the more general changes occurring during normal brain aging, we chose to focus on the central brain for our glial cell-surface proteomics, which is responsible for most of the brain’s higher order functions, including learning and memory, signal integration, behavior, etc. As the optic lobes account for approximately half of all neurons in the adult Drosophila brain and are specialized to process visual stimuli (Robinson et al., 2025), we were concerned that including the optic lobes in our glial cell-surface proteomics could strongly bias our findings towards age-related changes in visual function, rather than the more general changes we intended to focus on. Such clarification has been added to the results section (Quantitative comparison of young and old proteomes).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 62: Can the authors expand on "several changes"?

      We have added a sentence expanding upon this in the manuscript draft.

      (2) Line 137: Can the authors provide a reference for the phenoxyl radical half-life?

      Thanks for catching this. We’ve added our reference for the phenoxyl radical half-life.

      (3) Figure 1B: The authors state that neutravidin stained glia; however, there is no glial marker (e.g., anti-Repo) in this panel.

      We acknowledge the reviewer’s point. The lack of anti-Repo staining in Figure 1B is due to the requirements of the Neutravidin-Alexa 647 detection method. Because this procedure bypasses traditional primary and secondary antibody incubation to preserve the biotin signal, co-staining with Repo was not technically feasible. Nevertheless, we utilized the Repo-GAL4 driver to express UAS-CD2-HRP; since this driver is well-documented and specific to glial cells, the Neutravidin signal serves as a functional readout of the targeted glial population.

      (4) Line 254: There is no Figure 2D.

      We’ve corrected this to Fig. 2C.

      (5) Lines 390-396: No reference to the respective figures.

      We’ve made a couple corrections to reference all the respective figures.

      (6) Figure 5C: The X-axis is cut off.

      This has been corrected.

      Reviewer #2 (Recommendations for the authors):

      Minor inconsistencies (e.g., figure references-line 254 references "Figure 2D" where none exists) should be corrected.

      We’ve corrected this to Fig. 2C.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      We thank the reviewers and editors for the second round of peer review. Following the editorial assessment and specific review comments, we now present new results to compare EDS and IDS behavior, and use conventional standard for reporting statistics. We also request to simplify the manuscript title to be ‘Locus coeruleus modulation of prefrontal dynamics during attentional switching in mice’.

      Public Reviews:

      Reviewer #1 (Public review):

      In their response to reviewers, the authors say "We report p values using 2 decimal points and standard language as suggested by this reviewer". However, no changes were made in the manuscript: for example, "P = 4.2e-3" rather than "p = 0.004".

      We apologize for this misunderstanding. We initially interpreted this comment as reporting two non-zero digits in p values. We now have corrected this in the revision. We also follow the editorial recommendation and use a standard convention to report statistics (e.g., p = 0.03, t(7) = -2.8).

      In their response to the reviewers, they wrote: "Upon closer examination of the behavioral data, we exclude several sessions where more trials were taken in IDS than in EDS." If those sessions in which EDSIDS. Most problematic is the fact that the manuscript now reads "Importantly, control mice (pooled from Fig. 1e, 1h, Supp. Fig. 1a, 1b) took more trials to complete EDS than IDS (Trials to criterion: IDS vs. EDS, 10 {plus minus} 1 trials vs. 16 {plus minus} 1 trials, P < 1e-3, Supp. Fig. 1c), further supporting the validity of attentional switching (as in Fig. 1c)" without mentioning that data has been excluded.

      Editor raised a similar concern. We apologize for this oversight, which was due to miscommunication within the lab. We have now revised the manuscript to include all data points without any exclusion in Fig. 1e, 1h, and Supp. Fig. 1a-c. By pooling all data without any exclusion, control mice readily took more trials to complete EDS than IDS, supporting the validity of attentional switching (Trials to criterion: IDS vs. EDS, 11 ± 1 trials vs. 15 ± 1 trials, p = 0.006, Supp. Fig. 1c).

      The exclusion we initially meant to perform was to exclude sessions where task performance in IDS was beyond 95% threshold inferred from the naïve control group (15 trials, Fig. 1c). Exclusions are now explicitly described. Of note, including or excluding these sessions does not change any of the conclusions presented in our manuscript. We have added this analysis in Supp. Fig. 1d and the results remain robust (Supp. Fig. 1d). This panel could be removed if deemed unnecessary by the reviewers.

      Reviewer #3 (Public review):

      The authors overall do a nice job of addressing reviewer comments, and I believe the manuscript is significantly improved. Congratulations!

      We thank you for this positive assessment.

      Weaknesses are mostly minor, but there are some caveats that should be considered. First, the authors use a DBH-Cre mouse line and provide histological confirmation of overlap between HM4Di expression and TH immunostaining. While this strongly suggests modulation of noradrenergic circuit activity, the results should be interpreted conservatively as there is no independent confirmation that norepinephrine (NE) release is suppressed and these neurons are known to release other neurotransmitters and signaling peptides. In the absence of additional control experiments, it is important to recognize that effects on mPFC activity may or may not be directly due to LC-mPFC NE.

      We agree with this comment, and now further discuss this limitation in Discussion, line 255-259:

      “However, it is important to note that LC-NE neurons can co-release other neurotransmitters, such as dopamine and neuropeptides[73,75,76]. In the absence of further control experiments to confirm the suppression of NE release, the observed effects on mPFC may or may not be directly due to NE. Future studies are needed to better delineate the involvement of specific neurotransmitters, cell types and receptors in flexible decision making.”

      Another caveat is that the imaging analyses are entirely from the extradimensional shift session. Without analyzing activity data from the intradimensional shift (IDS) session, one cannot be certain that the observed changes are to some feature of activity that is specific to extradimensional shifts. Future experiments should examine animals with LC suppression during the IDS as well, which would show whether the observed effects are specific to an extradimensional shift and might explain behavioral effects.

      We also agree with this comment, and have thought about this. Technically, IDS has low trial numbers, especially incorrect trials, limiting the power of statistical comparisons. Conceptually, since in our paradigm EDS is always the last stage, comparing neural signals in EDS with previous stages may be confounded by the order of learning. That is, whether the observed differences in mPFC activity were due to mPFC responding to different rules, or due to mPFC responses over time/learning. We now discuss this point in Discussion, line 291-295:

      “Another limitation in the current study is that neurophysiological analyses were entirely from EDS. Without comparing with other task stages (e.g., REV, IDS), it is uncertain to what extent the observed neuronal changes are specific to EDS. Future experiments should examine the behavioral and neurophysiological effects with LC inhibition to determine the specificity of LC-NE modulation of the mPFC during attentional switching.”

      We are also actively collecting additional data to address this point, which requires considerable efforts. We hope to report our findings in a follow up study.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Genetically encoded fluorescent proteins expressed in specific cell types allow recognising them in vivo and, if the protein is a functional indicator, as in the case of genetically encoded calcium indicators (GECIs), to record activity from the same cellular ensemble. Ideally, if proteins (fluorophores) have perfectly distinct spectral properties, signals can be distinguished from as many cell types as the number of employed fluorophores. In practice, fluorescent proteins have non-negligible crosstalk both in absorption and emission bands. In addition, fluorescence contribution of each fluorophore normally varies from cell to cell and therefore spectral properties of cells expressing two or more proteins are different. The work of Phillips et al. addresses this challenge. The authors present an approach defined as "Neuroplex", allowing identification of up to nine cell types from the same number of fluorophores. The fingerprint of each cell is then associated with functional fluorescence from the GECI GCaMP, allowing recording calcium activity from that specific cell. The method is implemented in vivo using head-mounted miniscopes.

      The authors used a mouse line expressing GCaMP in cortical pyramidal neurons and developed an experimental pipeline. First, they injected the nine AAV viruses, causing expression of fluorophores in a different brain area. The idea was not to image that area, but a non-infected medial prefrontal cortex (mPFC) section where neurons could be infected by their axons projecting in an injected area, in this way being identified by their targeting region(s). A GRIN lens, allowing spectral analysis, was mounted in the mPFC section, and GCaMP fluorescence was then recorded during behavioural tasks and analysed to identify regions of interest (ROIs) corresponding to neuron somata. After functional imaging, the head of the mouse was fixed, spectral analysis was performed, and after necessary correction for chromatic distortions, the fluorophore contribution was determined for each ROI (neuron) from where GCaMP signals were detected. Notably, the procedures for estimation and correction of chromatic aberration and light transmission (described in Figure 2) were a major challenge in their technical achievements. The selection of the nine fluorophores was another big effort. This was done by combining computer simulations and direct measurement of spectra from individual proteins expressed in HEK293 cells. It is important to say that the authors could simulate arbitrary combinations of two or more different fluorophores and evaluate the ability of their algorithm to detect the correct proteins against wrong estimations of false-negative (absence of an expressed protein) or false-positive (presence of a non-expressed protein). Not surprisingly, this ability decreases with the level of GCaMP expression. The authors underline that most errors were false-negatives, which have a milder impact in terms of result interpretation, but the rate of false positives was, nevertheless, relevant in detecting a second fluorophore from a cell expressing only one protein. The experimental profiles of fluorophores were dependent both on the specific fluorescent protein and on the projecting area, and the distribution of double-labelled did not match anatomical evidence. This result should be taken as the limitation of the present pioneering experiments, presented as proof-of-principle of the approach, but Neuroplex may provide far improved precision under different experimental conditions.

      In my view, the work of Phillips et al. represents a significant advance in the state-of-the-art of the field. The rigorous analysis of limitations in the use of Neuroplex must be considered an important guideline for future uses of this approach.

      We appreciate the reviewer’s positive evaluation and thoughtful comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript introduces Neuroplex, a pipeline that integrates miniscope Ca²⁺ imaging in freely moving mice with multiplexed confocal and spectral imaging to infer projection identities of recorded neurons. This technical approach is promising and could broaden access to projection-resolved population imaging. However, the core quantitative analyses apply a winner-take-all single-label assignment per neuron even when multiple fluorophores exceed threshold, with additional labels treated descriptively as "secondary hits." While the authors acknowledge and simulate dual labeling, the extent to which this single-label decision rule affects subtype fractions and behavioural comparisons remains uncertain without a multi-label (or probabilistic) sensitivity analysis and propagation of classification uncertainty.

      We thank Reviewer #2 for the careful statistical perspective and focus on assignment strategy and uncertainty. Importantly, we emphasize that Neuroplex is presented as a methodological proof-of-principle, not as a definitive quantification of projection convergence.

      Strengths:

      (1) Conceptual advance and practicality: Decoupling acquisition from identity readout constitutes an innovative approach that is, in principle, applicable in laboratories currently using single-color miniscopes.

      (2) Engineering thoroughness: The manuscript offers detailed consideration of GRIN optics, spectral libraries, registration procedures, and simulations that address signal-to-noise ratio, background, and class imbalances.

      (3) Immediate community value: If demonstrated to be robust, the pipeline could enable projection-resolved analyses without reliance on specialized multicolor miniscopes.

      Weaknesses:

      (1) Single-label assignment in the main analyses: When multiple fluorophores exceed threshold for a neuron/ROI, the workflow applies a winner-take-all rule and assigns a single label (the fluorophore with the largest standardized beta), while additional above-threshold fluorophores are retained only as "secondary hits." This is a reasonable specificity-first choice, but because cortical excitatory neurons can collateralize, collapsing dual-threshold ROIs to one identity may under-represent dual-projecting cells and could bias estimated subtype fractions and behavioural comparisons.

      We thank the reviewer for raising this important conceptual point.

      We agree that cortical excitatory neurons frequently collateralize and therefore may legitimately express more than one retrograde fluorophore. Our use of a winner-take-all (WTA) rule in the primary analyses was an intentionally conservative methodological choice designed to prioritize specificity over sensitivity in this proof-of-principle study.

      As demonstrated in our simulations (Supp. Fig. 5–6), under realistic background and noise conditions, secondary assignments are more susceptible to false-positive errors than primary assignments. For this reason, we chose to assign a single primary identity for quantitative behavioral stratification while retaining additional above-threshold fluorophores as “secondary hits” and reporting their distribution separately (Supp. Fig. 7).

      We did not intend to imply that projections are exclusive. Rather, the WTA strategy provides a conservative lower-bound estimate of subtype proportions and avoids inflation of dual-label rates under conditions where spectral separability is imperfect.

      We agree that this rationale should be stated more explicitly in the manuscript, and that the potential impact of assignment strategy on subtype fractions and behavioral comparisons should be acknowledged clearly as a methodological trade-off rather than a biological claim.

      Importantly, the biological analyses presented in this manuscript are illustrative demonstrations of functional stratification capability and do not depend on exclusivity of projection identity. We have revised the manuscript to clarify this framing as follows:

      “If multiple fluorophores exceeded the threshold for an ROI, the fluorophore with the largest z-scored beta value was assigned as the primary identity (winner-take-all rule). This conservative approach was chosen to prioritize specificity under realistic noise and background conditions. Additional above-threshold fluorophores were retained as ‘secondary hits’ but were not incorporated into primary subtype stratification analyses.” (Methods, Single Pass Algorithm)

      “For quantitative behavioral comparisons, each ROI was assigned a single primary fluorophore identity using a winner-take-all rule. We emphasize that this assignment strategy does not imply projection exclusivity. Rather, it provides a conservative lower-bound estimate of subtype proportions, as ROIs exceeding threshold for multiple fluorophores were classified according to their strongest spectral contribution.” (Result, Fluorophore distribution in behaviorally relevant ROIs)

      “These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications. ” (Results, Neuronal Cell Type and Behavior)

      “Cortical pyramidal neurons frequently collateralize to multiple downstream targets, and accordingly some ROIs exceeded threshold for more than one fluorophore. In this proof-of-principle implementation, we adopted a specificity-first winner-take-all assignment rule for primary analyses to minimize false-positive multi-label calls under realistic noise conditions. This strategy likely underestimates the true prevalence of dual-projecting neurons and should therefore be interpreted as a conservative stratification approach rather than a statement of projection exclusivity.” (Discussion)

      (2) Dual-label detection is acknowledged but remains descriptive in vivo: the manuscript explicitly discusses the possibility of dual projection, evaluates dual-fluorophore detection in simulations (including performance under realistic noise/background), and reports in vivo rates of secondary hits. However, these dual-threshold events are not incorporated as co-identities in the main statistical analyses, making it difficult to judge how robust the principal biological conclusions are to the single-label decision rule.

      We thank the reviewer for this important clarification request.

      We agree that dual-projection neurons are biologically plausible and that dual-threshold ROIs were detected in vivo. In this manuscript, however, our primary goal was to establish the feasibility of high-dimensional spectral assignment and projection-resolved stratification, rather than to provide a definitive quantification of projection convergence.

      For this proof-of-principle study, we chose a conservative winner-take-all (WTA) framework for primary behavioral analyses in order to minimize false-positive multi-label assignments under realistic noise and background conditions, as demonstrated in our simulations (Supp. Fig. 5–6). Secondary hits were retained and reported descriptively (Supp. Fig. 7), but not incorporated into the primary statistical comparisons to avoid overinterpretation of potentially ambiguous dual-label calls.

      Importantly, the principal biological conclusions presented in the manuscript are qualitative demonstrations that projection-defined stratification is feasible within a single animal. These conclusions do not rely on projection exclusivity or on precise quantification of dual-projecting fractions.

      We agree that this distinction should be made clearer in the manuscript, and we have revised the text as follows:

      “Although dual-threshold ROIs were detected in vivo, these secondary assignments were not incorporated as co-identities in the primary behavioral analyses. This decision reflects a conservative specificity-first framework designed to minimize false-positive multi-label calls under realistic noise conditions. Accordingly, dual-label rates reported here should be interpreted descriptively. The present study focuses on demonstrating the feasibility of projection-resolved stratification, rather than providing definitive quantification of projection convergence.” (Results, Fluorophore distribution in behaviorally relevant ROIs)

      “We then stratified these neurons by projection target and examined behaviorally selective activity across cell types. These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications.” (Results, Behavioral Analysis)

      (3) Uncertainty is not propagated: False-positive/false-negative rates from simulations and uncertainty from registration/segmentation are not carried forward into quantitative confidence bounds on subtype proportions or behaviour-by-subtype effects.

      We agree that formal propagation of classification and registration uncertainty into subtype proportions and behavioral comparisons would be appropriate in a study primarily focused on precise anatomical quantification. However, the central goal of the present manuscript is methodological and to demonstrate that high-dimensional spectral identity can be reliably linked to miniscope-recorded functional activity within a single animal.

      We have shown that simulations under realistic noise, background, and class imbalance conditions (Supp. Fig 5-6) show that errors are predominantly false negatives rather than false positives. However, behavioral analyses are presented as qualitative demonstrations of the feasibility of projection-resolved stratification rather than as definitive quantitative anatomical measurements.

      In the revised manuscript, we clarified that 1) subtype proportions and behavioral effects are assignment-dependent estimates, 2) simulation-derived error rates provide guidance for experimental design rather than formal confidence intervals, and 3) future studies centered on precise quantification of projection fractions would benefit from formal uncertainty modeling, as follows:

      “These simulation-derived accuracy estimates characterize expected performance under defined noise and background conditions but were not formally propagated into confidence bounds on subtype proportions or behavioral comparisons. In this proof-of-principle study, subtype fractions are presented as assignment-dependent estimates rather than definitive anatomical measurements.” (Results, Assessment of spectral unmixing approach)

      “Because classification uncertainty was not formally propagated into these analyses, behavior-by-subtype comparisons should be interpreted as qualitative demonstrations of functional stratification rather than precise quantitative estimates.” (Results, Neuronal cell types and behavior)

      “The modeling framework was designed to characterize expected classification behavior across a range of experimental regimes, including background fluorescence, class imbalance, and reduced signal-to-noise ratio. These simulations provide practical performance guidance but were not used to compute formal error bars or propagate uncertainty into downstream biological analyses.” (Methods, Modeling of experimental variables to assess accuracy of algorithms)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      Reviewer #3 (Public review):

      This manuscript presents Neuroplex, a technically rigorous and carefully validated pipeline that links miniscope calcium imaging in freely behaving animals with high-dimensional fluorophore-based cell-type identification using in vivo multiplexed spectral confocal imaging through the same implanted GRIN lens. The work overcomes a major practical limitation of head-mounted microscopy by enabling the identification of up to nine projection-defined neuronal populations within the same animal, without post-fixation histology. The approach is well motivated and supported by extensive calibration and simulation. While the biological results are primarily illustrative, the methodological contribution is clear and likely to be broadly useful.

      Major comments

      (1) The approach relies on the assumption that fluorophore identity assigned during anesthetized confocal imaging accurately reflects the identity of neurons recorded during prior behavioural sessions. While the use of the same GRIN lens and in vivo co-registration mitigates many concerns, the manuscript would benefit from a more explicit discussion, or empirical demonstration, if available, of the stability of fluorophore assignments across time. Even limited repeat spectral imaging in a subset of animals would strengthen confidence in longitudinal applicability.

      We thank the reviewer for highlighting this important conceptual assumption.

      Fluorophore identity in Neuroplex is genetically encoded via AAVretro delivery and therefore does not depend on transient physiological state. Spectral imaging is performed in vivo through the same GRIN lens and field of view used during behavioral imaging, and co-registration relies on anatomical landmarks. While repeat spectral imaging was not formally performed as a longitudinal experiment, the underlying fluorescent protein expression is stable over weeks, and there is no biological mechanism in this paradigm that would alter fluorophore identity across sessions.

      We revised the manuscript to explicitly state this assumption and clarify why identity stability is expected as follows:

      “…fluorophore signals and reduce unmixing fidelity, leading to an increased false positive rate. Fluorophore identity in this framework is genetically encoded via retrograde AAV delivery and is therefore expected to remain stable across behavioral and spectral imaging sessions. Because both functional and spectral data are acquired in vivo through the same GRIN lens and co-registered using anatomical landmarks, assignment stability is not expected to vary across time unless expression levels change substantially. While repeat spectral imaging was not performed as a formal longitudinal experiment in this study, the stability of fluorescent protein expression supports the assumption that fluorophore identity reflects a persistent cellular attribute.” (Discussion)

      (2) Fluorophore identity is determined using thresholding of linear unmixing coefficients relative to an empirically defined baseline, followed by a second adaptive pass for over-represented fluorophores. While this heuristic is extensively validated via simulations, it remains ad hoc from a statistical perspective. The authors should more explicitly justify this choice and discuss its limitations relative to probabilistic or likelihood-based classifiers, particularly with respect to uncertainty estimation at the single-ROI level.

      We agree that the dual-pass thresholding approach is heuristic rather than fully probabilistic. More formal probabilistic classifiers are possible but would introduce additional modeling assumptions and training requirements beyond the scope of this proof-of-principle study.

      We revised our manuscript to clarify this as follows:

      “The current classification framework relies on linear unmixing followed by empirically defined thresholding rather than full probabilistic inference. This approach provides transparency and practical robustness under realistic noise and background conditions but does not generate single-ROI posterior uncertainty estimates. ” (Discussion)

      (3) Identifiability of fluorophores is demonstrated empirically, but the manuscript does not explicitly quantify spectral separability (e.g., similarity metrics between basis spectra or conditioning of the unmixing matrix). A brief analysis of spectral independence or sensitivity of beta estimates to noise would provide mathematical reassurance, especially given the reliance on linear regression in a high-dimensional feature space.

      We agree that spectral separability is conceptually important. In this manuscript, separability is demonstrated empirically through 1) In vitro fingerprint acquisition under identical optical conditions, 2) simulation under background and noise, and 3) successful in vivo classification across regimes. We did not compute formal matrix conditioning metrics, but we agree that the separability rationale should be described more explicitly. We revised our manuscript as:

      “While formal conditioning metrics were not explicitly computed empirical fingerprint acquisition and simulation-based perturbation analyses demonstrate sufficient spectral independence for reliable linear unmixing under the tested regimes.” (Discussion)

      (4) The spectral unmixing treats CNMF-derived ROIs as fixed supports. I wonder whether ROI boundaries, neuropil contamination, and partial overlap can introduce structured uncertainty that could bias spectral estimates. If so, the authors should acknowledge this dependency more explicitly and discuss how ROI quality or overlap might influence false negatives or false positives, particularly in densely labelled regions.

      We agree that ROI definition influences spectral extraction. Spectral fingerprints are derived by averaging all pixels within the ROI mask, and therefore neuropil contamination, partial ROI overlap, and dense labeling could influence beta estimates. In the revised manuscript, we have acknowledged this dependencies more explicitly.

      “Spectral unmixing operates on CNMF-derived ROI masks treated as fixed supports. Accordingly, segmentation quality, neuropil contamination, and partial overlap between neighboring cells can influence extracted spectral fingerprints and may contribute to false negatives or secondary assignments, particularly in densely labeled regions. These structured sources of uncertainty are expected to have the greatest impact under regimes of extreme class imbalance, low fluorophore brightness, strong neuropil signal, or pairing of spectrally overlapping reporters. Use of refined segmentation strategies or nuclear-localized reporters could reduce such structured uncertainty in future implementations.” (Discussion)

      (5) The manuscript reports meaningful rates of secondary fluorophore detection, but also nontrivial false-positive rates for secondary labels under realistic conditions. The authors appropriately caution against over-interpretation, but the Discussion should more clearly delineate when dual-label assignments are likely to be biologically interpretable versus methodologically ambiguous, and how experimental design (e.g., fluorophore pairing) should be optimized accordingly.

      We agree and will delineate interpretability boundaries explicitly.

      “Dual-label assignments are most reliable when fluorophores are spectrally well separated and when signal-to-noise ratios are high. In contrast, spectrally adjacent fluorophore pairs or densely labeled regimes increase ambiguity and false-positive risk. Experimental design should therefore prioritize pairing spectrally distant fluorophores when projection convergence is of primary interest.” (Discussion)

      (6) I suspect that Neuroplex will be most effective in certain regimes (moderate convergence, bright and spectrally distinct fluorophores) and less reliable in others. A more explicit discussion of best practices, anticipated failure modes, and experimental scenarios where the method may be inappropriate would increase the practical value of the paper for adopters.

      “More broadly, Neuroplex is expected to perform most robustly in regimes characterized by moderate projection convergence, balanced fluorophore representation, bright and spectrally distinct reporters, and adequate signal-to-noise ratio. Imaging directly within a projection target that has received dense retrograde labeling may introduce substantial class imbalance, which simulations predict will reduce detection sensitivity for the dominant fluorophore. In such cases, conservative assignment strategies, reduced spectral complexity, or refinement of ROI definition may improve interpretability. Careful fluorophore selection and pilot validation under intended imaging conditions are therefore recommended prior to large-scale application. Future implementations incorporating nuclear-localized reporters may further reduce segmentation-dependent ambiguity by constraining spectral signals to somatic compartments.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors should address a few points that are not clear.

      (1) At the end of the Results, the authors assess their approach using only four fluorophores and conclude that Neuroplex works "even" under reduced complexity. There is something I am missing. In my mind, lower complexity should be easier and should work better. As a researcher, I would first assess a four-fluorophores scenario and then step up with complexity, but the authors did the opposite. Also, I think that the present Supplementary Figure 9 should be in the main text; I don't understand why the authors decided to relegate a clear result to the bottom of everything. The authors should give some explanations.

      We agree that reduced spectral complexity should, in principle, improve separability and classification performance. Our original presentation order was intended to first demonstrate feasibility under the most challenging condition (nine fluorophores plus GCaMP), thereby establishing maximal multiplexing capacity. The reduced-complexity experiment was included to demonstrate scalability and generalizability under more typical experimental regimes. However, we agree that this rationale was not sufficiently clear and that the reduced-complexity results merit presentation in the main text.

      Accordingly:

      We have moved former Supplementary Figure 9 into the main Results (Fig. 6).

      We have clarified explicitly why the nine-fluorophore condition was presented first as follows:

      “To evaluate the performance of Neuroplex under more typical experimental regimes with reduced-complexity, we applied the pipeline to two GCaMP transgenic animals injected with a subset of four fluorophores.”

      (2) The question of relative expression is crucial. Among the infected regions, there is the contralateral mPFC and I imagine that if they image there, the contribution of the expressed protein might dominate all other components, preventing detection of other fluorophores, including GCaMP. But is it the case, or would it be possible to detect projecting neurons in that region? I would be surprised that the authors never tried it; this test would simply imply mounting the GRID lens on the other hemisphere.

      This is an important conceptual point.

      Our simulations (Supp. Fig. 5) explicitly model over-representation of a single fluorophore. These results show that heavy class imbalance primarily increases false negatives (due to baseline normalization) rather than false positives.

      In the revised manuiscript, we discussed this limitation more explicitly.

      “Relative fluorophore representation within the imaged field of view influences classification robustness. As demonstrated in our simulations of class imbalance (Supp. Fig. 5g–h), extreme over-representation of a single fluorophore primarily increases false-negative rates due to baseline normalization effects. In the present study, we intentionally avoided imaging directly within heavily infected projection targets (e.g., contralateral mPFC) in order to maintain moderate fluorophore representation across ROIs. Imaging in a densely labeled region would represent a more challenging regime, and we would expect reduced sensitivity for the dominant fluorophore under such conditions.” (Dicussion)

      (3) The possibility to utilise Neuroplex goes beyond the type of experiment presented as proof-of-concept in this technical paper. In the Discussion, the authors mention genetically defined subtypes and activity-tagged neurons. But, if one changes the pipeline, can it be used by expressing GECIs with different spectra, or GECIs and genetically-encoded voltage indicators (GEVIs)? I would be very interested in knowing what the authors think about this putative "shortcut".

      We thank the reviewer for this forward-looking and insightful question.

      In principle, the Neuroplex framework could be extended to incorporate spectrally distinct genetically encoded functional indicators, including multi-color GECIs or combinations of GECIs and GEVIs. However, it is important to distinguish this from the identity-assignment strategy implemented in the present study.

      Simultaneous multi-color functional imaging under a head-mounted miniscope is optically more demanding than assigning cell identity from single-color functional recordings followed by high-dimensional spectral readout. Multi-color GECI or GEVI imaging requires real-time excitation and emission separation during dynamic recording, increases optical complexity, and is particularly sensitive to chromatic aberration, photon efficiency, and signal-to-noise constraints imposed by GRIN lenses.

      In contrast, Neuroplex decouples functional acquisition from spectral identity determination. Functional activity is recorded using a single optimized channel, while spectral separation is performed separately under controlled confocal conditions with multiplexed excitation and emission sampling. This design substantially reduces optical burden during behavioral imaging.

      While integration of multiple functional reporters is conceptually feasible within this framework, successful implementation would require careful validation of brightness, spectral separability, and temporal stability for each reporter combination.

      Reviewer #2 (Recommendations for the authors):

      (1) Implement a principled multi-label calling mode for cells with >1 above-threshold fluorophore (e.g., per-fluorophore FDR control or Bayesian posteriors). Report cell-wise weights and re-run key results three ways: single-label, hard multi-label, and soft (probabilistic) assignments; state explicitly how conclusions change.

      We appreciate this suggestion and agree that multi-label or probabilistic calling frameworks are well motivated, particularly for studies in which projection convergence is the central biological question. In the current manuscript, however, our goal is to establish a practically deployable proof-of-principle pipeline for linking miniscope functional recordings to a high-dimensional spectral-identity readout. Consistent with this scope, we used a conservative winner-take-all (WTA) strategy for primary analyses to prioritize specificity under realistic noise and background conditions, and we treated multi-hit events descriptively. Importantly, the qualitative conclusions regarding projection-resolved functional stratification are unchanged when secondary-hit distributions are examined.

      In the revised manuscript, we explicitly stated that: (i) single-label assignment is a conservative analysis choice rather than a biological claim of exclusivity, and (ii) multi-label or probabilistic calling is a natural extension for future work, as follows:

      “If multiple fluorophores exceeded the threshold for an ROI, the fluorophore with the largest z-scored beta value was assigned as the primary identity (winner-take-all rule). This conservative approach was chosen to prioritize specificity under realistic noise and background conditions. Additional above-threshold fluorophores were retained as ‘secondary hits’ but were not incorporated into primary subtype stratification analyses.” (Methods, Single Pass Algorithm)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      (2) Add ground truth for dual projectors in a subset (paired orthogonal tracers or staged injections) and provide a confusion matrix including dual-positives; use this to calibrate thresholds/priors.

      We agree that ground truth validation of dual projectors using orthogonal tracers or staged injections would be valuable, particularly for calibrating priors and enabling confusion-matrix-based evaluation. However, these experiments require additional cohorts and experimental design beyond the scope of the current proof-of-principle technical manuscript. Our goal here is to demonstrate the feasibility of multiplexed identification and projection-resolved stratification within a single animal, not to provide definitive anatomical quantification of collateralization.

      We have revised the manuscript to clearly state that dual-label in vivo observations are descriptive and that studies aimed at quantitative convergence mapping should incorporate orthogonal ground truth validation.

      “Accurate quantification of projection convergence would benefit from orthogonal ground-truth validation (e.g., paired tracers or staged injections) to establish confusion matrices for dual positives and to calibrate thresholds or priors.”

      (3) Propagate uncertainty from simulations and registration/segmentation to subtype fractions and behavior effects (error bars or sensitivity analyses).

      We agree that formal uncertainty propagation is appropriate for studies focused on precisely quantifying subtype proportions or effect sizes. In this manuscript, subtype fractions and behavioral comparisons are presented primarily as demonstrations of the feasibility of projection-resolved functional stratification, rather than definitive anatomical measurements. Simulation analyses are included to characterize expected performance under defined noise and background regimes, but we did not propagate these uncertainties into downstream confidence bounds in this proof-of-principle work.

      We have revised the manuscript to clarify this explicitly as follows:

      “These simulation-derived accuracy estimates characterize expected performance under defined noise and background conditions but were not formally propagated into confidence bounds on subtype proportions or behavioral comparisons. In this proof-of-principle study, subtype fractions are presented as assignment-dependent estimates rather than definitive anatomical measurements.” (Results, Assessment of spectral unmixing approach)

      “These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications.” (Results, Neuronal cell types and behavior)

      “The modeling framework was designed to characterize expected classification behavior across a range of experimental regimes, including background fluorescence, class imbalance, and reduced signal-to-noise ratio. These simulations provide practical performance guidance but were not used to compute formal error bars or propagate uncertainty into downstream biological analyses.” (Methods, Modeling of experimental variables to assess accuracy of algorithms)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      (4) Mitigate sources of spurious multi-hits (neuropil handling, ROI mask erosion, nuclear-localized reporters, spectral basis choices) and quantify their impact on dual-label recovery.

      We agree that neuropil contamination, ROI boundary choices, and spectral basis selection can influence multi-hit rates. In the current manuscript, we already implement background subtraction and evaluate multi-hit behavior through simulations under realistic background and noise regimes. Quantitative evaluation of additional mitigation strategies (e.g., ROI erosion comparisons) would require new analyses beyond the current scope.

      We have revised the Discussion to include concrete best-practice recommendations (e.g., fluorophore pairing, conservative interpretation of multi-hits, and potential use of nuclear-localized reporters).

      “Multi-hit events can reflect true biological collateralization but may also arise from structured sources of ambiguity such as neuropil contamination, partial ROI overlap, or imperfect ROI boundaries. These factors may bias spectral estimates and contribute to secondary assignments, particularly in densely labeled regions. Practical mitigation strategies include conservative assignment rules, improved segmentation, and use of nuclear-localized reporters to reduce neuropil contribution. ”

      (5) Clarify claims in the main text/figures wherever exclusivity is implied; label which panels use single-label vs multi-label/soft assignments.

      We agree and thank the reviewer for emphasizing clarity. We did not intend to imply projection exclusivity. We have revised the manuscript text and figure legends to explicitly state where single-label (winner-take-all) assignment is used, and to avoid language that could be read as claiming exclusive projection identity as follows:

      “For quantitative behavioral comparisons, each ROI was assigned a single primary fluorophore identity using conservative winner-take-all rule. This assignment reflects the strongest spectral contribution and does not imply projection exclusivity. Rather, it provides a conservative lower-bound estimate of subtype proportions, as ROIs exceeding threshold for multiple fluorophores were classified according to their strongest spectral contribution.”

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study addresses a critical and timely question regarding the role of a subpopulation of cortical interneurons (Chrna2-expressing Martinotti cells) in motor learning and cortical dynamics. However, while some of the behavior and imaging data are impressive, the small sample sizes and incomplete behavioral and activity analyses make interpretation difficult; therefore, they are insufficient to support the central conclusions. The study may be of interest to neuroscientists studying cortical neural circuits, motor learning, and motor control.

      We thank the reviewers and the editors for the insightful comments. We are pleased to report that the raised issues with the manuscript can be addressed by improving clarity in our writing of specific sections and by providing additional analysis. Specifically, it was not clear in the manuscript text that although we show illustrative data with a lower number of animals, our conclusions are supported by data with a larger and sufficient sample size. Also, the description of our control experiments has been improved to clarify our proper treatment controls. We therefore clarify below that our study presents compelling and sufficient evidence to support our conclusions. We have responded to all the comments, explaining how each concern has been addressed. All line and figure numbers mentioned here refer to the numbering of the reviewed manuscript version. All references are cited as DOIs.

      Reviewer #1 (Public review):

      There are many major issues with the study. The findings across experiments are inconsistent, and it is unclear how the authors performed their analyses or why specific time points and comparisons were chosen. The study requires major re-analysis and additional experiments to substantiate its conclusions.

      The main limitation of the study lies in its small sample sizes and the absence of key control experiments, which substantially weaken the strength of the conclusions.

      (1a) Behavior task - the pellet-reaching task is a well-established paradigm in the motor learning field. Why did the authors choose to quantify performance using "success pellets per minute" instead of the more conventional "success rate" (see PMID 19946267, 31901303, 34437845, 24805237)? It is also confusing that the authors describe sessions 1-5 as being performed on a spoon, while from session 6 onward, the pellets are presented on a plate. However, in lines 710-713, the authors define session 1 as "naive," session 2 as "learning," session 5 as "training," and "retraining" as a condition in which a more challenging pellet presentation was introduced. Does "naive session 1" refer to the first spoon session or to session 6 (when the food is presented on a plate)? The same ambiguity applies to "learning session 2," "training session 5," and so on. Furthermore, what criteria did the authors use to designate specific sessions as "learning" versus "training"? Are these definitions based on behavioral performance thresholds or some biological mechanisms? Clarifying these distinctions is essential for interpreting the behavioral results.

      We agree that success rate is a more conventional measure than the number of successful prehensions per minute. We have changed all behavior quantifications to success rate. Note that all behavioral conclusions drawn before are still valid under the new quantification (see Figures 1, 4, and 5). Importantly, the terms “learning,” “training,” and “retraining” were defined based on task structure and prior literature on motor learning stages rather than predetermined behavioral performance thresholds. These labels reflect progression through the task design (initial acquisition, continued practice under stable conditions, and adaptation to altered task demands), not biologically distinct or threshold-defined phases. We have revised the Methods section to make these definitions and transitions explicit to avoid ambiguity in interpreting the behavioral results.

      (1b) Judging from Figures 1F and 4B, even in WT mice, it is not convincing that the animals have actually learned the task. In all figures, the mice generally achieve 10-20 pellets per minute across sessions. The only sessions showing slightly higher performance are session 5 in Figure 1F ("train") and sessions 12 and 13 in Figure 4B ("CLZ"). In the classical pellet-reaching task, animals are typically trained for 10-12 sessions (approximately 60 trials per session, one session per day), and a clear performance improvement is observed over time. The authors should therefore present performance data for each individual session to determine whether there is any consistent improvement across days. As currently shown, performance appears largely unchanged across sessions, raising doubts about whether motor learning actually occurred.

      As described in the methods Single pellet prehension task section, in our setup box, the elevated plate slot for pellet delivery is at a challenging position, outside the slit and 2cm to the right, forcing the mice to use the left paw. Therefore, mice need to be trained in gradually harder positions, using a spoon to deliver the pellet instead of placing it directly at the plate slot. Due to the gradually increasing difficulty in the task, the success rate curve remains flat, while the total number of attempts and number of successful prehensions per minute increase (Figure 1 F-H). We therefore argue that motor learning indeed occurred, with a relatively constant success rate when performing a gradually harder task. Further, the success rate and number of successful prehensions of our mice is within levels previously reported for trained mice (10.3791/51238). We added the precise plate slot position in the methods section to make clearer the need of a gradually increasing difficulty delivery method.

      (1c) The authors also appear to neglect existing literature on the role of SST-INs in motor learning and local circuit plasticity (e.g., PMID 26098758, 36099920). Although the current study focuses on a specific subpopulation of SST-INs, the results reported here are entirely opposite to those of previous studies. The authors should, at a minimum, acknowledge these discrepancies and discuss potential reasons for the differing outcomes in the Discussion section.

      We thank the reviewer for pointing this out. It is by no means a neglect, but a careful balance discussing previous literature that can be fairly compared with our findings. It is becoming increasingly clear — with mounting evidence from modern transcriptomic and connectomic studies — that the canonical “three‑cardinal” interneuron populations (SST⁺, PV⁺, VIP⁺) represent oversimplified groupings that mask considerable heterogeneity. For example, in a comprehensive single-cell RNA‑sequencing (scRNA‑seq) study covering ~1.3 million cells from mouse cortex and hippocampus, the authors identified dozens of discrete GABAergic subtypes beyond the classical marker-defined classes, revealing continuous and graded variation in molecular identity across cortical and hippocampal regions (10.1016/j.cell.2021.04.021). Moreover, a recent study focusing on SST-expressing interneurons demonstrated that even within the SST class there are multiple subtypes with distinct laminar distributions, axonal projection patterns, and circuit connectivity — for instance, two different Martinotti subtypes vs. a non-Martinotti SST subtype targeting different pyramidal neuron types and dendritic compartments (10.1016/j.neuron.2023.05.032). Finally, developmental single‑cell transcriptomics shows that interneuron diversity is already apparent at early postmitotic stages, indicating that these subtypes are pre-specified rather than being mere activity‑dependent states (10.1038/s41467‑018‑07458‑1). These findings argue strongly that the traditional SST⁺ / PV⁺ / VIP⁺ classification, while useful as a coarse heuristic, fails to capture the rich diversity in molecular, morphological, and functional phenotypes that likely underlie distinct roles in circuit computation and behavior.

      The consequence of this is that studies using any of these three markers must be cautiously interpreted since in reality, several quite different neuronal populations are studied at once, especially if no efforts were made to tease out which of the participating populations (inside the “cardinal” population) contribute to the effects seen. Most likely, the reported results are based on a mixed population - in the worst case scenario - populations with opposite effects. In any case, we have now included the role of SST-INs in motor learning and M1 circuitry in the discussion section. We also respectfully disagree that our findings are the opposite of previous SST-IN studies. We show that increasing Ma2 excitability improved execution of an already learned movement, while 10.1038/nn.4049 showed that both activating (which is different from increasing excitability) and inhibiting SST-INs impaired the learning of a stereotyped movement. Similarly, 10.1016/j.neuron.2022.08.018 showed that increasing SST-INs excitability impairs motor learning, not execution of a previously learned movement. While we found that increasing excitability of Ma2 cells did not affect motor learning, note that the Ma2 are a subset of martinotti cells with homogeneous electrophysiological and morphological properties (10.1371/journal.pbio.2001392), and martinotti cells themselves are a subset of SST+ cells (10.1016/j.neuron.2023.05.032). The discussion has been updated to include this reasoning.

      (2a) Calcium imaging - The methodology for quantifying fluorescence changes is confusing and insufficiently described. The use of absolute dF values ("detrended by baseline subtraction," lines 565-567) for analyses that compare activity across cells and animals (e.g., Figure 1H) is highly unconventional and problematic. Calcium imaging is typically reported as dF/F0 or z-scores to account for large variations in baseline fluorescence (F0) due to differences in GCaMP expression, cell size, and imaging quality. Absolute dF values are uninterpretable without reference to baseline intensity - for example, a dF of 5 corresponds to a 100% change in a dim cell (F0 = 5) but only a 1% change in a bright cell (F0 = 500). This issue could confound all subsequent population-level analyses (e.g., mean or median activity) and across-group comparisons. Moreover, while some figures indicate that normalization was performed, the Methods section lacks any detailed description of how this normalization was implemented. The critical parameters used to define the baseline are also omitted. The authors should reprocess the imaging data using a standardized dF/F0 or z-score approach, explicitly define the baseline calculation procedure, and revise all related figures and statistical analyses accordingly.

      The calcium imaging used here is 1-photon microendoscopic video data. To our knowledge, it is not possible to extract the true cell baseline over time from 1-photon data, since the background component includes signals from multiple sources, and usually has fluctuations larger than the neural signal itself. We agree that absolute dF values cannot be compared across cells, and that is not what we report here. The CNMF-E algorithm outputs the temporal activity of each neuron with the background component already removed (10.7554/eLife.28728) and therefore the baseline subtraction used in our study is already standardized (10.7554/eLife.38173). Note that although it is common in the literature to record 1-photon data and perform similar preprocessing (some form of baseline subtraction and/or normalization by noise std), referring to the resulting trace as dF/F, that is not entirely correct, since true F0 extraction is not possible. We thus chose to refer to the resulting preprocessed traces as what they actually are - dF detrended (raw trace with estimated background components removed). However, we agree that a better description of the process would be helpful in our manuscript, and that the nomenclature might be confusing to readers. We therefore expanded the methods section to better explain that we will now refer to F0 as the background component (and refer to our resulting traces as dF/F) and explain how it was determined. We also updated the example traces in Figure 1E to now show the raw traces, the estimated background components and the detrended traces.

      (2b) Figure 1G - It is unclear why neural activity during successful trials is already lower one second before movement onset. Full traces with longer duration before and after movement onset should also be shown. Additionally, only data from "session 2 (learning)" and a single neuron are presented. The authors should present data across all sessions and multiple neurons to determine whether this observation is consistent and whether it depends on the stage of learning.

      We agree that it would be beneficial to show longer traces as an example of prehension-related activity, so we expanded Figure 1I to show a longer trace for a single neuron. We added to Supplemental Figure 2 plots showing longer traces from all sessions including all neurons for both genotypes.

      (2c) Figure 1H - The authors report that chemogenetic activation of Chrna2 cells induces differential changes in PyrN activity between successful and failed trials. However, one would expect that activating all Chrna2 cells would strongly suppress PyrN activity rather than amplifying the activity differences between trials. The authors should clarify the mechanism by which Chrna2 cell activation could exaggerate the divergence in PyrN responses between successful and failed trials. Perhaps, performing calcium imaging of Chrna2 cells themselves during successful versus failed trials would provide insight into their endogenous activity patterns and help interpret how their activation influences PyrN activity during successful and failed trials.

      The reviewer is correct to assume that increasing excitability of Ma2 cells would suppress PC activity. As shown in Supplemental Figure 2I, that is exactly what we observe when considering only non-prehension related activity. Thus, it is very interesting that the opposite effect is seen for prehension-related activity. Also, this finding perfectly aligns with our results from the assembly analysis showing that assembly activity is decreased within the prehension window compared to outside the prehension window. Unfortunately, imaging Ma2 cells would only add information to this study in understanding their influence on PCs if we image both populations simultaneously, which require equipment and reagents we do not currently have. Fortunately, however, the endogenous activity patterns of Ma2 cells and the direct connectivity between Ma2 and pyramidal cells was already previously investigated in detail (10.1371/journal.pbio.2001392), therefore we expanded the discussion to better explain that the differential changes in PC when increasing Ma2 excitability could be due to increased PC synchronization, since a single Ma2 connects to several PCs, and upon inhibition release all connected PCs fire synchronously.

      (2d) Figure 1H - Also, in general, the Cre+ (red) data points appear consistently higher in activity than the Cre- (black) points. This is counterintuitive, as activating Chrna2 cells should enhance inhibition and thereby reduce PyrN activity. The authors should clarify how Cre+ animals exhibit higher overall PyrN activity under a manipulation expected to suppress it. This discrepancy raises concerns about the interpretation of the chemogenetic activation effects and the underlying circuit logic.

      As explained above, increasing Ma2 excitability indeed decreased non-prehension related PC activity, and the proposed mechanism has been added to the discussion section. We also made

      clearer in the results section that we are referring to prehension-related PC activity, and emphasize that overall non-prehension related PC activity is decreased.

      (3) The statistical comparisons throughout the manuscript are confusing. In many cases, the authors appear to perform multiple comparisons only among the N, L, T, and R conditions within the WT group. However, the central goal of this study should be to assess differences between the WT and hM3D groups. In fact, it is unclear why the authors only provide p-values for some comparisons but not for the majority of the groups.

      We agree that a clearer description of the statistical analysis is warranted. We expanded the statistical analysis methods section to clarify, among other things, that all possible pairwise comparisons were performed and appropriately corrected for multiple comparisons, and only positive p-values are reported in the figures, therefore the absence of p-value for a comparison means that is not significant.

      (4a) Figure 4 - It is hard to understand why the authors introduce LFP experiments here, and the results are difficult to interpret in isolation. The authors should consider combining LFP recordings with calcium imaging (as in Figure 1) or, alternatively, repeating calcium imaging throughout the entire re-training period. This would provide a clearer link between circuit activity and behavior and strengthen the conclusions regarding Chrna2 cell function during re-training.

      Unfortunately, it is not possible in our setup to record calcium imaging and LFP simultaneously, since the implants needed for the miniscope occupy the entire space above the animal’s cranium. To record calcium imaging during the execution of learned movements is also impractical. If the animals were to be implanted before the training phase, the signal will likely be too degraded for recordings after the training sessions, since the miniscope signal quality decreases over time, and over successive miniscope attachments. If the animals were to be implanted between the training and retraining phase (as the LFP group), the gap between training and retraining would be even larger, at least 28 days (as opposed to 16 days for the LFP group), which would affect the performance in the task. Therefore, LFP recordings provide understanding of the higher-level changes happening in neural activity when excitation is increased in Ma2 cells during the execution of learned movements. We respectfully disagree that the results from the LFP group cannot be interpreted in isolation, since we found that mice with increased excitability of Ma2 cells display increased low theta and gamma power during the prehension movement. As discussed in the manuscript, the increased high gamma band power when Ma2 cells are overexcitable, particularly for the successful trials in the planning phase, suggest that Ma2 cells may have a role influencing theta and gamma oscillations during motor performance (lines 1348-1355).

      (4b) It is unclear why CLZ has no apparent effect in session 11, yet induces a large performance increase in sessions 12 and 13. Even then, the performance in sessions 12 and 13 (30 successful pellets) is roughly comparable to Session 5 in Figure 1F. Given this, it is questionable whether the authors can conclude that Chrna2 cell activation truly facilitates previously acquired motor skills?

      We understand that a source of confusion for the behavioral data in the LFP group was the absence of data from sessions 1-7, together with the missing explanation about the task changing from spoon to plate (as explained in answers to question 1a and 1b). Since the animals are getting pellets from the spoon in session 5 (easier) and from the plate in later sessions (harder), the fact that animals achieved the same performance in the plate as they had on the last spoon session indicates they relearned the movement. To further clarify the training development, we added the full set of sessions (1-13) to Supplemental Figure 7, indicating the spoon-to-plate switch after session 5 and the 16-days gap between sessions 7 and 8 (due to viral injection and electrodes implant surgeries).

      (5) Figure 5 - The authors report decreased performance in the pasta-handling task (presumably representing a newly learned skill) but observe no difference in the pellet-reaching task (presumably an already acquired skill). This appears to contradict the authors’ main claim that Chrna2 cell activation facilitates previously acquired motor skills.

      We respectfully disagree that the results for the pasta-handling conflict with the finding that increasing Ma2 excitability facilitates previously acquired movements. The pasta handling specifically measures forepaw dexterity (as outlined in lines 442-444), therefore assessing forelimb function unrelated to learning. Mice perform a set of stereotyped movements to manipulate the pasta, therefore no learning is required (note that animals were habituated to the arena, followed by a single test session, with no training sessions). We do specifically mention in the results section that "we used the pasta handling task to assess forepaw dexterity that does not require learning" (lines 1137-1139). Our findings support our reported conclusion that "Ma2 cells may have a role in orchestrating precise forelimb movements that do not require previous specific training" (lines 1154-1156).

      (6) Supplementary Figure 1 - The c-Fos staining appears unusually clean. Previous studies have shown that even in home-cage mice, there are substantial numbers of c-Fos+ cells in M1 under basal conditions (PMID 31901303, 31901303). Additionally, the authors should present Chrna2 cell labeling and c-Fos staining in separate channels. As currently shown, it is difficult to determine whether the c-Fos+ cells are truly Chrna2+ cells.

      Our c-Fos stain does work well after having improved this method in several of our projects. Unfortunately, we could not check the references mentioned in the comment, since it points to a study that did not mention c-Fos (maybe incorrect PMID code?). However, we found our images to have similar c-Fos levels in control as other studies (for example 10.3389/fnana.2014.00013 Figure 1A and 10.1109/TBME.2024.3401136 Supplemental Figure 2C). Thus, we do find background activity of c-Fos in both Cre+ and control mice, but the c-Fos stain appears clean because of the strong up-regulation and fluorescent signal in exogenously activated hM3Dq+ cells. Also, we noticed that the manuscript was missing a methods section for the c-Fos experiments, therefore we added a section detailing the hM3Dq activation validation (lines 487-498). Further, the figure now displays separate channels for hM3Dq + cells (magenta) and c-Fos (cyan) for better clarity.

      (7) Overall, the authors selectively report statistical comparisons only for findings that support their claims, while most other potentially informative comparisons are omitted. Complete and transparent reporting is necessary for proper interpretation of the data.

      As explained above (comment 3), we expanded the statistical description in the methods to explain that all possible pairwise comparisons were performed and appropriately corrected for multiple comparisons, and that omitted comparisons are non-significant.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure legends - The authors should provide more detailed information in the figure legends, such as N values. It is also not explained what the bold bars, as well as the highest and lowest bars, represent. Clear labeling is essential for proper interpretation of the data.

      We revised all figure legends to add n-numbers for all quantification plots, and expanded the Statistical analysis methods section to explain the labeling of all quantifications.

      (2) Presentation of plots - The authors need to improve the clarity and completeness of their figure presentations. For example:

      (a) In Figure 1F, it is unclear whether the results were obtained under chemogenetic activation, as this information is missing from both the figure and the legend. Currently, it could be a comparison of Cre+ mice with Cre- mice without any manipulations.

      (b) In Figure 1H, p-values are reported, but it is not specified which groups are being compared. As mentioned above, why are p-values only given to some comparisons? Does that mean the others are not significant?

      (c) In Figure 1D, a scale bar should be provided.

      (d) In Figure 1E, the y-axis (fluorescence) scale should be clearly indicated.

      We thank the reviewer’s attention to the figure details. We added the missing scale bars for Figures 1D-E. We also clarified in the results section that all miniscope recordings were performed under clozapine treatment. As answered above (comments 3 and 7), we expanded the methods section to state that although all comparisons were made and appropriately corrected for multiple comparisons, only significant comparisons were reported. As for the groups being compared, every significance bar clearly connects two groups, which are the ones being compared. We also expanded the Statistical Analysis section to state that “Significance bars without ticks represent pairwise comparisons, while significance bars with downward ticks represent an effect.”.

      Reviewer #2 (Public review):

      The main limitation of the study lies in its small sample sizes and the absence of key control experiments, which substantially weaken the strength of the conclusions. Core findings of this paper, such as the lack of effect of Ma2 cell activation on motor learning, as well as the altered neuronal activity, rely on a sample size of n=3 mice per condition, which is likely underpowered to detect differences in behavior and contributes to the somewhat disconnected results on calcium activity, activity timing, and neuronal assembly activity.

      We understand that the source of confusion is the number of mice used for calcium imaging and the number of mice used for assessing the effect of Ma2 increased excitability in motor learning. The core finding that Ma2 increased excitability did not alter motor learning is supported by the data shown previously in Supplemental Figure 5 (now Figure 1F-H), with n=6 Cre+ and n=7 controls, which has enough statistical power to detect the effect of training session (F (3,33) = 9.254, power = 0.997) and should have enough power to detect the effect of group (estimated power of 0.835 for F(1,11)). The behavior performance of the miniscope-recorded mice was shown in the previous version for transparency, however no conclusion was drawn based on that data. To improve clarity, we now present data from the previous Supplemental Figure 5 as Figures 1F–H. This dataset clearly demonstrates that increased excitability of Ma2 cells did not affect motor learning. In addition, note that all quantification and conclusions drawn about neuronal activity are based on robust sample sizes: 1070 cells for controls and 403 for Chrna2-Cre+, or 70 assemblies for controls and 48 for Chrna2-Cre+. These sample sizes ensure sufficient statistical power, as demonstrated by the multiple significant effects and pairwise differences reported in our study. We reiterate that no underpowered tests were conducted in this study, and no conclusions were drawn on n = 3 controls and 3 Chrna2-Cre+ mice on behavioral outcomes.

      More comprehensive analyses and data presentation are also needed to substantiate the results. For example, examining calcium activity and behavioral performance on a trial-by-trial basis could clarify whether closely spaced reaching attempts influence baseline signals and skew interpretation.

      We agree and we performed a trial-by-trial analysis to verify the effect of adjacent prehensions in the trial signal. We found that only 17.7% of adjacent trials were affected by a previous trial. In addition we selected only trials not preceded by another trial for at least 6s, and evaluated whether activity immediately before the trial (-3 to -1s) is different from the activity long before the trial (-5 to -3s). The rationale is that if a trial would affect the baseline, then activity immediately before would be different from the activity long before the trial. In this analysis, we found no genotype- or session-related differences in baseline amplitude between epochs. Together these results confirm that prehension-related activity does not systematically alter non-prehension epochs. The results are shown in Supplemental Figure 3.

      The study uses cre-negative mice as controls for hM3Dq-mediated activation, which does not account for potential effects of Cre-dependent viral expression that occur only in Cre-positive mice. This important control would be necessary to substantiate the conclusion that it is increased Ma2 cell activity that drives the observed changes in behavior and cortical activity.

      Having a control group of Cre+ mice injected with cre-dependent vector control carrying, for example, only fluorescence, would add one more layer of certainty that the effects observed here are due to CLZ-induced hM3Dq activation. We do not agree, however, that it is necessary to confirm our findings. Cre-dependent expression alone was already extensively demonstrated to have no effect by comparing a DREADD activator to a vehicle treatment (for example 10.7554/eLife.38052, 10.1523/JNEUROSCI.0537-18.2018, 10.7554/eLife.67822). We also showed this for our LFP group (Figure 4), further confirming no effect of Cre-dependent hM3Dq expression alone.

      An unspecific effect of clozapine, where the treatment affects animals without the hM3Dq receptor, would be much more likely. We do control for this by giving the same treatment to Cre+ and Cre- mice. Moreover, since we use a low dose of clozapine, a lack of hM3Dq activation would be more likely, which we also controlled for with the c-Fos experiment as explained in the answer to the Minor point 1. Nevertheless, we added to the discussion that although we find it highly unlikely that the effects found here are due to Cre-dependent viral expression, we have not recorded Cre+ animals expressing control vectors instead of hM3Dq (lines 1360-1375).

      Reviewer #2 (Recommendations for the authors):

      Major points

      (1) One of the main findings in this paper is that Chrna2-Cre cell activation did not affect learning of the prehension task; however, the presented data do not convincingly support this claim. Looking at Fig.1F, Cre+ mice appear to have an overall lower number of successful prehensions compared to control mice. If this is not statistically significant, it is likely because n=3 mice for each group is underpowered. To better judge the behavior of these mice, it would be necessary to plot success rate and overall number of prehensions over the entire course of training, in addition to successes per minute. Given that n=3, plotting all individual data points would make more sense than showing a violin plot. Relatedly, in Supplemental Figure 5, there appears to be a clear effect on reduced success rates in Cre+ mice, which is stated in the figure legends, whereas the result section states: we found no effect of genotype on prehension success rates (lines 895-896). The authors should ensure that these behavior experiments are sufficiently powered to detect potential differences in learning between groups and present the complete data and statistical analysis.

      As explained on Comment 1, the finding that Ma2 increased excitability did not alter motor learning is not based on the data on the previous Figure 1F (n=3 Cre+ and n=3 controls, shown for transparency). Instead, it is supported by the data in the previous Supplemental Figure 5, now Figures 1F-H, with n=6 Cre+ and n=7 controls, for which we found only overall effects of training session, but no effect of genotype, with no significant post-hoc pairwise comparisons. We agree that plotting the success rate, total number of prehensions and successful prehensions per minute, for all 6 sessions, allows better evaluation of the mice behavior. We moved the Supplemental Figure 5 into Figure 1, plotting the three measures for the full set of sessions, with individual data points within the violin plots, and expanded the statistical results description on the main text. We reiterate that no underpowered tests were conducted in this study, and no conclusions were drawn on n = 3 controls and 3 Chrna2-Cre+ mice.

      (2) The authors mention that a significant fraction of prehension trials overlapped with a preceding prehension attempt. Were those attempts excluded from the analysis? The stark differences in calcium signals at baseline before prehension onset in some sessions (Figure 1G, Supplementary Figure 2D) suggest that trials preceding closely in time might play a role and could skew the analysis and interpretation.

      Overlapping trials were not excluded from the previous analysis. As summarized in our response to Comment 2, and expanded in the results section (lines 876-894), we found that only 17.7% of adjacent trials were affected by a previous trial, and that when selecting only trials not preceded by another trial for at least 6s, we found no effect of prehension-related activity in the baseline preceding the trials.

      (3) Relatedly, to test the differences in calcium activity before and after prehension onset, it would be clearer to use a delta F/F measure where the 1 second before onset is used as baseline.

      Since a large proportion of neurons are more active before the onset (on the movement planning phase, Figure 2C), the activity 1s before the movement onset cannot be considered as F0. Dividing the activity during the movement by the activity during the planning phase would generate a different measure, a form of execution/planning ratio. We performed this analysis as an additional measure and found a three-way interaction effect of genotype, session, and prehension accuracy, driven by genotype effects on early sessions, indicating that Ma2 activity might be involved in the planning/execution activity balance. Those results are now described in the results section and shown at the Supplemental Figure 4.

      (4) For the experiments in which mice were trained prior to Ma2 cell activation (Fig.4), the behavior in sessions 8-10 does not seem to have reached a plateau yet, and the increase in successful prehensions in sessions 11-13 of Cre+ mice could just be a continuation of training. It would be more convincing to show the original training curve of those mice in sessions 1-7. Additionally, the authors should perform a two-way ANOVA test for the interaction of drug and genotype, rather than two separate one-way ANOVAs.

      We agree, and we now show the curve for sessions 1-7 in Supplemental Figure 7, showing that the success ratio for sessions 8-10 is similar to session 7. Also, a 2-way ANOVA was already performed, although the full report was missing from the manuscript. We switched from successful prehensions per minute to success ratio (see Reviewer #1 comment 1a) and now include the full report, in which we found an overall effect of session, and when grouping by genotype, we found an effect for Cre+ but not control mice (lines 1065-1072).

      Minor points

      (1) The validation experiment for the efficacy of hM3Dq is somewhat confusing. It is surprising that the few hM3Dq-mCherry expressing cells in the cre-negative mice did not show increased c-Fos staining since non-specific leaky hM3Dq expression would presumably still lead to a functional DREADD. The better control for validating the efficacy of hM3Dq-mediated Chrna2-Cre cell activation would be to show c-Fos staining in Cre+ mice with or without clozapine injection. This would control for non-specific c-Fos expression and neuronal activation purely by expression of the DREADD. In cre-negative control mice, the comparison should also be between mice with and without clozapine injection to control for non-specific neuronal activation regardless of hM3Dq expression.

      We thank the reviewer for raising this point and agree that validation of hM3Dq efficacy and specificity requires careful interpretation. In principle, any hM3Dq-expressing cell, including the few hM3Dq-mCherry+ cells observed in Cre– mice, could respond to clozapine. However, in practice, effective DREADD activation depends on sufficient receptor expression levels and on the pharmacodynamics of clozapine in the brain (Gomez et al., 2017, Science, 10.1126/science.aan2475). In our dataset, even in Chrna2-Cre+ mice, only ~76% of hM3Dq+ cells showed c-Fos induction after clozapine, indicating that receptor expression and/or ligand access is not uniform across cells. Consistent with this, the very sparse and weak hM3Dq expression observed in Cre- mice resulted in only 0.8% of hM3Dq+ cells showing c-Fos induction, which is in line with previous reports demonstrating that low-level “leaky” expression is insufficient to drive neuronal activation (e.g. 10.1038/s41467-019-12236-z; 10.1523/JNEUROSCI.0537-18.2018; 10.1523/ENEURO.0363-21.2021).

      The reviewer also suggests that an ideal validation would compare Cre+ mice with and without clozapine to control for any c-Fos induction driven purely by DREADD expression. We agree that such a comparison is informative, and note that in our experiments the c-Fos assay was designed specifically to test whether the low clozapine dose used (0.01 mg/kg) is sufficient to activate hM3Dq in Ma2 cells, rather than to assay baseline effects of viral expression.

      Importantly, non-specific effects of clozapine itself were controlled for throughout the study by administering the same clozapine dose to both Chrna2-Cre+ and Cre– mice in all behavioral and physiological experiments. Thus, any clozapine-driven neuronal activation independent of hM3Dq would be expected to appear in both groups.

      Together, these results indicate that (i) the clozapine dose used is sufficient to robustly activate hM3Dq-expressing Ma2 cells, (ii) sparse leaky expression in Cre– mice is not sufficient to drive measurable activation, and (iii) the effects reported in the manuscript are unlikely to be explained by non-specific clozapine actions or by viral expression alone.

      (2) The authors state in the methods section that "only neurons that displayed a significant change comparing the before onset and after onset phases" were included in the analysis. This appears to bias the data towards neurons that change their activity with the prehension movement. If this is the intention, the authors should clearly state this and their rationale in the results section and show what proportion of recorded neurons fall into this category.

      Yes, thanks for pointing this out, the explanation for this exclusion criteria is missing. We expanded the methods section “Neural activity around prehensions” to explain that since we are evaluating the role of Ma2 cells in the prehension-related activity of pyramidal cells, we excluded neurons with no prehension-related activity. We also stated in the expanded text that 15.97% of recorded neurons were excluded due to no prehension-related activity.

      (3) I don’t understand the peak PC activity latency shown in Figure 2D. How is it possible that there are negative peak latencies during the prehension phase, which is defined as >0sec, (upper right panel), and positive peak latencies in the before prehension phase, which is defined as <0sec, (lower right panel)?

      As stated in lines 939-941 and in the figure 2C legend, neurons were sorted into "before prehension" or "during prehension" neurons according to their activity during the successful prehension. One of our main findings is that the pyramidal cells temporal patterns were strongly affected by prehension accuracy (lines 941-944) meaning that a significant number of neurons shifted prehension phases when performing a failed prehension (as illustrated in Figure 2C, note how the temporal pattern is not kept from successful to failed prehensions). That is why, for failed prehensions, there are negative latencies for neurons that were classified as "during prehension" and positive latencies for neurons classified as "before prehension" in successful trials. We expanded the sorting explanation in the results section (lines 944-950) to better highlight the latency change between different prehension accuracies.

      (4) Please specify how baseline subtraction (detrending) was performed for the calcium image analysis.

      We expanded the methods section “Neural signal extraction” to better explain that we will now refer to F0 as the background component (and refer to our resulting traces as dF/F) and explain how it was determined (lines 614-619).

      (5) The authors state that they found a "dissociation between changes in neural activity and performance outcomes". Since they only analyzed motor performance by quantifying successful prehensions, this statement should be caveated with the notion that other aspects of the behavior (e.g., trajectories/speed) could be affected but were not measured.

      We agree, and expanded the discussion section to acknowledge that we focussed the behavioral aspects to success ratio, and that other measures not investigated could also be affected (lines ????-????).

      (6) Are the differences in theta and gamma power specific to the prehension trials, or does Ma2 cell activation generally increase LFP activity in those bands?

      We thank the reviewer for the question, as we had not analyzed general LFP activity in the previous version. We performed the same analysis now including only LFP from epochs outside prehension windows across the full sessions. We found that Mα2 cell activation actually reduces LFP power across all bands specifically in Session 13 when no prehension is being performed. These findings are now included as Supplemental Figure 7.

      (7) Please define terms that might not be familiar to a typical reader in the field, such as "assemblies", when first introducing them in the text.

      We revised the introduction where we now define assemblies (lines 85-88).

      (8) Please specify the n-numbers for each figure throughout the manuscript. For example, in some figures, the number of trials or the number of neurons is used; however, it is not clear what this number is.

      We agree that although the n-numbers are stated in the text, it would be clearer to add them also to the figure legends. All figure legends now contain n-numbers for panels showing quantifications.

      (9) Relatedly, while the inclusion of supplemental tables with expanded statistical results is commendable, several statistical test details are missing, such as for Figure 5.

      We have fully revised the text to add any missing statistical details for the statements in the Supplemental Tables.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Nio and colleagues address an important question about how the cerebellum and ventral tegmental area (VTA) contribute to the extinction learning of conditioned fear associations. This work tackles a critical gap in the existing literature and provides new insights into this question in humans through the use of high-field neuroimaging with robust methodology. The presented results are novel and will broadly interest both the extinction learning and cerebellar research communities. As such, this is a very timely and impactful manuscript. However, there are several points that could be addressed during the review process to strengthen the claims and enhance their value for readers and the broader scientific community.

      (1) Reward Interpretation and Skin Conductance Responses (SCR)

      A central premise of the manuscript is that 'unexpected omissions of expected aversive events' are rewarding, which plays a critical role in extinction learning. The authors also suggest that the cerebellum is involved in reward processing. However, it is unclear how this conclusion can be directly drawn from their task, which does not explicitly model 'reward.' Instead, the interpretation relies on SCR, which seems more indicative of association or prediction rather than reward per se. Is SCR a valid metric of reward experienced during the extinction of feared associations? Or could these findings reflect processes tied more closely to predictive learning? Please, discuss.

      We thank the reviewer for raising this important point. We agree that skin conductance responses (SCRs) do not directly index reward. More generally, SCRs reflect autonomic arousal in response to salient or motivationally significant stimuli and are closely linked to expectancy and contingency awareness. In our study, SCRs served as a read-out of the participants’ expectation of a US, and were used to fit the hyperparameters of a reinforcement-learning-based deep learning model, which then provided per-trial estimates of prediction and prediction error values. These estimates capture predictive learning about the occurrence of the aversive US, rather than reward per se. The interpretation of unexpected US omissions as “reward-like” prediction errors relies on prior literature, particularly rodent studies showing that dopaminergic neurons in the VTA respond to omitted aversive stimuli and drive extinction learning via projections to the nucleus accumbens (Kalisch et al., 2019; Salinas-Hernández et al., 2018, 2023). We therefore interpret our cerebellar activations during unexpected omissions as being compatible with the processing of reward-like prediction errors, while acknowledging that this inference is indirect.

      To clarify this reasoning, we made revisions to the Introduction and Discussion to (i) state explicitly that SCRs do not directly measure reward but were incorporated into the reinforcement learning model as an index of autonomic arousal related to US expectancy and predictive learning, and (ii) consistently replace the term “reward prediction error” with “reward-like prediction error” throughout.

      (2) Reinforcement Agent and SCR Modeling

      The modeling approach with the deep reinforcement agent treats SCR as a personalized expectation of shock for a given trial. However, this interpretation seems misaligned with participants' actual experience - they are aware of the shock but exhibit evolving responses to it over time. Why is this operationalization useful or valid? It would benefit the manuscript to provide a clearer justification for this approach.

      This point is well taken. We did not collect trial-by-trial expectancy ratings, as frequent button-box responses would have induced cerebellar activations unrelated to fear (extinction) learning. Subjective expectancy was assessed only at the end of each experimental phase. As frequently done in the human fear conditioning literature, we used trial-by-trial SCR data (Lonsdorf et al., 2017). Although SCRs show correspondence with US expectancy ratings, they are inherently noisy and show substantial variability across trials and participants (Constantinou et al., 2021). Therefore, individual trial-by-trial responses cannot be used to directly infer US predictions. Accordingly, we used group-averaged SCR data to fit model hyperparameters in a grid search across parameter settings. The best-fitting hyperparameters were then applied to 100 randomly initialized agents, and their outputs were averaged to generate trial-wise estimates of predictions and prediction errors. These averaged values were used as parametric modulators in the fMRI analyses. We have revised the Introduction and Methods to make this procedure clearer.

      (3) Clarity and Visualization of Results

      The results section is challenging to follow, and the visualization and quantification of findings could be significantly improved. Terms like 'trending' appear frequently - what does this mean, and is it worth reporting? Adding clear statistical quantifications alongside additional visualizations (e.g., bar or violin plots of group means within specific subregions within the cerebellum, or grouped mean activity in VTA and DCN) would enhance clarity and allow readers to better assess the distribution and systematicity of effects. Furthermore, the figures are overly complex and difficult to read due to the heavy use of abbreviations. Consider splitting figures by either phase of the experiment or regions, and move some details to the supplemental material for improved readability.

      We agree with the reviewer that the clarity of results can be improved and have revised the manuscript accordingly. Specifically:

      (1) We use “trend-level” to refer to uncorrected voxelwise t-maps at p < 0.05, and “significant” to refer to TFCE/FWE-corrected effects at p < 0.05. This distinction was not sufficiently clear in the original figures. To address this, uncorrected t-maps are now displayed with a grey striped background frame, and colorbar labels have been enlarged to emphasize whether TFCE/FWE-corrected or uncorrected t-values are shown.

      (2) We added a supplementary table (Table S7) reporting group-level summary statistics for all fMRI contrasts presented in the manuscript, including group means, standard deviations, effect sizes (Cohen’s d), and 95% confidence intervals for cerebellar cortex, cerebellar nuclei, and VTA VOIs. We hope that this helps with the interpretation of effect magnitude and variability across fMRI analyses.

      (3) To improve readability, we split overly complex figures: Figure 2 now separates CS-related prediction from US-related presentation contrasts (which are now revised Figures 4 and 5), and Figure 3 separates event-based and parametric modulation contrasts (which are now revised Figures 6 and 7).

      (4) We also reduced abbreviations in the figures, and provide full definitions and explanations also including the original abbreviations in the main text and figure captions for clarity.

      We considered the suggestion to split figures further by region or by phase. However, we believe it is more informative to present the cerebellar cortex, nuclei, and VTA together for each contrast, and to keep all phases side by side, as this allows readers to directly assess commonalities across phases. We therefore chose to keep the same overall structure, but simplified the figures in other ways (e.g. splitting by contrast type) to improve overall readability. We hope that these changes address the reviewer’s concerns by simplifying the presentation, removing abbreviations, and providing clearer quantification of results.

      (4) Theoretical Context for Paradigm Phases

      The manuscript benefits from the comprehensive experimental paradigm, which includes multiple phases (acquisition, extinction, recall, reacquisition, re-extinction). This design has great potential for providing a more holistic view of conditioned fear learning and extinction. However, the manuscript lacks clarity on what insights can be drawn from these distinct phases. What theoretical framework underpins the different stages, and how should the results be interpreted in this context? At present, the findings seem like a display of similar patterns across phases without sufficient interpretation. Providing a stronger theoretical rationale and reorganizing the results by experimental phase could significantly improve readability and impact.

      We thank the reviewer for this constructive suggestion. We would first like to mention that the primary aim of this manuscript is not to analyze differences between phases, but rather to highlight the commonalities. Across different learning contexts, we consistently observed reward-like prediction error-related activations in the cerebellum and VTA. This consistency and connectivity between the cerebellum and VTA, despite phase-to-phase differences, is the most important finding of our study.

      We agree, however, that the manuscript did not sufficiently explain how each phase differs conceptually, which is important for readers to understand why the consistency of responses is notable. We therefore expanded the Introduction and Discussion to provide clearer theoretical context for each phase. More specifically, the phases can be understood as follows:

      Extinction (day 2): Because acquisition was conducted with a 100% reinforcement rate, unexpected US omissions during initial extinction trials maximize reward-like prediction errors and yield stronger, more uniform expectations across participants compared to a partial reinforcement rate. This phase should therefore provide the clearest opportunity to observe cerebellar-VTA contributions to the processing of reward-like prediction errors.

      Recall (day 3): Despite allowing for the consolidation of extinction learning, the recall test often still elicits conditioned fear responses to the CS+, that is, shows spontaneous recovery of the initial fear association (Bouton, 2002). In these trials, the non-occurrence of the US is unexpected. In this context, US omission-related activations reflect reward-like prediction errors during renewed fear responding in the presence of both a fear memory and an extinction memory. This contrasts with extinction training on day 2, where prediction errors arose primarily against the background of the recently acquired fear memory, without a competing extinction memory.

      Reacquisition (day 3): Unlike acquisition, reacquisition used a partial reinforcement rate, such that non-reinforced CS+ trials were interspersed between reinforced CS+ trials (similar to the partially reinforced phase used by Ernst et al., 2019). Because reacquisition occurs in the presence of savings, that is, the presence of a previously acquired fear memory, US expectancy increases rapidly following reinforced trials and relearning occurs faster (Bouton, 2004). Importantly, partial reinforcement maintains high US expectancy and therefore allows prediction errors to remain sustained across omission trials (Figure 9).

      Reextinction (day 3): Reextinction is an additional extinction phase but without a consolidation interval, and with an already established fear extinction memory. Because reextinction followed the partially reinforced reacquisition phase, prediction errors during early reextinction decayed more slowly than during extinction on day 2 (following the fully reinforced acquisition phase on day 1) (Figure 9). Together, reacquisition and reextinction were designed to maximize the number and persistence of unexpected US omissions, thereby providing additional opportunities to examine reward-like prediction-error signaling.

      By clarifying this framework, we aim to show that while the learning context and history differ across phases, the consistent cerebellum-VTA activation and connectivity related to unexpected US omissions underlines the robustness of the effect. We chose not to reorganize the Results by phase, as our central conclusion rests on similarities rather than differences. Instead, we have clarified the theoretical background in the revised manuscript to help readers interpret both the commonalities and the potential sources of variability.

      (5) Cerebellum-VTA Connectivity Analysis

      The authors argue that the cerebellum modulates VTA activity, yet they perform the PPI analysis in the reverse direction. Why does this make sense? In their DCM analysis, they found a bidirectional relationship (both cerebellum - VTA and VTA-cerebellum), yet the discussion focused on connectivity from the cerebellum to VTA. A more careful interpretation of the connectivity findings would be useful - especially the strong claims in the discussion on the cerebellum providing the reward signal to the VTA should be tempered.

      We thank the reviewer for highlighting this issue. In our primary analysis, we used the VTA as the PPI seed and observed trend-level connectivity with the cerebellum. When we reversed the analysis and used the cerebellar volume of interest (VOI) from the conjunction analysis as the seed, effects in the VTA were substantially weaker. We believe this reflects the broad connectivity profile of the cerebellar VOI (i.e., not specific to the VTA) as well as general limitations of PPI in our study, including the small number of unexpected omission trials and the lack of specificity to reward-like prediction errors (e.g., connectivity also appeared during US presentation). For transparency, we now report the cerebellar-seed PPI results in the Supplementary information (Figure S3). Given their limited robustness, we chose not to include the corresponding VTA maps in the main figures.

      Finally, we agree that our conclusions regarding cerebellum-VTA interactions should be framed more cautiously. While the DCM analyses support bidirectional connectivity, our original discussion placed disproportionate emphasis on cerebellum-to-VTA influences. We have revised the text to provide a more balanced interpretation that also considers VTA-to-cerebellum connectivity.

      Reviewer #2 (Public review):

      Summary

      Building upon the group's previous work, this study used a 3-day threat acquisition, extinction, recall, reextinction, and reacquisition paradigm with 7T imaging to probe the mechanism by which the cerebellum contributes to fear extinction learning. The authors hypothesize this may be via its connection to the VTA, a known modulator of fear extinction due to its role in reward processing. Using complementary analysis methods, the authors demonstrate that activity with the cerebellum, DNC, and VTA is modulated by predictions about the occurrence of the US, which shows regional specificity. They show trend-level evidence that there is increased functional connectivity between the cerebellum and VTA during all phases of the paradigm with unexpected omissions. They also present a DCM which indicates that the cerebellum could positively modulate VTA activity during extinction learning. This study adds to a growing literature supporting the role of the historically overlooked cerebellum in the control of emotions and suggests that an interaction between the cerebellum and VTA should be considered in the existing model of the fear extinction network.

      Strengths

      The authors address their research question using a number of complementary methods, including parametric modulation by model-derived expectation parameters, PPI, and DCM, in a logical and easily understood way. I feel the authors provide a balanced interpretation of their findings, presenting numerous interpretations and offering insight with regard to reward vs attention or unsigned prediction errors and the directionality of the interaction they identify. The manuscript is a timely addition to growing literature highlighting the role of the cerebellum in fear conditioning, and emotion generation and regulation more generally.

      Weaknesses

      Subjective and skin conductance responses do not completely support the success of the learning paradigm. For example, CS+/CS- differentiation in both domains persisted after extinction training. I do not feel that this negates the findings of this manuscript, though it raises questions about the parametric modulators used, and the interpretation of the neural mechanisms proposed if they do not strongly relate to updated subjective appraisals (the goal of extinction therapy). My interpretation of the manuscript suggests there are some key results based upon contrasts that have as few as three events; I am a little unsure about the power and reliability of these effects, though I await author clarification on this matter. There are a number of unaddressed deviations from the pre-registered protocol that I have asked the authors to elaborate upon.

      We thank the reviewer for the thoughtful and constructive evaluation of our work. We appreciate that the manuscript and methods were found to be clearly presented, and we welcome the suggestions for clarification and improvement. Below we address the specific concerns regarding extinction learning in behavioral measures, the reliability of event-based contrasts with few trials, and deviations from the preregistration.

      Extinction in self-reports and skin conductance responses (SCRs)

      The reviewer is correct that CS+/CS- differentiation persisted after extinction. Although there was no differentiation in SCRs at the end of extinction, post-extinction self-reports continued to do so, albeit to a lesser degree, which is in line with previous literature on dissociation of outcome measures during fear conditioning (Lipp et al., 2003). This residual subjective differentiation is also consistent with extinction forming an inhibitory memory trace that suppresses, rather than erases, the original fear association (Bouton, 2002; Milad & Quirk, 2012), and a single extinction session is often insufficient to eliminate differential responding (Craske et al., 2014; Vervliet et al., 2013). However, both measures showed significant effects of extinction learning.

      We included additional analyses of self-reports across phases. Importantly, CS+ ratings were significantly reduced during extinction and recall compared to acquisition (all p ≤ 0.001), whereas CS- ratings remained unchanged (all p > 0.532). This pattern demonstrates that the magnitude of the CS+/CS- difference was significantly reduced relative to acquisition, indicating that extinction learning did occur (Doubliez et al., 2025).

      For physiological responses, extinction learning was shown in PSRs but not conclusively in SCRs. PSRs showed a significant reduction of CS+ responses across extinction, while CS- responses remained unchanged. SCRs showed a reduction of CS+/CS- differentiation across extinction; however, this effect remained at trend level, as the Stimulus x Time interaction did not reach significance (p = 0.053). This pattern is consistent with early differentiation followed by rapid attenuation under the full reinforcement structure of the paradigm (100% reinforcement during acquisition and 0% during extinction). Under such conditions, participants rapidly learn that the US is no longer delivered during extinction, such that physiological responses are largely confined to the first few trials, leaving limited power to detect extinction effects in noisier measures such as SCRs. To address the lower robustness of SCR effects, as recommended by the reviewer, we therefore included PSRs in the main Results section, which provide converging physiological evidence for extinction learning.

      Of note, on day 3, both physiological measures and self-reports again showed CS+/CS- differentiation, consistent with spontaneous recovery, a well-established phenomenon reflecting the persistence of the original fear trace after consolidation (Bouton, 2002; Vervliet et al., 2013).

      Taken together, these findings demonstrate that the paradigm successfully induced both acquisition and extinction of conditioned fear, even though residual fear responses persisted.

      Reliability of event-based contrasts with three trials

      The initial decision to use three events for event-based contrasts was based on SCR and PSR data, which showed that differentiation between CS+ and CS- occurred almost exclusively in the first few trials of extinction and recall. Consistent with the full reinforcement described above, prediction errors were expected to be high in the very first extinction trials, and to decay rapidly. Thus, the usual half-block division (e.g., first eight trials) would have included many trials without meaningful prediction errors.

      We acknowledge that contrasts based on three trials provide limited statistical power. To address this concern, we added a supplementary table showing summary statistics for contrast estimates in the cerebellar cortex, cerebellar nuclei, and VTA VOIs across all fMRI analyses (Table S7), including both the event-based and parametric modulation approaches. Importantly, the event-based contrasts showed moderate to strong effects despite being restricted to the first three unexpected omission trials. Moreover, the parametric modulation analyses, which incorporate all available trials, yielded results that were consistent with the three-trial event-based contrasts and with the patterns shown in the main figures. This convergence between event-based and parametric approaches strengthens our confidence that the observed effects are reliable.

      Deviations from preregistration

      We acknowledge that deviations from the preregistered protocol were not fully documented and have now added this information. The main deviation concerned our event-based analyses: while the preregistration planned early vs. late block comparisons, in practice the rapid decay of SCRs under our 100% and 0% reinforcement rates rendered later trials uninformative for prediction error analyses. We therefore focused on the first three trials, when prediction errors are expected to be present. These behavioral findings are also consistent with Doubliez et al. (2025), who used the same paradigm and observed similar rapid SCR decay. Other deviations, such as not reporting exploratory whole-brain DCM analyses, are now clearly stated for transparency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor Point - Paradigm Details

      Providing additional details about the experimental paradigm in the main text (e.g., the nature of the visual stimuli associated with shocks) would enhance the manuscript's clarity. Some of the information currently in supplementary Figure 5 could be incorporated into the main text to enhance the understanding of the paradigm

      We agree that the current structure reduces clarity, as the paradigm is only explained in detail after the results. To improve readability, we have moved parts of Figure 5 (illustrating the paradigm and scanner setup) to the beginning of the manuscript (now revised Figure 1). In addition, information from Figure 5, including details of the visual stimuli, is now added to the Introduction.

      Reviewer #2 (Recommendations for the authors):

      Methods

      Can the authors please clarify what part of the task went into [US post CS+ > no US post CS-] contrast? Is this the time immediately after the CS presentations, when the US has just occurred/not occurred, or rather more like the CS+>CS- contrast except including trials confounded by the US (i.e. [CS+/US > CS -])?

      The contrasts are based on an event-related separation of CS and US. The CS was presented for 6 seconds, with its onset modeled in the GLM as a zero-duration event (delta function). The CS offset coincided with either the delivery or omission of the US, which was likewise modeled as a zero-duration event. Thus, CS onset and offset were modeled separately. The no-US events were further distinguished by whether they followed a CS+ or a CS-. Accordingly, we analyzed both CS and US-related contrasts; for example, the CS+ > CS- contrast reflects CS-related differentiation at CS onset (0 s), whereas [US post CS+ > no US post CS-] reflects (no-)US-related activity at CS offset (6 s; US delivered from 5.9-6.0 s). We have added further clarification to the Methods section.

      I was a bit unclear on what this sentence of the methods meant "Notably, all single trials comprised CS+ trials, with CS- trials also being modeled as single trials to facilitate paired analysis", does this mean that some contrasts had 6 events in total - e.g. the first 3 unexpected omissions vs 3 x CS-. If so, which CS- were selected for the comparison?

      We agree that this sentence was unclear and have revised it. Our intention was to describe that when CS+ trials were modeled as single trials in the GLM (e.g., each CS+ onset and its associated [no-]US event modeled as separate regressors), the CS- trials were modeled in the same way. This ensured that paired analyses would be possible if required.

      For reacquisition and reextinction, single-trial modeling was necessary, as the last unexpected omission of reacquisition is also the first unexpected omission of reextinction. Modeling trials separately allows us to examine the first three unexpected US omissions in each phase independently.

      The event-based contrasts for unexpected US omissions were defined in line with a previous study of our group. For example, during extinction we contrasted the first three unexpected US omissions following CS+ with all expected omissions following CS- (i.e. [first 3 no US post CS+ > no US post CS-], corresponding to 3 vs. 16 events). The weights of events were automatically scaled by SPM12 so that both sides of the contrast carried equal total weight (e.g. positive events weighted 1/3, negative events weighted -1/16). This procedure matches the approach in Ernst et al. (2019), where in partially reinforced acquisition 6 unexpected omissions after CS+ were contrasted with 16 expected omissions after CS-.

      More generally, can the authors please comment on the power and reliability of analyses that include only 3 events in a condition [e.g. the first 3 unexpected omissions]?

      It is not clear if the (US post CS+ > no US post CS-) phases were included. In your pre-registration you say "we will use a "no US post CS+ > no US post CS-" fMRI contrast, where "no US post CS+" designates unexpected omission events in early extinction, early recall (depending on behavioral data which might indicate a return of fear) and a volatile phase (where unexpected omissions occur in the first part of the volatile phase, i.e. reacquisition).", but my reading of the manuscript was that it included both early and late "see 1st level analysis = US post CS+, no US post CS+, no US post CS- separately for each phase; 2nd level = contrast included unexpected omission of the US (no US post CS+ > no US post CS-)". Please clarify and if necessary explain the deviation from preregistration.

      We agree that this point requires clarification. In the preregistration, we planned to divide phases into early and late blocks (no US post CS+ > no US post CS-). However, as already outlined in our response (Reviewer 2, public review response: Reliability of event-based contrasts with three trials), both our preliminary behavioral data and subsequent modeling analyses indicated that differentiation between CS+ and CS- declined extremely rapidly under the 100% reinforcement schedule, leaving likely little or no prediction error beyond the first few trials. Based on this, we adapted the event-based analyses to focus on the first three unexpected omission trials in extinction, recall, and reextinction, where prediction errors are expected to be present. In reacquisition, only three omission events occur by design (83% reinforcement), so this naturally constrained the analysis to three trials. We now explicitly describe this deviation from the preregistration in the revised manuscript.

      As outlined in the same response, we recognize that contrasts based on three trials provide limited statistical power, and addressed this point by providing additional summary VOI statistics of contrast estimates for both event-based and parametric modulation contrasts, which show moderate-to-strong effect sizes and convergence across methods, which we argue supports that using the first three trials is a reliable approach (Reviewer 1, public review response, point (3) Clarity and Visualization of Results).

      Finally, with regard to the reviewer’s specific question: yes, US post CS+ > no US post CS- contrasts were examined for acquisition training, primarily to demonstrate US-related activation (see revised Figure 3).

      Results

      Page 5 + 6: Including the interaction effects for pupil size responses during extinction and reextinction in the SCR section seems unjustified. I appreciate that the SCR data does not significantly support the key claim that extinction learning towards the CS+ occurred, but I do not feel it is acceptable to draw from the other measure for this effect alone. If the PSR measure is of primary/significant importance to support the validity of your paradigm, please consider adding all of these results to the main manuscript.

      We agree with this point and have moved the PSR analysis to the main manuscript. In addition, the SCR Results section no longer includes the PSR analyses, and clearly states the absence of a significant Stimulus x Time interaction effect in extinction (p = 0.053). For completeness, we additionally report trend-level post hoc tests showing CS+/CS- differentiation during early extinction but not during late extinction, consistent with an initial differentiation that attenuates across extinction training.

      Subjective and (some) skin conductance responses do not completely support the success of the learning paradigm. For example, CS+/CS- differentiation in both subjective domains and SCRs persisted after extinction training. Can the authors comment on how this might influence the interpretation of their results more generally? What does it mean if these expectations do not appropriately translate to updated subjective appraisals in your participants, contrary to the model from which the parametric modulators were derived would predict?

      The persistence of CS+/CS- differentiation in self-reports after extinction, and the return of CS+/CS- differentiation in both self-reports and physiological measures during the recall test, is not unexpected. For self-reports administered after extinction, such persistent CS+/CS- differences are commonly observed in the human fear extinction literature (Hermans et al., 2006; see also Lipp et al., 2003), and may reflect that initial extinction learning establishes a new inhibitory association that suppresses, but does not erase, the original fear memory (Bouton, 2002). At recall on day 3, the remaining differentiation in both self-reports and physiological responses is consistent with spontaneous recovery, a well-documented phenomenon in extinction research (Bouton, 2002). As noted earlier (Reviewer 2, public review response: Extinction in self-reports and skin conductance responses (SCRs)), additional analyses showed that ratings were significantly reduced after extinction and recall compared to acquisition. Thus, while residual differentiation in self-reports remained after extinction and recall, its magnitude was diminished, indicating that extinction learning occurred but was incomplete. This pattern is consistent with partial updating of subjective appraisals in accordance with the reinforcement-learning model used to derive the parametric modulators, rather than a failure of updating.

      Figures

      Figure 1: Please ensure that the summary of your results in the figure legend is consistent with the quantitative results reported. Example 1: "On day 2, there was a loss of differentiation during extinction training.", however, a significant effect of the stimulus, and time remained (but no interaction). Please tone down this interpretation, or make it clearer how the difference in the initial extinction trials was quantified. If the ANOVA-type analysis was only performed in the first half, this was not clear. Example 2: "During initial reacquisition, there were again differential responses to the CS+ and CS-, which decreased in reextinction and the unexpected US phase". I appreciate that you refer to the difference decreasing, rather than disappearing altogether, but the magnitude of this difference is not reported in the manuscript, and there does remain a significant difference in the amplitude.

      We thank the reviewer for this helpful feedback. We have revised the figure legends to tone down overly strong statements and ensure that all descriptions are in correspondence with the quantitative results. For clarity, we have also added significance markers for (trend-level) post hoc comparisons (CS+/CS- differentiation within early and late blocks for each phase) to revised Figures 2 and 3 displaying SCRs and PSRs.

      Figure 2, 3, 4: I found it quite confusing to have uncorrected and corrected results displayed in the same way in the same figure. E.g. Figure 2A which, as far as I can tell shows trend-level results for the cerebellum, and corrected results for the VTA. For Figures 2 and 3 it was also not immediately clear which colour bar related to which map. Figure 4A appeared to be missing colour bars. I suggest the authors consider (as much as possible) standardising the colour bar scales, such that the maps across figures/sub-plots are more directly comparable, and differentiate more clearly between corrected and uncorrected results. The 3D renders in Figures 2 and 3 are a little hard to see - would it be possible to make it not so transparent?

      We use “trend-level” to refer to uncorrected voxelwise t-maps at p < 0.05, and “significant” to refer to TFCE/FWE-corrected effects at p < 0.05. This distinction was not sufficiently clear in the original figures. In the revised figures, uncorrected t-maps are displayed with a grey striped background frame. Colorbar scales were not standardized, as different panels display different statistical quantities (TFCE values versus t-values), and scaling was chosen to visualize variation within each contrast rather than enforce comparability across panels, which would have reduced interpretability. In addition, the missing colorbar in Figure 8A (formerly Figure 4A) has now been added; it matches the colorbar shown in Figure 8B. See also Reviewer 1, public review response, point (3) Clarity and Visualization of Results.

      Is it possible to annotate significant effects on Figure 1 and Supplement Figure 1? The use of square markers makes it quite hard to tell the value of each point, which, given the small scale of the y-axis is quite important for interpretation. Could the authors consider remaking these plots with smaller dots?

      We have added post hoc significance markers to Figures 2 and 3 displaying SCRs and PSRs to facilitate interpretation. These markers reflect post hoc comparisons of CS+/CS- differentiation within early and late blocks. In cases where the Stimulus x Time interaction was not significant, the corresponding post hoc markers are still shown but are indicated in red to denote their trend-level status. In addition, the plots have been remade with smaller dots to make individual values clearer.

      Discussion

      The authors state "Because aversive stimulus presentation results in pronounced cerebellar activations, we were unable to separate cerebellar activation related to the unexpected (initial acquisition trials) and the expected (late acquisition trials) presentation of the US." Could the authors compare between early[CS+>CS-] and late[CS+>CS-] acquisition (which I believe were created in the event-based analysis but results not reported), or between the first 3[CS+ with US>CS-] and later [CS+ with US>CS-] to assess this?

      In our terminology, the suggested comparisons (early vs. late [CS+ > CS-] or first three vs. last three [CS+ > CS-]) reflect changes in US prediction rather than prediction error. The statement in the Discussion refers specifically to cerebellar activation during US presentation, where distinguishing between expected and unexpected presentations is complicated by the strong cerebellar activation elicited by the electrical US itself. Moreover, when comparing early “unexpected” US presentations with later “expected” ones, the relatively higher activity in early trials could reflect habituation of the US sensation (i.e., non-associative learning) rather than a prediction error, making interpretation difficult.

      Because the current manuscript focuses on reward-like prediction errors, we did not report these US prediction or presentation contrasts in detail. In brief, the suggested comparisons of early versus late CS-related differentiation (CS+ > CS-), revealed only limited trend-level activity. In contrast, US-related responses during acquisition showed robust activations in the cerebellar cortex, DCN, and VTA across the acquisition phase. Comparisons between the first three US presentations and later US presentations showed broadly distributed and stronger responses during early acquisition than during later US presentations. This pattern seems to be more consistent with non-associative effects, such as sensory habituation to the electrical stimulation, rather than with prediction-error–related processing. We have therefore not included them in the manuscript, but would be open to providing them in the Supplementary Information if the editor or reviewers consider them essential.

      General

      In your pre-registered analysis plan you state "we will explore the use of DCM in a larger network that encompasses known constituents of the fear extinction network, in addition to the cerebellum and VTA.". You have plenty of results to discuss in the current manuscript and adding this may complicate the narrative, but that being said, please either perform and include this analysis as you proposed or explicitly mention why this was not completed. You could also consider adding a whole-brain activation map for the key phases of the experiment. Please also double-check other pre-registered points, for example - the sample size justification is also different.

      We decided not to include whole-brain DCM analyses in this manuscript and not to report whole-brain activation results extensively, as the study was primarily hypothesis-driven with a focus on cerebellum-VTA interactions. While we recognize that whole-brain analyses are of interest and plan to explore them in future work, they were considered outside the scope of the current paper. This deviation from the preregistration is now explicitly noted in the revised manuscript.

      Regarding the sample size justification, the preregistration contained an error: the parameters were reported incorrectly. The correct sample size justification was already provided in the original 2019 grant application and is correctly reported in the current manuscript. The underlying power analysis was the same, but with different alpha levels depending on whether the study involved healthy participants (where larger samples are feasible) or rare patient populations (where stricter alpha levels are not practical). We have clarified this point in the manuscript under deviations from the preregistration.

      Additional changes made in manuscript by authors

      To provide a complete overview, we also note changes made independently of specific reviewer comments:

      Methods

      In the computational modeling section, “reextinction” was mistakenly mentioned where “reacquisition phase” was intended (the initial phase of the volatile phase before experience replay). This has been corrected.

      The term “trial sequence” is used in computational modeling, whereas counterbalancing in the fear conditioning methods used different terminology. We added a clarifying sentence in the modeling section to make this consistent.

      References in the pupil size analysis section (Jentsch et al. 2020; Mathôt et al. 2017) were misplaced and have now been moved earlier in the sentence.

      The citation for MRIcroGL software was updated to the current Nature Methods reference.

      We added a reference to Doubliez et al. 2025 which used the same three-day paradigm in a behavioral study showing similar physiological responses.

      Supplementary information

      During revision, we noted that the SCR statistics had been computed on an earlier preprocessed dataset version, whereas the finalized corrected dataset was already used for plotting and for estimating prediction and prediction-error values in the reinforcement-learning model. We therefore recomputed the SCR statistics on the finalized dataset for the sake of consistency; this did not change any main effects, interactions, or conclusions, with the only difference being an exploratory late-acquisition CS+/CS- post hoc shifting from non-significant to p < 0.05 (interaction still non-significant). Updated statistics are reported in the Supplementary information.

      Post hoc significant differences in Table S3 are now marked in bold, as the formatting was missing previously.

      To align behavioral analyses more closely with the event-based fMRI approach, we additionally examined physiological responses using a first three versus last three trial division within each phase. These analyses yielded patterns consistent with those obtained using the original early/late block division and are reported in the Supplementary Information.

      We added a new supplementary figure (Figure S4) showing the location of the cerebellar VOI on a SUIT flatmap and added a corresponding cross-reference in the Methods section (Volumes of interest (VOI) definition)

      References

      Bouton, M. E. (2002). Context, ambiguity, and unlearning: sources of relapse after behavioral extinction. Biological Psychiatry, 52(10), 976–986. https://doi.org/10.1016/S0006-3223(02)01546-9

      Bouton, M. E. (2004). Context and Behavioral Processes in Extinction: Table 1. Learning & Memory, 11(5), 485–494. https://doi.org/10.1101/lm.78804

      Constantinou, E., Purves, K. L., McGregor, T., Lester, K. J., Barry, T. J., Treanor, M., Craske, M. G., & Eley, T. C. (2021). Measuring fear: Association among different measures of fear learning. Journal of Behavior Therapy and Experimental Psychiatry, 70(September 2020), 101618. https://doi.org/10.1016/j.jbtep.2020.101618

      Craske, M. G., Treanor, M., Conway, C. C., Zbozinek, T., & Vervliet, B. (2014). Maximizing exposure therapy: An inhibitory learning approach. Behaviour Research and Therapy, 58, 10–23. https://doi.org/10.1016/j.brat.2014.04.006

      Doubliez, A., Köster, K., Müntefering, L., Nio, E., Diekmann, N., Thieme, A., Albayrak, B., Nicksirat, S. A., Erdlenbruch, F., Batsikadze, G., Ernst, T. M., Cheng, S., Merz, C. J., & Timmann, D. (2025). Dopaminergic drugs modulate fear extinction-related processes in humans, but effects are mild. Brain Communications, 7(5), fcaf333. https://doi.org/10.1093/braincomms/fcaf333

      Ernst, T. M., Brol, A. E., Gratz, M., Ritter, C., Bingel, U., Schlamann, M., Maderwald, S., Quick, H. H., Merz, C. J., & Timmann, D. (2019). The cerebellum is involved in processing of predictions and prediction errors in a fear conditioning paradigm. ELife, 8, e46831. https://doi.org/10.7554/eLife.46831

      Hermans, D., Craske, M. G., Mineka, S., & Lovibond, P. F. (2006). Extinction in Human Fear Conditioning. Biological Psychiatry, 60(4), 361–368. https://doi.org/10.1016/j.biopsych.2005.10.006

      Kalisch, R., Gerlicher, A. M. V., & Duvarci, S. (2019). A Dopaminergic Basis for Fear Extinction. Trends in Cognitive Sciences, 23(4), 274–277. https://doi.org/10.1016/j.tics.2019.01.013

      Lipp, O. V., Oughton, N., & LeLievre, J. (2003). Evaluative learning in human Pavlovian conditioning: Extinct, but still there? Learning and Motivation, 34(3), 219–239. https://doi.org/10.1016/S0023-9690(03)00011-0

      Lonsdorf, T. B., Menz, M. M., Andreatta, M., Fullana, M. A., Golkar, A., Haaker, J., Heitland, I., Hermann, A., Kuhn, M., Kruse, O., Meir Drexler, S., Meulders, A., Nees, F., Pittig, A., Richter, J., Römer, S., Shiban, Y., Schmitz, A., Straube, B., … Merz, C. J. (2017). Don’t fear ‘fear conditioning’: Methodological considerations for the design and analysis of studies on human fear acquisition, extinction, and return of fear. Neuroscience and Biobehavioral Reviews, 77, 247–285. https://doi.org/10.1016/j.neubiorev.2017.02.026

      Milad, M. R., & Quirk, G. J. (2012). Fear Extinction as a Model for Translational Neuroscience: Ten Years of Progress. Annual Review of Psychology, 63(1), 129–151. https://doi.org/10.1146/annurev.psych.121208.131631

      Salinas-Hernández, X. I., Vogel, P., Betz, S., Kalisch, R., Sigurdsson, T., & Duvarci, S. (2018). Dopamine neurons drive fear extinction learning by signaling the omission of expected aversive outcomes. ELife, 7, e38818. https://doi.org/10.7554/eLife.38818

      Salinas-Hernández, X. I., Zafiri, D., Sigurdsson, T., & Duvarci, S. (2023). Functional architecture of dopamine neurons driving fear extinction learning. Neuron, 111(23), 3854-3870.e5. https://doi.org/10.1016/j.neuron.2023.08.025

      Vervliet, B., Craske, M. G., & Hermans, D. (2013). Fear extinction and relapse: State of the art. Annual Review of Clinical Psychology, 9(March 2013), 215–248. https://doi.org/10.1146/annurev-clinpsy-050212-185542

    1. Author Response:

      Reviewer #1 (Public review):

      Summary and Strengths:

      Shin et al deepen our understanding of high-frequency oscillations in the frontal cortex during REM in a manner that sheds important light on the roles of these events. In particular, they reveal that cortical HFOs are modulated by theta oscillations, occur in chains and recruit cortical neuronal activation patterns in a manner that is distinct from other high-frequency events during non-REM or in the hippocampus. They also show that these events occur during increased oscillatory cross-talk between hippocampus and cortex and may protect cortical neurons from downregulation of firing during sleep. Overall, this is important work with several novel observations pointing towards an important role for these events that will become increasingly understood over time.

      I also wanted to comment that 2D is a beautiful illustration of separate and essentially exclusive communication channels used during HF events in NREM vs REM. They almost perfectly complement each other's frequencies.

      We thank the Reviewer for the positive comments and for highlighting the importance of our work, especially the distinct communication patterns during NREM and REM cortical high-frequency events.

      Weaknesses:

      I have only one major scientific critique: I believe we need to see quantification of how phasic REM theta waves with versus without HFOs differ. What do REM HFOs add to the "normal" theta oscillation? Without this comparison, it is more difficult to interpret the meaning of these events. Given that HFO chains have IEIs around the time of a theta cycle duration, are the repeating spiking activities stronger during HFO repeats than during adjacent theta waves without HFOs?

      We agree with the Reviewer that differences in activity during HFOs versus theta in the absence of HFOs is an important comparison to make to determine whether activity during HFOs reflect a unique state of information processing during REM sleep, or is redundant with theta oscillation signatures. We attempt to clarify this point in Figure S4I where we examined PFC population activity during theta periods outside of HFOs. Here, we extracted REM theta periods at least 250 ms away from detected HFOs and split the theta cycles into quartiles based on the theta power at the preferred theta phase bin determined by theta-coupled-HFOs (during that specific sleep session). We expect that using the preferred phase of HFOs is the most accurate choice for this comparison (compared to random phases). Lastly, we aligned PFC population activity to these theta phases and found that even in the highest theta power quartile, theta modulated fluctuations in PFC population activity were absent without HFOs. This indicates that theta-associated HFOs are the primary driver or signature of the observed population activity patterns (Figures 1H, 3F, S4I). An explanation of this procedure can be found in the Methods section under “Control for periods of high theta power”.

      Regarding the comment “what REM HFOs add to the "normal" theta oscillation”, we hypothesize that generation of HFOs and associated population activity is the result of theta-mediated input from other brain regions that converge on PFC. It is possible that CA1 is a candidate region, since we observed that theta frequency activity in CA1 leads PFC (Figure 4K, Phase slope index result). Additionally, the high concentration of acetylcholine and the high inhibitory tone in REM sleep is conducive to local suppression in response to external drive, as shown in the model and noted in the Discussion. Thus, we propose that HFOs delineate transient windows where sparse populations of PFC neurons are activated in the backdrop of overall suppression, potentially to link specific ensembles across PFC and other brain areas such as the hippocampus – a phenomenon that differs from baseline theta activity in REM.

      To address this point, we will provide additional analyses investigating PFC activity profiles during theta periods adjacent to HFOs. We will also reorganize the results and figures to highlight these important control analyses.

      What percentage of theta waves contain HFOs, and what is the firing rate during those theta waves with vs without HFOs? Is there differential firing rate modulation? The authors may even consider that all REM-HFO-specific quantifications should be shown as differential from phasic theta cycles without HFOs.

      To address these points, we will perform the requested analyses and explicitly quantify firing rate differences during HFO and non-HFO theta periods for further clarification.

      As a non-scientific comment on the manuscript itself: unfortunately, the paper is difficult to read and understand at times, requiring great effort by the reader. This is to an extent that communication is hindered. The paper is dense with changing methods, often from panel to panel. Unfortunately, the panel quantifications are not explained in the results section in a manner that readers can understand without going to read the methods, often for each individual panel. These measures should be explained in a way that lets readers understand the conclusions of each panel and what gross calculations were used to reach those. Instead, too much jargon is used rather than clear descriptions of the overall calculations being done for each panel.

      The point is well-taken and we apologize for the dense text and lack of methodological detail in the results section. We agree with the Reviewer that enhancing clarity and adding additional details about the quantitative methods within the main text and figure panels/legends would improve readability and make the manuscript more accessible for a wider audience.

      To address this point, we will include important details in the results section and legends to clarify the methods and calculations used. We will also reorganize the manuscript text and reorder some figure panels for readability, and update the Methods section to parallel the Results/Figure order to the extent possible.

      The authors mention in the discussion section that they see increased functional connectivity between mPFC and CA1, but most data suggesting this seems to be based on LFP rather than spiking. Functional connectivity is best defined by spiking-spiking relationships. And these authors have spiking data. So I believe either the descriptive language should be pulled back to something like "oscillatory coupling" or more analyses should be dedicated to showing spike-spike coordination across regions.

      To address this point, we will temper the claims of functional connectivity and replace all instances with “oscillatory coupling”.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors investigate high-frequency oscillations (HFOs) in the prefrontal cortex during REM sleep. They identify a specific pattern where these HFOs occur in "chains" that are phase-locked to theta oscillations, primarily during the "phasic" periods of REM. The study contrasts these events with isolated HFOs and NREM ripples, suggesting a unique role for these chains in coordinating activity between the prefrontal cortex and the hippocampus. Most notably, the authors report that a specific subset of hippocampal cells-those that co-fire with the prefrontal cortex during these HFOs-increase their firing rates over the course of sleep, suggesting a potential mechanism for selective memory consolidation.

      Strengths:

      The study addresses an under-explored area of sleep physiology: the fine-grained temporal coordination between the cortex and hippocampus during REM sleep. The identification of HFO "chains" and their association with higher theta power provides an interesting framework for understanding how the brain might organize information transfer outside of NREM sleep. The observation that specific hippocampal populations show differential firing rate changes based on their participation in these HFO events is a striking finding that warrants further investigation.

      We thank the Reviewer for finding our work interesting and for the positive comments regarding our manuscript.

      Weaknesses:

      The primary weakness of the study lies in the lack of a clear distinction between global brain states and the specific events being analyzed. Because the authors compare HFOs across different sleep stages (NREM, tonic REM, and phasic REM) without sufficient controls, it is difficult to determine if the observed differences are intrinsic to the HFOs themselves or simply a reflection of the different physiological states in which they occur.

      We appreciate this concern. We do agree that the generation of these ripples/HFOs in NREM and REM sleep are inextricably linked to global brain state (ex. cholinergic tone, as shown in the model), which results in differing patterns of activity across sleep states. However, we also show that activity associated with ripples and HFOs in NREM and REM sleep, respectively, delineate unique periods that underlie intra- and interregional interactions that differ from activity associated with other phenomena, such as spindles or baseline theta periods, in each respective sleep state. Regarding NREM PFC ripples, in our previous publication (Shin and Jadhav 2024), we show that PFC ripples are strongly associated with spindles and slow oscillations, but when PFC activity was assessed by aligning to each of these events separately, we observed significant differences in activity profiles (Shin and Jadhav 2024), indicating that NREM PFC ripples are indeed periods of differential PFC activity during which local reactivation is particularly strong. Similarly, here, in REM sleep, we see that PFC HFOs are strongly coupled with gamma oscillations and that these two frequency bands separately engage PFC neurons (Figures 2C, S3J, differences in phase locking preference of PFC neurons to gamma and HFO). While we observed strong theta modulated neuronal population activity in response to HFOs (Figure 1H), we did not observe the same for gamma events that were uncoupled from HFOs (Figure S3L, right). However, we did observe the population activity suppression when examining gamma events that were coupled with HFOs, but the theta modulated activity was largely absent (Figure S3L, left), indicating that, in terms of higher frequency oscillations, precise alignment to HFOs drives the theta modulated activity. Furthermore, we provide a control for baseline theta periods outside of HFOs to demonstrate that the phasic, theta-modulated activity (Figures 1H, 3F) is due to association with HFOs, and not a common feature during baseline theta activity (Figure S4I). Together, these results demonstrate that the theta modulated, phasic PFC activity that we report is primarily associated with the presence of HFOs.

      To address this point, we will provide a more detailed explanation for the theta controls that we performed, and conduct additional analyses to control for different baseline periods during REM sleep, similar to the response to Reviewer 1’s first comment.

      Furthermore, the evidence for "structured reactivation" is not yet convincing. The temporal alignment of these reactivation events appears inconsistent, with peaks occurring well before the HFO itself, and the analysis does not sufficiently control for pre-existing cellular assembly strengths.

      We thank the Reviewer for raising these important points. Regarding the temporal alignment of assemblies during REM HFOs, since gamma activity is linked to and precedes HFO activity in REM (Figure S3F,G), we posit that assembly activation preceding HFO alignment may be gamma frequency driven. Indeed, we do observe gamma-associated peaks in PFC population activity temporally adjacent to the start of HFO chains in REM (Figure S5F), which we propose is driving the assembly activation.

      Related to our response to Reviewer 1, the hypothesis that we have regarding this finding is that theta-mediated input to PFC, possibly from several brain areas including the hippocampus, converges and elicits cross-frequency activity spanning gamma and HFO bands. We hypothesize that these gamma and HFO oscillations work in concert to evoke the structured reactivation.

      Furthermore, as the Reviewer accurately points out, we are not able to determine whether the assembly patterns active during the REM HFOs pre-existed prior to their assessment during sleep. Since there was not enough REM sleep during the earlier sleep epochs, we were not able to investigate assembly activation patterns during REM in the first pre-task sleep session prior to W-Track exposure.

      To address these points, we will provide additional support for our claims, add clarification to major points, and expand on the methods used to assess structured reactivation. We will also analyze the spatial rate maps of assemblies during behavior on the W-Track and attempt to link these representations to assembly activity during REM HFOs. If sufficient controls cannot be provided, we will temper the claims of “reactivation” and replace all mentions with assembly “activation”.

      Additionally, some of the sleep architecture presented appears atypical, such as very short REM bouts and direct NREM-to-REM transitions that bypass standard progression, raising questions about the consistency of the sleep detection across animals.

      The reviewer is presumably referring to the hypnograms in Figure S1H. In Figure S1H, we presented concatenated hypnograms across all 9 sleep sessions, regardless of whether they were included for analysis. Furthermore, these hypnograms illustrate the output of just the sleep scoring algorithm and do not take into account the secondary, manual inspection that is performed to confirm sleep epoch inclusion. Individual epoch sleep state plots (e.g. Figure S1B) were visually inspected to confirm robust increases in theta-to-delta ratio detected in the absence of movement – epochs where microarousals or persistent subthreshold fluctuations in animal movement induced noisy TD ratio increases, and thus inaccurate REM designation, were excluded. We also want to note that omitting the edge cases, which is a minor part of the REM sleep data, does not change any results.

      Another consideration is that these animals were running a strenuous learning task that required repeated traversal of multiple maze arms over multiple behavioral session, which likely increased sleep pressure and thus may have altered sleep state dynamics in a subset of animals (Leemburg et al. 2010; Yang et al. 2012).

      To address these points, we will provide updated hypnograms that explicitly highlight the epochs used in analysis to resolve ambiguities. We will also further demonstrate that our procedure for sleep state designation is accurate and consistent across animals with supporting materials, including additional sleep stage classification examples, and REM-specific sleep examples marking tonic and phasic REM.

      Finally, the study does not account for potential confounds like baseline firing rates when interpreting the behavior of "high-cofiring" neurons, which may simply be the most active cells in the population.

      When we compared low and high cofiring neurons in CA1, we did indeed compare baseline firing rates between the two groups and found no differences. We compared both mean firing rates across entire sleep sessions as well as mean firing rates restricted to REM sleep (Figure S7A). We apologize that this important control was not emphasized more clearly.

      To address this point, we will explicitly reference this figure in the main text as a standalone point.

      Reviewer #3 (Public review):

      Summary:

      Shin et al. examine hippocampal-prefrontal interactions during sleep using simultaneous CA1 and prefrontal cortex recordings in rats performing a spatial memory task. They identify high-frequency oscillation (HFO) events in PFC during REM sleep that occur in theta-modulated chains and are associated with increased CA1-PFC coherence and sequential, sparse reactivation of cortical ensembles. This pattern contrasts with the synchronous reactivation observed during NREM cortical ripples. Together with a simple cholinergic network model, the authors propose that REM HFO chains represent a distinct mechanism for hippocampal-cortical coordination that complements NREM ripple-mediated processing during sleep.

      Strengths:

      A major strength of the work is the extensive electrophysiological dataset, which includes simultaneous recordings of large neuronal populations in both hippocampus and prefrontal cortex across behaviour and subsequent sleep. The analyses linking high-frequency events to population dynamics, interregional coherence, and ensemble reactivation are technically sophisticated and provide an incredibly detailed description of REM-associated cortical activity patterns. In particular, the demonstration that REM HFOs occur in chains aligned to theta phase and organise sequential activation of cortical assemblies represents a potentially important advance in understanding the neural structure of REM sleep activity. The integration of experimental data with a computational model further provides a useful framework for interpreting the observed differences between REM and NREM network states in terms of neuromodulatory influences.

      We thank the Reviewer for finding our work important and for the positive comments regarding the manuscript.

      Weaknesses:

      While overall this study provides a highly valuable body of work, there are two primary limitations, which, if overcome, would provide substantially more significance to the overall characterisation of REM HFOs. Specifically:

      (1) Distinction from wake HFOs

      The results largely support the authors' claim that REM HFO chains represent a distinct pattern of neural coordination compared to NREM cortical ripples. The analyses consistently show differences between REM and NREM events in terms of neuronal modulation, ensemble structure, and interregional coupling. However, similar high-frequency events during wake are not examined. Since REM sleep shares several network features with wakefulness, including strong theta oscillations, evaluating whether comparable PFC HFOs occur during wake would provide clarity on whether these events are specific to REM sleep (and its associated functions) or represent a more general theta-associated phenomenon.

      We thank the Reviewer for this suggestion. Indeed, this is an important comparison to make, since electrophysiological patterns of activity are similar across wake and REM sleep states.

      To address this point, we will detect and analyze HFOs during running behavior on the W-Track to determine if they elicit similar, phasic population responses in PFC.

      (2) Link to memory consolidation

      The manuscript proposes throughout that REM HFO chains may contribute to memory consolidation by coordinating hippocampal-cortical reactivation, but the evidence for this functional role remains indirect. The authors do highlight this as a limitation of the study - the inability to link their findings to learning - but it is not clear why. Further details of the behaviour results should be included. If no learning occurred across the eight behavioural sessions, this should be reported. If learning did occur, but could not be linked to HFO events, this should also be reported.

      This point is well-taken and we will reduce emphasis on memory consolidation in the manuscript. We do want to note that the primary focus here was to investigate new cortical-hippocampal activity patterns during sleep states that are established to be important for memory consolidation, in this case, REM sleep. Indeed, several major discoveries of reactivation and cortical-hippocampal physiological patterns in rodent sleep and wake states thought to be important for memory consolidation were initially reported without a link to memory consolidation, e.g., NREM hippocampal reactivation and replay (Wilson and McNaughton 1994; Lee and Wilson 2002), cortical – hippocampal activity coordination in slow-wave sleep (Siapas and Wilson 1998; Ji and Wilson 2007), waking replay in hippocampus (Foster and Wilson 2006; Karlsson and Frank 2009), etc. As Reviewer 1 noted, we expect that an important role for these novel events reported here will become increasingly understood over time.

      The connection between learning and REM HFO activity is a line of investigation that we find very interesting. However, due to the experimental design and the rapid pace at which the animals learn this task (Shin, Tang, and Jadhav 2019), we were not able to robustly relate REM HFO activity to learning. Firstly, with our threshold criteria for REM sleep detection (>10 s) as well as a total REM sleep duration criterion for sessions, most of the sleep epochs included for analysis came from the later sessions when REM sleep was more abundant (Figure SF,G). Consequently, many of the sleep sessions following the earlier behavioral/learning sessions were excluded. Making a claim about the contribution of REM HFOs to the learning process requires the inclusion of REM sleep periods after each behavior session to examine incremental changes in response to learning. Furthermore, a comparison of these REM sleep periods to pre-task REM sleep (pre-task sleep session #1 prior to task exposure) is important to demonstrate that any changes are dependent on experience. However, we were unable to make this comparison due to lack of REM sleep in pre-task sleep session #1. It is likely that an investigation of the role of these novel events in memory consolidation may require rodent task designs that are known to require REM sleep, such as inference tasks (Abdou et al. 2024; Ellenbogen et al. 2007), motor learning (Nitsche et al. 2010), or emotional memory (van der Helm and Walker 2011; Cairney et al. 2015).

      To address this point, we will reinforce this as a limitation of our study, reduce emphasis on memory consolidation, and further clarify that we were not able to link REM HFO activity to learning. We will also include additional details about the behavioral results.

      References

      Abdou, K., M. Nomoto, M. H. Aly, A. Z. Ibrahim, K. Choko, R. Okubo-Suzuki, S. I. Muramatsu, and K. Inokuchi. 2024. 'Prefrontal coding of learned and inferred knowledge during REM and NREM sleep', Nat Commun, 15: 4566.

      Cairney, S. A., S. J. Durrant, R. Power, and P. A. Lewis. 2015. 'Complementary roles of slow-wave sleep and rapid eye movement sleep in emotional memory consolidation', Cereb Cortex, 25: 1565–75.

      Ellenbogen, J. M., P. T. Hu, J. D. Payne, D. Titone, and M. P. Walker. 2007. 'Human relational memory requires time and sleep', Proc Natl Acad Sci U S A, 104: 7723–8.

      Foster, D. J., and M. A. Wilson. 2006. 'Reverse replay of behavioural sequences in hippocampal place cells during the awake state', Nature, 440: 680–3.

      Ji, D., and M. A. Wilson. 2007. 'Coordinated memory replay in the visual cortex and hippocampus during sleep', Nat Neurosci, 10: 100–7.

      Karlsson, M. P., and L. M. Frank. 2009. 'Awake replay of remote experiences in the hippocampus', Nat Neurosci, 12: 913–8.

      Lee, A. K., and M. A. Wilson. 2002. 'Memory of sequential experience in the hippocampus during slow wave sleep', Neuron, 36: 1183–94.

      Leemburg, S., V. V. Vyazovskiy, U. Olcese, C. L. Bassetti, G. Tononi, and C. Cirelli. 2010. 'Sleep homeostasis in the rat is preserved during chronic sleep restriction', Proc Natl Acad Sci U S A, 107: 15939–44.

      Nitsche, M. A., M. Jakoubkova, N. Thirugnanasambandam, L. Schmalfuss, S. Hullemann, K. Sonka, W. Paulus, C. Trenkwalder, and S. Happe. 2010. 'Contribution of the premotor cortex to consolidation of motor sequence learning in humans during sleep', J Neurophysiol, 104: 2603–14.

      Shin, J. D., and S. P. Jadhav. 2024. 'Prefrontal cortical ripples mediate top-down suppression of hippocampal reactivation during sleep memory consolidation', Curr Biol, 34: 2801–11 e9.

      Shin, J. D., W. Tang, and S. P. Jadhav. 2019. 'Dynamics of Awake Hippocampal-Prefrontal Replay for Spatial Learning and Memory-Guided Decision Making', Neuron, 104: 1110–25 e7.

      Siapas, A. G., and M. A. Wilson. 1998. 'Coordinated interactions between hippocampal ripples and cortical spindles during slow-wave sleep', Neuron, 21: 1123–8.

      van der Helm, E., and M. P. Walker. 2011. 'Sleep and Emotional Memory Processing', Sleep Med Clin, 6: 31–43.

      Wilson, M. A., and B. L. McNaughton. 1994. 'Reactivation of hippocampal ensemble memories during sleep', Science, 265: 676–9.

      Yang, S. R., H. Sun, Z. L. Huang, M. H. Yao, and W. M. Qu. 2012. 'Repeated sleep restriction in adolescent rats altered sleep patterns and impaired spatial learning/memory ability', Sleep, 35: 849–59.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Here, the authors attempted to test whether the function of Mettl5 in sleep regulation was conserved in drosophila, and if so, by which molecular mechanisms. To do so they performed sleep analysis, as well as RNA-seq and ribo-seq in order to identify the downstream targets. They found that the loss of one copy of Mettl5 affects sleep, and that its catalytic activity is important for this function. Transcriptional and proteomic analyses show that multiple pathways were altered, including the clock signaling pathway and the proteasome. Based on these changes the authors propose that Mettl5 modulate sleep through regulation of the clock genes, both at the level of their production and degradation, possibly by altering the usage of Aspartate codon.

      Comments on revised version:

      The authors satisfactorily addressed my comments, even though the precise mechanism by which Mettl5 regulates translation of clock genes remains to be firmly demonstrated.

      Reviewer #3 (Public review):

      Xiaoyu Wu and colleagues examined a potential role in sleep of a Drosophila ribosomal RNA methyltransferase, mettl5. Based on sleep defects reported in CRISPR generated mutants, the authors performed both RNA-seq and Ribo-seq analyses of head tissue from mutants and compared to control animals collected at the same time point. A major conclusion was that the mutant showed altered expression of circadian clock genes, and that the altered expression of the period gene in particular accounted for the sleep defect reported in the mettl5 mutant. In this revision, the authors have added a more thorough analysis of clock gene expression and show that PER protein levels are increased relative to wild type animals a specific times of day, indicating increased stability of the protein. Given that PER inhibits its own transcription, the per RNA is low in the mutants. Efforts toward a more detailed understanding of how clock gene expression was altered in the mutants, as well as other clarification of sleep phenotypes throughout is appreciated. As noted above, a strength of this work is its relevance to a human developmental disorder as well as the transcriptomic and ribosomal profiling of the mutant. However, there still remain some minor weaknesses in the manuscript. This reviewer is not in agreement with the interpretation of the epigenetic experiments. Specifically, co-expression of Clk[jrk] or per [01] with the mettl5 mutant recovered the nighttime sleep phenotype, but was additive to the daytime sleep phenotype such that double mutants showed higher sleep. This effect should be acknowledged and discussed. Overall, this is an interesting paper that indicates a molecular link between mettl5 and the circadian clock in regulation of sleep.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The authors misunderstood my original comment for Fig 1A. Please provide an explanation for the significance of the boxed region. There is little or no detail in the legend to help guide the reader.

      The information has been added to the figure legends for Figure 1A.

      Efforts toward improving analysis of circadian genes as well as sleep phenotypes (sleep onset time, rebound, etc) is much appreciated, thank you. However, Figure S1H and G panel labels are mixed up; please label in the order that they appear and that they correspond to the main text. Why is Figure S1H labeled "ZT 14"?

      Sleep latency is defined as the time from preparing to sleep to actually falling asleep. In this study, it specifically refers to the time taken for each individual fly to reach the sleep phenotype (i.e., 25 minutes of continuous sleep). We noted that this label was misleading, as the actual time to reach the sleep phenotype varied among individual flies. Therefore, in the revised figures, we have removed the ZT14 label. In addition, we have corrected the labeling of Figures S1G and S1H to ensure they appear in the correct order and correspond accurately to the descriptions in the main text.

      Unfortunately, based on Fig S1A-C, I am not convinced that mettl5 localizes to neurons, as there are no cells that show double labelling. This figure does not support the statement: "we found expression in both neurons (colocalizing with ELAV staining: Figure S1A-C) (lines 91-92), and "Mettl5-Gal4 is expressed in distinct neurons and glia that appear crucial for sleep regulation." (line 297). What "distinct" sleep related neurons were labeled? The staining in Fig S1A shows a different distribution from that in Fig S1D, and so it's possible this was a technical issue. Is there a better example?

      Thank you for your careful review and valuable comments. We agree that the colocalization of METTL5 with the neuronal marker ELAV is relatively sparse. However, as indicated by the arrows in Fig S1A–C, we did observe a few cells showing clear double labeling. These examples support the presence of METTL5 expression in neurons, albeit at a low frequency.

      In Figure 4G-H, please indicate the time of day of tissue collection.

      In Figure 4G-H, the tissue was collected at ZT0. We have now indicated this time point in the figure and legend to clarify the experimental timing.

      As noted in the public comment, I remain in disagreement with the assessment that "the double mutant showed the similar phenotype as downstream genes". The striking significant increase in daytime sleep in the double mutants remains unexplained. No further experiments are necessary, but this should be acknowledged in the text. Instead of an epistatic effect, given that overall sleep is high in the double mutants, another possible explanation is that the flies are sick and so are less active and sleeping more.

      Thank you for your suggestion. This has been acknowledged in the text. “Genetic epistasis experiments further supported this model, with clock gene mutants modified Mettl5 mutant phenotypes that suggesting both Clock and  Per downstream of Mettl5 (Figure 4I-N, Table 1). Secondary effect may exist for the significant increase in daytime sleep in the double mutants.”

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Fisher et al describes the molecular mechanism underlying how G beta gamma subunits engage with the beta 3 isoform of PLC. The paper used a combination of cryo EM, BRET assays, and biochemical assays of PLC beta activity. A key discovery is that G beta gamma is not sufficient to drive membrane binding by itself, and instead promotes G alpha activation. The work is important, but suffers slightly from some ambiguity in the actual interface that is present in their cryo EM model, as crosslinkers could stabilise a transient and non-native complex. This is somewhat abrogated by the careful mutational analysis, which shows that mutation of any of these three sites does somewhat block PLC beta G beta gamma activation. However, there could be some improvement in the presentation of this data, as well as possible mutant selection. Overall, this paper is a nice complement to the Falzone et al paper, showing the membrane-bound complex of PLCB3 on membranes, with this work building on this work, highlighting the importance this will have in our full understanding of PLC beta activation.

      Thank you for the positive feedback.

      Major concerns:

      My biggest concern is the potential that this interface is artefactual based on the crosslinking strategy utilised. Here are thoughts on how this could be better validated, presented in a more convincing way.

      (1) The authors' main claim is that there is a degree of plasticity of G beta gamma binding to the PLC beta 3 isoform, with three possible binding sites. The main complication of this is, of course, the possibility that the crosslinking stabilises a non-native complex, driven by a mutated cysteine.

      Because of this, any other additional details about this interface are going to be critical for the scientific audience to judge if this is accurate.

      What would greatly help Figure 1 is an evolutionary conservation analysis of the novel Gbg interface in PLC, to see how well this is conserved, and compare this to the conservation of the previously annotated sites. Conservation of these sites on both the G beta gamma and PLC side would help justify this as a native complex.

      This will also help orient the reader to the identity of the mutated residues assayed in Figure 3.

      We agree that crosslinking can result in the capture a non-physiologically relevant interface. However, we do not observe any crosslinking between Gbg and a PLCb3 variant that retains a cysteine in the disordered region of the X–Y linker, nor crosslinking between PLCb3 and any other cysteine present in the Gbg heterodimer. The evolutionary conservation analysis is a great suggestion and will included in the revision for both Gbg and PLCb.

      (2) The g beta gamma orientation is also different than what I have observed in previous g beta gamma effector structures. Is there any precedent for this as an effector interface? A supplemental figure comparing this structure to other g beta gamma interfaces from other enzymes, for example recent Tesmer structure with PI3K.

      Yes, this is not the more typically observed Gbg–effector interaction, which is mediated by the narrow face of the Gbgtoroid. We are not aware of other structures in which Gbg interacts with a binding partner in the same way. A supplemental figure comparing this Gbg–PLCb interaction to the Gbg–PI3K and Gbg–GRK2 structures will be included in the revision.

      (3) The mutational analysis in Figure 2D-G seems to give some strange results, and I have some question why certain residues were chosen rather than others. Mutation of the Gbg side will be more complicated, as of course that can affect any of the three surfaces. My main question is that, from the way Figure 2A is oriented, the main salt bridge in their novel interface to me looks like R199-D228, with K183 being in the wrong orientation to E226, and D167 being far from any charged residues. Why did the authors not make the corresponding R199 to D or E mutation?

      Thank you for pointing this out. We are in the process of testing the PLCb3 R199E mutant in our assays and will include the results in the revised manuscript.

      (4) To help the reader's interpretation of Figure 2A, I would recommend a supplemental figure showing the density for interfacial residues, as that also would increase confidence in the interface.

      Thank you for the suggestion, this will be included in the revised manuscript.

      Reviewer #2 (Public review):

      In this manuscript, the authors dissect how Gβγ potentiates PLCβ3 signaling in cells. Using engineered crosslinking to stabilize a Gβγ-PLCβ3 complex, single particle cryo-EM, and cell-based functional assays, they identify and map multiple putative Gβγ interaction surfaces on PLCβ3, including a previously unrecognized binding mode. Structure-guided mutagenesis supports the functional relevance of these interactions and suggests that Gβγ potentiation is not primarily mediated by PLCβ3 membrane recruitment, but instead enhances PLCβ3 activity after the lipase is already at the membrane.

      Previous reconstitution work on the membrane surface (Falzone & MacKinnon, 2023) proposed a recruitment/partitioning-centric model in which Gβγ increases PLCβ3 output largely by elevating its membrane surface concentration, whereas Gαq primarily increases catalytic turnover; under those reconstitution conditions, the two inputs can combine approximately multiplicatively. In receptor-driven cellular signaling, however, PLCβ3 is robustly recruited to the plasma membrane upon Gαq activation, which raises the question of whether Gβγ contributes mainly through additional recruitment or through a post-recruitment mechanism once PLCβ3 is already at the membrane.

      This manuscript helps address that gap by using membrane-anchored PLCβ3 and complementary cellular readouts to separate "getting PLCβ3 to the membrane" from "boosting activity once PLCβ3 is already there." Their results argue that, in cells, membrane recruitment is largely dominated by Gαq·GTP, while Gβγ can further potentiate PIP2 hydrolysis after membrane association, consistent with a modulatory role at the membrane rather than primary recruitment.

      Overall, the work provides a structural and mechanistic framework for Gβγ-PLCβ3 cooperation and helps clarify the basis of Gq pathway amplification. The manuscript is generally strong, but some issues need to be addressed.

      Thank you for the positive comments.

      Major comments:

      (1) BMOE/BM(PEG)2 crosslinking may enforce a non-native docking geometry, potentially compromising the physiological relevance and precision of the Gβγ-PLCβ3 interface as described. Although a >50% 1:1 crosslinked complex is formed and remains active, the solution maps show lower local resolution for Gβγ, consistent with a dynamic, potentially heterogeneous, interface. One interface is captured via a single engineered cysteine pair (PLCβ3 E60C-Gβ C271), which could potentially bias the pose. It would be helpful if the authors could provide additional orthogonal support (e.g., alternative crosslinked sites) and bolster the clarification of its uniqueness and relevance.

      We did attempt to isolate other crosslinked complexes. PLCb3-D892 self-crosslinked under all reaction conditions, while PLCb3-D892 XY<sub>Cys</sub> , which retains an endogenous cysteine within the X–Y linker (C516), did not result in any crosslinked product when incubated with Gbg. Only the PLCb3-D892 E60C crosslinked to Gbg as confirmed by SDS-PAGE and SEC. All experiments also used wild-type Gb which contains two solvent-exposed cysteines in the effector binding site (C204 and C271). The greatest number of particles correspond to crosslinking between Gb C271 and E60C in PLCb3-D892. Crosslinking between PLCb3-D892 E60C and other residues in Gbg is possible, but there are not sufficient particle numbers corresponding to these species for 2D classing and reconstruction. These observations, together with the high efficiency of crosslinking, are consistent with a stable and persistent interaction.

      (2) In the crosslinked structure, the authors report that GβD228 interacts with PLCβ3 R199 and K183. In Figure 2A, R199 appears closer to Gβ D228 than K183, yet only K183 is functionally tested. Testing R199 (e.g., R199E/R199A) would strengthen the structure-guided validation of this interface.

      We agree, and functional analysis of PLCb3 R199E will be included in the revision.

      (3) The mutagenesis strategy appears inconsistent across figures/assays, which makes it difficult to interpret phenotypes and directly link the functional data to the proposed interfaces. For example, in Figure 2E, we see R185L but R215E, while residue L40 is mutated to Gly in the IP accumulation assays but to Glu/Lys (L40E/K) in the BRET assays (Figures 3B/3D/3F). The authors should (i) clearly justify the rationale for each substitution (conservative vs charge-reversal, interface disruption, etc.) and (ii), where possible, test the same mutants across assays (or provide evidence that alternative substitutions yield consistent conclusions).

      The mutagenesis experiments were initially carried out independently in the Lambert and Lyon groups. As the study progressed, additional mutations were designed based on prior results. The L40G mutation is one such example. Given its modest impact on activity in the IP accumulation assay, the L40E and L40K mutants designed to maximally disrupt the interface in the BRET experiments. The revision will include the rationale behind different substitutions and discussion of any potential differences.

      Reviewer #3 (Public review):

      Summary:

      PLCβ3 is activated by both Gαq and Gβγ subunits. This paper follows previous solutions and cryoEM studies of PLCβ3 / Gβγ, trying to understand the molecular details of activation using cellular BRET assays and cryoEM.

      Strengths:

      The authors find evidence for multiple binding sites on PLCβ3 for Gβγ and suggest that Gβγ is not bone fide activator per se but enhances Gαq activation by positioning the catalytic site towards substrate, although this is not completely convincing. Although these sites may not naturally be operative, the authors might want to develop the potential role of these sites.

      The authors also find that this activation is not through recruitment of the enzyme to the membrane by Gβγ released upon G protein activation, in accord with other PLCβ enzymes, but not for PLCβ3, and again, the authors might want to develop this point further.

      Thank you for the suggestions.

      Weaknesses:

      (1) I'm confused as to why the authors feel that their mechanism is distinct from the two-state enzyme, the synergistic activation proposed by Ross in 2011, using a primarily thermodynamic argument. As written, the authors appear to be very reliant on structural and BRET studies that do not give the details that would disprove this interpretation. The main issue is that the author's mechanism does not fully explain how Gβγ activation occurs for PLCβ2 in reconstituted systems in the absence of Gαq subunits.

      The reconstitution experiments rely on nM-mM concentrations of purified proteins and liposomes that contain up to 30% PI (4,5)2. PLCb2 and PLCb3 show dose-dependent increases in activity with increasing concentrations of Gbg. PLCb enzymes that interact with the liposomes would encounter liposome-tethered Gbg subunits, which would in turn bind the lipase, tethering to the membrane and helping position the active site for catalysis. While there is not yet experimental evidence that Gbg binding can displace the Ha2’ helix, it could facilitate interfacial activation given the net negative charge of PI (4,5) P2. In addition, PLCb2 is fundamentally different from the other PLCb isoforms in its sensitivity to heterotrimeric G proteins. Given its decreased sensitivity to Ga<sub>q</sub> and increased basal activity, it is possible that autoinhibition by the proximal CTD is weaker. PLCb2 is also abundantly expressed in neutrophils, along with more Gi-coupled receptors. Thus, it is possible that Gbg directly activates PLCb2 in these cells, but future experiments are required to definitively answer this question.

      (2) In a recent study, McKinnon presents a model showing that Gαq and Gβγ activate PLCβ3 by two distinct pathways and that activation by Gβγ occurs through membrane recruitment. It is not surprising that the authors find that this is not true since the pelleting method used by McKinnon is subject to error. The authors should directly address the limitations of this previous work and the changes in proteoliposomes with sedimentation that alter partition coefficients. Although the inability of Gβγ to drive membrane binding is in accord with the quantitative studies of Scarlata, showing that the affinity of PLCβ3 to Gβγ is fairly weak as compared to the intrinsic membrane partition coefficient.

      Thank you for raising this point. The changes in composition, size, and structure when pelleting proteoliposomes may complicate data interpretation and will be discussed in the revision.

      (3) It was proposed many years ago that in signaling complexes Gαq - Gβγ may not have to fully dissociate when binding PLCβ, but rather shift their relative orientation when binding to PLCβ to allow activation. Is their model consistent with this? Is it possible that PLCβ3 keeps Gβγ from diffusing to enhance the rate of Gq / Gβγ re-association?

      The crosslinked complex is compatible with simultaneous binding of a Gbg –Gbg heterotrimer to the PLCb3 without disrupting the observed interface. It is possible that Gbg could interact with Gbg bound to the PH domain or the EF hands in the previously reported reconstruction. If so, the interaction would be mediated by the N-terminal helix of Gbg. Alternatively, the intrinsic GAP activity of PLCb3 may also prevent Gbg from diffusing to promote heterotrimer reassociation.

      (4) The authors find that Gβγ binds multiple sites, and it is clear that the PH domain site is the primary one in accord with previous work. Could these weaker sites be an artifact of the elevated concentrations used in cryoEM and BRET assays?

      Assuming the PH domain is the primary Gbg binding site, it is possible that the secondary EF hand site observed by Falzone and Mackinnon reflects high protein concentrations. However, it seems unlikely that we would reach these concentrations within cells. Our functional data is also consistent with the Gbg binding site in the EF hands playing a functional role in increasing PLCb activity.

      (5) Although their assays infer differences in binding affinities, it would strengthen the paper if the authors could estimate the association energies of these different binding sites. This estimation would also address the concern stated above.

      We appreciate this suggestion and will keep it in mind as we complete the revision.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study makes a fundamental contribution to our understanding of interocular suppression, particularly continuous flash suppression (CFS). Using neuroimaging data from two macaque monkeys, the study provides compelling evidence that CFS suppresses orientation responses in neurons within V1. These findings enrich the CFS literature by demonstrating that neural activity under CFS may prevent high-level visual and cognitive processing.

      Comments on revisions:

      The authors have addressed all my previous comments.

      Thanks for the very warm comments!

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to investigate the degree to which low-level stimulus features (i.e., grating orientation) are processed in V1 when stimuli are not consciously perceived under conditions of continuous flash suppression (CFS). The authors measured the activity of a population of V1 neurons at single neuron resolution in awake fixating monkeys while they viewed dichoptic stimuli that consisted of an oriented grating presented to one eye and a noise stimulus to the other eye. Under such conditions, the mask stimulus can prevent conscious perception of the grating stimulus. By measuring the activity of neurons (with Ca2+ imaging) that preferred one or the other eye, the authors tested the degree of orientation processing that occurs during CFS.

      Strengths:

      The greatest strength of this study is the spatial resolution of the measurement and the ability to quantify stimulus representations during CSF in populations of neurons preferring the eye stimulated by either the grating or the mask. There have been a number of prominent fMRI studies of CFS, but all of them have had the limitation of pooling responses across neurons preferring either eye, effectively measuring the summed response across ocular dominance columns. The ability to isolate separate populations offers an exciting opportunity to study the precise neural mechanisms that give rise to CFS, and potentially provide insights into nonconscious stimulus processing.

      Weaknesses:

      However, while this is an impressive experimental setup, the major weakness of this study is that the experiments don't advance any theoretical account of why CFS occurs or what CFS implies for conscious visual perception. There are two broad camps of thinking with regard to CFS. On the one hand, Watanabe et al., 2011 reported that V1 activity remained intact during

      CFS, implying that CFS interrupts stimulus processing downstream of V1. On the other hand, Yuval-Greenberg and Heeger (2013) showed that V1 activity is in fact reduced during CFS. By using a parametric experimental design, they measured the impact of the mask on the stimulus response as a function of contrast, and concluded that the mask reduces the gain of neural responses to the grating stimulus. They presented a theoretical model in which the mask effectively reduced the SNR of the grating, making it invisible in the same way that reducing contrast makes a stimulus invisible.

      In the first submission of the manuscript, the authors incorrectly described the Yuval-Greenberg & Heeger (2013) paper and Watanabe et al. (2011) papers, suggesting that they had observed the same or similar effects of CFS on V1 activity, when in fact they had described opposite results. Reviewer 1 also observed that the authors appeared to be confused in their reading of these highly relevant papers. In the revision, the authors have reworked this paragraph, now correctly describing these sets of opposing results. However, I still do not understand what the authors are trying to argue: "...these studies were not designed to quantify the pure effect of CFS on stimulus-evoked V1 responses." I do not understand what is meant by "pure" in this case.

      This is clarified as: “Nevertheless, these studies contrasted monocular and dichoptic masking conditions to equate stimulus input while manipulating perceptual visibility, which were not designed to quantify the pure effect of CFS on stimulus-evoked V1 responses, that is, the difference of BOLD signals between binocular masking and stimulus alone conditions.” (line 63)

      Regardless, it is clear that the measurements in the present study strongly support the interpretation of Yuval-Greenberg & Heeger (i.e., that V1 activity is degraded by CFS, 'akin' to a loss in the contrast-to-noise ratio of neural activity). It would be appropriate for the authors to communicate this clearly.

      We agree and added the following sentence in the text: “These results support the conclusion of Yuval-Greenberg and Heeger (2013) that V1 activity is degraded by CFS, ‘akin’ to a loss in the contrast-to-noise ratio of neural activity” (line 122)

      I continue to be of the opinion that this study is lacking an adequate model of interocular interactions that might explain the Ca2+ imaging. The machine learning results are not terribly surprising - multivariate methods, such as SVMs, are more sensitive than univariate approaches. So it is plausible that an SVM can support decoding of the coarse orientation information, even when no tuning is evident in the univariate analyses. However, the link between this result and the underlying neurophysiology is opaque. The failure to model the neural data with an explicit model is a missed opportunity.

      We agree and put “An ocular-dominance-dependent gain control model” back to the text. Fig. 2D now shows the results of model fitting.

      (line 167)

      An ocular-dominance-dependent gain control model

      We developed an ocular dominance-dependent gain control model to account for the impact of CFS on V1 population orientation tuning. The model development followed two steps.

      Step I. Population orientation tuning functions before CFS

      The population orientation tuning functions due to monocular stimulation exhibited different amplitudes among OD groups (Fig. 2D, red curves), which could be simulated with Equation 1, an OD-weighted Gaussian basis function:

      where parameters A, σ, and B corresponded to the amplitude, standard deviation, and minimal response of the Gaussian basis function, respectively, and θ represented the preferred orientation of a bin of neurons relative to the actual orientation of the grating stimulus. The weight parameter w was the mean of linearly transformed ODIs of neurons in a neuronal group, which equated to (ODI +1)/2 or 1 - (ODI + 1)/2, depending on contralateral or ipsilateral eye grating stimulation, and ranged from 0-1. Thus, a smaller w would indicate a higher preference for the eye seeing the grating, and a larger w would indicate a higher preference for the unstimulated eye (or the eye seeing the flashing masker under CFS). The w equated to 0.33, 0.50, and 0.67 in Monkey A, and 0.32, 0.5, and 0.68 in Monkey B, for the grating eye-preferring group, binocular group, and the masker eye-preferring group, respectively. The exponent s represented a nonlinear transformation.

      Equation 1 fitted the baseline data well (Fig. 2D, red curves), resulting in goodness-of-fit (R<sup>2</sup>) values at 0.94 and 0.95 for the two monkeys, respectively. This indicated that the equation captured the OD-dependent population orientation tuning characteristics of V1 neurons with monocular stimulation before CFS.

      Step II. The impacts of CFS

      In step II, the model introduced several binocular combination factors to account for population orientation tuning functions under CFS.

      To account for the OD-dependent changes of orientation tuning bandwidths under CFS, a w-dependent inhibition factor wt was introduced, which scaled the σ of the tuning functions, changing the monocular tunings R into R’:

      This allowed different groups of neurons to exhibit various degrees of orientation tuning function broadening, capturing the pattern in which neurons preferring the eye seeing the grating displayed a sharper population orientation tuning curve under CFS than those preferring the eye seeing the masker.

      Previous studies have shown that binocular neuronal responses can be modeled by incorporating interocular suppression and summation processes (Kato et al., 1981; Dougherty, Cox, Westerberg, & Maier, 2019; Zhang et al., 2024). Therefore, R’ was further normalized by the neural response to the flashing masker to simulate interocular suppression, which was the first component of Equation 3. Additionally, the neural response to the flashing masker was summed to simulate binocular summation, which was the second component of Equation 3. These two components when summed, determining the final neural responses under CFS:

      where N was the empirical neural response to the monocularly presented flashing masker stimulation, a and b were scaling parameters, and k and m were nonlinearity parameters. The interocular normalization by masker response led to amplitude reduction of population orientation tuning functions for different groups of neurons, while the binocular summation with masker response elevated the minimal responses of tuning functions to their corresponding heights.

      During the step II model fitting, the parameters A, σ, and s were inherited from the monocular tuning fits derived from Equation 1 and served as inputs, while the parameters a, k, b, m, and t were optimized. The model captured the CFS modulation on population orientation tuning curves well, with R2 = 0.99 and 0.98 for Monkeys A and B, respectively (Fig. 2D, red curves).

      Reviewer #3 (Public review):

      Summary:

      In this study, Tang, Yu & colleagues investigate the impact of continuous flash suppression (CFS) on the responses of V1 neurons using 2-photon calcium imaging. The report that CFS substantially suppressed V1 orientation responses. This suppression happens in a graded fashion depending on the binocular preference of the neuron: neurons preferring the eye that was presented with the marker stimuli were most suppressed, while the neurons preferring the eye to which the grating stimuli were presented were least suppressed. Binocular neuron exhibited an intermediate level of suppression.

      Strengths:

      The imaging techniques are cutting-edge.

      Weaknesses:

      The strength of CFS suppression varies across animals, but the authors attribute this to comparable heterogeneity in the human psychophysics literature.

      Comments on revisions:

      The authors have addressed my comments from the previous round of review, and I have no further comments

      Thanks!

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data. Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (i) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (ii) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping, across the whole genome, to ensure full understanding and clarity.

      In the revised manuscript, the authors have improved the presentation and analysis of their data, expanding the description of SNS-seq mapping across the genome, and more clearly assessing to what extent there is correlation between SNS-seq signal and previous mapping approaches to predict origins (by MFA-seq and ChiP-chip of ORC1/CDC6). With regard the correlation between SNS-seq and ORC/1CDC6 ChIP-chip, it should be noted that two datasets were generated in distinct strains of T. brucei (Lister 427 and TREU927, respectively), and it is unclear if the latter dataset can be accurately mapped to the strain used here. Notwithstanding this concern, these improvements clarify a number of aspects of the SNS-seq mapping: (1) the signal is more prevalent in the transcribed core of the genome than in the largely transcriptionally silent subtelomeres; and (2) whereas previous work revealed strong correlation between ORC1/CDC6 localisation and MFA-seq peaks at the ends of multigene transcription units, neither of these data show significant overlap with SNS-seq signal, which is not seen at transcription start or stop sites ('SSRs'; supplementary Fig.8D) and shows marked depletion at predicted ORC1/CDC6 sites (supplementary Fig.8C). To the authors' credit, they acknowledge this lack of correlation in the discussion.

      The authors have not provided any new data to substantiate their assertion that SNS-seq accurately detects origins in T. brucei, and therefore the work rests on a single experimental approach, without validation. As a result, the suggestion of abundant, previously undetected origins in the intergenic regions of multigene transcription remains a prediction. One key untested limitation of the work lies in the observation that the very large majority of SNS-seq signal overlaps with previously RNA-DNA hybrids; without an experimental test, the suggestion that the authors have 'disclosed for the first time a strong link between RNANA hybrid formation and DNA replication initiation' remains conjecture.

      Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of origins of replications. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Between the initial submission and this revision, the raised major concerns have not been resolved, and no additional validation has been provided.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript is concluded with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) There are substantial discrepancies between the origins identified here and those reported in previous studies. Given that the other studies precede this manuscript, it is the authors' duty to investigate these differences. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      We agree that orthogonally validation of origins detected by stranded SNS-seq is necessary and we are working on it.

      (2) I am concerned that up to 96% percent of all SNS-seq peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Upon request, the authors have performed a control, where randomly placed peaks were run through the same filtering process. Only approximately twice as many experimental peaks passed filtering compared to random peaks. While the authors emphasize reproducibility between replicates, technical artifacts from the protocol would also be reproducible. Moreover, in other SNS-seq studies, for example, Pratto et al. Cell 2021, Fig. 1B, + and − strand peaks always appear closely paired. This pattern contrasts strongly with Fig. 2A in this manuscript.

      The size and overlap of peaks depend on the length of the SNS. In our study, the width of the peaks corresponds to the size of the short nascent strands (0.5–2.5 kb) chosen as the starting material, whereas the width of the peaks in Pratto et al., Cell, 2021 are much larger (few kb). This could be due to the longer SNS used in the Pratto et al. study. Consequently, the overlap of the longer SNS is more pronounced since the SNS fibres elongate in both directions: at the 3′ end by DNA polymerase and at the 5′ end by ligation of Okazaki fragments. Additionally, the genomic regions displayed in our Figure 2A and in Pratto et al, Figure 1B are presented at substantially different resolutions, with a roughly ten‑fold difference in scale.

      Further, I have some minor concerns that do not affect the main conclusions of the manuscript:

      - Fig 2C: The regions shown in the heatmap have different sizes, and I presume that the regions are ordered by size on the y-axis? If so, does the cone-shaped pattern, which is origin-less for genic regions and origin-enriched for intergenic regions, arise from the size of the regions? (I.e., for each genic region, the region itself is origin-less and the flanking intergenic regions contain origins.) If this is the case, then the peaks/valleys, centered exactly on the center of the regions on the mean frequency plots, arise from the different sizes of the analyzed regions, not from the fact that origins are mostly found at the center of intergenic regions. This data would be better presented with all regions stretched to the same size. This has not been addressed in the revision.

      As the reviewer suggested, we have produced scaled plots of the stranded SNS-seq origins over genic and intergenic regions (see Figure 3, which is attached along with the Reviewer #2 (Recommendations for the authors)). However, we would prefer to keep the unscaled versions in the manuscript and add a note in the text as part of the Version of Record, explaining that the origins are evenly distributed throughout intergenic regions rather than being centred within them.

      - Line 123, "and the average length of origins was found to be approximately 150 bp.": To determine origins, the authors filter away overlapping peaks and peaks that are too far from each other. Both restrict the minimal and maximal length of origins that can be observed, and this, in turn, affects the average length. This has not been addressed in the revision.

      This observation is correct. By applying filtering and setting the maximum distance between the positive and negative peaks, we are most likely affecting the average length by excluding potentially wider origins.

      We'll modify the text as part of the Version of Record.

      Are claims well substantiated?:

      The identification of origins via SNS-seq appears to be incompletely supported to me.<br /> All downstream analyses depend on the reliability of origin identification.<br /> Impact:

      This study has the potential to be valuable for two fields: In research focused on T. brucei as a disease agent, where essential processes that function differently than in mammals are excellent drug targets. Further, this study would impact basic research analyzing DNA replication over the evolutionary tree, where T. brucei can be used as an early-divergent eucaryotic model organism.


      The following is the authors’ response to the original reviews.

      eLife Assessment

      The authors use sequencing of nascent DNA (DNA linked to an RNA primer, "SNS-Seq") to localise DNA replication origins in Trypanosoma brucei, so this work will be of interest to those studying either Kinetoplastids or DNA replication. The paper presents the SNS-seq results for only part of the genome, and there are significant discrepancies between the SNS-Seq results and those from other, previously-published results obtained using other origin mapping methods. The reasons for the differences are unknown and from the data available, it is not possible to assess which origin-mapping method is most suitable for origin mapping in T. brucei. Thus at present, the evidence that origins are distributed as the authors claim - and not where previously mapped - is inadequate.

      We would like to clarify a few points regarding our study. Our primary objective was to characterise the topology and genome-wide distribution of short nascent-strand (SNS) enrichments. The stranded SNS-seq approach provides the high strand-specific resolution required to analyse origins. The observation that SNS-seq peaks (potential origins) are most frequently found in intergenic regions is not an artefact of analysing only part of the genome; rather, it is a result of analysing the entire genome.

      We agree that orthogonal validation is necessary. However, neither MFA-seq nor TbORC1/CDC6 ChIP-on-chip has yet been experimentally validated as definitive markers of origin activity in T. brucei, nor do they validate each other.

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data.

      Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (1) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (2) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping across the whole genome to ensure full understanding and clarity.

      Regarding comparisons with previous work:

      - Two other attempts to identify origins in T. brucei - ORC1/CDC6 binding sites (ChIP-on-chip, PMID: 22840408) and MFA-seq (PMID: 22840408, 27228154) - were both produced by the McCulloch group. These methods do not validate each other; in fact, MFA-seq origins overlap with only 4.4% of the 953 ORC1/CDC6 sites (PMID: 29491738). Therefore, low overlap between SNS-seq peaks and ORC1/CDC6 sites cannot disqualify our findings. Similar low overlaps are observed in other parasites (PMID: 38441981, PMID: 38038269, PMID: 36808528) and in human cells (PMID: 38567819).

      - We also would like to emphasize that the ORC1/CDC6 dataset originally published (PMID: 22840408) is no longer available; only a re-analysis by TritrypDB exists, which differs significantly from the published version (personal communication from Richard McCulloch). While the McCulloch group reported a predominant localization of ORC1/CDC6 sites within SSRs at transcription start and termination regions, our re-analysis indicates that only 10.3% of TbORC1/CDC6-12Myc sites overlapped with 41.8% of SSRs.

      - MFA-seq does not map individual origins, it rather detects replicated genomic regions by comparing DNA copy number between S- and G1-phases of the cell cycle (PMID: 36640769; PMID: 37469113; PMID: 36455525). The broad replicated regions (0.1–0.5 Mbp) identified by MFA-seq in T. brucei are likely to contain multiple origins, rather than just one. In that sense we disagree with the McCulloch's group who claimed that there is a single origin per broad peak. Our analysis shows that up to 50% of the origins detected by stranded SNS-seq locate within broad MFA-seq regions. The methodology used by McCulloch’s group to infer single origins from MFA-seq regions has not been published or made available, as well as the precise position of these regions, making direct comparison difficult.

      Finally, the genomic features we describe—poly(dA/dT) stretches, G4 structures and nucleosome occupancy patterns—are consistent with origin topology described in other organisms.

      On the concern that SNS-seq may map RNA-DNA hybrids rather than replication origins: Isolation and sequencing of short nascent strands (SNS) is a well-established and widely used technique for high-resolution origin mapping. This technique has been employed for decades in various laboratories, with numerous publications documenting its use. We followed the published protocol for SNS isolation (Cayrou et al., Methods, 2012, PMID: 22796403). RNA-DNA hybrids cannot persist through the multiple denaturation steps in our workflow, as they melt at 95°C (Roberts and Crothers, Science, 1992; PMID: 1279808). Even in the unlikely event that some hybrids remained, they would not be incorporated into libraries prepared using a single-stranded DNA protocol and therefore would not be sequenced (see Figure 1B and Methods).

      Furthermore, our analysis shows that only a small proportion (1.7%) of previously reported RNA-DNA hybrids overlap with SNS-seq origins. It is important to note that RNA-primed nascent strands naturally form RNA-DNA hybrids during replication initiation, meaning the enrichment of RNA-DNA hybrids near origins is both expected and biologically relevant.

      On the claim that our analysis focuses narrowly on inter-CDS regions and ignores other genomic compartments: this is incorrect. We mapped and analyzed stranded SNS-seq data across the entire genome of T. brucei 427 wild-type strain (Müller et al., Nature, 2018; PMID: 30333624), including both core and subtelomeric regions. Our findings indicate that most origins are located in intergenic regions, but all analyses were performed using the full set of detected origins, regardless of location.

      We did not ignore transcription start and stop sites (TSS/TTS). The manuscript already includes origin distribution across genomic compartments as defined by TriTrypDB (Fig. 2C) and addresses overlap with TSS, TTS and HT in the section “Spatial coordination between the activity of the origin and transcription”. While this overlap is minimal, we have included metaplots in the revised manuscript for clarity.

      Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of the origins of replication. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript concludes with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      We sincerely thank you for this positive feedback.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      Thank you very much for this remark.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Thank you for appreciating our discussion.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) I do not understand why SNS-seq would create peaks. Replication should originate in one locus, then move outward in both directions until the replication fork moving outward from another origin is encountered. Hence, in an asynchronous population average measurement, I would expect SNS data to be broad regions of + and -, which, taken together, cover the whole genome. Why are there so many regions not covered at all by reads, and why are there such narrow peaks?

      Thank you for asking these questions. As you correctly point out, replication forks progress in both directions from their origins and ultimately converge at termination sites. However, the SNS-seq method specifically isolates short nascent strands (SNSs) of 0.5–2.5 kb using a sucrose gradient. These short fragments are generated immediately after origin firing and mark the sites of replication initiation, rather than the entire replicated regions. Consequently: (i) SNS-seq does not capture long replication forks or termination regions, only the immediate vicinity of origins. (ii) The narrow peaks indicate the size of selected SNSs (0.5–2.5 kb) and the fact that many cells initiate replication at the same genomic sites, leading to localized enrichment. (iii) Regions without coverage refer to genomic areas that do not serve as efficient origins in the analyzed cell population. Thus, SNS-seq is designed to map origin positions, but not the entire replicated regions.

      (2) I am concerned that up to 96% percent of all peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Specifically, if the authors placed the same number of peaks as was measured randomly in intergenic regions, would 4% of these peaks pass the filtering process by chance?

      Maintaining the strandness of the sequenced DNA fibres enabled us to filter the peaks, thereby increasing the probability that the filtered peak pairs corresponded to origins. Two SNS peaks must be oriented in a way that reflects the topology of the SNS strands within an active origin: the upstream peak must be on the minus strand and followed by the downstream peak on the plus strand.

      As suggested by the reviewer, we tested whether randomly placed plus and minus peaks could reproduce the number of filter-passing peaks using the same bioinformatics workflow. Only 1–6% of random peaks passed the filters, compared with 4–12% in our experimental data, resulting in about 50% fewer selected regions (origins). Moreover, the “origins” from random peaks showed 0% reproducibility across replicates, whereas the experimental data showed 7–64% reproducibility. These results indicate that the retainee peaks are highly unlikely to arise by chance and support the specificity of our approach. Thank you for this suggestion.

      (3) There are 3 previous studies that map origins of replication in T. brucei. Devlin et al. 2016, Tiengwe et al. 2012, and Krasiļņikova et al. 2025 (https://doi.org/10.1038/s41467-025-56087-3), all with a different technique: MFA-seq. All three previous studies mostly agree on the locations and number of origins. The authors compared their results to the first two, but not the last study; they found that their results are vastly different from the previous studies (see Supplementary Figure 8A). In their discussion, the authors defend this discrepancy mostly by stating that the discrepancy between these methods has been observed in other organisms. I believe that, given the situation that the other studies precede this manuscript, it is the authors' duty to investigate the differences more than by merely pointing to other organisms. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      The MFA-seq data for T. brucei were published in two studies by McCulloch’s group: Tiengwe et al. (2012) using TREU927 PCF cells, and Devlin et al. (2016) using PCF and BSF Lister427 cells. In Krasilnikova et al. (2025), previously published MFA-seq data from Devlin et al. were remapped to a new genome assembly without generating new MFA-seq data, which explains why we did not include that comparison.

      Clarifying the differences between MFA-seq and our stranded SNS-seq data is essential. MFA-seq and SNS-seq interrogate different aspects of replication. SNS-seq is a widely used, high-resolution method for mapping individual replication origins, whereas MFA-seq detects replicated regions by comparing DNA copy number between S and G1 phases. MFA-seq identified broad replicated regions (0.1–0.5 Mb) that were interpreted by McCulloch’s group as containing a single origin. We disagree with this interpretation and consider that there are multiple origins in each broad peaks; theoretical considerations of replication timing indicate that far more origins are required for complete genome duplication during the short S-phase. Once this assumption is reconsidered, MFA-seq and SNS-seq results become complementary: MFA-seq identifies replicated regions, while SNS-seq pinpoints individual origins within those regions. Our analysis revealed that up to 50% of the origins detected by stranded SNS-seq were located within the broad MFA peaks. This pattern—broad MFA-seq regions containing multiple initiation sites—has also recently been found in Leishmania by McCulloch’s team using nanopore sequencing (PMID: 26481451). Nanopore sequencing showed numerous initiation sites within MFA-seq regions and additional numerous sites outside these regions in asynchronous cells, consistent with what we observed using stranded SNS-seq in T. brucei. We will expand our discussion and conclude that the discrepancy arises from methodological differences and interpretation. The two approaches provide complementary insights into replication dynamics, rather than ‘vastly different’ results.

      We recognize the importance of validating our results in future using an alternative mapping method and functional assays. However, it is important to emphasize that stranded SNS-seq is an origin mapping technique with a very high level of resolution. This technique can detect regions between two divergent SNS peaks, which should represent regions of DNA replication initiation. At present, no alternative technique has been developed that can match this level of resolution.

      (4) Some patterns that were identified to be associated with origins of replication, such as G-quadruplexes and nucleosomes phasing, are known to be biases of SNS-seq (see Foulk et al. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 2015;25(5):725-735. doi:10.1101/gr.183848.114).

      It is important to note that the conditions used in our study differ significantly from those applied in the Foulk et al. Genome Res. 2015. We used SNS isolation and enzymatic treatments as described in previous reports (Cayrou, C. et al. Genome Res, 2015 and Cayrou, C et al. Methods, 2012). Here, we enriched the SNS by size on a sucrose gradient and then treated this SNS-enriched fraction with high amounts of repeated λ-exonuclease treatments (100u for 16h at 37oC - see Methods). In contrast, Foulk et al. used sonicated total genomic DNA for origin mapping, without enrichment of SNS on a sucrose gradient as we did, and then they performed a λ-exonuclease treatment. A previous study (Cayrou, C. et al. Genome Res, 2015, Figure S2, which can be found at https://genome.cshlp.org/content/25/12/1873/suppl/DC1) has shown that complete digestion of G4-rich DNA sequences is achieved under the conditions we used.

      Furthermore, the SNS depleted control (without RNA) was included in our experimental approach. This control represents all molecules that are difficult to digest with lambda exonuclease, including G4 structures. Peak calling was performed against this background control, with the aim of removing false positive peaks resulting from undigested DNA structures. We explained better this step in the revised manuscript.

      The key benefit of our study is that the orientation of the enrichments (peaks) remains consistent throughout the sequencing process. We identified an enrichment of two divergent strands synthesised on complementary strands containing G4s. These two divergent strands themselves do not, however, contain G4s (see Fig. 8 for the model). Therefore, the enriched molecules detected in our study do not contain G4s. They are complementary to the strands enriched with G4s. This means that the observed enrichment of

      G4s cannot be an artefact of the enzymatic treatments used in this study. We added this part in the discussion of the revised manuscript.

      We also performed an additional control which is not mentioned in the manuscript. In parallel with replicating cells, we isolated the DNA from the stationary phase of growth, which primarily contains non-replicating cells. Following the three λ-exonuclease treatments, there was insufficient DNA remaining from the stationary phase cells to prepare the libraries for sequencing. This control strongly indicated that there was little to no contaminating DNA present with the SNS molecules after λ-exonuclease enrichment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Four broad issues need to be addressed.

      (1) The authors have attempted to test the overlap between ORC1/CDC6 (an ORC subunit) binding in the genome and SNS-seq. If there were an overlap, this would provide evidence that the SNS-seq signals represent origins. However, the analysis provided is inadequate: merely a statement that "we obtained an overlap of 4.2% between origins and ORC1/CDC6 binding sites within a window of {plus minus}2 kb and 6.2% in the window of {plus minus}3 kb". Nowhere are these data shown or properly discussed:

      a) The authors need to provide a diagram showing where in the genome the very small amount of overlapping SNS-seq and ORC1/CDC6 binding occurs, and to clearly show and state how many of the intergenic SNS-seq peaks are sites of ORC1/CDC6 binding. In the absence of such analysis, a key question is unanswered: is there any evidence of ORC1/CDC6 (or ORC more broadly) binding at the SNS-seq signals within the polycistronic transcription units?

      In the original version of the manuscript, these data were already presented as percentages in the text and as a metaplot (Supplementary Fig. 8C).

      We based our analysis on the set of 350 TbORC1/CDC6 binding sites available on TriTrypDB at the time of analysis. This dataset was a filtered subset of the originally reported TbORC1/CDC6 ChIP‑on‑chip peaks (personal communication, TriTrypDB). Since then, the unfiltered dataset has been made available. We therefore re‑analyzed the overlap using this dataset, to which we applied a filtering that yielded 990 binding sites closely matching the 953 sites reported by the McCulloch group. We need to stress here that the original 953 sites reported by the McCulloch group (Tiengwe et al., 2012 PMID: 22840408), is not available anymore and that the authors:

      - do not provide genomic coordinates for the 953 binding sites and

      - do not release any scripts or methodology that would allow independent reproduction of the 953 sites.

      A similar remark also applies to the MFA-seq data (see below).

      To address the reviewer’s request, we have now:

      (1) Recalculated the overlap using the updated TbORC1/CDC6 dataset (990 binding sites) from TriTrypDB.

      (2) Added the absolute number of overlapping SNS‑seq origins and TbORC1/CDC6 binding sites in the Results section for clarity.

      (3) Included the TbORC1/CDC6 binding sites in the chromosomal overview (newly added to Supplementary Fig. 8A), so that their genomic localization relative to SNS‑seq peaks is visually accessible.

      (4) Revised the metaplots of TbORC1/CDC6 distribution around SNS‑seq origins using the updated dataset (Supplementary Fig. 8C).

      With these improvements, we now find that:

      - Within ±2 kb, 12.9% (253) of SNS‑seq origins overlap with 25.6% of TbORC1/CDC6 binding sites.

      - Within ±3 kb, 18.8% (370) of SNS‑seq origins overlap with 37.4% of TbORC1/CDC6 binding sites.

      The updated metaplot shows a clear depletion of TbORC1/CDC6 signal at the origin center, with modest enrichment ~5 kb upstream and downstream. The underlying reason for this pattern remains unknown, and we agree that additional studies will be needed to understand it.

      b) Equally, the authors need to explain what they conclude from this analysis. They make a comparison with T. cruzi ORC1/CDC6 and SNS-seq overlap, which does not illuminate what the data tell us. For instance, if there is no or minimal overlap between ORC1/CDC6 binding and SNS-seq peaks within the polycistronic transcription units, do they conclude that the major SNS-seq signal they detail is evidence for ORC-independent DNA replication? If there is no overlap, what further evidence can they provide that these signals truly are origins?

      First, we would like to clarify that, to date, there is no evidence supporting ORC‑independent DNA replication in T. brucei, and—importantly—no published data demonstrating that TbORC1/CDC6 is universally required for DNA replication initiation. Because of this, we consider that it would be inappropriate to conclude that regions lacking detectable TbORC1/CDC6 signal undergo ORC‑independent initiation. We would prefer not to speculate in the absence of supporting evidence and would gratefully consider any reference the reviewer wishes to provide on this subject.

      Second, the low overlap between TbORC1/CDC6 binding sites and SNS‑seq origins does not, in our view, invalidate our mapping of replication initiation sites. Multiple factors contribute to this:

      (1) Low overlap between ORC1/CDC6 and origin‑mapping techniques has been repeatedly reported across kinetoplastids. For instance, in T. cruzi, 88.2% of origins detected by DNAscent nanopore sequencing showed no overlap with TcORC1/CDC6–Ty1 ChIP signal within ±3 kb, and only 11.7% co‑localized. This is strikingly similar to our observations in T. brucei. Thus, our data are consistent with the broader pattern in trypanosomatids rather than an exception.

      (2) The origin topology detected by stranded SNS‑seq is supported by several genomic characteristic found frequently in other eukaryotes, including:

      - A highly specific and polarized poly(dA)/poly(dT) sequence environment.

      - Strand‑specific G4 structures positioned around origin centers.

      - A conserved nucleosome‑depleted region flanked by well‑positioned nucleosomes.

      These features are absent from shuffled controls, appear at high significance, and recapitulate hallmark signatures of replication origins in other eukaryotes.

      Together, these findings give us confidence that the SNS‑seq peaks represent genuine origins - despite the incomplete overlap with TbORC1/CDC6 binding.

      Third, we fully agree with the reviewer that a definitive conclusion would require an additional, independent validation method.

      Given the lack of complete ORC subunit datasets and the unusual biology of trypanosomatid replication complexes, we believe that the cautious interpretation above is the most appropriate.

      c) The authors state (Discussion): "Validation of origins is generally a difficult task, particularly in trypanosomatids, where proteins involved in the initiation of DNA replication are difficult to determine. Few proteins have been described as potential ORC subunits (reviewed in 61), and none of them have been shown to be a specific marker that indicates the origins." There are two problems with the statement. First, most of the subunits of ORC have now been described in T. brucei; the authors should make this clear. Second, mapping of ORC1/CDC6 localisation, contrary to what the authors state here, shows precise correlation with the peaks of every MFA-seq signal described (see Tiengwe et al, Cell Reports, 2012); thus, ORC1/CDC6 binding provides evidence that MFA-seq is detecting origins, something that cannot be said for SNS-seq. The authors need to correct this misleading paragraph.

      As suggested, we have removed the paragraph from the Discussion to avoid confusion. However, we disagree with the reviewer's assessment and clarify below our position regarding the issues raised.

      First, we agree that five candidate ORC subunits have now been identified in T. brucei. Our intention was not to suggest the contrary, but rather to emphasize that, although candidate ORC components have been described, direct functional evidence for their roles in replication initiation is still limited. For this reason, we were cautious in referring to any ORC component as a definitive marker of replication origins.

      Second, regarding the reviewer’s statement that TbORC1/CDC6 binding “shows precise correlation with the peaks of every MFA‑seq signal”, we respectfully disagree based on several observations:

      (1) MFA‑seq does not identify individual origin centers, but rather broad replicated regions that often span hundreds of kilobases. By design, this method cannot define the number or position of discrete origins within each peak. For that reason, MFA-seq regions do not have the resolution required to validate TbORC1/CDC6 binding sites as individual origins.

      (2) In the published datasets (Tiengwe et al., Devlin et al.), no metaplots or locus‑wide quantification of the overlap between MFA‑seq peaks and TbORC1/CDC6 binding were provided. The coordinates or the approach used to define the discrete regions that they define as the originsin the MFA‑seq broad peaks have never been described or made available, making it difficult to evaluate the claimed correspondence.

      (3) Notably, McCulloch’s group later reported that only 4.4% of the 953 TbORC1/CDC6 sites overlapped with their 42 MFA‑seq “origins”, underscoring that the degree of correspondence is in fact limited (PMID: 29491738).

      (4) Finally, as noted in our response to point (1b), low overlap between ORC1/CDC6 binding sites and origin‑mapping techniques is a consistent observation across kinetoplastids, including T. cruzi, where DNAscent‑mapped origins show only ~12% overlap with TcORC1/CDC6 ChIP signals. This suggests that the limited overlap we observe is not unique to our dataset.

      For these reasons, we are not convinced that the TbORC1/CDC6 binding sites have been shown to align precisely with MFA seq peaks, nor that these datasets definitively validate origin mapping in T. brucei. Nevertheless, to avoid over‑interpretation and potential confusion, we have removed the paragraph from the Discussion as requested. We hope this clarifies our position and improves the accuracy and neutrality of the manuscript.

      (2) Like for ORC1/CDC6 localisation, the authors' evaluation of the relationship between MFA-seq and SNS-seq mapping is inadequate, and the depth of the analysis and discussion needs to be improved:

      a) The authors state: "We found 28-42% stranded SNS-seq origins overlapped with early and 43-55% overlapped with late S-phase MFA-seq replicated regions (Supplementary Figure 8B)." This seems important and provides (limited) validation of both datasets, but cannot be discerned from the supplied figure. Please provide a metaplot of the two datasets centred on the MFA-seq loci, including the SNS-seq peak amplitude.

      We would like to emphasize that MFA‑seq is not a method designed to map individual origins, and this fundamentally limits the interpretability of metaplots centered on MFA-seq regions. MFA‑seq identifies broad replication‑enriched domains, typically spanning 100–500 kb, within which multiple origins may fire asynchronously across the cell population.

      This concern is reinforced by the original MFA‑seq publications (Tiengwe et al., 2012; Devlin et al., 2016), which:

      - do not provide positional data for the 42-47 MFA‑inferred origins,

      - do not describe the computational method used to derive individual origin coordinates from the broad peaks, and

      - do not release any scripts or methodology that would allow independent reproduction of the claimed origin positions.

      Because of this, it is not possible to reconstruct or validate how the 42 MFA‑seq “origin” sites were defined, nor to use those coordinates as anchors for metaplot analyses.

      Most importantly, we disagree with the underlying assumption that each MFA‑seq peak corresponds to exactly one origin. This assumption runs counter to the principle of the technique, which identifies regions of higher DNA content in replicating cells than in non-replicating cells; it is also contradicted by our stranded SNS‑seq data and by DNA combing measurements:

      - SNS‑seq detects multiple discrete origins within the same genomic regions that produce a single broad MFA‑seq peak.

      - DNA combing reveals inter‑origin distances of ~36–422 kb (median ~150 kb) (PMID: 26976742), which is far shorter than the ~400–600 kb replication domains identified by MFA‑seq.

      - Furthermore, with only 42 origins detected by MFA-seq, it is not possible to achieve complete genome replication in T. brucei during S-phase. DNA combing has found that the average speed of replication forks in the procyclic forms is 1.9 Kb/min. (PMID: 26976742). Dividing the size of the Trypanosoma brucei brucei TREU927 genome (26.1 Mb) by 42 origins (PMID: 22840408) shows that 621 Kb must be replicated during the S phase. Using the calculated average replication speed of 1.9 Kb/min, we can estimate that the replication of 621 Kb would take 327 min (5.45 hours) (621 Kb/1.9 Kb/min = 327 min). However, this exceeds the estimated length of the S-phase in these parasites, which is 2.31 hours (138.6 minutes) (PMID: 32397111, 31811174, 28258618) or less, 1.36 hours (PMID: 2190996, 10574712) in Trypanosoma brucei procyclic forms. Therefore, more than 42 origins are necessary to complete replication during the short S phase.

      This makes it unlikely that MFA-seq regions represent single functional origins. For these reasons, a metaplot centered on MFA‑seq “loci” may lead to misinterpretations and would not provide biologically meaningful information.

      We hope that the expanded explanation clarifies our interpretation of the relationship between these two complementary, but fundamentally different, methods.

      b) The authors state that "Our results showed that the origins are predominantly located in the intergenic regions within the PTUs (Figure 2C)'. This finding cannot be discerned from this figure, which does not show 'strand switch regions' (SSRs; transcription start/stop sites), where MFA-seq predicts all origins to localise. The authors need to acknowledge this difference and must show a comparison of SNS-seq data, including peak amplitude, around all SSRs (whether predicted by MFA-seq to act as origins or not, since all appear to bind ORC1/CDC6).

      We have now provided the metaplots showing the overlap between stranded SNS-seq origins and SSRs (see Supplementary Figure 8D). This difference has been acknowledged and discussed in the revised manuscript.

      c) Finally, the authors' interpretation that around 30-55% of SNS-seq peaks overlap with MFA-seq 'origins' is highly questionable. MFA-seq peaks are regions of increased DNA content in replicating cells relative to non-replicating cells, and so the entire region under the MFA-seq peak is not necessarily an origin, but is likely to be a more discrete locus (eg, the SSR, where ORC1/CDC6 mainly localises). They should correct the wording and discuss what significance they see in this overlap; for instance, do they think SNS-seq 'clusters' are more pronounced within the MFA-seq peaks and, if so, what might this mean, and why does it not correlate with ORC1/CDC6 localisation?

      As the reviewer notes, ‘MFA‑seq peaks are regions of increased DNA content, and so the entire region under the MFA-seq peak is not necessarily an origin but is likely to be a more discrete locus’. This is exactly why MFA‑seq is inappropriate for identifying discrete/individual origins: within these replicated domains, multiple origins can fire, as revealed both by stranded SNS‑seq mapping.

      Regarding the overlap between SNS‑seq origins and MFA‑seq peaks, we agree with the reviewer that this overlap should not be interpreted as validating MFA‑seq “origin positions.” Instead, we now describe it more accurately as the proportion of discrete SNS‑seq origins that fall within broader MFA‑seq replication domains. This is expected, because SNS‑seq identifies individual initiation events, whereas MFA‑seq identifies S‑phase replication domains averaged across a population. Our stranded SNS‑seq data do not show enhanced origin accumulation within MFA-seq regions, and we find no correlation with TbORC1/CDC6 positions. This is now discussed.

      Regarding SSRs, we do not share the view that they should be considered privileged initiation sites. After remapping the TbORC1/CDC6 ChIP‑on‑chip dataset (see above) to the T. brucei Lister 427–2018 genome (Supplementary Fig. 8A), we observed that TbORC1/CDC6 binding is distributed throughout the chromosomes, not restricted to SSRs. To quantify this, we analyzed the overlap between TbORC1/CDC6 sites and all annotated SSR classes (dSSRs, cSSRs, and head‑to‑tail regions, as defined in Kim et al. 2009). The results show that:

      Only 10% of TbORC1/CDC6 binding sites fall within 40% of all SSRs.

      At the level of individual SSR types:

      - TTS: 3.3% of TTS overlap with 0.3% of TbORC1/CDC6 sites.

      - TSS: 67% of TSS overlap with 6.1% of TbORC1/CDC6 sites.

      - Head‑to‑tail regions: 54.2% overlap with 3.6% of TbORC1/CDC6 sites.

      These analyses demonstrate that most TbORC1/CDC6 sites are not located at SSRs, contradicting the idea that SSRs represent primary or exclusive origin sites.

      Author response image 1.

      Overlap between TbORC1/CDC6-12Myc binding sites (Tiengwe 2012, Cell Reports) and strand‑switch regions (SSRs). Venn diagram showing the overlap of 990TbORC1/CDC6-12Mycbinding sites (Retrieved from TritrypDB filtered at score 22 to achieve a number of binding sites similar to the one (953 binding sites) published in Tiengwe 2012, Cell Reports) and SSR sites in the genome (Kim 2018, NAR). The intersection shows that 10.3% of Orc1/CDC6 binding sites overlap with 41.8% SSRs. The intersection is subdivided into TSS (orange), TTS in (blue) and HT in (green).

      (3) A key objection to the data presentation is the decision to limit SNS-seq mapping to the intergenic regions. In addition to overlooking the SSRs (see above, 2), so-called subtelomeres, which account for nearly 50% of the T. brucei genome and are largely untranscribed, are not shown or discussed at all. Providing this data will improve clarity and also provide a key test of one of the predictions that the authors make: "most origins are localized in actively transcribed regions, which could lead to collisions between DNA replication and the transcription machinery. This spatial coincidence implies that transcription and replication must occur in a highly ordered and cooperative manner in T. brucei."

      We do not understand why this reviewer concluded that we took 'the decision to limit the mapping of SNS-seq to intergenic regions'. This is a factual error.

      To be clearer,

      (2) We now explicitly present the distribution of SNS‑seq origins across core and subtelomeric regions in the revised Figure 2D, making clear that origin mapping was performed genome‑wide.

      (2) And that SNS‑seq origins are also present in subtelomeric regions. We have revised the manuscript to avoid any implication that origin firing is restricted only to actively transcribed regions. Our data show that most SNS‑seq origins lie within intergenic regions of PTUs, but a minority are found outside these regions—including subtelomeres and SSRs. The revised text reflects this nuance and highlights that the spatial relationship between transcription and replication is strong but not exclusive.

      These additions undoubtedly ensure that the genomic-wide nature of SNS-seq analysis is transparent to the reader and should therefore remove this reviewer's “key objection”.

      a) The authors must show SNS-seq mapping to the subtelomeres (in addition to around the SSRs; see comment (2). If no SNS-seq peaks are detected in the subtelomeres, what do the authors conclude about how the genome is duplicated? If SNS-seq peaks are detected in the subtelomeres, do they correspond with the ordered nucleosomes in this part of the genome described by Maree et al (PMID: 28344657); if so, might SNS-seq signal localisation not be directed by transcription but chromatin?

      We have now presented the proportion of origins in subtelomeric regions (see Figure 2B).

      As illustrated in the metaplots in Author response image 2, the distribution of nucleosomes around the subtelomeric origins is similar to the distribution shown for all origins in the manuscript. We do not see the pattern of nucleosomes as described by Maree et al (PMID: 28344657) over ORC1/CDC6 binding sites in this part of the genome.

      Author response image 2.

      Metaplots showing the mean nuclesome signal over centred SNS-seq origins in subtelomeric regions. Two replicates from Maree et al 2019 (PMID: 28344657).

      We never claimed that transcription directs the localisation of the SNS-seq signal. We did not conduct experiments to address this issue. In contrast, we consider that the organisation of chromatin exerts a significant influence on the selection of active origins.

      (4) The major conclusion of the manuscript is that the SNS-seq signal corresponds very precisely to the locations of RNA-DNA hybrids (R-loops). Given all the limitations discussed above, can the authors rule out the possibility that SNS-seq is merely mapping DNA-DNA hybrids and is not, in fact, detecting origins?

      a) It is legitimate to speculate about the possibility that the very extensive overlap between SNS-seq and DRIP-seq signals within polycistronic transcription units (between ORFs) might suggest that DRIP-seq data detects nascent strands at replication origins, rather than R-loops at sites of pre-mRNA processing, as previously suggested by Briggs et al (PMID: 30304482). (eg, 'we disclosed for the first time a strong link between R-loop formation and DNA replication initiation'; 'The RNA:DNA hybrids are formed at initiation sites by RNA priming of SNS and Okazaki fragments'). However, the authors should acknowledge that alternative explanations for the localisation and potential functions of inter-CDS R-loops have been suggested,

      We do not find extensive overlap between stranded SNS-seq and DRIP-seq signal. We have observed only a minor proportion (1.7%) of the previously reported DRIP-seq signal to overlap with the origins detected by stranded SNS-seq. The RNA-primed SNS must form RNA:DNA hybrids during the initiation of DNA replication, and that an enrichment of these hybrids around the origins is expected. Therefore, we legitimately speculated that this minor proportion of RNA:DNA hybrids enriched around origin centres could be due to the origin activation.

      We agree that some of the DRIP-seq signals detected around the origins may be sites of pre-mRNA processing, as previously suggested by Briggs et al. (PMID: 30304482). Since there is no data proving implication of pre-mRNA processing into DNA replication initiation we prefer not to speculate about it.

      b) More importantly, the authors should provide experimental evidence that tests such a mechanistic prediction of R-loops and origins: for instance, have they attempted to remove R-loops, eg, by treatment with RNase H, and checked that the SNS-seq signal is unaltered? In the absence of such data, they cannot exclude the possibility that their work has revealed an overlooked problem with SNS-seq (which may not be limited to T. brucei; are matched DRIP-seq and SNS-seq datasets available to correlate these signals in a range of organisms?).

      We have not attempted RNase H treatment for a fundamental methodological reason: it seems highly improbable that RNA:DNA hybrids would persist through the multiple denaturation steps inherent to the SNS‑seq enrichment protocol. Published biophysical measurements show that RNA:DNA hybrids melt at ~95 °C (Roberts & Crothers, Science, 1992; PMID: 1279808), which is the temperature repeatedly applied during SNS isolation. Under these conditions, persistent RNA:DNA hybrids cannot remain intact and therefore cannot be responsible for the SNS‑seq peaks detected.

      We do not interpret our findings as revealing an “overlooked problem with SNS‑seq.” Instead, we consider that the enrichment of RNA:DNA hybrids around origins observed in DRIP‑seq is biologically meaningful and expected, given that replication initiation involves RNA‑primed nascent strands and that DRIP‑seq detects such structures.

      Reviewer #2 (Recommendations for the authors):

      I have some minor concerns that do not affect the main conclusions of the manuscript:

      (1) Figure 2B: The regions shown in the heatmap have different sizes, and I presume that the regions are ordered by size on the y-axis? If so, does the cone-shaped pattern, which is origin-less for genic regions and origin-enriched for intergenic regions, arise from the size of the regions? (I.e., for each genic region, the region itself is origin-less and the flanking intergenic regions contain origins.) If this is the case, then the peaks/valleys, centered exactly on the center of the regions on the mean frequency plots, arise from the different sizes of the analyzed regions, not from the fact that origins are mostly found at the center of intergenic regions.

      That is correct. The regions displayed in the heatmaps are genic and intergenic region sorted by size. We did not want to convey with this metaplot that the origins are accumulating at the centres of the intergenic region but mainly that genic regions are mostly devoid of origins and the intergenic regions enriched in origins.

      (2) Line 123, "and the average length of origins was found to be approximately 150 bp.": To determine origins, the authors filter away overlapping peaks and peaks that are too far from each other. Both restrict the minimal and maximal length of origins that can be observed, and this, in turn, affects the average length.

      This observation is correct. By applying filtering and setting the maximum distance between the positive and negative peaks, we are most likely affecting the average length by excluding origins that are potentially wider. Nevertheless, the violin plot shows that the majority of origins are shorter than 500 nt. In the end, the size of regions detected as the origin is not important. What gives the resolution of stranded-SNS-seq is the ability to identify the centre of the origin between the minus and plus peaks.

      (3) Data in the manuscript were sometimes not presented in an easy-to-read manner. In some cases, this was due to benign things, such as missing labels for the mean frequency plots (e.g., Figure 2B, blue and green) or very small fonts for axes (Figure 2B). Sometimes, due to the plot types that were chosen, such as pie-charts (Figure 2C, see https://medium.com/analytics-vidhya/dont-use-pie-charts-in-data-analysis-6c005723e657), stacked bar plots (Figure 6B), or showing cumulative distributions (Figure 5C, and Figure 2D) it makes it difficult to judge the actual distribution.

      Wherever possible, the size of the small fonts was increased to the maximum. Missing labels were added to the mean frequency plots. We increased the font size for the axes in the frequency plots.

      However, we found cumulative distributions useful. If you have a more specific proposal for replacing cumulative distributions, we would be very grateful to hear it. We also hope that magnifying the figures in TIFF format with a higher resolution will improve visibility.

      (4) Figure 2B: This data would be better presented with all regions stretched to the same size (the reason is explained in the public review).

      We performed the scaled plots for the stranded SNS-seq origins over the genic and intergenic regions as the reviewer suggested (see Author response image 3), but we prefer to keep the unscaled versions in the manuscript.

      Author response image 3.

      Distribution of mapped origins in scaled genic and intergenic regions. Scaled heatmaps present the distribution of the mapped origins and shuffled controls within scaled genic and intergenic regions (± 2 kb).

      (5) Line 149: "The number of origins in both cells was 148 compared using normalised mapped reads": Supplementary Figure 2D mentions that conditions were subsampled to the same amount. I would mention that explicitly in the main text ("compared using normalized, subsampled mapped reads"), as 'normalizing' would not include 'subsampling' for me. Also, I could not find the methods section that the authors refer to here.

      Thanks for the suggestion. We changed the text to make this point clearer. In the methods section, the subsampling process was referred to as 'PCF down-sampling', but we changed now the name to 'Read sub-sampling' to be more consistent in the edited version of the manuscript.

      (6) Figure 2C: I struggled to understand what gDNA stands for. Maybe it could be replaced with something like distribution in genome?

      Thanks for this suggestion. It is changed to ‘distribution in genomic sequence’.

      (7) Figure 5C: I cannot see how a G4 30 kb from an origin could be relevant. This also does not fit the scale of the author's own model at all (Figure 8).

      The main goal of Figure 5C was to demonstrate the differences between origins and the nearest G4s compared to the shuffled controls. The graph shows that 50% of the origins have a G4 within 2010 bp, whereas the median for the shuffled control is 4154 bp in the case of non-stabilised G4s. Our model is based on Figure 5D, which illustrates the enrichment of G4s and poly(dA) around the centre of origins.

      (8) Figure 6B: could be made supplementary in my opinion. All relevant data is repeated in panel D.

      It is true that Figures 6B and 6C contain some repetition. However, we would prefer to keep Figure 6B because it provides a quantification of the six indicated categories, along with the statistical tests. Figure 6B only presents the three categories that changed significantly. Figure 6D shows distribution but does not contain quantified data.

      (9) Figure 6D: This plot is repeating a lot, within single figures (Figure 6A, top) but also between figures (e.g., Figure 5D, Figure 4B). I'd prefer it if the initial plots of each figure were expanded a bit (here Figure 6A, top) to include some information from the previous figures. Then all these summary plots could be combined into a single figure at the very end (maybe still as different panels to reduce the number of lines in a single plot). Otherwise, each summary plot repeats the tracks of the previous, which becomes very repetitive.

      Our model is based on these summary plots, and we calculated the relative distances between the different elements using them. Two elements were repeated in each plot: the positions of poly(dA) and G4s. These two elements served as reference points to determine the relative positions of the other elements. Following your suggestion would result again in repetitive summary plots at the end, as one combined summary plot would be overloaded with lines and difficult to understand.

      (10) Figure 6D & Figure 7C: Both show predicted G4s; however, on the plus strand, one prediction has a two-peaked shape, the other only a single peak. Is this a mistake?

      The graphs for the predicted G4s do not have the same shape in the two plots as they were performed in different reference genomes for T. brucei. Figure 6C is in the 427-reference genome as the MNase-seq data set was analysed in this reference genome and we re-did the SNS-seq analysis and the G4 prediction in this reference genome to be able to compare them directly. In Figure 7C we are comparing origins DRIP-seq and predicted G4s, in this case all datasets could be compared in the 427-2018 reference genome.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Adapting Clinical Chemistry Plasma as a Source for Liquid Biopsies" addresses a timely and practical question: whether residual plasma from heparin separator tubes can serve as a source of cfDNA for molecular profiling. This idea is attractive, since such samples are routinely generated in clinical chemistry labs and would represent a vast and accessible resource for liquid biopsy applications. The preliminary results are encouraging, but in its current form, the study feels incomplete and requires additional work.

      We thank the reviewer for the encouragement and for recognizing the potential of clinical chemistry plasma as an accessible source for cfDNA-based analyses. To address concerns about incompleteness, we conducted additional controlled experiments and a more thorough literature review.

      My major concerns/suggestions are as follows:

      (1) Context and literature

      The introduction provides only limited background on prior attempts to use heparinized plasma for cfDNA work. It is well known that heparin can inhibit PCR and sequencing library preparation, which has historically discouraged its use. The authors should summarize the relevant literature more comprehensively and explain clearly why this approach has not been widely adopted until now, and how their work differs from or overcomes these earlier challenges.

      Thank you, we agree that the review of prior work requires expansion. In the revised manuscript, we expanded the introduction to focus on prior studies and their gaps (lines 53-80).

      (2) Genome-wide coverage

      The analyses focus on correlations in methylation patterns and fragmentation metrics, but there is no evaluation of sequencing coverage across the genome. For both WGS and WMS, it would be important to demonstrate whether cfDNA from heparin plasma provides unbiased coverage, or whether certain genomic regions are systematically under-represented. A comparison against coverage profiles from cell-derived DNA (e.g., PBMC genomic DNA) would help to put the results in context and assess whether the material is suitable for whole-genome applications.

      Thank you for raising this point. We agree that genome-wide coverage distributions should be evaluated alongside correlations in methylation and fragmentation metrics when assessing the effects of sample tube types.

      To address this, we pooled the five healthy subjects in the Tube Comparison Study by tube type to generate two high-depth reference BAMs (EDTA vs. heparin separator). We calculated the mean depth per 1Mb bin across Chr1-22 and normalized with z-score. Overall, the heparin separator samples showed coverage profiles comparable to the matched EDTA samples (Pearson’s r = 0.9988, Spearman’s ρ = 0.9994). The figure has now been added as Supplementary Figure 1.

      Also appreciate the suggestion to compare against gDNA. However, cfDNA and gDNA are expected to exhibit different coverage patterns because cfDNA undergoes non-random fragmentation during its generation and degradation, which makes a direct cfDNA–gDNA comparison difficult to interpret in terms of tube-related bias.

      (3) Viral detection sensitivity

      The study shows strong concordance in viral detection between EDTA and heparin samples, but the sensitivity analysis is lacking. For clinical relevance, it is critical to demonstrate how well heparin-derived plasma performs in low viral load cases. A quantitative comparison of viral read counts and genome coverage across tube types would strengthen the conclusions.

      We agree that evaluating low viral loads is important for test development. While our goal is to evaluate the repurposing of residual plasma from the heparin separator, rather than to establish the analytical sensitivity, we recruited additional paired cases (n=4) together with viral reads below 10 RPM from existing cases (n=12) and examined the correlation of viral read counts between EDTA and heparin separators in this subset. As shown in Author response image 1, viral RPM is strongly correlated between tube types (Pearson’s r = 0.93, P < 0.0001), supporting that the heparin-derived plasma yields quantitatively consistent viral reads relative to EDTA samples. We have updated our sample sheet in Supplementary Table 1 and Fig. 3 accordingly.

      Author response image 1.

      Viral load correlation in cases below 10 RPM

      Reviewer #2 (Public review):

      Summary:

      The authors propose that leftover heparin plasma can serve as a source for cfDNA extraction, which could then be used for downstream genomic analyses such as methylation profiling, CNV detection, metagenomics, and fragmentomics. While the study is potentially of interest, several major limitations reduce its impact; for example, the study does not adequately address key methodological concerns, particularly cfDNA degradation, sequencing depth limitations, statistical rigor, and the breadth of relevant applications.

      We thank the reviewer for the insightful comments. In the revised manuscript, we added controlled experiments specifically designed to address the concerns regarding cfDNA degradation. We have also addressed other concerns in the responses below.

      Strengths:

      The paper provides a cheap method to extract cfDNA, which has broad application if the method is solid.

      We thank the reviewer for the encouraging comment.

      Weaknesses:

      (1) The introduction lacks a sufficient review of prior work. The authors do not adequately summarize existing studies on cfDNA extraction, particularly those comparing heparin plasma and EDTA plasma. This omission weakens the rationale for their study and overlooks important context.

      Thank you for this important point. We have expanded the introduction to include a thorough review of relevant prior studies (lines 53-80).

      (2) The evaluation of cfDNA degradation from heparin plasma is incomplete. The authors did not compare cfDNA integrity with that extracted from EDTA plasma under realistic sample handling conditions. Their analysis (lines 90-93) focuses only on immediate extraction, which is not representative of clinical workflows where delays are common. This is in direct conflict with findings from Barra et al. (2025, LabMed), who showed that cfDNA from heparin plasma is substantially more degraded than that from EDTA plasma. A systematic comparison of cfDNA yields and fragment sizes under delayed extraction conditions would be necessary to validate the feasibility of their proposed approach.

      The concern about degradation is very reasonable based on the literature. In the revised manuscript, we added a controlled experiment mimicking the real-world clinical specimens unprocessed at room temperature.

      In the controlled experiment with delayed processing, paired EDTA and heparin separator tubes from the same blood draw from 6 volunteers were processed with the first soft spin (1600g 10min) after room temperature or 4°C delays (0, 1, 3, and 24 hours) to simulate the real-world delayed processing at the inpatient hospital setting, and then the original tubes were kept in 4°C for a week before the second spin (16000g 10min) to simulate the delayed processing at the research laboratory (Fig. 2). This simulation cannot mimic the outpatient or remote clinic setting that requires transportation. Therefore, we noted this caveat in the Discussion and Abstract.

      From our results, EDTA samples remained largely stable across all test settings (Author response image 2). In contrast, heparin separator tubes held at room temperature showed a clear time-dependent shift in fragmentation, with the most pronounced degradation at 24 hours. Importantly, heparin separator samples processed within a short pre-centrifugation window (for example, within 3 hours) and maintained refrigerated thereafter showed only minimal changes relative to the time 0 controls (Author response image 3). We have updated the Discussion to emphasize this short window plus refrigeration condition as a practical boundary for fragmentomics in heparin separator tubes.

      We addressed the work of Barra et al. (2025, LabMed) in the introduction. In that study, whole blood in heparin tubes was first soft spun and then incubated at 37°C for 24 hours, leading to severe DNA fragmentation. Our data agrees: two matched 37°C, 24-hour pairs of samples produced similar severe fragmentation in heparinized blood (Author response image 4). However, this is not representative of routine (Stanford/UCSF) clinical transport and processing. We revised the manuscript to emphasize that heparin separator tubes are most suitable for downstream cfDNA fragmentomic analyses when the pre-centrifugation interval is minimized and samples are maintained refrigerated before processing whenever feasible.

      Author response image 2.

      Size distribution and end motif rank concordance in EDTA tubes across conditions. Left panels show fragment size distributions. The right panels show the corresponding scatter plots comparing end-motif abundance rankings between conditions. E0, EDTA processed immediately; E4T24, EDTA incubated at 4°C for 24 h; ERT24, EDTA incubated at room temperature for 24 h.

      Author response image 3.

      Size distribution and end motif rank concordance in Heparin separators across conditions. Left panels show fragment size distributions. The right panels show scatter plots comparing end-motif abundance rankings between conditions. H0, heparin processed immediately; H4T1/H4T3/H4T24, heparin incubated at 4°C for 1, 3, or 24 h; HRT1/HRT2/HRT3/HRT24, heparin incubated at room temperature for 1, 2, 3, or 24 h.

      Author response image 4.

      Size distribution and end motif rank concordance in extreme incubation conditions. Left panels show fragment size distributions. The right panels show scatter plots comparing end-motif abundance rankings between conditions. H0, heparin processed immediately; H37T24, heparin incubated at 37°C for 24 h.

      (3) The comparison of methylation profiles suffers from the same limitation. The authors do not account for cfDNA degradation and the resulting reduced input material, which in turn affects sequencing depth and data quality. As shown by Barra et al., quantifying cfDNA yield and displaying these data in a figure would strengthen the analysis. Moreover, the statistical method applied is inappropriate: the authors use Pearson correlation when Spearman correlation would be more robust to outliers and thus more suitable for methylation and other genomic comparisons.

      We appreciate the reasonable concerns regarding cfDNA degradation and agree that the methylation profile is not a metric for degradation. This point regarding measuring degradation is addressed with new experiments and in our above response to comment (2). We appreciate the suggestion to use Spearman correlation, and we have now incorporated Spearman’s ρ into the updated figures.

      (4) The CNV analysis also raises concerns. With low-coverage WGS (~5X) from heparin-derived cfDNA, only large CNVs (>100 kb) are reliably detectable. The authors used a 500 kb bin size for CNV calling, but they did not acknowledge this as a limitation. Evaluating CNV detection at multiple bin sizes (e.g., 1 kb, 10 kb, 50 kb, 100 kb, 250 kb) would provide a more complete picture. In addition, Figure 3 presents CNV results from only one sample, which risks bias. Similar bias would exist for illustrations of CNVs from other samples in the supplementary figures provided by the authors. Again, Spearman correlation should be applied in Figure 3c, where clear outliers are visible.

      We appreciate the reviewer’s constructive comments regarding the CNV analysis. We added an analysis using 50kb as the bin size (data uploaded to Zenodo). Across matched CNV-positive samples, the CNV patterns remained consistent across tube types, while the expected higher noise was observed. We did not extend the bin size to 1-10kb because at ~5x coverage, such resolution would mainly be noise, rendering the results uninterpretable for CNV calling.We agree that illustrative examples alone are insufficient and that quantitative measures are required. To address this concern, we evaluated concordance across all paired cases by measuring the copy ratio and calculating the Spearman correlation (Fig. 4b). CNV-positive samples had high concordance (n = 6, Spearman’s ρ=0.72-0.96) between tube types and were used primarily for interpretation. Low correlations in CNV-negative samples are not unexpected and were not used for interpretation. In these samples, log2 ratios across all bins cluster tightly around zero in both tube types. Correlation coefficients are highly sensitive to minor fluctuations, thus not informative of biological concordance.

      (5) It is important to point out that depth-based CNV calling is just one of the CNV calling methods. Other CNV calling software using SNVs, pair-reads, split-reads, and coverage depth for calling CNV, such as the software Conserting, would be severely affected by the low-quality WGS data. The authors need to evaluate at least two different software with specific algorithms for CNV calling based on current WGS data.

      We appreciate this suggestion. We used another popular and independent CNV caller, CNVkit, in addition to ichorCNA. Although both methods use sequencing depth, they differ in their segmentation algorithm. ichorCNA uses a hidden Markov model-based segmentation optimized for low-pass cfDNA WGS, whereas CNVkit uses circular binary segmentation by default and works well with targeted panels. The CNVkit results are also consistent across different tube types. We have added the CNVkit results to Supplementary Fig. 3.

      (6) The authors omit an important application of cfDNA: somatic mutation detection. Degraded cfDNA and reduced sequencing depth could substantially impact SNV calling accuracy in terms of both recall and precision. Assessing this aspect with their current dataset would provide a more comprehensive evaluation of heparin plasma-derived cfDNA for genomic analyses.

      We thank the reviewer for highlighting somatic SNV detection as an important cfDNA application. Robust SNV benchmarking typically requires larger plasma input and substantially deeper, targeted sequencing than is feasible with remnant chemistry specimens. In routine workflows, chemistry testing leaves only ~0.5–2 mL residual plasma per tube, which limits the achievable depth for sensitive SNV calling. We have added this limitation to the Abstract and the Discussion (lines 281-285) and clarified that our goal is to repurpose heparin separator residual plasma as a complementary resource to expand biobanking, rather than to replace collection protocols optimized for mutation testing.

      Reviewer #2 (Recommendations for the authors):

      The manuscript does not seem to have been edited thoroughly prior to submission. For example, at lines 94-97, the line spacing is double, which is apparently different from the other surrounding lines. In addition, Figure 5a contains a wrong label of "|y=x" at its top. Figure 5b strongly suggests that Spearman, but not Pearson correlation, should be appropriate for the analysis.

      We thank the reviewer for carefully noting these formatting and labeling issues. Corrections for all points are made in the revised version.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates the biological mechanism underlying the assembly and transport of the AcrAB-TolC efflux pump complex. By combining endogenous protein purification with cryo-EM analysis, the authors show that the AcrB trimer adopts three distinct conformations simultaneously and identify a previously uncharacterized lipoprotein, YbjP, as a potential additional component of the complex. The work aims to advance our understanding of the AcrAB-TolC efflux system in near-native conditions and may have broader implications for elucidating its physiological mechanism.

      Strengths:

      Overall, the manuscript is clearly presented, and several of the datasets are of high quality. The use of natively isolated complexes is a major strength, as it minimizes artifacts associated with reconstituted systems and enables the discovery of a novel subunit. The authors also distinguish two major assemblies-the TolC-YbjP sub-complex and the complete pump-which appear to correspond to the closed and open channel states, respectively. The conceptual advance is potentially meaningful, and the findings could be of broad interest to the field.

      Weaknesses:

      (1) As the identification of YbjP is a key contribution of this work, a deeper comparison with functional "anchor" proteins in other efflux pumps is needed. Including an additional Supplementary Figure illustrating these structural comparisons would be valuable.

      We have expanded the comparative analysis between YbjP and established anchoring or accessory components in other efflux pumps, and we have added Supplementary Figure S3 to illustrate these structural relationships.

      (2) The observation of the LTO states in the presence of TolC represents an important extension of previous findings. A more detailed discussion comparing these LTO states to those reported in earlier structural and biochemical studies would improve the clarity and significance of this point.

      In the revised manuscript we have expanded our discussion of the LTO conformations, including a direct comparison with previously reported structural and biochemical observations, to better contextualize the significance of our findings.

      Reviewer #2 (Public review):

      Summary:

      This manuscript reports the high-resolution cryo-EM structures of the endogenous TolC-YbjP-AcrABZ complex and a TolC-YbjP subcomplex from E. coli, identifying a novel accessory subunit. This work is an impressive effort that provides valuable structural insights into this native complex.

      Strengths:

      (1) The study successfully determines the structure of the complete, endogenously purified complex, marking a significant achievement.

      (2) The identification of a previously unknown accessory subunit is an important finding.

      (3) The use of cryo-EM to resolve the complex, including potential post-translational modifications such as N-palmitoyl and S-diacylglycerol, is a notable highlight.

      Weaknesses:

      (1) Clarity and Interpretation: Several points need clarification. Additionally, the description of the sample preparation method, which is a key strength, is currently misplaced and should be introduced earlier.

      We have reorganized the text to introduce the sample preparation strategy earlier and clarify the points that may cause ambiguity.

      (2) Data Presentation: The manuscript would benefit significantly from improved figures.

      We agree and have revised the figures to improve clarity, consistency, and readability. Additional schematic illustrations have been included.

      (3) Supporting Evidence: The inclusion of the protein purification profile as a supplementary figure is essential. Furthermore, a discussion comparing the endogenous AcrB structure to those obtained in other systems (e.g., liposomes) and commenting on observed lipid densities would strengthen the overall analysis.

      We appreciate these suggestions. We added the purification profile to Supplementary Figure S1 and expanded the comparison between our endogenous AcrB structure and previously reported structures from reconstituted systems, including a more detailed discussion of lipid densities.

      Reviewer #3 (Public review):

      Summary:

      The manuscript "Structural mechanisms of pump assembly and drug transport in the AcrAB-TolC efflux system" by Ge et al. describes the identification of a previously uncharacterized lipoprotein, YbjP, as a novel partner of the well-studied Enterobacterial tripartite efflux pump AcrAB-TolC. The authors present cryo-electron microscopy structures of the TolC-YbjP subcomplex and the complete AcrABZ-TolC-YbjP assembly. While the identification and structural characterization of YbjP are potentially novel, the stated focus of the manuscript-mechanisms of pump assembly and drug transport - is not sufficiently addressed. The manuscript requires reframing to emphasize the principal novelty associated with YbjP and significant development of the other aspects, especially the claimed novelty of the AcrB drug-efflux cycle.

      Strengths:

      The reported association of YbjP with AcrAB-TolC is novel; however, a recent deposition of a preceding and much more detailed manuscript to the BioRxiv server (Horne et al., https://doi.org/10.1101/2025.03.19.644130) removes much of the immediate novelty.

      Weaknesses:

      While the identification of YbjP is novel, the authors do not appear to acknowledge the precedence of another work (Horne et al., 2025), and it is not cited within the correct context in the manuscript.

      We thank the reviewer for raising this important point regarding the independent nature of our work.

      Our study indeed progressed independently. The process began with our purification of an endogenous protein sample containing the AcrAB-TolC efflux pump. During our cryo-EM analysis, we observed an unassigned density in the map, for which we built a preliminary main-chain model. A subsequent search of structural databases, including AlphaFold predictions, allowed us to identify this density as the protein YbjP. It was only after this identification that we became aware of the related preprint by Horne et al. on BioRxiv (Posted March 19, 2025).

      Therefore, our structural determination of YbjP was conducted entirely independently. We fully acknowledge and respect the work by Horne et al. and have already cited their preprint in our manuscript. While their detailed structural data, maps, and coordinates were not publicly available as of March 13, 2026, we have described their findings appropriately. We agree that our manuscript can better reflect this context and will carefully check for any missing citations to ensure that their contribution is properly and clearly acknowledged.

      We also believe that the two studies are mutually complementary and collectively reinforce the emerging understanding of YbjP.

      Several results presented in the TolC-YbjP section do not represent new findings regarding TolC structure itself.

      We agree that the TolC features we describe are consistent with previously reported structural characteristics. However, these observations could only be confirmed in the context of the newly determined TolC–YbjP subcomplex, which was not available prior to this study. We have clarified this point in the revision to avoid overstating novelty.

      The structure and gating behaviour of TolC should be more thoroughly introduced in the Introduction, including prior work describing channel opening and conformational transitions.

      We appreciate this suggestion and agree that a more comprehensive overview of TolC gating and conformational transitions will strengthen the Introduction. We have revised the text to incorporate relevant prior structural and functional studies.

      The current manuscript does not discuss the mechanistic role of helices H3/H4 and H7/H8 in channel dilation, despite implying that YbjP binding may influence these features.

      Thank you for this comment. The primary novel contributions of this manuscript are the identification of YbjP and the structural characterization of AcrB in three distinct states. The discussion of the dilation mechanism, while included because we observed the closed TolC-YbjP state, is a secondary point. In the revised manuscript, we have expanded this discussion as suggested.

      Only the original closed TolC structure is cited, and the manuscript does not address prior mutational studies involving the D396 region, though this residue is specifically highlighted in the presented structures.

      We appreciate the reviewer drawing attention to this oversight. We have added citations to the relevant mutational and mechanistic studies, including those involving the D396 region, and more clearly discussed these findings in relation to our structural observations.

      The manuscript provides only a general structural alignment between the closed TolC-YbjP subcomplex and the open TolC observed in the full pump assembly. However, multiple open, closed, and intermediate conformations of AcrAB-TolC have already been reported. Thus, YbjP alone cannot be assumed to account for TolC channel gating. A systematic comparison with existing structures is necessary to determine whether YbjP contributes any distinct allosteric modulation.

      We agree with the reviewer’s assessment and appreciate the constructive suggestion. In our revised manuscript, we have expanded the structural comparison to include previously reported open, closed, and intermediate AcrAB–TolC conformations. This expanded analysis will more clearly position our findings within the existing structural framework.

      The analysis of AcrB peristaltic action is superficial, poorly substantiated and importantly, not novel. Several references to the ATP-synthase cycle have been provided, but this has been widely established already some 20 years ago - e.g. https://www.science.org/doi/10.1126/science.1131542.

      We thank the reviewer for this comment. We fully acknowledge the foundational studies that established the AcrB functional cycle and its analogy to the ATP-synthase mechanism. While previous work indeed defined the LTO (Loose, Tight, Open) cycle of AcrB, those structures were obtained using AcrB in isolation. In contrast, our endogenous sample, which includes the native constraints of AcrA from above and the presence of AcrZ, reveals conformational changes in the transmembrane and porter domains that differ from those previously reported. We interpret these differences as reflecting a more physiologically relevant mechanism. In our revision, we provided a detailed discussion to contextualize these distinctions within the existing literature.

      The most significant limitation of the study is the absence of functional characterization of YbjP in vivo or in vitro. While the structural association between YbjP and TolC is interesting, the biological role of YbjP remains unclear.

      To explore the potential physiological role of YbjP, we compared the viability of a ΔybjP mutant in the E. coli C600 background with that of the wild-type C600 strain under ciprofloxacin (CIP) stress. However, we did not observe a detectable difference in survival between the two strains under the tested conditions. This result is consistent with the assay reported in the preprint mentioned by the reviewer, although the stress conditions used in that study differ from ours.

      Author response image 1.

      To further address this point, we have added a new Supplementary Figure S3 comparing outer membrane proteins with structural and functional similarities to TolC. As shown in this analysis, many such proteins contain an extracellular loop that appears to help anchor or stabilize them within the outer membrane. Notably, TolC lacks such a loop, whereas YbjP contains a corresponding loop region, suggesting that YbjP may potentially play a role in stabilizing or positioning TolC in the outer membrane.

      While our current experiments did not reveal a clear phenotype under CIP stress, the structural observations still suggest that YbjP may have a physiological role. We have therefore expanded the Discussion to more carefully consider possible functional implications of YbjP and to explicitly acknowledge the limitations of the present study regarding its physiological characterization.

      Moreover, the manuscript does not examine structural differences between the presented complex and previously solved AcrAB-TolC or MexAB-OprM assemblies that might support a mechanistic model.

      We thank the reviewer for this suggestion. We now provide a more detailed comparative analysis with previously reported AcrAB–TolC and MexAB–OprM structures, highlighting both similarities and key differences.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) To address the probable role of YbjP, performing 3D variability analysis on the sub-complex and the complete complex would help clarify whether YbjP participates in channel opening and closing.

      YbjP does not participate in the opening or closing of the TolC channel. Indeed, the structure of TolC shows no conformational changes upon YbjP binding when compared to the free, closed form of TolC. The structural transition between the closed and open states of TolC has been thoroughly reviewed by Alav et al. (Chem. Rev. 2021).

      Although the particles for the two reconstructions were obtained from the same dataset, inspection of the raw micrographs and the corresponding 2D class averages clearly shows that the particles fall into two distinct populations: one containing only the TolC–YbjP sub-complex and the other containing the full AcrABZ–TolC–YbjP assembly. In other words, the particles correspond to two different complexes, distinguished by the absence or presence of the AcrABZ components, rather than representing two conformational states of a single complex.

      Three-dimensional variability analysis (3DVA) is most appropriate for analyzing structural heterogeneity arising from continuous or discrete conformational changes within the same macromolecular assembly. Because the heterogeneity in our dataset primarily reflects compositional differences between two assemblies rather than conformational variability within a single complex, we believe that applying 3DVA would not be appropriate for this dataset.

      (2) In addition to the above points, a few minor revisions would improve clarity and readability. Some of the representative density maps in the supplementary figures could be refined for clarity. Adjusting formatting elements (e.g., dashed line thickness) may improve visual presentation.

      Supplementary Figures S2, S5, and S6 have been redrawn to reduce the excessive thickness of the density map representations for better visualization.

      Reviewer #2 (Recommendations for the authors):

      In this manuscript, Xiaofei and colleagues report the high-resolution cryo-EM structure of the TolC-YbjP-AcrABZ complex, as well as the structure of a subcomplex containing only TolC and YbjP. Additionally, they identify a previously unidentified accessory subunit that plays a role in the function of this complex. Overall, this represents an impressive effort in determining the complete endogenous complex from E. coli and performing systematic analyses. I have a few questions regarding the manuscript:

      (1) The authors use the term "native" several times (e.g., lines 24, 73, 157, 256) to refer to the complex reported here. This may cause confusion, given the use of detergent to extract endogenous complexes from E. coli. They should consider excluding the possibility that the subcomplex was formed during the purification process. The term "endogenous" should suffice in this context.

      We have replaced “native” with “endogenous”.

      (2) Lines 26-28: The phrase "its protomers" may lead to ambiguity, as it could refer to either YbjP or TolC.

      The sentence has been updated to “…bridging the TolC protomers at their equatorial domain.”

      (3) Lines 50-51: The text suggests that the assembly of AcrA and AcrB triggers TolC's transition from a closed to an open conformation. Please clarify this point.

      The introduction (lines 50-51) has been expanded to describe the assembly of TolC and AcrAB, as well as the gating transition between the closed and open states of TolC.

      (4) Lines 57-59: Using cryo-EM may get the low-to-medium resolution map, but not using low-to-medium resolution cryo-EM.

      The sentence has been changed to … prior studies using crystallography and cryo-EM have revealed low-to-medium resolution snapshots of the assembled pump.

      (5) Line 73: The authors should consider briefly introducing how they prepared the samples for cryo-EM structural studies, as this is a highlight of the manuscript.

      A detailed, multi-step purification protocol has been added as Supplementary Figure S1A to illustrate the sample preparation procedure.

      (6) Lines 77-82: The authors should label these structural features in the corresponding figures for easier reference, particularly clarifying which part refers to the "equatorial domain."

      We have labeled these structural features in the corresponding figures for clarity, and specifically indicated which region corresponds to the equatorial domain.

      (7) Lines 92-93: The first α-helix of TolC is unclear; the authors should indicate the corresponding residues of this helix in the main text. Additionally, it would be beneficial to illustrate the interface in a figure for easier access.

      We have specified the residues corresponding to the participating α-helix of TolC in the main text and illustrated the interaction interface in a figure (Figure 1F) for better visualization.

      (8) Lines 99-100: Did the authors observe additional density for N-palmitoyl and S-diacylglycerol modifications in their cryo-EM density map? If so, they should highlight this in a figure to demonstrate the importance of these modifications.

      The N-palmitoyl and S-diacylglycerol modifications are embedded in the outer membrane but lack a consistent location within it. As a result, they were averaged out during cryo-EM reconstruction and are not visible in our final map.

      (9) Line 122: Please indicate the 33 nm height in the figure.

      The 33 nm height is composed of a 14 nm TolC channel, a 14 nm periplasmic portion of AcrAB, and a 5 nm transmembrane portion of AcrB, which has been added to the right side of Figure 2B.

      (10) Lines 123-124: This sentence feels out of place. It would be more appropriate to move it to another location, such as the beginning of the Results section, to introduce how the samples were prepared.

      This sentence has been moved to the section “Structure of a TolC–YbjP closed-state complex” to describe the sample preparation.

      (11) Lines 127-128: This section needs to be rewritten for improved clarity.

      This sentence has been rewritten as “This tripartite architecture is stabilized by three distinct sets of interfaces: (i) contacts between the AcrB trimer and the basal regions of AcrA, (ii) extensive AcrA–AcrA lateral interactions within the hexameric ring, and (iii) tip-to-tip junctions formed between the upper AcrA α-helical hairpin and the periplasmic entrance of TolC (Figure 2D).”

      (12) Line 141: Please define terms like DN, DC, PN, and PC upon their first use.

      DN and DC (denoting the N- and C-terminal subdomains of the docking domain), PN and PC (named for the N- and C-terminal subdomains of the periplasmic (porter) domain) have been defined where they first appear in the text.

      (13) The lα helix of AcrB is at least partially buried in the membrane (Liu H. et al, PNAS 2025). The authors should consider including this information in their figures, particularly Figure 2B and Figure 5. As the complex is endogenously purified, are there any differences in AcrB compared to those observed in liposomes, SMALP, or vesicles? Did the authors observe significant lipid densities?

      A structural comparison of the AcrB holocomplex with an AcrB structure determined in the native membrane environment (PDB: 9DXN) has been added as Supplementary Figure S8D. In the transmembrane region of AcrB, some sausage-like densities were observed; however, lipid molecules were not modelled in the study.

      (14) The protein purification profile should be included, at least as a supplementary figure.

      The protein purification profile has been added to Supplementary Figure S1A.

      Reviewer #3 (Recommendations for the authors):

      (1) The identification and structural characterization of YbjP as a novel TolC-associated lipoprotein is potentially interesting, and the cryo-EM structures of the TolC-YbjP subcomplex and the complete pump assembly represent a solid starting point. However, the manuscript currently does not sufficiently support the broader mechanistic conclusions implied by the title regarding pump assembly and drug transport. To strengthen the work, the manuscript would benefit from being refocused to highlight the novelty of YbjP, while also providing a clearer mechanistic rationale for its functional role.

      We thank the reviewer for this helpful comment. We have revised the manuscript to better highlight the novel features of YbjP and provide a clearer mechanistic explanation for its function.

      Most Gram-negative TolC homologs, including P. aeruginosa OprM and E. coli CusC, carry native lipid anchors that attach them to the outer membrane. However, E. coli TolC lacks this N-terminal lipidation site. We propose that YbjP, a dually lipidated protein modified with N-palmitoyl and S-diacylglycerol groups, tethers TolC to the outer membrane and functionally replaces the intrinsic lipid anchors found in other outer membrane factors.

      To support this mechanism, we have added Supplementary Figure S3, which compares the anchoring domains of six representative outer membrane components of efflux pumps.

      (2) The structural features and gating dynamics of TolC should be more thoroughly introduced, including prior work describing channel dilation and helix movements (e.g., PMID: 18406332; PMID: 21245342), and the manuscript should discuss how YbjP may influence these known conformational transitions. The relevance of the D396 region should also be considered in the context of previous mutational analyses (e.g., PMID: 32850959).

      All citations mentioned have been added. Indeed, the structure of TolC shows no conformational changes upon YbjP binding when compared to the free, closed form of TolC.

      (3) Structural interpretation of the YbjP-containing complexes needs to be strengthened by comparison with the extensive library of available AcrAB-TolC structures in open, closed, and intermediate states (e.g., PMID: 28355133; PMID: 24747401; PMID: 34506732). Such analysis is necessary to determine whether YbjP contributes any distinct allosteric or conformational effects.

      YbjP binds to the equatorial domain of TolC, distant from the tip of its coiled-coil helices. This binding therefore does not interfere with TolC’s functional role, but rather helps anchor TolC within the outer membrane in the correct orientation.

      (4) The speculations regarding the peristaltic nature of AcrB cycling as currently presented in the text and Figure 4 lack novelty and currently reiterate well-established AcrB L/T/O states without offering insight into how YbjP might influence long-range communication within the complex.

      We thank the reviewer for this valuable comment. We agree that the functional rotation mechanism of AcrB with loose, tight and open states has been well documented in previous work.

      In our endogenous intact complex, however, we identified substantial conformational changes in both the porter and transmembrane domains of AcrB that were not observed in earlier isolated structures. To highlight these differences, we have added Supplementary Figure S8 to compare our AcrB structure with all previously reported conformational states.

      On the basis of these structural observations, we have proposed a distinct drug efflux mechanism, which is now described in detail in the revised manuscript.

      (5) Specific clarification is needed regarding the proposed pathway by which YbjP could modulate AcrA or AcrB, given the spatial separation observed in the structures.

      YbjP binds to the equatorial domain of TolC, which has no effect on AcrA or AcrB.

      (6) The manuscript currently lacks functional validation of YbjP, either in vivo or in vitro. Incorporating even basic assays to test YbjP's contribution to efflux function, pump assembly, or antibiotic resistance would significantly enhance the conclusions.

      To explore the potential physiological role of YbjP, we compared the viability of a ΔybjP mutant in the E. coli C600 background with that of the wild-type C600 strain under ciprofloxacin (CIP) stress. However, we did not observe a detectable difference in survival between the two strains under the tested conditions. This result is consistent with the assay reported in the preprint mentioned by the reviewer, although the stress conditions used in that study differ from ours. (See Author response image 1).

      To further address this point, we have added a new Supplementary Figure (Fig. S3) comparing outer membrane proteins with structural and functional similarities to TolC. As shown in this analysis, many such proteins contain an extracellular N-terminal loop that appears to help anchor or stabilize them within the outer membrane. Notably, TolC lacks such a loop, whereas YbjP contains a corresponding loop region, suggesting that YbjP may potentially play a role in stabilizing or positioning TolC in the outer membrane.

      While our current experiments did not reveal a clear phenotype under CIP stress, the structural observations still suggest that YbjP may have a physiological role. We have therefore expanded the Discussion to more carefully consider possible functional implications of YbjP and to explicitly acknowledge the limitations of the present study regarding its physiological characterization.

      (7) The relationship to the prior BioRxiv work by Horne et al. (March 19, 2025) should be discussed more directly, particularly because it reports the same YbjP-TolC association across two different efflux systems and includes higher-resolution structures and functional evidence. The current citation should be revised to accurately acknowledge the precedence and overlap in findings.

      We thank the reviewer for this important suggestion. We have adjusted the citation to earlier in the manuscript to properly acknowledge the work by Horne et al.

      We fully agree that a direct comparison between our structures and those reported by Horne et al. would be highly valuable. However, although nearly a year has passed since the preprint was posted, their atomic coordinates have not been released in the Protein Data Bank. No detailed structural coordinates or models are provided in the preprint itself, which prevents us from performing a meaningful, structure-based comparison with our own data at this stage.

      (8) The references used to support statements on allosteric pump activation (e.g., lines 182-183) should be updated to include more relevant full-complex studies (e.g., PMID: 28355133; PMID: 33009415; PMID: 33909410), and the manuscript should more clearly articulate any proposed mechanism for signal transmission involving YbjP.

      The citations have been added.

      YbjP does not participate in the opening or closing of the TolC channel. Indeed, the structure of TolC shows no conformational changes upon YbjP binding when compared to the free, closed form of TolC.

      (9) Overall, while the structural identification of YbjP is noteworthy, additional functional data and more rigorous structural comparison are needed to substantiate the proposed model of pump assembly and drug transport. Reframing the manuscript to emphasize the novelty of YbjP and clarifying its potential mechanistic role would strengthen the work significantly.

      We refer the reviewer to our earlier response for additional functional data. We have added Supplementary Figure S8 to compare our AcrB structure with all previously reported conformational states.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Witte et al. examined whether canonical behavioral functions attributed to the cerebellum decline with age. To test this, they recruited younger, old, and older-old adults in a comprehensive battery of tasks previously identified as cerebellar-dependent in the literature. Remarkably, they found that cerebellar function is largely preserved across the lifespan-and in some cases even enhanced. Structural imaging confirmed that their older adult cohort was representative in terms of both cerebellar gray- and white-matter volume. Overall, this is an important study with strong theoretical implications and convincing evidence supporting the motor reserve hypothesis, demonstrating that cerebellar-dependent measures remain largely intact with aging.

      Strengths:

      (1) Relatively large sample size.

      (2) Most comprehensive behavioral battery to date assessing cerebellar-dependent behavior.

      (3) Structural MRI confirmation of age-related decline in cerebellar gray and white matter, ensuring representativeness of the sample.

      Weaknesses:

      (1) Although the authors note this was outside the study's scope, the absence of a voxel-based morphometry (VBM) analysis limits anatomical and functional specificity. Such an analysis would clarify which functions are cerebellar-dependent rather than solely inferring this from prior neuropsychological literature.

      (2) As acknowledged in the Discussion, task classification (cerebellar-dependent vs. general measures) remains somewhat ambiguous. Some "general" measures may still rely on cerebellar processes based on the paper's own criteria - for example, tasks in which individuals with cerebellar degeneration show impairments.

      (3) Cerebellar-dependent and general measures may inherently differ in measurement noise, potentially biasing results toward detecting effects in general measures but not in cerebellar-dependent ones.

      We appreciate Reviewer #1's positive assessment of the study, including the acknowledgment of our large sample size, comprehensive behavioral battery, and verification of cerebellar atrophy using MRI. We address the concerns raised as follows:

      (1) Voxel-based morphometry (VBM) and anatomical specificity

      We agree that VBM would strengthen anatomical specificity. As noted in our response to private comments, we have carried out these analyses as part of a separate dedicated study, now available as a preprint (“Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function”, https://doi.org/10.64898/2026.02.13.705695). This work investigates region-level cerebellar aging and its relationship with behavior in detail, including both anatomical and functional parcellations. In short, the preprint demonstrates the absence of structure-function relationship between cerebellar regions (from either anatomical or functional atlases) and cerebellar function. Given the scope of the present manuscript, which focuses primarily on behavioral evidence for cerebellar preservation, we chose not to expand this paper further with VBM results.

      (2) Task classification and cerebellar involvement

      We clarified in the revised manuscript that even “general” measures likely involve cerebellar processing to some extent. We have strengthened the discussion explaining that these measures do not primarily depend on cerebellar function, in contrast to the cerebellar-specific metrics derived from established models (e.g., clock variance in rhythmic tapping). We now explicitly caution against interpreting these general measures as cerebellar-independent.

      (3) Measurement noise and differential sensitivity

      To address the reviewer’s concern that measurement noise may differ between task categories, we now report split-half reliabilities for all measures in the Supplement. These data demonstrate no systematic reliability disadvantage for cerebellar-specific tasks that could explain the pattern of results.

      Reviewer #2 (Public review):

      Summary:

      The authors are investigating cerebellar-mediated motor behaviors in a large sample of adults, including 30 individuals over the age of 80 (a great strength of this work). They employed a large battery of motor tasks that are tied to cerebellar function, in addition to a cognitive task and motor tasks that are more general. They also evaluated cerebellar structure. Across their behavioral metrics, they found that even with cerebellar degeneration, cerebellar-mediated motor behavior remained intact relative to young adults. However, this was not the case for measures not directly tied to cerebellar function. The authors suggest that these functions are preserved and speak to the resiliency and redundancy of function in the cerebellum. They also speculate that cerebellar circuits may be especially good for preserving function in the face of structural change. The tasks are described very well, and their implementation is also well-done with consideration for rigor in the data collection and processing. The inclusion of Bayesian estimates is also particularly useful, given the theoretically important lack of age differences reported. This work is methodologically rigorous with respect to the behavior, and certainly thought-provoking.

      Strengths:

      The methodological rigor, inclusion of Bayesian statistics, and the larger sample of individuals over the age of 80 in particular are all great strengths of this work. Further, as noted in the text, the fact that all participants completed the full testing battery is of great benefit.

      Weaknesses:

      The suggestion of cerebellar reserve, given that at the group level there is a lack of difference for cerebellar-specific behavioral components, could be more robustly tested. That is, the authors suggest that this is a reserve given that the volume of cerebellar gray matter is smaller in the two older groups, though behavior is preserved. This implies volume and behavior are seemingly dissociated. However, there is seemingly a great deal of behavioral variability within each group and likewise with respect to cerebellar volume. Is poorer behavior associated with smaller volume? If so, this would still suggest that volume and behavior are linked, but rather than being age that is critical, it is volume. On the flip side, a lack of associations between behavior and volume would be quite compelling with respect to reserve. More generally, as explicated in the recommendations, there are analyses that could be conducted that, in my opinion, would more robustly support their arguments given the data that they have available. This is a well-executed and thought-provoking investigation, but there is also room for a bit more discussion.

      We appreciate Reviewer’s recognition of the methodological rigor of the study. The public review focuses on the structure-function relationship for the cerebellum. Given that the volume of the cerebellum is smaller in older adults but that the identified cerebellar function are maintained, we conclude that there is no structure-function relationship. We agree with the reviewer that this could be tested further by looking at different parcellations of the cerebellum and demonstrating the absence of association between smaller regions of the cerebellum and the investigated cerebellar function. We agree with the reviewer that this is interesting but believe that this goes beyond the scope of this already extensive paper. For this reason, detailed analyses of the structure-function relationship are available in the preprint version of another paper entitled “Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function”, (https://doi.org/10.64898/2026.02.13.705695). In this preprint, across multiple anatomical and functional parcellations, we found no meaningful association between cerebellar structure and cerebellar-specific behavioral measures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Prefacing these suggestions, I want to commend the authors for undertaking this Herculean effort, recruiting such a large sample and administering an extensive battery of tasks. This is an impressively comprehensive study!

      (1) Lesion-symptom mapping. The authors state that lesion-symptom mapping was beyond the scope of the study, but it is unclear why such an analysis could not be performed. Including it would strengthen inferences linking cerebellar structure to behavioral outcomes and help differentiate cerebellar-specific from general performance measures.

      (2) Inter-measure correlations. For cerebellar-dependent tasks, did the authors examine correlations among behavioral measures? If cerebellar aging effects are relatively uniform across the cerebellar cortex, performance across tasks engaging distinct cerebellar regions should, in theory, covary. Similar pairwise correlations for general measures could provide a useful comparison.

      1 + 2: We fully agree with this two points; however, we decided to address this analysis in a separate paper. In the current manuscript, our primary focus was on the behavioral aspects, as these are already quite extensive on their own. In our subsequent work, we conducted an in-depth investigation into the relationship between cerebellar-specific measures and cerebellar structure across distinct cerebellar regions (including anatomical regions and functionally defined regions according to the atlas of Nettekoven et al., 2024). We found that aging does not affect the cerebellum uniformly, but that some anatomical regions exhibit stronger age effects. For the functionally defined regions the age effects were uniformly though. There was no relation between behavioral cerebellar-specific measures and regional gray matter structure.

      In this second paper we also analyzed inter-measure correlations between behavioral cerebellar-specific measures. We did not find any correlations between cerebellar outcomes of different tasks, which indeed could indicate that the different tasks engage distinct cerebellar regions. In addition, we did not find any relation between cerebellar outcomes and anatomically or functionally defined cerebellar regions.

      You can find a preprint of the second manuscript entitled “Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function” here: https://doi.org/10.64898/2026.02.13.705695

      (3) Measurement sensitivity. Could differences in age effects reflect varying measurement noise between cerebellar-specific and general measures? For instance, even among younger participants, cerebellar-related measures (e.g., slope in mental rotation) might exhibit greater variability - given that they depend on more conditions, each with its own noise - than general metrics (e.g., baseline motor variability or choice reaction time estimated from a single condition). This could affect sensitivity to detect age-related change and bias results toward finding effects in general rather than cerebellar-specific measures.

      To address this concern, we computed split-half reliability for both cerebellar-specific and general sensorimotor measures and added these estimates to the supplementary materials. As can be seen from Author response table 1, there is no consistent pattern of lower reliability for cerebellar-specific measures that could plausibly account for the absence of age-related effects.

      Author response table 1.

      Split-half reliabilities

      (4) Task dependence on the cerebellum. It is difficult to argue that measures such as reach accuracy, choice reaction time, or rhythm deviation are non-cerebellar. Ataxia certainly impacts reach accuracy. Although patient evidence is mixed - and even when there is a lack of dissociation (e.g., prolonged choice reaction times in both cerebellar and PD groups) - this does not preclude cerebellar involvement in these measures. Indeed, as the authors stated, claims of cerebellar independence should therefore be made cautiously (can be addressed by VBM in comment 1).

      In the paper we tried to emphasize that the general sensorimotor measures still involve cerebellar functions, as this is the case with many movement-related measures. However we theorized that they do not primarily depend on cerebellar function. For example rhythm deviation in the finger tapping task is influenced by cerebellar timing mechanisms as well as motor execution noise, attention, etc. While the cerebellar-specific measure from this task, which is the clock variance, has been shown to extract the contribution of cerebellar-dependent timing mechanisms to this task (Ivry & Keele, 1989).

      On p.37, we added the following paragraph:

      “Similarly, it is important to recognize that general sensorimotor performance is not independent of cerebellar processing. Many broad measures, such as movement accuracy, reaction time, likely reflect contributions from many different brain regions including the cerebellum. As a result, age‑related differences in general sensorimotor performance may emerge from multiple interacting systems rather than cerebellar function alone.”

      (5) Interpreting preserved or enhanced function. The finding of preserved - or even enhanced - performance in older adults is compelling. The authors interpret this as evidence for cerebellar reserve or compensation for cortical decline. An alternative explanation is that cerebellar structures simply decline more slowly than cortical ones, as their gray-matter data suggest; so rather than cerebellar activity revving up, it may remain the same: For example, following up on several of the authors' prior papers, Cisneros et al. (2024) reported enhanced implicit recalibration with age, potentially reflecting greater reliance on cerebellar forward models as sensory (especially proprioceptive) signals degrade. However, this may reflect reweighting rather than compensation - where cerebellar contributions are not enhanced, but rather preserved as other systems decline more rapidly. It would be valuable for the authors to clarify whether they view their findings as evidence of reweighting (slower decline) or compensation (increased contribution).

      We completely agree with this additional interpretation and added a small section to the discussion about it. However, based on the structural cerebellar measures that we have, it is difficult to state whether the reweighting or compensation theory would be more plausible. In either way, both are in line with the cerebellar reserve theory

      Added to discussion (P. 35):

      Importantly, the relative preservation of cerebellar structure compared to other systems may itself contribute to the maintained cerebellar function observed in older age. Even if structural decline is present, the fact that it progresses more slowly than in many cortical and subcortical regions suggests that a form of structural reserve remains available in the cerebellum. This structural reserve could underlie the continued efficiency of cerebellar circuits and support their capacity to sustain motor functions across aging.

      (6) Mental rotation and the continuity hypothesis. The age-related decline in mental rotation performance, if cerebellar-dependent (see McDougle et al., 2022; note minor inconsistency in citation format throughout the paper), supports emerging theories that the cerebellum supports continuous mental simulations in both cognition and action, whether it's forward model simulation or interval-based timing in the motor control domain or mental rotation/intuitive physics in the cognitive domain (Tsay & Ivry, 2025). Given that mental rotation showed the strongest age effect, it would be fascinating to examine whether this correlates with structural loss in Crus I/II, regions most implicated in higher-order cognitive functions - related to Comment 1 above. Even on a crude level, without correlating with behaviour, do the authors have a map for which areas show greater degeneration than others?

      This is also something we did in the other paper mentioned before (Figure 5 of the new preprint). At a first glimpse, the mental rotation outcomes show a strong positive correlation with Crus I and a negative correlation with Crus II, however none of these were significant and the fact that their sign is opposite suggest that these might be random. Indeed, in the preprint, we also compare age-related changes in grey matter volumes for different anatomical and functional cerebellar regions (Figure 1).

      The inconsistencies in citation format have been fixed as well.

      (7) Continuous age analyses. An exploratory analysis correlating age (as a continuous variable) with each dependent measure might provide greater sensitivity than categorical group comparisons, revealing more graded relationships between age and performance.

      Our experiment was not designed to perform such analysis. Testing for group differences provides more power than testing for correlations. For this reason, given that our clearly separated age groups did not show any behavioral differences, we do not expect such an analysis to provide substantial additional insight. Given that the paper is already very extensive, we haven’t performed this additional analysis.

      Congratulations on this comprehensive piece of work!

      Thank you for your kind words

      Reviewer #2 (Recommendations for the authors):

      In the introduction, the authors note that the current literature on the cerebellum in aging has evidence from "studies that relied on single-task paradigms", including a citation to an eye-blink conditioning study. They then note "instead of capturing a broader range of specific cerebellar functions". What do they mean by this? Eye-blink conditioning, for example, when administered in a delay paradigm, is tied directly to the cerebellum and is arguably a cerebellar function or learning paradigm. Some clarity about his point is needed.

      The meaning of this is that most previous studies examining cerebellar function in older adults relied on a single task, or on tasks that were functionally very similar, such as balance and gait, to assess performance. In contrast, our study incorporated multiple tasks targeting different sensorimotor skills, allowing us to identify broader patterns in cerebellar sensorimotor performance in older adults.

      To make this clearer, we have rephrased the sentence (p.4):

      “However, much of the evidence supporting this theory comes from studies that narrowly focused on a single task (Boisgontier & Nougier, 2013; Miller et al., 2013; Woodruff-Pak et al., 2001) or on assessments within similar cerebellar domains such as balance and gait (Droby et al., 2021; Rosano et al., 2007), instead of capturing a broader range of specific cerebellar functions.”

      The authors note that many cerebellar tasks that are impaired in patients are preserved in older adults. The authors, however, seem to ignore delay eyeblink conditioning. Gerwig and colleagues (2010, Behav Brain Res) have shown that this is impacted in patients, and it is also robustly impacted in aging. Older adults still learn, but the age effects are highly replicable. A clear discussion of eye-blink conditioning and how it fits into this framework, and with your findings here, would be really helpful. It seems like a notable oversight not to have it discussed, given the age effects in this context, even if it was not included as a measure.

      Eye blink conditioning is an interesting example that seems to contradict our theory: eye-blink conditioning is both affected by age and dependent on the cerebellum. However, while age-related changes in cerebellar structure evolve continuously with age, changes in eye-blink conditioning performance remains unchanged between 40 and 80 years old. Therefore, eye-blink conditioning suggest that age-related changes in cerebellar structure are not related to possible age-related changes in function. This discussion was already included in the manuscript on p. 36, which reads as:

      “Similarly, no eye-blink conditioning task was included, as it is heavily influenced by cognitive factors such as awareness and arousal, and fear conditioning (LaBar et al., 2004). Previous work has shown that many variables, such as blink reaction time and motor components of the eyeblink reflex, introduce substantial variability in responses at older age (Woodruff-Pak & Jaeger, 1998). In contrast, this study found that only performance on the rhythmic finger-tapping task, similar to what we included in our battery, emerged as a significant predictor of age-related differences in eye-blink conditioning. Furthermore, age-related differences appeared to plateau after early adulthood, with no significant variation in the percentage of correct responses between ages 40 and 80 (Woodruff-Pak & Jaeger, 1998). Practically, the extended duration of the training protocol also makes this task unsuitable for inclusion in a test battery (Winton et al., 2025).”

      This approach also does not consider variability within older adults. That is, on average, they may do better than patients. But, there are also individual differences in cerebellar metrics (structure, for example) within an older adult sample that are a critical consideration here. When looking at the behavioral plots that include the individual data points (which is a great addition and very helpful), it is clear that variability is prevalent. As noted below, it may still be that cerebellar metrics are associated with behavior, given the high degree of variability within the groups across aging.

      We agree with the reviewer that variability is prevalent, as it is in any experiment. In our latest preprint entitled “Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function” (https://doi.org/10.64898/2026.02.13.705695), we investigated whether variability in cerebellar structure could predict variability in cerebellar functions. Across all our tasks, we did not find such association, independently of whether we defined cerebellar regions based on an anatomical atlas or a functional one.

      The use of 23 as the cut-off for MOCA scores is rather low. What was the justification for this within the literature? The authors note wanting to ensure task instructions and those with symptoms of potential MCI, but often 26 is used as a minimum score (with 25 and below being potential MCI).

      In the methods, we refer to the study of Carson et al. (2018) that recommends a cutoff score of 23/30 instead of 26/30 as it shows overall better diagnostic accuracy. We selected this cutoff to emphasize that our sample was not restricted to only the highest‑performing older adults. However, we agree that this is not sufficiently explained in the text, so we briefly clarified this (p.5):

      “We assessed cognitive functioning in both older and older‑old participants using the Montreal Cognitive Assessment (MoCA). A minimum score of 23 out of 30 was required for inclusion, following the recommendation by Carson et al. (2018), who demonstrated that this reduced cutoff yields fewer false positives and provides better overall diagnostic accuracy than the original 26/30 threshold. We adopted this criterion to ensure that our sample was not limited to only the highest‑performing older adults.”

      The authors note that the timing of the visits was adapted based on participant availability. It would be helpful to report the mean length of time between sessions, as well as the range.

      We added this to the method section (p.6):

      “There was no fixed interval between the two behavioral sessions. Ideally, both were scheduled within one week, but in practice, the timing was adapted to participants’ availability. Across all participants, this resulted in a mean inter-session interval of 7.40 days (± 9.03; range = 0-63 days). The average interval between the behavioral sessions and the MRI scanning was 6.86 days (± 8.90; range = 0-83 days).”

      The authors have anatomically defined cerebellar parcellations but have looked solely at total volume measures. What is the rationale for this? If there are differential impacts on cerebellar volume with age (Han et al., 2022; Bernard & Seidler, 2013), there may also be positive associations with behavior in regions that are less negatively impacted by volume. This would be consistent with the idea of reserve. One interesting set of correlations that could be considered is with respect to anterior lobules (I-IV and V) relative to the secondary motor representation in VIIIa and VIIIb, such that the latter may show a more robust association with behavior in the positive direction if volume in these regions is less impacted by aging.

      As mentioned in response to one comment from the other reviewer, we investigated this question in our latest preprint (https://doi.org/10.64898/2026.02.13.705695). In this analysis, we did not find any relation between cerebellar outcomes and anatomical or functional cerebellar regions.

      We consider this to be beyond the scope of the present paper, which focuses on the behavioral performances. The total cerebellar volume was added to show that the subject sample we used did actually exhibit atrophy in the cerebellum, but the purpose of the paper was not to focus on the link between structure and function.

      With respect to timing, I recognize that the clock variance is insignificant based on p=.06. However, this is a relatively "close" result. I am very much of the mindset that things are significant or not. Inclusion of Bayesian analyses helps this, but I don't find this particularly convincing. The larger sample of individuals over age 80 is certainly a strength, and I'm not especially concerned about power. But I do wonder about overinterpretation. I would also emphasize the large degree of variability here in the oldest sample. This raises questions about associations with cerebellar metrics. This argument for relative preservation/reserve may be strengthened by looking at individual differences in structure relative to behavior. That is, in areas of the cerebellum where structure is less impacted by aging (as this is not entirely uniform) does this volume predict better behavior in this sample?

      As noted earlier, the relationship between structure and function is examined in our other paper (https://doi.org/10.64898/2026.02.13.705695). Unfortunately, we were unable to include the 80+ group in that analysis because MRI data was available for only 20 older‑old participants and correlations/regression with 20 people are vastly underpowered.

      We also want to point out that the almost significant difference highlighted by the reviewer between age groups actually goes in the direction of the older participants performing better than the young participants.

      The note about the amount of variance in the older-old participants is fair, though.

      The comparison with the Cam-CAN data set seems to be largely qualitative. Why did the authors not make a direct comparison to determine relative similarity in their sample compared to Cam-CAN? This would be a bit more compelling, though I suspect the differences are not statistically reliable (they note the oldest-old in the Leuven sample have a slightly larger volume). I do realize there are sample size differences, but a matched random sub-sample could also be created out of Cam-CAN. Why did they not compute the quadratic model in the Leuven sample as well?

      A quadratic model was not considered very meaningful in the Leuven sample because age was not measured as a continuous variable but categorized into three discrete age groups (which provides more power to look at age-related differences). Our goal was not to determine whether absolute cerebellar volumes matched across datasets, for example, by creating comparable age groups in the Cam‑CAN dataset, but rather to assess whether the pattern of age‑related effects in our sample aligned with those seen in a larger dataset. In our opinion, the current approach sufficiently demonstrates that the age‑related trends we observe are consistent with those reported in Cam‑CAN.

      The analysis of relative cerebellar gray and white matter is quite interesting. However, what about regional patterns to this? It would be particularly interesting to know if some regions are more or less impacted or preserved relative to the cortex. The data are seemingly available based on the processing approach (at least for gray matter). Was a similar analysis also computed in Cam-CAN? Replicating this in an independent sample would also be of interest.

      We agree with the reviewer that this is indeed interesting for further analyses on this dataset. However, it falls beyond the scope of the present paper. Our preprint (https://doi.org/10.64898/2026.02.13.705695) looks at regional patterns for the cerebellum. Other papers have compared age-related decline in different cortical and subcortical regions as discussed on p.35 of our discussion:

      “Given that the cerebellum exhibited a relatively less pronounced structural decline compared to other brain regions as shown here and in another previous study (Taki et al., 2011), it seems more plausible that the cerebellum might compensate for deficits caused by structural changes in other areas rather than vice-versa. Age-related gray and white matter degeneration is usually faster in frontotemporal regions and subcortical regions, including the hippocampus, amygdala and thalamus than in the cerebellum (Fjell et al., 2013; Giorgio et al., 2010; Neufeld et al., 2022). Although this does not directly indicate functional implications, it suggests that cortical regions are less likely to compensate for cerebellar loss when they exhibit more severe degeneration.”

      The authors argue for cerebellar reserve and present compelling behavioral data in support of this with their many tasks. In instances where they look at largely cerebellar-mediated measures, they demonstrate that older adults and the >80 year old group show relatively intact behavior, even those in the group for total cerebellar gray matter volume (and white matter) is significantly smaller than in young adults. As noted, the behavioral data are very compelling, and as an individual who looks at aging populations in their research, seeing areas and domains of preservation is always interesting and useful. This pattern certainly may be consistent with cerebellar reserve. However, it would be more compelling if the authors also looked at these behaviors with respect to cerebellar volume. That is, there is still a great deal of variability in behavior in the older and >80 samples (though also in the young adults) that may still be associated with cerebellar volume. Poorer performance may be present in those with smaller volumes. This would also be somewhat consistent with the notion that these tasks are those that are derived from work in cerebellar degeneration samples. Associations between behavior and cerebellar measures would speak to this. If there are no associations with volume, this would be particularly interesting and compelling in the context of reserve. Alternatively, if there are differential impacts on cerebellar volume with age (Han et al., 2022; Bernard & Seidler, 2013), there may also be positive associations with behavior in regions that are less negatively impacted by volume. This would be consistent with the idea of reserve. One interesting set of correlations that could be considered is with respect to anterior lobules (I-IV and V) relative to the secondary motor representation in VIIIa and VIIIb, such that the latter may show a more robust association with behavior in the positive direction if volume in these regions is less impacted by aging. Not all individuals completed the scan (due to safety and comfort considerations), which would limit statistical power potentially, but this could be conducted in the subset of individuals that have both sets of data.

      This point overlaps with the issues raised by the other reviewer in comments 1 and 2, which highlights the importance of this point. Yet, we decided to address this analysis in a separate paper. In the current manuscript, our primary focus was on the behavioral aspects, as these are already quite extensive on their own. In our subsequent work (https://doi.org/10.64898/2026.02.13.705695), we conducted an in-depth investigation into the relationship between cerebellar-specific measures and cerebellar structure across distinct cerebellar regions (including anatomical regions and functionally defined regions according to the atlas of Nettekoven et al., 2024). We found that aging does not affect the cerebellum uniformly, but that some anatomical regions exhibit stronger age effects. For the functionally defined regions the age effects were uniform though. There was no relation between behavioral cerebellar-specific measures and anatomical or functional cerebellar regions.

      Some of the assertions the authors make in the discussion about the cerebellum have less pronounced structural decline relative to other brain regions would benefit from being tempered. They used relative measures here, and this is certainly interesting. But, how do other regions stack up? What would the hippocampus look like if such a measure were used? And as noted, does this pattern replicate in the CAM-CAN sample? Further, the authors cite Jernigan et al. (2001) in arguing that cerebellar changes are smaller than those in other brain regions, when in looking at their tables, in fact, the gray matter reductions of the cerebellum are comparable to those of the prefrontal cortex and second only to those of the hippocampus.

      We agree with the reviewer that this is an interesting question but this question needs to be addressed in a separate paper. We also remove the citation to the Jernigan paper.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Comments on revisions: The authors addressed all my concerns.

      We thank you for the positive review and feedback throughout the review process.

      Reviewer #2 (Public review):

      Comments on revisions: We agree with the overall findings of the study and appreciate that the claims in text and title have been appropriately toned down. As additional suggestions e.g. for presentation, many of the graphics/labels are still too small to be useful. It would be interesting to see if this cell line is similar to the tumours in terms of all the phenotypes. The lapatinib experiment was good. I wonder how quick this drug affects the mitochondria. Also it would be interesting to see if these cells have higher OXPHOS than other non-transformed breast epithelial cells. The WB on oxphos components is good with ab110413 but this looks like many subunits are detected so this should be made clear.

      Thank you for these suggestions.

      We have clarified in the Methods section (lines 475–476) the specific OXPHOS subunits detected using the Ab110413 antibody cocktail.

      With respect to lapatinib, prior work has shown that lapatinib can alter the phosphoproteome within minutes to hours (PMID:22964224). In our experiments, however, NF639 cells were exposed to lapatinib for 24 hours - a timeframe in which transcriptional and translational remodeling are also expected to occur. Therefore, we cannot distinguish whether the observed suppression of OXPHOS reflects acute signaling effects or downstream changes in gene and protein abundance. Importantly, the purpose of this experiment was proof-of-principle: to determine whether HER2 signaling contributes to respiratory competency in a cell line derived from the same transgenic model as the intact tumor slices used in this study. Thus, while defining the precise kinetics of inhibition or comparing to benign/non-transformed cells would be interesting, these were not the primary objectives of the added experiments.

      We have increased figure label sizes across all main figures.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Frangos et al. used a transcriptomic and proteomic approach to characterise changes in HER2-driven mammary tumours compared to healthy mammary tissue in mice. They observed that mitochondrial genes, including OXPHOS regulators, were among the most down-regulated genes and proteins in their datasets. Surprisingly, these were associated with higher mitochondrial respiration, in response to a variety of carbon sources. In addition, there seems to be a reduction in mitochondrial fusion and an increase in fission in tumours compared to healthy tissues.

      Strengths:

      The data are clearly presented and described.

      The author reported very similar trends in proteomic and transcriptomic data. Such approaches are essential to have a better understanding of the changes in cancer cell metabolism associated with tumourigenesis.

      Weaknesses:

      (1) This study, despite being a useful resource (assuming all the data will be publicly available and not only upon request) is mainly descriptive and correlative and lacks mechanistic links.

      We appreciate this point. While the primary goal of our study was to assess mitochondrial adaptations with HER2-driven tumorigenesis, we agree strengthening the mechanistic interpretation would improve the impact of the data. To address this, we have provided experiments demonstrating HER2 inhibition in NF639 cells with lapatinib supresses respiratory capacity, directly supporting the interpretation that HER2 activity regulates respiratory function (Figure 10). We have expanded the discussion appropriately (lines 378-394). Both raw RNA-seq and proteomic data were deposited through GEO and the PRIDE repositories (accession numbers included in Data Availability Statement).

      (2) It would be important to determine the cellular composition of the tumour and healthy tissue used. Do the changes described here apply to cancer cells only or do other cell types contribute to this?

      We thank the reviewer for this suggestion; we have added experiments that have directly addressed this concern.

      Cell type composition analysis by immunofluorescence was added (Figure 6) where we quantified epithelial, mesenchymal, endothelial, immune and stromal populations in our benign mammary tissue and tumor samples. We found no major shift in the dominant cell types that would confound transcriptomic data in whole tissues.

      We integrated immunofluorescence data with a publicly available scRNA-seq dataset from human breast tumors which allowed us to estimate cell-type-specific expression of OXPHOS genes in our own samples. Despite the possibility of species differences, this is the only dataset of its kind, and we used this to generate an estimate of cell type weighted OXPHOS mRNA expression (Figure 6). This revealed that epithelial cells are likely the dominant contributors to OXPHOS gene expression for CIIV. All calculations are delineated in the Methods section.

      (3) Are the changes in metabolic gene expression a consequence of HER2 signalling activation? Ex-vivo experiments could be performed to perturb this pathway and determine cause-effects.

      Thank you for this suggestion – we have included an experiment directly testing this concept. We assessed mitochondrial respiration in NF639 HER2-driven mammary tumor epithelial cells in the presence or absence of the well-described dual tyrosine kinase inhibitor lapatinib. Lapatinib reduced basal, CI-linked and CI+II linked respiration without compromising mitochondrial integrity or coupling, demonstrating that HER2 activation regulates respiration in our model. This data is presented in Figure 10, and a new section has been added to the discussion describing the implications of this finding in the context of the current literature (lines 378-394).

      (4) The data of fission/fusion seem quite preliminary and the gene/protein expression changes are not so clear cut to be a convincing explanation that this is the main reason for the increased mitochondria respiration in tumours.

      We agree mitochondrial morphology and dynamics alone cannot fully account for the observed respiratory phenotype – this was emphasized in the discussion but has since been further clarified (lines 365-377). We retained the TEM and dynamics gene/protein data because they do support morphological differences consistent with enhanced fission. However, we have revised the tone of our interpretation to more explicitly acknowledge that these findings are correlative, and the updated discussion now emphasizes that the increased respiratory capacity in tumors is likely driven by multiple converging mechanisms.

      Reviewer #2 (Public review):

      Frangos et al present a set of studies aiming to determine mechanisms underlying initiation and tumour progression. Overall, this work provides some useful insights into the involvement of mitochondrial dysfunction during the cellular transformation process. This body of work could be improved in several possible directions to establish more mechanistic connections.

      (5) The interesting point of the paper: the contrast between suppressed ETC components and activated OXPHOS function is perplexing and should be resolved. It is still unclear if activated mitochondrial function triggers gene down-regulation vs compensatory functional changes (as the title suggests). Have the authors considered reversing the HER2-derived signals e.g. with PI3K-AKT-MTOR or ERK inhibitors to potentially separate the expression vs. functional phenotypes? The root of the OXPHOS component down-regulation should also be traced further, e.g. by probing into levels of core mitochondrial biogenesis factors. Are transcript levels of factors encoded by mtDNA also decreased?

      We appreciate this insight and agree that the discordance between mitochondrial content and function is fascinating and have addressed the concerns above in the following manner:

      - We have altered the title – we agree we cannot definitively say that the enhanced respiratory capacity observed is compensatory.

      - We have added experiments in NF639 cells in the presence of lapatinib, a tyrosine kinase inhibitor to interrogate whether HER2 is necessary for our functional outcome of interest – the enhanced respiratory capacity in the tumors. Lapatinib significantly suppressed respiration (Figure 10) demonstrating HER2 signaling directly regulates mitochondrial respiration.

      - We have expanded the discussion to provide further comment on potential explanations for increased respiratory function and low mitochondrial content.

      (6) The second interesting aspect of this study is the implication of mitochondrial activation in tumours, despite the downregulation of expression signatures, suggestive of a positive role for mitochondria in this tumour model. To address if this is correlative or causal, have the authors considered testing an OXPHOS inhibitor for suppression of tumorigenesis?

      Previous studies have eloquently highlighted that directly or indirectly inhibiting mitochondria can supress growth in HER2-driven breast cancer (PMID:31690671) or alternatively, amplification of mt-HER2 enhances tumorigenesis (PMID: 38291340). In many solid tumors, this is the concept of preclinical and clinical studies using IACS-010759 or similar inhibitors of OXPHOS which do suppress growth but have significant off target effects in healthy tissues (PMID: 36658425, 3580228We have expanded the discussion to ensure the reader is aware of these previous contributions and highlighted the importance of future work delineating the role of enhanced respiratory function in HER2-driven mammary cancer (lines 378-394).

      (7) A number of issues concerning animal/ tumour variability and further pathway dissection could be explored with in vitro approaches. Have the authors considered deriving tumourderived cell cultures, which could enable further confirmations, mechanistic drug studies and additional imaging approaches? Culture systems would allow alternative assessment of mitochondrial function such as Seahorse or flow cytometry (mitochondrial potential and ROS levels).

      We thank the reviewer for this suggestion – we have addressed this in part by using the NF639 HER2driven tumor epithelial line which demonstrated that HER2 regulates our observed respiratory response. Unfortunately, the addition of tumor derived cell cultures was not feasible or within the scope of our study. Animal and tumor variability has been clarified in the Methods section (lines 424-429). Mitochondrial respiration experiments were performed in paired tissue (benign and tumor from same mouse). Transcriptomic, proteomic and histological analyses were performed on tumors and benign samples from different mice due to tissue limitations.

      (8) The study could be greatly improved with further confirmatory studies, eg immunoblotting for mitochondrial components with parallel blots for phospho-signalling in the same samples. It would be interesting if trends could be maintained in tumour-derived cell cultures. It is notable that OXPHOS protein/transcript changes are more consistent (Figure 5, Supplementary Figure 4) than mitochondrial dynamics /mitophagy factors (Figure 8). Core regulatory factors in these pathways should be confirmed by conventional immunoblotting.

      We thank the reviewer for this thoughtful comment. While we agree that additional confirmatory studies can be valuable, due to tissue quantity constraints and the number of assays required for our multi-omics analysis, extensive additional blots were not feasible. However, we had sufficient protein to provide select OXPHOS proteins to verify the proteomic data (now provided in S-Fig.4H). Furthermore, we have plotted the fold change of genes and proteins detected in both datasets and added this to Figure 4 (4A, B), further highlighting the consistency between our transcriptomic and proteomic findings. We believe that the highly consistent and concordant nature of our datasets collectively provides strong support for our central objective - determining whether mitochondrial content and respiratory function correlate in HER2-driven mammary tumors. The reproducibility of OXPHOS-related changes reinforces the robustness of our observations. We also appreciate the reviewer’s insight that OXPHOS alterations appear particularly consistent. In response, we have edited the discussion to further emphasize this point, especially in relation to the distinctive pattern observed for Complex V, which showed greater preservation relative to Complexes I–IV across several methods (lines 348-364). We comment on how this stoichiometric shift may contribute to intrinsic respiratory activation despite reduced mitochondrial content.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Further Minor points.

      (9) It would be helpful to know further details regarding the source of the tumour samples, particularly for the proteomics (N=5) and transcriptomics (N=6) datasets, since the exact timepoint of tissue harvest and number of tumours/mouse varied, according to the methods section. Were all samples from the omics studies from different mice (ie 11 mice)? B4 and B6 seem like outliers in mitochondrial transcriptomes. Are these directly paired eg with T4 and T6? Are the side-by-side pairs of Ben and Tum samples for blots in Figure 1 and Supplementary Figure 1 from the same mouse.

      This has been clarified in the Methods section (lines 424-429). Mitochondrial respiration experiments were performed in paired tissue (benign and tumor from same mouse). Transcriptomic, proteomic and histological analyses were performed on tumors and benign samples from different mice due to tissue limitations.

      (10) Further references and details are needed to support the methodology of the mitochondrial function tests (eg. nutrients vs pairing with complexes). What was the time point of nutrient supplementation? It would seem that the lipid substrates should take longer to activate OXPHOS than pyruvate/malate or succinate. Is this the case? Is there speculation as to why succinate supplementation is much more active than pyruvate+malate? What is +MD in Figure 6? The rationale for pooling data for Figure 7A is unclear since the categories appear to overlap: (pyruvate, malate, ADP) vs. (palmitoyl-carnitine, malate, ADP).

      Thank you for this comment. We have expanded the methods (lines 515-531) to provide additional detail on the mitochondrial respiration protocol. Briefly, permeabilized tissues were exposed to substrates delivered at supraphysiological concentrations in a sequential protocol lasting ~30–60 minutes. Under these conditions, mitochondrial respiration reflects the maximal capacity to utilize each substrate rather than the physiological time course of substrate mobilization or uptake that would occur in vivo with the influence of blood flow and transport/substrate availability limitations.

      (11) Many of the figures were blurry (Figure 1F, 2B) or had labels that were too small to be effective (Figures 1G, H, 2D-G, 3E-G, 5E-I, 7C, 8B).

      The font size of figure labels has been increased where possible and all figures have been exported to maximize resolution.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates the role of vascular mural cells, specifically pericytes and vascular smooth muscle cells (vSMCs), in maintaining blood-brain barrier (BBB) integrity and regulating vascular patterning. Analyzing zebrafish pdgfrb mutants that lack brain pericytes and vSMCs, they show that mural cell deficiency does not impair BBB establishment or maintenance during larval and early juvenile stages. However, mural cells seem to be crucial for preventing vascular aneurysms and hemorrhage in adulthood as focal leakage, basement membrane disruption, and increased caveolae formation are observed in adult zebrafish at aneurysm hotspots. The authors challenge the paradigm that mural cells are essential for BBB regulation in early development while highlighting their importance for long-term vascular stability.

      Strengths:

      Previous studies have established that the zebrafish BBB shares molecular and morphological homology with e.g. the mammalian BBB and therefore represents a suitable model. By examining mural cell roles across different life stages - from larval to adult zebrafish - the study provides an unprecedented comprehensive developmental analysis of brain vascular development and of how mural cells influence BBB integrity and vascular stability over time. The use of live imaging, whole-brain clearing, and electron microscopy offers high-resolution insights into cerebrovascular patterning, aneurysm development, and structural changes in endothelial cells and basement membranes. By analyzing "leakage hotspots" and their association with structural endothelial defects in adults the presented findings add novel insights into how mural cell loss may lead to vascular instability.

      Weaknesses:

      The study uses quantitative tracer assays with multiple molecular weight dyes to evaluate blood-brain barrier (BBB) permeability. The study normalizes the intensity of tracer signals (e.g., 10 kDa, 70 kDa dextrans) in the brain parenchyma to the vascular signal of a 2000 kDa dextran tracer (assumed to remain within vessels). Intensity normalization is used to control for variations in tracer injection efficiency or vascular density. This method doesn't directly assess the absolute amount of tracer present in the parenchyma, potentially underestimating leakage severity. As the lack of BBB impairment is a "negative" finding, more rigorous controls or other methods might be needed to corroborate it.

      In response to these and comments from other reviewers, we have now performed further carefully controlled analysis to test leakage of tracers using molecular weights ranging from 1 to 2000 kDa. We have performed additional normalisation approaches (new data in Fig. 2a–d) imaging tracer extravasation together with vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this transgenic reporter for normalisation (as suggested by Reviewer #2). The results of these experiments all supported our initial conclusions (revised Extended Data Fig. 3a–d) further validating the reliability of our method. Furthermore, as suggested by the reviewer analysis of the raw tracer intensity amounts in the parenchyma were also performed with no normalization at all (see Author response image 1). This also supports our conclusion that the BBB is intact in young animals. Finally, we now use our methods to demonstrate that we can detect an immature leaky BBB at 3 dpf and a mature functional BBB at 7 dpf (Fig. 2e-f), a suitable positive control to show that our methods and analyses are reliable.

      Author response image 1.

      Raw intensity values from the parenchyma confirm findings in Figure 2 and Extended Data Figure 3.a–d, Raw mean fluorescence intensity values of extravasated tracers in the midbrain.(a–b) show unnormalized values corresponding to Extended Data Fig. 3a–d, and (c–d) show unnormalized values corresponding to Fig. 1a–d. Unpaired t-tests for 70 and 10 kDa at 14 dpf in (a–b), for 10 kD at 7 dpf, and for 70 kDa at 14 dpf in (c–d). Mann-Whitney tests for 70 and 10 kDa at 7 dpf in (a–b), for 70 kDa at 7 dpf, and for 10 kDa at 14 dpf (c–d), due to non-normal distribution. These data were all generated in genotype blind assays, display variance in signal that is generated between embryos due to injection differences and show no difference between the genotypes analysed in BBB integrity. Comparison of this to normalised data using 2000 kDa tracer or kdrl expression in endothelial cells (Fig. 2 and Extended Data Fig. 3) confirms that normalisation improves the analysis, effectively controlling for embryo-to-embryo differences in delivery of tracer and imaging.

      Reviewer #2 (Public review):

      Summary:

      The authors generated a zebrafish mutant of the pdgfrb gene. The presented analyses and data confirm previous studies demonstrating that Pdgfrb signaling is necessary for mural cell development in zebrafish. In addition, the data support previously published studies in zebrafish showing that mural cell deficiency leads to hemorrhages later in life. The authors presented quantified data on vessel density and branching, assessed tracer extravasation, and investigated the vasculature of adult mice using electron microscopy.

      Strengths:

      The strength of this article is that it provides independent confirmation of the important role of Pdgfrb signaling for the development of mural cells in the zebrafish brain. In addition, it confirms previous literature on zebrafish that provides evidence that, in the absence of pericytes/VSMC, hemorrhages appear (Wang et al, 2014, PMID: 24306108 and Ando et al 2021, PMID: 3431092). The study by Ando et al, 2021 did not report experiments assessing BBB leakage in pdgfrb mutants but in the review article by Ando et al (PMID: 34685412) it is stated that "indicating that endothelial cells can produce basic barrier integrity without pericytes in zebrafish."

      We thank the reviewer for their comments and pointing out literature that we had not cited (this has been corrected in our revised manuscript).

      As noted by other reviewers, our study goes beyond simply confirming previous literature. The quoted section by the reviewer from Ando et al 2021 regarding intact barrier integrity in pdgfrb mutants is a conclusion based on apparent lack of haemorrhages in pdgfrb mutants[1]. Our work shows haemorrhages in older animals and as such is in line with these previously published results, but it also extends previous work, for the first time reporting detailed functional analysis to assess BBB integrity. Our study uses definitive tracer assays (now including extensive revisions) to identify intact the BBB in pdgfrb mutants in live animals. This has not been previously described and is important because it offers a new perspective on the evolutionary conservation (or otherwise) of pericyte control of BBB function. Furthermore, our study investigates the nature of hotspot leakage and haemorrhages in more detail than in previous work.

      Weaknesses:

      (1) The authors should avoid using violin plots, which show distribution. Instead, they should replace all violin plots in the figures with graphs showing individual data points and standard deviation. For Figure 2f specifically, the standard deviation in the analyzed cohort should be shown.

      This is a good point and we have replaced the violin plots with individual data points and shown all data as mean±SEM.

      (2) The authors have not shown the reduced PDGFRB protein or the effect of mutation on mRNA level in their zebrafish mutant.

      Our pdgfrb<sup>uq30bh</sup> mutant allele introduces a mutation predicted to generate a truncated protein very similar to previously validated alleles (see detail in revised Extended Data Fig. 1a and methods). Our pdgfrb<sup>uq30bh</sup> mutant also phenocopies previous pdgfrb mutants (sa16389 and um148 alleles)[2,3], displaying mural cell loss with multiple markers (Fig. 1a, new data in Extended Data Fig. 1b–c, Fig. 3b–c; Extended Data Fig. 4c–d) and the same typical morphological defects and survival rates (new data in Extended Data Fig. 1d–f). Thus our mutant phenocopy gives confidence it is most likely a null allele, in line with previous papers studying presumed null alleles[1].

      We believe this provides sufficient confidence in this allele of pdgfrb. Moreover, considering that our manuscript focusses on loss of mural cells and we show definitively that this mutant has robust loss of mural cells in the brain, our mutant is suitable for this study.

      (3) Statistical data analysis: Did the authors perform analyses to investigate whether the data has a normal distribution (e.g., Figures 1d, e)?

      We thank the reviewer for raising this and apologise for this oversight. All data have now been assessed for normality using Shapiro-Wilk test and further statistical analyses have been performed accordingly. The specific quantifications referred to by the reviewer in Extended Data Fig. 3a–d (previously Fig. 1d-e), have normal distribution except for quantification measuring 70 kDa extravasation at 7 dpf, therefore Mann-Whitney test has been used for this comparison. Further information can be found in figure legends and methods.

      (4) Analysis of tracer extravasation. The use of 2000 kDa dextran intensity as an internal reference is problematic because the authors have not provided data demonstrating that the 2000 kDa dextran signal remains consistent across the entire vasculature. The authors have not provided data demonstrating that the 2000 kDa dextran signal in vessels exhibits acceptable variance across the vasculature to serve as a reliable internal reference. The variability of this signal within a single animal remains unknown. The presented data do not address this aspect.

      We thank the reviewer for their comment and agree that analysis was needed for showing 2000 kDa dextran as a reliable normalization signal.

      We now show the data in the following Figures that demonstrate the consistency of signal throughout the vasculature using this 2000-kDa tracer: Extended Data Fig. 2b, Extended Data Fig. 3a and c, Extended Data Fig. 5a, Extended Data Fig. 6. In fact, we observe that this 2000 kDa tracer provides a very reliable marker of large and small calibre vessels in larval, juvenile and adult animals, even in fixed and cleared whole tissues and animals (e.g. Extended Data Fig. 2d-e, Extended Data Fig. 5 and 6).

      Our further experiments and analysis support the use of this tracer as an ideal way to normalise for variation between animals and coupled with improved masking of vessels using transgenic labels (e.g. Extended Data Fig. 2b) we can quantify across whole vascular networks to reduce the concern about variation within individual animals. We also find 2000 kDa shows negligible leakage through the brain vessels Extended Data Fig. 2b–c (new data) at 2 hours post-injection (hpi) and provided images in Extended Data Fig. 6b–b′′ showing detectable signals even at 6 hpi. Finally, results generated with this approach, normalisation to transgenic markers or even raw parenchymal values of tracer intensity, generate the same conclusions. In addition, we point the reviewer to a recent pre-print that further validates this method from our team[4].

      Overall, we find the use of this tracer an ideal way to normalise for differences in injection volumes between animals and we recommend the use of this method to other groups assessing BBB leakage in zebrafish.

      Additionally, it's intriguing that the signal intensity in the parenchyma of the tested tracers presents a substantial range, varying by 20-30% in the analysed cohort (Figure 1g, Extended Figure 1e). Such large variability raises the question of its origin. Could it be a consequence of the normalization to 2000 kDa dextran intensity which differs between different fish? Or is it due to the differences in the parenchymal signal intensity while the baseline 2000 kDa intensity is stable? Or is the situation mixed?

      This is a good point raised by the reviewer.

      To address this, we have used the following approaches:

      (1) We provide additional experiments and normalisation methods that support the utility of our tracer studies (new data in Fig 2a–f and Extended Data Fig. 2b–c), discussed in detail below.

      (2) We provide graphs of the raw parenchymal distribution of tracer not normalised at all (also requested by reviewer 1). This is provided in Author response image 1 and further supports all our conclusions, showing that our normalisation methods generate meaningful data.

      Overall, the range of parenchymal intensity that we see after tracer injection and live imaging shows variations introduced during microinjection. However, these ranges are in-line with previous publications using similar methods (see studies by O’Brown et al 2019 and 2023)[5,6], allow reliable statistical comparisons to be drawn between control and mutants and allow us to detect both immature and functional BBB states during zebrafish development (new data in Fig. 2e-f).

      Of note, the variability we see is likely introduced during the injection process into tiny larval blood vessels and is the reason why we perform normalization of parenchymal tracers to a vascular dextran signal that doesn’t leak from brain vessels. In our studies, 2000-kDa dextran has been co-injected with the smaller size tracers, therefore any potential differences in injection volumes as well as imaging conditions (however consistent) should be reduced by this method.

      An alternative and potentially more effective approach would be to cross the pdgfrb mutant line with a line where endothelial cells are genetically labeled to define vessels (e.g. the line kdrl used in acquiring data presented in Figure 2a). Non-injected controls could then be used as a baseline to assess tracer extravasation into the parenchyma.

      We thank the reviewer for this suggestion.

      In response, we have performed new tracer leakage experiments at 7 and 14 dpf in siblings and pdgfrb mutants and quantified parenchymal tracer extravasation by normalizing to vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). The results were in-line with the previously presented and independent experiments and showed indistinguishable phenotypes between siblings and pdgfrb mutants (new data, Fig. 2a–d). We also used uninjected controls to assess baseline and saw consistent values approaching zero in these images and did not include this in the revised paper.

      Furthermore, we have also used this approach in wild-type larvae at 3 dpf (immature BBB) and 7 dpf (functional BBB)[5]. We detected significantly higher parenchymal extravasation of 10 and 70 kDa tracers at 3 dpf compared to 7dpf, demonstrating that our method can detect leakage (new data, Fig. 2e–f).

      We believe that both normalization approaches have advantages (as discussed above), therefore showing the same results with these two different approaches has further strengthened our findings.

      How is the data presented in Figure 3e generated? How was the dextran intensity calculated? It looks like the authors have used the kdrl line to define vessels. Was the 2000 kDa still used as in previous figures? If not, please describe this in the Materials and Methods section.

      We have moved this data to Fig. 4e (previously Fig. 3e).

      Previously, we had plotted raw data due to the nature of the experiment being conducted on a vibratome sectioned tissue. The 2000 kDa tracer was not used. In response to this query and to be consistent with the new approach suggested by the reviewer, we have revised the quantification by normalizing the 10 kDa tracer extravasation to Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) for this and the new experiments on juveniles (Fig. 5h–i). Please see the corresponding figure legends or revised methods (lines 464–472).

      (5) The authors state that both controls and mutants show extravasation of 1 kDa NHS-ester into the parenchyma. However, the presented images do not illustrate this; it is not obvious from these images (Extended Data Figure 1c). Additionally, the presented quantification data (Extended Data Figure 1e) do not show that, at 7 dpf, the vasculature is permeable to this tracer. Note that the range of signal intensity of the 1 kDa NHS-ester is similar to the 70 kDa dextran (Figure 1g and Extended Figure 1e). Would one expect an increase in the ratio in case of extravasation, considering that the 2000 kDa dextran has the same intensity in all experiments? Please explain.

      We thank the reviewer for raising this important point.

      To clarify, we have never claimed that “2000-kDa dextran has the same intensity in all experiments”. On the contrary, vascular 2000 kDa normalization has been used to account for potential differences caused by injection, as stated in the submitted supplementary materials and now made more clear in the revision.

      In response to this query, we conducted more detailed analysis on tracer extravasation patterns based on molecular weight (new data, Extended Data Fig 2b–c). This analysis showed that 1- and 10-kDa tracers have much higher extravasation rate compared to 70- and 2000-kDa tracers. Interestingly, we did not find a significant difference between 1 and 10 kDa extravasation. Therefore, in the revised manuscript we used only 10 kDa in further experiments and have removed 1 kDa from the figures.

      To assess the tracers individually (new data in Extended Data Fig. 2c), parenchymal extravasation of individual tracers was normalised to their own vascular signal (eg. Mean intensity of 10 kDa in midbrain/mean intensity of 10 kDa in vasculature), to account for potential differences in injection volume. This provides a suitable method to assess leakage in wild-type animals and is now in line with how previous studies have analysed such tracer injections[5,6]. Please see revised figure legends and supplementary materials for details.

      (6) The study would be strengthened by a more detailed temporal analysis of the phenotype. When do the aneurysms appear? Is there an additional loss of VSMC?

      We thank the reviewer for this suggestion, and we have now performed staged imaging of the pdgfrb mutants and siblings between 7 and 21 dpf using TgBAC(acta2:EGFP)<sup>uq17bh</sup> transgene (new data, Fig. 3b-c; Extended Data Fig. 4a–d). Consistent with previous results, acta2:EGFP-positive cells surrounding the middle mesencephalic central arteries (MMCtA) were missing in pdgfrb mutants. At 21 dpf, we have also observed a mild dilation of these vessels, likely the earliest changes to generate aneurysms (new data, Fig. 3c).

      To extend the number of stages analysed in this study, we have also performed new tracer leakage experiments in juveniles (30 dpf) and found that aneurysms can be detected at this age when the 10 kDa tracer is used (new data in Fig. 5b–b′). Consistent with the adult stage phenotype, aneurysms were limited to the larger calibre vessels (arteries) in the brain. We have also observed hotspots, and upon quantification, we found fewer numbers in juveniles compared to adults, suggesting that severity of aneurysms and hotspots increase with age.

      Taken together, our results show that the aneurysms in pdgfrb mutants start appearing at late larval/early juvenile stages (~21 dpf) with observable dilations. By 30 dpf, aneurysms accompanied by small numbers of hotspots are observed, which exhibits significantly increased numbers by adulthood. This also correlates with reduced development and survival rate of pdgfrb mutants after 30 dpf (new data, Extended Data Fig. 1d–e).

      (7) The authors intended to analyze the BBB at later stages (line 128), but there is not a significant time difference between 2 months (Figure 2) and 3 months (Figure 3) considering that zebrafish live on average 3 years. Therefore, the selection of only two time-points, 2 and 3 months, to analyze BBB changes does not provide a comprehensive overview of temporal changes throughout the zebrafish's lifespan. How long do the pdgfb mutants live?

      Respectfully, zebrafish transition from juvenile stages to adulthood between 2 and 3 months and there are many significant differences in the physiology of this organism at these two ages. At 2 months, zebrafish are still juveniles undergoing metamorphosis with rapid growth and ongoing skeletal and vascular development. By 3 months, they are sexually mature adults and have much more developed cranioskeletal and vascular systems. Having said that, we take the reviewers important point that further temporal resolution would improve the study.

      We have performed new experiments in 1-month-old animals and provided comprehensive analysis of the vascular phenotypes occurring in pdgfrb mutants. These were very informative experiments analysing leakage using 10-kDa tracer injections and have significantly improved the study. We had previously provided experiments at 5-month-old adults as well (previously Fig. 4a–b and Extended Data Fig. 4a) and so now the study includes larval stages (7, 14 dpf), juveniles at 1 and 2 months and adults at 3 and 5 months. While the additional timepoints did not offer up any new conclusions, they significantly enhanced the body of work overall.

      Of further note, we provided survival data up to 90 dpf where survival of the pdgfrb mutants is significantly reduced compared to siblings (Extended Data Fig. 1e). We believe this is associated with the severity of the aneurysms and haemorrhages which probably lead to lethality in these mutants.

      (8) Why is there a difference in tracer permeability between 2 and 3 months (Figures 2 and 3)? Are hemorrhages not detected in 2-month-old zebrafish?

      In response to this and other queries, we have added new additional experiments that provide more detailed temporal analysis on tracer accumulation (new data in Fig. 5b–c, Fig. 5f–g).

      In short, we do not see obvious haemorrhages in 1- or 2-month fish at a gross level during dissections (not shown). We find that using 10-kDa tracer, we can detect small hotspots at aneurysms as early as 1 month, likely representing the earliest loss of integrity. We do not see obvious hotspots in 2-month-old animals when we use the 70-kDa tracer, this suggests to us that it is less sensitive for hotspot detection (in line with new Extended Data Fig. 2c). Finally, we find that the number of hotspots increases dramatically from Juvenile to Adult stages in our datasets, which we take as indicative of a progressive phenotype.

      Overall, tracer size matters for detecting hotspots and they become more apparent in older animals - we have added a note in the main text to cover these points (lines 200–205)

      (9) Figure 3: The capillary bed should be presented in magnified images as it is not clearly visible. Figure 3e shows that in the pdgfb mutant the dextran intensity is higher also in regions 6-10. How do the authors explain this?

      We thank the reviewer for raising this important point.

      Firstly, we now include enlarged views of the capillary beds for this experiment (Fig. 4d′) and new experiments mentioned below.

      Secondly, in relation to why there is higher tracer in lateral locations and not just medial sites of haemorrhage, we believe that this is most likely due to the progressive spread of tracer from the medial hotspots. To test if this is likely, we performed additional experiments and tested tracer accumulation at 2 different timepoints in brains collected at 0.5 or 6 hpi (new data in Fig. 5f–g, Extended Data Fig. 6a–b′′). Tracer accumulation at 0.5 hpi was very minimal and was primarily limited to hotspots and nearby regions new data in (Fig. 5h), whereas a higher tracer accumulation in brains was observed across medial to lateral regions at 6 hpi (new data in Fig. 5i) in pdgfrb mutants. Comparing the data in Figure 4 (2 hpi) and new data in Figure 5i (6 hpi), the 10 kDa-tracer appears to have spread to more lateral locations given the increased time allowed post injection.

      We cannot formally exclude the possibility that tracer leakage does occur slower through capillaries than at major hotspots, which might fit with the proposed model of slow leakage via increased EC transcytosis[7-9]. However, considering that we cannot detect increased tracer accumulation in pdgfrb mutants that lack aneurysms and haemorrhages at 7 and 14 dpf, such a scenario would require capillary transcytosis to be active at later juvenile and adult stages but not in larval and late larval animals. Thus, we believe the most plausible explanation is that aneurysm/haemorrhage associated leakage is the primary cause of the vascular integrity defects in zebrafish pdgfrb mutants.

      We have added discussions addressing this in the revised manuscript (lines 220–230, 300–302).

      (10) In general, the manuscript would benefit from a more detailed description of the performed experiments. How long did the tracer circulate in the experiments presented in Figures 2, 3, and 4?

      We thank the reviewer for this suggestion and have now ensured that this is clearly described for in figure legends and methods (lines 391–395).

      (11) How do the authors explain the poor signal of the 70 kDa dextran from the vasculature of 5-month-old zebrafish presented in Extended Data Figure 3?

      We agree that the dextran signal was reduced compared to the other experiments in that Figure. This is likely due to sample preparation and clearing causing reduced fluorescence. Upon consideration of the presented data and the additional experiments using 10 kDa tracers providing further validations for our claims, we decided to remove this data from the paper.

      (12) The study would benefit from a clear separation of the phenotypes caused by the loss of VSMC. The title eludes that also capillaries present hemorrhages which is not the case. How do vascular mural cells differ from mural cells? Are there any other mural cells?

      We take the reviewers point and have now updated the title as "Mural cells protect the adult brain from haemorrhage but do not control the blood-brain barrier in developing zebrafish."

      (13) I have a few comments about how the authors have interpreted the literature and why, in my opinion, they should revise their strong statements (e.g., the last sentence in the abstract).

      Scientists have their own insights and interpretations of data. However, when citing published data, it should be clearly indicated whether the statement is a direct quote from the original publication or an interpretation. In the current manuscript, the authors have not correctly cited the data presented in the two published papers (references 5 and 6). These papers do not propose a model where pericytes suppress "adsorptive transcytosis" (lines 73-76). While increased transcytosis is observed in pericyte-deficient mice, the specific type of vesicular transport that is increased or induced remains unknown.

      Similarly, lines 151-152 refer to references 5 and 6 and use the term "adsorptive transcytosis," but the authors of both papers did not use this term. Attributing this term to the original authors is inaccurate. Additionally, lines 152-153 do not accurately represent the findings of references 5 and 6. These papers do not state that there is an induction of "caveolae" in endothelial cells in pericyte-deficient mice. In the absence of pericytes, many vesicles can be observed in endothelial cells, but these vesicles are relatively large. It is more likely that there is some form of uncontrolled transcytosis, perhaps micropinocytosis. Please refer to the original papers accurately.

      We thank the reviewer for these comments. We take the point and have rewritten the manuscript carefully to improve accuracy and avoid misrepresenting any previous claims made in specific papers.

      Also, the authors have missed the fact that in mice, the extent of pericyte loss correlates with the extent of BBB leakage. To a certain extent, the remaining pericytes, can compensate for the loss by making longer processes and so ensure the full longitudinal coverage of the endothelium. This was shown in the initial work of Armulik et al (reference 5) and later in other studies.

      We certainly did not miss this important point (as we are also working with these mouse models) and we now include reference to this in our expanded discussion. Of note, we do think it would be worthwhile assessing if the extent of BBB leakage and pericyte coverage also correlates with the presence of microhaemorrhages in these hypomorphic mouse models, although this is more challenging to do in mice than in zebrafish.

      The bold assertion on lines 183 -187 that a lack of specific BBB phenotype in pdgfrb zebrafish mutant invalidates mouse model findings is unfounded. Despite the notion that zebrafish endothelium possesses a BBB, I present a few examples highlighting the differences in brain vascular development and why the authors' expectation of a straightforward extrapolation of mouse BBB phenotypes to zebrafish is untenable.

      In mice Pdgfrb knockout is lethal, but in zebrafish, this is not the case. In marked contrast to mice, however, zebrafish pdgfrb null mutants reach adulthood despite extensive cerebral vascular anomalies and hemorrhage. Following the authors' argumentation about the unlikely divergence of zebrafish and mice evolution, does it mean that the described mouse phenotype warrants a revisit and that the Pdgfrb knockout in mice perhaps is not lethal? Another example where the role of a gene product is not one-to-one, which relates to pericyte development, is Notch3. Notch3-null mice do not show significant changes in pericyte numbers or distribution, suggesting a less prominent role in pericyte development compared to zebrafish.

      Although many aspects of development are conserved between species, there are significant differences during brain vascular development between zebrafish and mice. These differences could reveal why the BBB is not impaired in zebrafish pdgfrb mutants. There is a difference in the temporal aspect when various cellular players emerge. The timing of microglia colonization in the brain differs. In mice, microglia colonization starts before the first vessel sprouts enter the brain, while in zebrafish, microglia enter after. Additionally, microglia in zebrafish and mice have a different ontogeny. In mice, astrocytes specialize postnatally and form astrocyte endfeet postnatally. In zebrafish, radial glia/astrocytes form at 48 hpf, and as early as 3 dpf, gfap+ cells have a close relationship with blood vessels. Thus, these radial glia/astrocyte-like cells could play an important role in BBB induction in zebrafish. It's worth noting that in Drosophila, the blood-brain barrier is located in glial cells. While speculative, these cells might still play a role in zebrafish, while the role of pericytes does not seem to be crucial. Pericytes enter the brain and contact with developing vasculature (endothelium) relatively late in zebrafish (60 hpf). In mice, the situation is different, as there is no such lag between endothelium and pericyte entry into the brain. I suggest that the authors approach the observed data with curiosity and ask: Why are these differences present? Are all aspects of the BBB induced by neural tissue in zebrafish? What is the contribution of microglia and astrocytes?"

      Another interesting aspect to consider is the endothelial-pericyte ratio and longitudinal coverage of pericytes in the zebrafish brain, and how this relates to what is observed in mice. How similar is the zebrafish vasculature to the mouse vasculature when it comes to the average length of pericytes in the zebrafish brain? Does the longitudinal coverage of pericytes in the zebrafish brain reach nearly 100%, as it does in mice?

      Based on the preceding arguments, it is recommended that the authors present a balanced discussion that provides insightful discussion and situates their work within a broader framework.

      Overall, we agree with most of the points made by the reviewer above. As we have now extended the format of this paper to be a full article, we have space to provide an extended discussion and introduction. We now try to capture many of the points made by the reviewer and we think that this has significantly improved the paper. We thank the reviewer for this contribution.

      We do want to point out that we did not state that our findings using zebrafish pdgfrb mutants invalidate mouse model findings. We suggest that a deeper analysis to understand the nature of the hotspots in mural cell deficient mammalian models could be very interesting in light of the zebrafish observations. We hope that the revised discussion better reflects this.

      Reviewer #3 (Public review):

      This manuscript examines the role of pdgfrb-positive pericytes in the establishment and maintenance of the blood-brain barrier (BBB) in the zebrafish. Previous studies in PDGFB- or PDGFRB-deficient mice have suggested that loss of pericytes results in disruption of the BBB. The authors show that zebrafish pdgfrb mutant larvae have an intact BBB and that pdgfrb mutant adult fish show large vessel defects and hemorrhage but do not exhibit substantial leakage from brain capillaries, suggesting loss of pericytes is not sufficient to "open" the BBB. The authors use beautiful and compelling images and rigorous quantification to back up most of their conclusions. The imaging of the adult brain is particularly nice. The authors rigorously document the lack of BBB leakage in pdgfrbuq30bh mutant larvae and large vessel phenotypes (eg, enlargement and rupture) in pdgfrbuq30bh mutant adults. A few points would help the authors to further strengthen their findings contradicting the current dogma from rodent models.

      We appreciate the reviewer's comments on the manuscript overall and agree that addressing the raised points was needed to strengthen our findings. We have addressed the main points below and believe that this revision greatly improves this study.

      Major point:

      The authors document pericyte loss using a single TgBAC(pdgfrb:egfp)ncv22 transgenic line driven by the promoter of the same gene mutated in their pdgfrbuq30bh mutants. Given their findings on the consequences of pericyte loss directly contradict current dogma from rodent studies, it would be useful to further validate the absence of brain pericytes in these mutants using one of several other transgenic lines marking pericytes currently available in the zebrafish. This could be done using pdgfrb crispants, which the authors show nicely phenocopy the germline mutants, at least in larvae. This would help nail down the absence of any currently identifiable pericyte population or sub-population in the loss of pdgfrb animals and substantially strengthen the authors' conclusions.

      We thank the reviewer and agree that examination of pdgfrb<sup>uq30bh</sup> mutants using another transgenic line labelling pericytes would further validate the absence of brain pericytes. We generated a transgenic line, TgBAC(abcc9:abcc9-T2A-mCherry)<sup>uom139</sup>, to visualise pericytes and validated the absence of brain pericytes in the pdgfrb mutants (revised Extended Data Fig. 1b). The loss of brain pericytes matched our findings using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line as well as previously published data by Ando et al 2016-2021, where the brain pericytes except for metencephalic artery were missing[2,3].

      Other issues:

      The authors should provide more information about the pdgfrbuq30bh mutant and how it was generated (including a diagram in a supplemental figure would be useful).

      We thank the reviewer for this suggestion. In addition to the explanations provided in supplementary materials, we have added a schematic, provided sanger sequencing results showing the mutation as well as predicted effect of the mutation on the protein domains (Extended Data Fig. 1a).

      It would be helpful to show some data on whether mutants show morphological phenotypes or developmental delay at 7 and 14 dpf, to provide some context to better assess the reduced branching and vessel length vascular phenotypes (see Figures 1c-e).

      We thank the reviewer for this suggestion. We have provided further details on body length and survival of the pdgfrb mutants until 90 dpf. As reported by Ando et al 2021, we did not observe any distinguishing feature until about 30 dpf[1,3]. The adult anatomy of our mutant allele matches that of previously described null mutants and is now shown (Extended Data Fig. 1f).

      If available, it would be helpful to have a positive control for the tracer leakage experiments - a genetic manipulation that does cause disruption of the BBB and leakage at 2 hours post-tracer injection (see Figures 1f and g).

      We thank the reviewer for this suggestion and agree that a positive control would validate reliability of our method. We have performed new experiments at 3 dpf when BBB integrity is not yet established and at 7 dpf when BBB is functional in zebrafish[5], testing both 10 and 70 kDa tracers (new data in Fig. 2e–f). We detected significantly higher tracer accumulation at 3 dpf, showing that our methods can detect tracer leakage in the brain.

      Quantification of the findings in Figure 4c, d would be useful, as would the use of germline fish for these experiments if these are now available. If this is not possible, it would be helpful to document that the crispants used in these experiments lack pdgfrb:egfp pericytes at adult stages (this is only shown for 5 dpf larvae, in Extended Data Figure 4b).

      We thank the reviewer for this comment. Using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line, we have imaged coronal brain sections collected from 10-week old pdgfrb crispants and uninjected siblings (age-matched animals used in Fig. 5d–e, previously Fig. 4c–d). We have now included data showing that adult pdgfrb crispants lack brain mural cells, phenocopying pdgfrb<sup>uq30bh</sup> mutants (new data, Extended Data Fig. 6f). These particular crispants are very reliable in our hands and nicely reproduce stable mutant phenotypes, giving us confidence to use the faster F0 approach in this experiment.

      Adult mutants clearly show less dye leakage in the more superficial capillary regions than WT siblings, but dextran intensity is a bit higher, although this could well be diffusion from more central brain regions where overt hemorrhage is occurring. Along similar lines though, the authors' TEM data in Extended Data Figure 4d hints that there may be more caveolae in mutant brain capillaries, although the N number was lower here than for the measurements from TEM of larger central vessels (Figure 4g). It would be useful to carry out additional measurements to increase the N number in Figure 4d to see whether the difference between wild-type sibling and mutant capillary caveolae numbers remains as not significant.

      We thank the reviewer for these raising important points and suggestions.

      Firstly, in relation to signal in capillary regions and likely diffusion from hotspots, please see the response to reviewer 3 point 9 above.

      Secondly, we have imaged and analysed more capillaries in both pdgfrb mutants and siblings (Extended Data Fig. 7a–b, previously Extended Data Fig. 4d). The results showed no significant difference between these groups, suggesting that capillary EC transcytosis is unchanged in our pdgfrb mutants.

      It might be helpful to include some orienting labels and/or additional descriptions in the figure legends to help readers who are not used to looking at zebrafish brain vessels have an easier time figuring out what they are looking at and where it is in the brain.

      We thank the reviewer for this suggestion and agree that adding further information in the figure legends and illustrations about orientation would make it easier for readers. In addition to the information provided in the figure legends in the submitted version, we have added an illustration, more labels on the revised figures, extended the descriptions in figure legends, main text and methods.

      We have added a schematic depicting the tracer leakage assay workflow, orientation of live imaging and analysed region of interest (Extended Data Fig. 1a–b).

      All figure legends have been updated with the anatomical position and microscopy view.

      Additional labels on figures have been added to understand the referenced vessel names (new data in Fig. 3c and Extended Data Fig. 4a–b′).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study uses the intensity of tracer signals within the vessels to analyze BBB permeability, potentially underestimating leakage severity. The dye intensity is measured 2 hours after injection, however, other studies have already observed leakage after 30 Minutes, by imaging directly in the brain parenchyma. The overall intensity should also decrease through leakage from the other vessels of the body, e.g. in the trunk and tail. Probably the loss of intra-vascular dye intensity from leakage in barrier-free vessels is already so high (after 2 hours) that the smaller amount of leakage across the BBB cannot be observed.

      We thank the reviewer for this comment and suggestion. We agree that small sized tracers leak from vasculature, particularly through fenestrated vessels in the trunk and tail. We have based our timing on previous studies and our own experience. In zebrafish, the study by O’Brown et al 2019 also used 2 hpi[5] for detection of leakage in mfsd2aa mutants, which also has been proposed to regulate BBB integrity by controlling EC transcytosis. Therefore, we believe that performing experiments at 2 hpi is appropriate to investigate roles of pericytes in BBB integrity. Our data would suggest that this timing works.

      In response to this and other comments, we performed further experiments and analyses to test leakage of tracers testing molecular weights ranging from 1 to 2000 kDa individually. We showed that these tracers can reliably be detected in brain parenchyma and vasculature when imaged at 2 hpi. In another study, we showed that medium size tracers such as 40 kDa Dextran can be reliably detected in the vasculature in similar timepoints[10]. Considering we have performed experiments using 10 and 70 kDa tracers do detect parenchymal tracer accumulation and tracer still within the vessels, we believe this timepoint is appropriate for assessing BBB integrity in zebrafish.

      In addition to these experiments, see our tracer leakage experiments in 1-month-old animals, at 0.5 and 6 hpi to test leakage pattern described above (Fig. 5 and Extended Data Fig. 6).

      Therefore, the authors will need to validate their method of choice, showing an impairment of the BBB, caused by other agents (known to affect the BBB), and at 48hpf, when the BBB is not tightened yet. One example for BBB impairment can be found in O'Brown et al (2019), eLife 8e47326. doi: 10.7554/eLife.47326

      We thank the reviewer for this suggestion. As shown by O’Brown et al 2019, we have performed experiments at 3 dpf when BBB integrity is not mature and at 7 dpf when BBB is functional[5], testing both 10 and 70 kDa tracers. We detected significantly higher tracer accumulation at 3 dpf, showing our new additional method (see below) can detect tracer leakage in the brain (new data in Fig. 2e–f).

      Ideally, the authors would also supplement the method with additional approaches in the younger developmental stages to validate their findings.

      The validation of the method and the findings is particularly important for the claims of lack of BBB impairment in the absence of mural cells, as this is a "negative" finding.

      In response to this and comments from other reviewers, we performed additional tracer leakage experiments (new data in Fig. 2a–d) where we imaged 10 and 70 kDa tracers with a vascular reporter (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this reporter for normalisation. Both this approach as well as the experiments provided in the first submission (updated as Extended Data Fig. 3a–d) showed that pdgfrb mutants at 7 and 14 dpf have indistinguishable BBB integrity compared to siblings. See also Author response image 1 that further addresses this.

      I also strongly suggest to rephrase and downtown the claim that vascular mural cells do not control the blood-brain barrier in developing zebrafish.

      As a negative finding cannot be proven completely and lots of the previously shown effects on murine BBB impairment are rather weak (when caused by single agents such as Claudin5 deficiency or Sphingosine-phosphate receptor1 knockout), it might be important to only claim that in zebrafish no strong impairment (as observed in the mural cell-deficient mouse) could be observed. Or rephrase it to "no impairment as severe as/comparable to ... could be observed" and then provide an impairment control for the developmental stages.

      We thank the reviewer for this comment and agree that negative findings are very challenging to prove. However, we find no evidence of leakage of the BBB in animals lacking mural cells at 7 and 14 dpf and believe that our data is robust on this point. As such, we believe we show that a vertebrate with a largely conserved EC BBB, can have intact barrier function in the absence of mural cells.

      We have as suggested revised our claims throughout the manuscript to provide more further nuanced discussion of this, but we do not want to water down our claims too much as we believe they are important. We hope that the reviewer will appreciate our carefully worded and expanded discussion section.

      Additional items of interest to the readers and therefore suggestions to improve the manuscript could be

      (1) To include more molecular analysis: while the study identifies caveolae induction and basement membrane thickening as potential contributors to focal leakage, the exact molecular mechanisms linking mural cell loss to these structural changes are not deeply investigated.

      (2) Also, the study primarily associates BBB disruption in the adult with aneurysms. Therefore other subtle or diffuse changes to BBB permeability that might occur even without overt vascular lesions are potentially underrepresented.

      However, following up experimentally on these might exceed the scope of the manuscript.

      We thank the reviewer for these suggestions and agree with both points. However, as stated by the reviewer, these experiments are beyond the scope of the manuscript and represent future directions for our lab and others.

      Reviewer #2 (Recommendations for the authors):

      (1) Mouse genes should be written as follows: Pdgfb, Pdgfrb and be in italics. See line line 70: it should be written "Pdgfb and Pdgfrb (italics)" and not "PdgfB and Pdgfrβ".

      We have updated the text according to the reviewer’s suggestion.

      (2) Please state the age of the fish analyzed in Figure 1f and 1g.

      We have moved this data to Extended Fig. 3a–d (previously Fig. 1f-g) and have placed age information on the images and in the figure legends.

      (3) Is the reduced vascular complexity in pdgfb mutant due to reduced angiogenesis or due to excessive pruning?

      This is a good question, and we do not know at this stage. We have unpublished data that suggest pericytes secrete angiogenic growth factors, but this question warrants a thorough investigation that we believe is beyond the scope of this current study.

      (4) Please check that the figure legends state the correct number of fish analysed. For example, Figure 1 d, e N=8 but there seem to be 9 data points per group - 14dpf.

      We apologise for this mistake and thank the reviewer for raising this. We have updated the graphs and figure legends accordingly.

      (5) Please indicate in the figures the genotypes (wt, het) of a sibling presented alongside a pdgfb mutant.

      Wild-type and heterozygous mutants are commonly used together in zebrafish research as a collective control group termed siblings. Since we didn’t see any difference between wild-type and pdgfrbuq30bh/- groups in any experiments, we reported these groups together. This is now stated in the supplementary materials.

      One exception to this was examination of the growth and survival rates where we show the genotypes separately (new data in Extended Data Fig. 1b-f).

      (6) Please explain clearly what region is shown in Figure 2B. I do not understand the explanation "approximate location of dotted line". Is the image in the panel "a" top view of a brain?

      We have moved this data to Fig. 3a′ (previously Fig. 2b) and replaced the dotted line in Figure 3a (previously Fig. 2a) with a white box indicating the location of the restricted region in the whole brain image.

      We have revised the text as below:

      “Subset of z-slices from the whole brain imaging in (a) and (b) (white boxes) indicating mural cell loss and abnormal capillary network patterning. 100-μm-thick maximum intensity projections (MIP) were generated using the continuation of the left middle mesencephalic central artery (MMCtA, arrow) as an anatomical landmark.”

      In addition, we have updated all our figure legends clearly stating the view and anatomical position of the imaged sample.

      (7) Figure 2e: Note that- the dotted areas do not correspond to the areas magnified. Please adjust.

      We have moved this data to Extended Data Fig. 5a (previously Fig. 2e–e′) and updated the location of the white box in 5a shown in enlarged view in 5a′.

      (8) Lines 112 and 114 - Should the indicated figure be Figure 2b-d and Figure 2c-d, respectively, and not Figure 1?

      We thank the reviewer for pointing out this mistake. All the figure legends are now referred to appropriately in the revised manuscript.

      (9) Data presented in Figure 2 and Figure 3 can be consolidated and presented as one Figure.

      We thank the reviewer for this suggestion. After addition of new data and revising the manuscript we have decided to keep these data presented separately.

      (10) Note that Figure 2a,b shows 5-month-old fish, not 2-month-old fish. Additionally, Extended Data Figure 3 shows 5-month-old fish, not 3-month-old fish.

      The stages noted by the reviewer were correctly indicated.

      (11) Figure 2d: Please clarify the definition of a "large vessel".

      We have observed normal morphology in capillaries and noted aneurysms and hotspots in large calibre vessels such as arteries, which become more severe over time. We have revised this across the manuscript accordingly.

      (12) Figure 4a, b: Please explain how the hotspots of leakage were defined based on the extravasated tracer.

      Hotspots of leakage are scored when fluorescent tracer aggregates are clearly observed outside the vessels. Vessel borders were defined using the transgenic lines (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). We have added a clear description in the methods section (lines 473–475).

      Figure 4c: Why were Pdgfrb crispants used and not the mutant line?

      They were used as pdgfrb crispants phenocopy the lack of brain mural cells (Extended Data Fig. 5e, previously Extended Data Fig. 4b) and mutant phenotype reliably and for practical reasons, because they allow faster experiments and reduce fish usage.

      Figure 4e: The magnification of the electron microscopy images does not make it possible to clearly identify caveolae. What was the magnification of the collected images for caveolae analysis? How did the authors ensure that they quantified only caveolae and not other types of vesicles?

      Respectfully, we disagree that the magnification is insufficient as our images were captured and analysed consistent with previous ultrastructural descriptions[11,12]. We based our quantification of caveolae on the size of vesicles observed and define them as circular profiles of less than 100 nm in diameter and were scored as luminal or abluminal based on proximity to each surface membrane (within 500 nm of each surface or in a thin-walled vessel the caveolae closest to each surface) (lines 398–409). Importantly, comparable analyses at similar magnifications have been independently validated in multiple caveola-deficient zebrafish genetic models[4,13]. Interestingly given the reviewers comments above, we do see increased vesicular structures that are larger than caveolae, but we only provide quantification of the caveolae here.

      Reviewer #3 (Recommendations for the authors):

      Congratulations to the authors on their really beautiful imaging and rigorous quantitative documentation of phenotypes - this is a really nicely done study, and could be very important to the field with just a few additional experiments to buttress the key conclusions.

      We thank the reviewer for their kind comments.

      In addition to the comments noted in the public review, I would only point out that there are two mislabeled call-outs in the text (Lines 112 and 114; says Figure 1, should say Figure 2).

      We thank the reviewer for this point and have now revised the text accordingly.

      (1) Ando, K., Ishii, T. & Fukuhara, S. Zebrafish Vascular Mural Cell Biology: Recent Advances, Development, and Functions. Life (Basel) 11 (2021). https://doi.org/10.3390/life11101041

      (2) Ando, K. et al. Clarification of mural cell coverage of vascular endothelial cells by live imaging of zebrafish. Development 143, 1328-1339 (2016). https://doi.org/10.1242/dev.132654

      (3) Ando, K. et al. Conserved and context-dependent roles for pdgfrb signaling during zebrafish vascular mural cell development. Dev Biol 479, 11-22 (2021). https://doi.org/10.1016/j.ydbio.2021.06.010

      (4) Lim, Y. W. et al. Trans-Endothelial Trafficking in Zebrafish: Nanobio Interactions of Polyethylene Glycol-Based Nanoparticles in Live Vasculature. ACS Nano (2026). https://doi.org/10.1021/acsnano.5c21042

      (5) O'Brown, N. M., Megason, S. G. & Gu, C. Suppression of transcytosis regulates zebrafish blood-brain barrier function. Elife 8 (2019). https://doi.org/10.7554/eLife.47326

      (6) O'Brown, N. M. et al. The secreted neuronal signal Spock1 promotes blood-brain barrier development. Dev Cell 58, 1534-1547 e1536 (2023). https://doi.org/10.1016/j.devcel.2023.06.005

      (7) Armulik, A. et al. Pericytes regulate the blood-brain barrier. Nature 468, 557-561 (2010). https://doi.org/10.1038/nature09522

      (8) Daneman, R., Zhou, L., Kebede, A. A. & Barres, B. A. Pericytes are required for blood-brain barrier integrity during embryogenesis. Nature 468, 562-566 (2010). https://doi.org/10.1038/nature09513

      (9) Mae, M. A. et al. Single-Cell Analysis of Blood-Brain Barrier Response to Pericyte Loss. Circ Res 128, e46-e62 (2021). https://doi.org/10.1161/CIRCRESAHA.120.317473

      (10) Lim, Y.-W. et al. A Standardized Protocol to Investigate Trans- Endothelial Trafficking in Zebrafish: Nano-bio Interactions of PEG-based Nanoparticles in Live Vasculature. bioRxiv, 2025.2007.2023.666282 (2025). https://doi.org/10.1101/2025.07.23.666282

      (11) Parton, R. G. & Simons, K. The multiple faces of caveolae. Nat Rev Mol Cell Biol 8, 185-194 (2007). https://doi.org/10.1038/nrm2122

      (12) Parton, R. G. & del Pozo, M. A. Caveolae as plasma membrane sensors, protectors and organizers. Nat Rev Mol Cell Biol 14, 98-112 (2013). https://doi.org/10.1038/nrm3512

      (13) Lim, Y. W. et al. Caveolae Protect Notochord Cells against Catastrophic Mechanical Failure during Development. Curr Biol 27, 1968-1981 e1967 (2017). https://doi.org/10.1016/j.cub.2017.05.06

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to investigate the mechanisms underlying Kupffer cell death in metabolic-associated steatotic liver disease (MASLD). The authors propose that KCs undergo massive cell death in MASLD and that glycolysis drives this process. However, there appears to be a discrepancy between the reported high rates of KC death and the apparent maintenance of KC homeostasis and replacement capacity.

      Strengths:

      This is an in vivo study.

      Weaknesses:

      There are discrepancies between the authors' observations and previous reports, as well as inconsistencies among their own findings.

      Before presenting the percentage of CLEC4F<sup>+</sup>TUNEL<sup>+</sup> cells, the authors should have first shown the number of CLEC4F<sup>+</sup> cells per unit area in Figure 1. At 16 weeks of age, the proportion of TUNEL<sup>+</sup> KCs is extremely high (~60%), yet the flow cytometry data indicate that nearly all F4/80<sup>+</sup> KCs are TIMD4<sup>+</sup>, suggesting an embryonic origin. If such extensive KC death occurred, the proportion of embryonically derived TIMD4<sup>+</sup> KCs would be expected to decrease substantially. Surprisingly, the proportion of TIMD4<sup>+</sup> KCs is comparable between chow-fed and 16-week HFHC-fed animals. Thus, the immunostaining and flow cytometry data are inconsistent, making it difficult to explain how massive KC death does not lead to their replacement by monocyte-derived cells.

      We thank the reviewer for the insightful comment and the opportunity to clarify this important point. To ensure consistency between our methodologies, we replaced Clec4f staining with TIM4 staining results as requested by the reviewer. We first showed the number of TIM4<sup>+</sup> cells per unit area in Figure 1B. The results showed a significant and progressive loss of TIM4<sup>+</sup> cells per unit area in the liver parenchyma, decreasing from approximately 60 cells/FOV at baseline (0w) to nearly 50 at 4w and further to about 30 at 16w post-HFHC diet. This finding is fully consistent with our flow cytometry data. The percentage of the embryonically derived KC population (CD11blow F4/80hi TIM4hi) among CD45<sup>+</sup> cells dropped from 30.2% (0w) to 24.3% (4w) and 17.6% (16w) (Revised Figure 1C). The absolute number per gram of liver decreased from roughly 12 x 10<sup>5</sup> (1w) to 9 x 10<sup>5</sup> (4w) and 5 x 10<sup>5</sup> (16w) (Revised Figure 1D).

      These data suggest that despite the reported high rate of cell death among CLEC4F<sup>+</sup>TIMD4<sup>+</sup> KCs, the population appears to self-maintain, with no evidence of monocyte-derived KC generation in this model, which contradicts several recent studies in the field.

      We appreciate the reviewer’s insightful comment. We agree that our data show no substantial generation of monocyte-derived Kupffer cells (MoKCs) within the 16-week HFHC model. However, we do not believe the remaining embryonic KCs(EmKCs) are maintained through self-renewal, as the proportion of Ki67<sup>+</sup>TIM4<sup>+</sup> cells remains low at all time points (Revised Figure S2D). Instead, our observations align with a phased replacement model: recruited monocytes first differentiate into monocyte-derived macrophages (MoMFs), which we see accumulate (Revised Figure S2B, S2C), and only later adopt a KC phenotype. Consistent with this, our 16-week model shows significant EmKC loss and MoMF expansion, but not yet the emergence of TIM4-MoKCs. This timing is supported by prior studies, where TIM4-KCs were observed at 24 weeks, but not at 16 weeks, on similar diets (Ref. 1,2). Therefore, we interpret our findings as capturing an earlier phase of MASLD progression, characterized by EmKC death and MoMF accumulation, prior to their full differentiation into MoKCs.

      Moreover, there is no evidence that TIM4<sup>+</sup>CLEC4F<sup>+</sup> KCs increase their proliferation rate to compensate for such extensive cell death. If approximately 60% of KCs are dying and no monocyte-derived KCs are recruited, one would expect a much greater decrease in total KC numbers than what is reported.

      Thank you for raising this point, which allows for an important clarification. The interpretation that approximately 60% of KCs are dying is correct, but this refers to the proportion of the remaining KC population at 16 weeks that is TUNEL<sup>+</sup>, not to 60% of the original KC pool. Since our data show that over half of the EmKCs are lost by 16 weeks (Revised Figure 1B), the 60% of dying cells at this late time point corresponds roughly to only 25-30% of the total original KC population at baseline. This distinction reconciles the high rate of apoptosis observed late in disease with the overall progressive depletion of the EmKC pool.

      It is also unexpected that the maximal rate of KC death occurs at early time points (8 weeks), when the mice have not yet gained substantial weight (Figure 1B). Previous studies have shown that longer feeding periods are typically required to observe the loss of embryo-derived KCs.

      We appreciate the reviewer’s insightful observation. We think KC death is a continuous event during MASLD. To induce MASH, previous studies typically assess the loss of EmKCs after longer feeding periods, which might leave us an impression that longer feeding periods are required to observe substantive loss of embryonically derived KCs. In our HFHC model, the proportion of dying KCs was already elevated by 8 weeks, and this high rate was sustained through the 16-week endpoint. In a separate MCD dietary model characterized by rapid MASLD progression, a high rate of KC death was detectable as early as 6 weeks (Revised Figure 1F). Collectively, these data suggest that the onset of significant KC death is dependent on the pace of MASLD pathogenesis, more likely an early-initiated event that is through MASLD progression.

      Furthermore, it is surprising that the HFD induces as much KC death as the HFHC and MCD diets. Earlier studies suggested that HFD alone is far less effective than MASH-inducing diets at promoting the replacement of embryonic KCs by monocyte-derived macrophages.

      We appreciate the reviewer’s insightful comment. In our study, we observed significant KCs death under both HFD and HFHC feeding for 20, 16 weeks, respectively. Moreover, both HFHC and HFD induced similar stages of MASLD (characterized by significant lipid accumulation without fibrosis development) by these time points (Authir response image 1). Therefore, these data support that the onset of substantial KCs death may be an early MASLD event, before the progression to MASH. Additionally, this finding aligns with existing literature showing that 16 weeks of HFD feeding alone is sufficient to cause a marked reduction in the TIM4<sup>+</sup>KCs population (Ref. 1).

      Author response image 1.

      Detection of liver fibrosis in MASLD mouse models. Male wild-type C57BL/6J mice were fed a high-fat, high-cholesterol (HFHC) diet for 16 weeks or a high-fat diet (HFD) for 20 weeks to induce MASLD. Mice fed a normal chow diet (NCD) served as controls. (A) Sirius Red staining of liver sections was performed to assess collagen deposition and fibrosis during MASLD progression. Scale bar, 20 μm. (B) Western blot analysis of liver tissue lysates showing α-smooth muscle actin (α-SMA) expression as a marker of hepatic stellate cell activation and liver fibrosis.

      In Figure 2D, TIMD4 staining appears extremely faint, making the results difficult to interpret. In contrast, the TUNEL signal is strikingly intense and encompasses a large proportion of liver cells (approximately 60% of KCs, 15% of hepatocytes, 20% of hepatic stellate cells, 30% of non-KC macrophages, and a proportion of endothelial cells is also likely affected). This pattern closely resembles that typically observed in mouse models of acute liver failure. Given this apparent extent of cell death, it is unexpected that ALT and AST levels remain low in MASH mice, which is highly unusual.

      Thank you for this important feedback. To address concerns about the clarity of our imaging, we have provided high-resolution split-channel raw images for Figure 2D (Revised Figure 2D), which distinctly show the localization of TIM4, TUNEL, and GS. These confirm the progressive reduction of TIM4<sup>+</sup>KCs and the increase in TUNEL<sup>+</sup> TIM4<sup>+</sup>cells over time. We agree that the high proportion of TUNEL<sup>+</sup>cells seems at odds with the modest ALT/AST elevation. This discrepancy might be explained by the distinct nature of cell death in MASLD. Unlike the acute necrosis with membrane rupture seen in acute liver failure—which causes massive, rapid enzyme release— obesity-related liver injury is a chronic process dominated by apoptosis (Ref. 4,5). Apoptosis preserves membrane integrity until late stages (Ref. 6), with dying cells packaged into apoptotic bodies for efficient phagocytic clearance by neighboring macrophages (Ref. 7,8). This controlled disposal system minimizes the leakage of intracellular enzymes. Therefore, the coexistence of widespread apoptosis (high TUNEL signal) with limited enzyme release (low ALT/AST) is a recognized feature of chronic MASLD pathogenesis.

      No statistical analysis is provided for Figure 5D, and it is unclear which metabolites show statistically significant changes in Figure 5C.

      We thank the reviewer for raising this statistical problem. We have now included statistical analysis in Revised Figure 5D.

      In addition, there is no evaluation of liver pathology in Clec4f-Cre × Chil1flox/flox mice. It remains possible that the observed effects on KC death result from aggravated liver injury in these animals. There is also no evidence that Chil1 deficiency affects glucose metabolism in KCs in vivo.

      We thank the reviewer for these important points. We previously characterized the liver pathology of Clec4f<sup>ΔChil1</sup> mice in detail (preprint: eLife 2025, DOI: 10.7554/eLife.107023.1, Fig. 2). On a normal chow diet, these mice showed no differences in body weight, hepatic lipid deposition, metabolic parameters, or glucose tolerance compared to controls. However, on an HFHC diet, Clec4f<sup>ΔChil1</sup> mice developed significantly worse metabolic and histological phenotypes. Crucially, our in vitro data demonstrate that recombinant Chi3l1 directly reduces KC death (preprint, Fig. 6E-F), indicating that the aggravated MASLD in knockout mice is a consequence of increased KC loss, not its cause.

      Regarding glucose metabolism, we have previously shown that Chi3l1 deficiency leads to increased glucose uptake by KCs in vivo using the fluorescent glucose analog 2-NBDG. This effect was reversed by supplementing knockout mice with recombinant Chi3l1 (preprint Fig. 6G-H). This provides direct evidence that Chi3l1 modulates glucose uptake in KCs in vivo.

      Finally, the authors should include a more direct experimental approach to modulate glycolysis in KCs and assess its causal role in KC death in MASH.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) in the HFHC-induced MASLD model (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for four weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity during active disease development. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, He et al. set out to investigate the mechanisms behind Kupffer Cell death in MASLD. As has been previously shown, they demonstrate a loss of resident KCs in MASLD in different mouse models. They then go on to show that this correlates with alterations in genes/metabolites associated with glucose metabolism in KCs. To investigate the role of glucose metabolism further, they subject isolated KCs in vitro to different metabolic treatments and assess cleaved caspase 3 staining, demonstrating that KCs show increased Cl. Casp 3 staining upon stimulation of glycolysis. Finally, they use a genetic mouse model (Chil1KO) where they have previously reported that loss of this gene leads to increased glycolysis and validate this finding in BMDMs (KO). They then remove this gene specifically from KCs (Clec4fCre) and show that this leads to increased macrophage death compared with controls.

      Strengths:

      As we do not yet understand why KCs die in MASLD, this manuscript provides some explanation for this finding. The metabolomics is novel and provides insight into KC biology. It could also lead to further investigation; here, it will be important that the full dataset is made available.

      Weaknesses:

      Different diets are known to induce different amounts of KC loss, yet here, all models examined appear to result in 60% KC death. One small field of view of liver tissue is shown as representative to make these claims, but this is not sufficient, as anything can be claimed based on one field of view. Rather, a full tissue slice should be included to allow readers to really assess the level of death.

      Thank you for raising this point regarding data presentation. We analyzed full tissue slices and found that including a view of the entire slice at a standard magnification makes individual KC difficult to resolve (Author response image 2). To clearly represent the extent and distribution of KCs death across the liver tissue slice, we now include lower-magnification images that provide a wider field of view, allowing readers to assess the pattern across a larger tissue area (Revised Figures 1, 2, 6F).

      Author response image 2.

      Assessment of KCs death on full liver tissue slice. (A) Immunofluorescence staining was performed to detect Kupffer cell (KC) death in liver sections from mice fed an MCD diet for 6 weeks. Cell death was assessed by TUNEL staining (green), and KCs were identified by TIM4 staining (red). Nuclei were counterstained with DAPI (blue). Representative whole-tissue view is shown. Scale bars, 1mm.

      Additionally, there is no consistency between the markers used to define KCs and moMFs, with CLEC4F being used in microscopy, TIM4 in flow, while the authors themselves acknowledge that moKCs are CLEC4F+TIM4-. As moKCs are induced in MASLD, this limits interpretation. Additionally, Iba1 is referred to as a moMF marker but is also expressed by KCs, which again prevents an accurate interpretation of the data. Indeed, the authors show 60% of KCs are dying but only 30% of IBA1+ moMFs, as KCs are also IBA1+, this would mean that KCs die much more than moMFs, which would then limit the relevance of the BMDM studies performed if the phenotype is KC specific. Therefore, this needs to be clarified.

      We thank the reviewer for the constructive comments. For consistency, we have standardized our KC marker to TIM4 for all immunostaining data, aligning it with our flow cytometry analysis (Revised Figures 1, 2D, 6F). We have also clarified that IBA1 is expressed by hepatic macrophages (both KCs and MoMFs)(Revised Figure 2C, Revised manuscript, page 5, lines 182-183). Moreover, we also included the clarification that 60% of TIM4<sup>+</sup> KCs are TUNEL<sup>+</sup> versus 30% of total IBA1<sup>+</sup> cells further supports that KCs undergo death more readily than MoMFs (Revised manuscript, page 5, lines 186-189). We also acknowleged the limitation of BMDM studies in the Revised manuscript, page 8, line 332-340.

      The claim that periportal KCs die preferentially is not supported, given that the majority of KCs are peri-portal. Rather, these results would need to be normalised to KC numbers in PP vs PC regions to make meaningful conclusions.

      We thank the reviewer for this important point. We included the normalized data. At 8 weeks, the normalized death rate was significantly higher in periportal versus pericentral regions (p = 0.041), supporting increased periportal KC susceptibility during early MASLD. By 16 weeks, proportional death rates became comparable between zones (Revised Figure 2D, Revised manuscript, page 6, lines 194-201).

      Additionally, KCs are known to be notoriously difficult to keep alive in vitro, and for these studies, the authors only examine cl. Casp 3 staining. To fully understand that data, a full analysis of the viability of the cells and whether they retain the KC phenotype in all conditions is required.

      We appreciate the reviewer’s suggestions. To confirm the identity and health of isolated KCs in our in vitro studies, we showed that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      Finally, in the Cre-driven KO model, there does not seem to be any death of KCs in the controls (rather numbers trend towards an increase with time on diet, Figure 6E), contrary to what had been claimed in the rest of the paper, again making it difficult to interpret the overall results.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Additionally, there is no validation that the increased death observed in vivo in KCs is due to further promotion of glycolysis.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity in KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments support a contributory role for excessive glycolytic activity in promoting KCs death in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #3 (Public review):

      This manuscript provides novel insights into altered glucose metabolism and KC status during early MASLD. The authors propose that hyperactivated glycolysis drives a spatially patterned KC depletion that is more pronounced than the loss of hepatocytes or hepatic stellate cells. This concept significantly enhances our understanding of early MASLD progression and KC metabolic phenotype.

      Through a combination of TUNEL staining and MS-based metabolomic analyses of KCs from HFHC-fed mice, the authors show increased KC apoptosis alongside dysregulation of glycolysis and the pentose phosphate pathway. Using in vitro culture systems and KC-specific ablation of Chil1, a regulator of glycolytic flux, they further show that elevated glycolysis can promote KC apoptosis.

      However, it remains unclear whether the observed metabolic dysregulation directly causes KC death or whether secondary factors, such as low-grade inflammation or macrophage activation, also contribute significantly. Nonetheless, the results, particularly those derived from the Chil1-ablated model, point to a new potential target for the early prevention of KC death during MASLD progression.

      The manuscript is clearly written and thoughtfully addresses key limitations in the field, especially the focus on glycolytic intermediates rather than fatty acid oxidation. The authors acknowledge the missing mechanistic link between increased glycolysis and KC death. Still, several interpretations require moderation to avoid overstatement, and certain experimental details, particularly those concerning flow cytometry and population gating, need further clarification.

      Strengths:

      (1) The study presents the novel observation of profound metabolic dysregulation in KCs during early MASLD and identifies these cells as undergoing apoptosis. The finding that Chil1 ablation aggravates this phenotype opens new avenues for exploring therapeutic strategies to mitigate or reverse MASLD progression.

      (2) The authors provide a comprehensive metabolic profile of KCs following HFHC diet exposure, including quantification of individual metabolites. They further delineate alterations in glycolysis and the pentose phosphate pathway in Chil1-deficient cells, substantiating enhanced glycolytic flux through 13C-glucose tracing experiments.

      (3) The data underscore the critical importance of maintaining balanced glucose metabolism in both in vitro and in vivo contexts to prevent KC apoptosis, emphasizing the high metabolic specialization of these cells.

      (4) The observed increase in KC death in Chil1-deficient KCs demonstrates their dependence on tightly regulated glycolysis, particularly under pathological conditions such as early MASLD.

      Weaknesses:

      (1) The novelty is questionable. The presented work has considerable overlap with a study by the same lab, which is currently under review (citation 17), and it should be considered whether the data should not be presented in one paper.

      We appreciate the reviewer for the opportunity to clarify the relationship between the two studies. In our previous work (citation 17), we focused on the transcriptional metabolic differences between Kupffer cells (KCs) and monocyte-derived macrophages (MoMFs) and identified Chi3l1 as a selective protective factor that limits glucose uptake and shields KCs from metabolic stress–induced cell death, with minimal effects on MoMFs. That study directly motivated the current work. The observation that KCs are uniquely protected from metabolic stress led us to hypothesize that excessive glycolytic activation itself may be a primary driver of KCs death, which forms the central question of the present study. Accordingly, the current manuscript shifts the focus from Chi3l1-mediated protection to the mechanistic role of hyperglycolysis in driving KCs mortality, using distinct experimental approaches and addressing a different biological question. Because the two studies address conceptually distinct aims—one defining a protective regulator of KCs survival and the other dissecting glycolysis-driven KCs death mechanisms—we believe they are best presented as separate manuscripts. Combining them into a single study would dilute the mechanistic depth and clarity of each story.

      (2) The authors report that 60% of KCs are TUNEL-positive after 16 weeks of HFHC diet and confirm this by cleaved caspase-3 staining. Given that such marker positivity typically indicates imminent cell death within hours, it is unexpected that more extensive KC depletion or monocyte infiltration is not observed. Since Timd4 expression on monocyte-derived macrophages takes roughly one month to establish, the authors should consider whether these TUNEL-positive KCs persist in a pre-apoptotic state longer than anticipated. Alternatively, fate-mapping experiments could clarify the dynamics of KC death and replacement.

      We thank the reviewer for this astute observation. As shown in revised Figure 2D, the proportion of TIM4<sup>+</sup>TUNEL<sup>+</sup>KCs peaks at 8 weeks after HFHC feeding and remains elevated at 16 weeks. However, examination of the corresponding single-channel TIM4 staining during this period reveals that the overall density of TIM4<sup>+</sup> KCs does not undergo abrupt or synchronous depletion. This temporal dissociation between sustained TUNEL positivity and relatively gradual KCs loss suggests that TUNEL-positive KCs do not undergo immediate clearance. Based on these observations, we agree with the reviewer that a substantial fraction of TUNEL-positive KCs likely persists in a prolonged pre-apoptotic or stressed state rather than undergoing rapid cell death. This interpretation is consistent with the absence of extensive KCs depletion or compensatory monocyte infiltration at these time points. Importantly, previous studies (Ref. 1,2) indicate that KCs are eventually lost as MASLD progresses, supporting the notion that KC death is a gradual process that unfolds over an extended time frame rather than acutely.

      (3) The mechanistic link between elevated glycolytic flux and KC death remains unclear.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity of KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      (4) The study does not address the polarization or ontogeny of KCs during early MASLD. Given that pro-inflammatory macrophages preferentially utilize glycolysis, such data could provide valuable insight into the reason for increased KC death beyond the presented hyperreliance on glycolysis.

      We thank the reviewer for this insightful comment. Regarding KCS ontogeny, flow cytometry analysis (Revised Figure 1C) shows that KCs remain uniformly TIM4<sup>hi</sup> during early MASLD, indicating that monocyte-derived KCs (TIM4<sup>low</sup>) have not yet emerged at these stages. To address KCs polarization, we assessed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in KCs isolated from WT mice fed a HFHC diet for 0, 8, and 16 weeks. As shown in revised Figure S5A, M1 markers progressively increase over time, whereas M2 markers remain unchanged or slightly decrease. This polarization shift is consistent with the increased glycolytic activity observed in KCs during early MASLD. Together, these data indicate that embryonically derived KCs undergo a pro-inflammatory polarization accompanied by enhanced glycolytic metabolism during early MASLD, providing mechanistic context for their increased susceptibility to metabolic stress–induced cell death beyond hyperreliance on glycolysis alone (Revised manuscript, page 7-8, line 307-321).

      (5) The gating strategy for monocyte-derived macrophages (moMFs) appears suboptimal and may include monocytes. A more rigorous characterization of myeloid populations by including additional markers would strengthen the study's conclusions.

      We thank the reviewer for raising this important point. To improve the rigor of our analysis, we adopted gating strategies established in previous studies (PMID: 41131393; PMID: 32562600). Specifically, Kupffer cells were defined as CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>hi</sup> TIM4<sup>hi</sup> cells, while monocyte-derived macrophages (MoMFs) were defined as CD45<sup>+</sup>Ly6G<sup>-</sup>CD11b<sup>+</sup>F4/80<sup>low</sup> TIM4<sup>low/−</sup> cells, thereby excluding contaminating neutrophils and minimizing inclusion of circulating monocytes. Using this refined gating strategy, we observed a progressive reduction of KCs accompanied by a corresponding increase in MoMFs in WT mice during HFHC feeding (Revised Figures 1C and S2B–C), (Revised manuscript, page 4, line 154-163).

      (6) While BMDMs from Chil1 knockout mice are used to demonstrate enhanced glycolytic flux, it remains unclear whether Chil1 deficiency affects macrophage differentiation itself.

      We thank the reviewer for this important question. To determine whether Chi3l1 deficiency affects macrophage differentiation, we analyzed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in Kupffer cells isolated from WT and Chil1<sup>-/-</sup> mice fed a HFHC diet for 0, 8, and 16 weeks. At baseline (0 weeks), Chi3l1 deficiency was associated with elevated expression of multiple M1 markers, whereas M2 marker expression was comparable between WT and Chil1<sup>-/-</sup> KCs. During MASLD progression, the pro-inflammatory signature in Chil1<sup>-/-</sup> KCs was further enhanced, while anti-inflammatory marker expression became dysregulated (revised Figure S5C). Together, these data indicate that Chi3l1 deficiency does not impair macrophage differentiation per se but biases KCs toward a partially pro-inflammatory, M1-like phenotype, providing additional context for the enhanced glycolytic flux observed in Chi3l1-deficient macrophages (Revised manuscript, page 7-8, line 307-321).

      (7) The authors use the PDK activator PS48 and the ATP synthase inhibitor oligomycin to argue that increased glycolytic flux at the expense of OXPHOS promotes KC death. However, given the high energy demands of KCs and the fact that OXPHOS yields 15-16 times more ATP per glucose molecule than glycolysis, the increased apoptosis observed in Figure 4C-F could primarily reflect energy deprivation rather than a glycolysis-specific mechanism.

      We thank the reviewer for highlighting this important point. We agree that KCs are highly metabolically active and that perturbations of OXPHOS can influence overall cellular energy balance. As noted in our response to comment #3, we further performed glycolysis inhibition assay by 2-DG in vivo, the protection of KCs observed following 2-DG in vivo (Revised Figure 4E-G) further provides evidence that increased glycolytic flux is not merely correlated with, but functionally contributes to KCs loss in

      MASLD.

      (8) In Figure 1C, KC numbers are significantly reduced after 4 and 16 weeks of HFHC diet in WT male mice, yet no comparable reduction is seen in Clec4Cre control mice, which should theoretically exhibit similar behavior under identical conditions.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      To address the concerns raised in the public review, the authors should:

      (1) Reassess their conclusions using the same panels in flow and microscopy, e.g., the combination of CLEC4F, TIM4, and IBA1. This will allow resKCs (CLEC4F+TIM4+IBA1+), moKCs (CLEC4F+TIM4-IBA1+), and moMFs (CLEC4F-TIM4-IBA1+) to be accurately defined and hence their viability and numbers correctly assessed.

      We thank the reviewer for this insightful suggestion. In our flow cytometry analysis, we did not detect a CD45<sup>+</sup>CD11b<sup>low</sup>F4/80<sup>hi</sup>TIM4<sup>low</sup> population, indicating that monocyte-derived KCs (moKCs) have not emerged in our model at this stage. To more accurately quantify resident KCs (resKCs) in the current study, we replaced CLEC4F with TIM4 staining and enumerated TIM4<sup>+</sup>as well as TIM4<sup>+</sup>TUNEL<sup>+</sup> cells. These data were highly consistent with CLEC4F<sup>+</sup>TUNEL<sup>+</sup>cell counts, confirming that moKCs are not involved in KCs death during early MASLD (Revised Figure 1A,B,E,F).

      (2) Investigate why the number of KCs in controls and MASLD are so distinct between Figures 1 and 6.

      We appreciate the reviewer’s suggestions. Like we explained above, Cre insertion promotes KCs self-renewal (Revised manuscript, Figure S8). This enhanced proliferative capacity likely accounts for the relative preservation of KCs numbers in Clec4f-Cre mice during HFHC feeding, explaining the apparent discrepancy with WT mice (Revised manuscript, Figure 6D-E).

      (3) Normalise the tunel+ cells based on the number of KCs in PP vs PC regions.

      After normalizing KCs death to KCs numbers in periportal (PP) versus pericentral (PC) regions, we found the proportion was significantly higher in PV regions compared to CV regions at 8 weeks of HFHC feeding. We have therefore revised our texts. (Revised manuscript, page 5, lines 194-201).

      (4) Demonstrate the viability of KCs in vitro across conditions.

      To confirm the identity and health of isolated KCs in our in vitro studies, we show that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      (5) Confirm previous studies demonstrating different degrees of KC loss depending on the model of MASLD.

      We thank the reviewer for highlighting this point. Consistent with previous studies, KCs loss has been reported to varying degrees depending on the MASLD model used, reflecting the heterogeneity of hepatic macrophages, marker choice, mouse husbandry, and diet regimen. For example, in a 6-week MCD feeding model, ~10% of CLEC4F<sup>+</sup> KCs were TUNEL<sup>+</sup> (Figure 4A, Ref. 9). Another 6-week MCD study reported a drop from 66% to 26% TIM4<sup>+</sup> KCs (Figure 2A, Ref. 12). In an HFD model, TIM4<sup>+</sup> KCs decreased by ~20% after 16 weeks (Figure 1G, Ref. 1). In a Western diet model, TIM4<sup>+</sup>KCs decreased by >50% at 36 weeks (Figures 1J and 2C, Ref. 2). Together, these studies underscore the model-dependent nature of KCs loss and highlight the importance of experimental context and marker selection when assessing KCs dynamics in MASLD. We have included these studies in our discussion section (Revised manuscript, page 9-10, line 393-402)

      (6) Demonstrate in vivo that loss of CHIL1 drives further glycolysis in KCs.

      In Figure 6G-H of our previous study, we showed that Chi3l1 deficiency leads to more glucose uptake by KCs in vivo whereas suppelementing KO mice with recombinant Chi3l1 will significantly reduced glucose uptake by KCs through treating mice with a fluorescent glucose analog 2-NBDG. We included the related figure here as Author response image 3.

      Author response image 3.

      Chi3l1 limits glucose uptake by Kupffer cells in vivo. (A) Measurement of 2-NBDG (a fluorescent glucose analog) uptake by KCs in vivo. WT and Chil1<sup>-/-</sup> mice, either untreated or supplemented with rChi3l1, were injected intraperitoneally with 12 mg/kg 2-NBDG. After 45mins, KCs were isolated and glucose uptake assessed by spectrophotometry. (B) Representative immunofluorescence images of liver sections stained for TIM4 (red) and 2-NBDG uptake (green) to visualize glucose uptake by KCs in situ. Scale bar = 10 µm (zoom). Quantification is shown as the percentage of TIM4<sup>+</sup> cells that are also 2-NBDG<sup>+</sup>. Representative images were shown in B. One-way ANOVA was performed in A, B. P value is as indicated.

      (7) There is no mention of the publication of the metabolomics dataset; this should be released with the manuscript.

      We included the raw metabolomics dataset as Table S1 and S2 now.

      Reviewer #3 (Recommendations for the authors):

      (1) Methods: Reconsider which methods are described in the main text versus the Supplementary Information to improve readability and consistency.

      Thank you for your valuable suggestion. We have reevaluated and adjusted the placement of the methods section between the main text and the supplementary materials.

      (2) Line 34: Check for grammar issues.

      L34 has been revised as follows : Additionally, using Chi3l1-deficient mice, we further demonstrated that increased glucose utilization accelerates KCs death in vivo.

      (3) Lines 101, 110: Explicitly reference the corresponding Supplementary Methods sections.

      We have included the references for these two methods sections (Revised supplementary materials and methods, Line 30, 65, respectively).

      (4) Figure 2: Iba1 marks all macrophages, not only monocyte-derived macrophages; both figure and text (line 205) require correction.

      We have corrected Iba1 represent hepatic macrophages including both KCs and MoMFs (Revised Figure 2C, manuscript page 5, line 182).

      (5) Line 218-219: Avoid overinterpretation, as only KCs, hepatocytes, and hepatic stellate cells were assessed - not all hepatic populations.

      We appreciate the reviewer’s valuable suggestion and rephrased our description accordingly (Revised manuscript, page 5, line 186-189).

      (6) Line 262: Use abbreviations consistently throughout the manuscript.

      We have gone through the whole manuscript and double checked the abbreviations.

      (7) Line 264: Include the palmitic acid (PA) concentration used.

      We included 800 µM PA in the revised manuscript (Revised manuscript, page 6, line 250).”

      (8) Lines 316-317: Check for grammar errors.

      Grammar errors are checked (Revised manuscript, page 8, line 340-341).

      (9) Line 337-338: See comment above on gating strategy.

      We updated gating strategy accordingly (Revised manuscript, page 9, line 361-362).

      (10) Line 343-344: Note that Chi3l1 is not exclusively expressed by KCs.

      We rephrased our words accordingly (Revised manuscript, page 9, line 374-378).

      (11) Lines 355-358: The statement that "sustained glycolytic hyperactivation culminates not in sustained activation, but in apoptotic cell death" is unsupported by data or literature, as macrophage polarization was not analyzed in this study.

      We removed the statement from the revised manuscript.

      (12) Lines 375-379: Rephrase to clarify that while KCs are metabolically active and glucose-demanding, excessive glycolytic flux accelerates apoptosis.

      We have rephrased to clarify (Revised Manuscript, page 10, lines 405-407).

      (13) Lines 375-385 & 387-397: Consolidate overlapping statements for conciseness and coherence.

      We have consolidate the overlapping statements (Revised manuscript, page 10, lines 405-425).

      Reference

      Daemen, S. et al. Dynamic Shifts in the Composition of Resident and Recruited Macrophages Influence Tissue Remodeling in NASH. Cell Rep 34, 108626, doi:10.1016/j.celrep.2020.108626 (2021).

      Remmerie, A. et al. Osteopontin Expression Identifies a Subset of Recruited Macrophages Distinct from Kupffer Cells in the Fatty Liver. Immunity 53, 641-657.e614, doi:10.1016/j.immuni.2020.08.004 (2020).

      Ozer, J., Ratner, M., Shaw, M., Bailey, W. & Schomaker, S. The current state of serum biomarkers of hepatotoxicity. Toxicology 245, 194-205, doi:10.1016/j.tox.2007.11.021 (2008).

      Malhi, H. & Gores, G. J. Molecular mechanisms of lipotoxicity in nonalcoholic fatty liver disease. Semin Liver Dis 28, 360-369, doi:10.1055/s-0028-1091980 (2008).

      Ibrahim, S. H., Hirsova, P. & Gores, G. J. Non-alcoholic steatohepatitis pathogenesis: sublethal hepatocyte injury as a driver of liver inflammation. Gut 67, 963-972, doi:10.1136/gutjnl-2017-315691 (2018).

      Kerr, J. F., Wyllie, A. H. & Currie, A. R. Apoptosis: a basic biological phenomenon with wide-ranging implications in tissue kinetics. British journal of cancer 26, 239-257, doi:10.1038/bjc.1972.33 (1972).

      Poon, I. K., Lucas, C. D., Rossi, A. G. & Ravichandran, K. S. Apoptotic cell clearance: basic biology and therapeutic potential. Nat Rev Immunol 14, 166-180, doi:10.1038/nri3607 (2014).

      Krenkel, O. & Tacke, F. Liver macrophages in tissue homeostasis and disease. Nat Rev Immunol 17, 306-321, doi:10.1038/nri.2017.11 (2017).

      Tran, S. et al. Impaired Kupffer Cell Self-Renewal Alters the Liver Response to Lipid Overload during Non-alcoholic Steatohepatitis. Immunity 53, 627-640.e625, doi:10.1016/j.immuni.2020.06.003 (2020).

      O'Neill, L. A. & Pearce, E. J. Immunometabolism governs dendritic cell and macrophage function. J Exp Med 213, 15-23, doi:10.1084/jem.20151570 (2016).

      Vander Heiden, M. G. & DeBerardinis, R. J. Understanding the Intersections between Metabolism and Cancer Biology. Cell 168, 657-669, doi:10.1016/j.cell.2016.12.039 (2017).

      Zhang J, Wang Y, Fan M, Guan Y, Zhang W, Huang F, Zhang Z, Li X, Yuan B, Liu W, Geng M, Li X, Xu J, Jiang C, Zhao W, Ye F, Zhu W, Meng L, Lu S, Holmdahl R. Reactive oxygen species regulation by NCF1 governs ferroptosis susceptibility of Kupffer cells to MASH. Cell Metab. 2024 Aug 6;36(8):1745-1763.e6. doi: 10.1016/j.cmet.2024.05.008. Epub 2024 Jun 7. PMID: 38851189.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors aimed to identify the molecular target and mechanism by which α-Mangostin, a xanthone from Garcinia mangostana, produces vasorelaxation that could explain the antihypertensive effects. Building on prior reports of vascular relaxation and ion channel modulation, the authors convincingly show that large-conductance potassium BK channels are the primary site of action. Using electrophysiological, pharmacological, and computational evidence, the authors achieved their aims and showed that BK channels are the critical molecular determinant of mangostin's vasodilatory effects, even though the vascular studies are quite preliminary in nature.

      Strengths:

      (1) The broad pharmacological profiling of mangostin across potassium channel families, revealing BK channels - and the vascular BK-alpha/beta1 complex - as the potently activated target in a concentration-dependent manner.

      (2) Detailed gating analyses showing large negative shifts in voltage-dependence of activation and altered activation and deactivation kinetics.

      (3) High-quality single-channel recordings for open probability and dwell times.

      (4) Convincing activation in reconstituted BKα/β1-Ca<sub>v</sub> nanodomains mimicking physiological conditions and functional proof-of-concept validation in mouse aortic rings.

      We thank the reviewer for acknowledging the strength of the different aspects investigated in our study.

      Weaknesses are minor:

      (1) Some mutagenesis data (e.g., partial loss at L312A) could benefit from complementary structural validation.

      In the attempt to improve structural insight for the presented mutagenesis data, we have used Alphafold3 (AF3; Abramson et al., 2024) to generate models of the I308A, L312M and A316P substitutions and repeated the docking for each (Fig. R1). According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift).

      Author response image 1.

      Alphafold3 models of BK I308A, L312M, and A316P with α-Mangostin docked to the mutant structures. The upper row shows an overview of the mutant pore helices (AF3 models) used for molecular docking. The lower row shows the binding region with the wildtype structure overlaid in gray. Only 3 helices are shown for clarity.

      Although these results provide interesting tentative explanations for the effect of the mutations and conclusions from AF3 models become increasingly robust, we think that definitive statements of their mechanistic contributions would require experimental studies of mutant channels, i.e., cryo-EM or crystallography, that are beyond our means. Therefore, we have decided not to include this data in the manuscript; however, it is accessible for the interested reader within the public review. Hopefully, as cryo-EM structures have been obtained for the wildtype channel, there will be studies on mutations of this gating-relevant S6 segment in the future.

      (2) While Cav-BK nanodomains were reconstituted, direct measurement of calcium signals after mangostin application onto native smooth muscle could be valuable.

      We are not sure if a global elevation of cellular calcium concentration would be informative. We rather expect that the relevant local Ca<sup>2+</sup> elevation would occur as sparks in the BK-Ca<sub>v</sub> nanodomains, close to the membrane. We would anticipate a change in spark duration, as the Ca<sup>2+</sup> inward current would be stopped faster by the enhanced repolarization via a-Mangostin activated BKα/β1 channels. This would require fast Ca<sup>2+</sup> imaging acquisition speed to capture spark activity. We concur that this would be an informative experiment to investigate a more native situation. However, we would have to accomplish such methodologically challenging measurements in a separate project, which could fruitfully be combined with a more extensive characterization of aortic contraction as also suggested in the following remark (3).

      (3) The work has an impact on ion channel physiology and pharmacology, providing a mechanistic link between a natural product and vasodilation. Datasets include electrophysiology traces, mutagenesis scans, docking analyses, and aortic tension recordings. The latter, however, are preliminary in nature.

      We completely agree with the reviewer that there is ample room for further studies that could characterize different tissues important in blood pressure regulation (such as resistance arteries), elucidate even more physiological detail (such as modulatory effects of the endothelium), or look deeper into the pharmacology using chemically altered Mangostin derivatives. While we very much like this to happen in future projects, in this study we focused on the functional aspects of a-Mangostin in BK channel gating. We present our tension recordings as a proof-of-concept to underline the activity of a-Mangostin in native tissues, and we clearly show the importance of the BK channel by using iberiotoxin as a specific inhibitor which impressively abolished relaxation.

      References:

      Abramson, J. et al. (2024) “Accurate structure prediction of biomolecular interactions with AlphaFold 3,” Nature, 630(8016), pp. 493–500. Available at: https://doi.org/10.1038/s41586-024-07487-w.

      Reviewer #2 (Public review):

      Summary:

      In the present manuscript, Cordeiro et al. show that α-mangostin, a xanthone obtained from the fruit of the Garcinia mangostana tree, behaves as an agonist of the BK channels. The authors arrive at this conclusion through the effect of mangostin on macroscopic and single-channel currents elicited by BK channels formed by the α subunit and α + β1 sununits, as well as αβ1 channels coexpressed with voltage-dependent Ca2+ (CaV1,2) channels. The single-channel experiments show that α-mangostin produces a robust increase in the probability of opening without affecting the single-channel conductance. The authors contend that α-mangostin activation of the BK channel is state-independent and molecular docking and mutagenesis suggest that α-mangostin binds to a site in the internal cavity. Importantly, α-mangostin (10 μM) alleviates the contracture promoted by noradrenaline. Mangostin is ineffective if the contracted muscles are pretreated with the BK toxin iberiotoxin.

      Strengths:

      The set of results combining electrophysiological measurements, mutagenesis, and molecular docking reveals α-mangostin as a potent activator of BK channels and the putative location of the α-mangostin binding site. Moreover, experiments conducted on aortic preparations from mice suggest that α-mangostin can aid in developing drugs to treat a myriad of diverse diseases involving the BK channel.

      We thank the reviewer for pointing out the significance of our study.

      Weaknesses:

      Major:

      (1) Although the results indicate that α-mangostin is modifying the closed-open equilibrium, the conclusion that this can be due to a stabilization of the voltage sensor in its active configuration may prove to be wrong. It is more probable that, as has been demonstrated for other activators, the α-mangostin is increasing the equilibrium constant that defines the closed-open reaction (L in the Horrigan, Aldrich allosteric gating model for BK). The paper will gain much if the authors determine the probability of opening in a wide range of voltages, to determine how the drug is affecting (or not), the channel voltage dependence, the coupling between the voltage sensor and the pore, and the closed-open equilibrium (L).

      We would like to take the opportunity to clarify this potential misunderstanding. In our manuscript, we have discussed three mechanistic explanations for the Mangostin activation: (1) an electrostatic effect at the selectivity filter, (2) structural and electrostatic changes of S6 that facilitate the opening of a putative lower gate, and (3) hydrophobic gating, i.e., counteracting dewetting of the pore. All possibilities would impact S6 and lower the free energy for pore opening, and we concur that therefore Mangostin most likely affects the closed-open equilibrium (L) of the BKα channel.

      The sentence at the original lines 470-471, “(…) caused by an enhanced shift of the closed-open equilibrium toward the open state, such as the stabilization of the voltage sensor in an active conformation” refers to the observation that the presence of the β1 subunit enhances this closed-open shift. The stabilization of the voltage sensor domain was mentioned as one example of how it achieves this. We recognize that this example was an unfortunate choice, as β1 rather facilitates Ca<sup>2+</sup>-dependent allosteric pore opening unrelated to the discussed mechanisms of Mangostin. We have therefore removed this statement.

      As to the suggestion to dissect the effect of Mangostin on C, D, and L, we agree with the reviewer that this would surely add to a full biophysical characterization. However, in our project, we strove towards including more experiments showing the physiological implications of Mangostin activation to emphasize the implication for vasodilation. We hope the reviewer understands that, with limited resources, this came at the expense of a full investigation of the different gating components, which could pose a separate project by itself.

      (2) Apparently, the molecular docking was performed using the truncated structure of the human BK channel. However, it is unclear which one, since the PDB ID given in the Methods (6vg3), according to what I could find, corresponds to the unliganded, inactive PTK7 kinase domain. Be as it may, the apo and Ca2+ bound structures show that there is a rotation and a displacement of the S6 transmembrane domain. Therefore, the positions of the residues I308, L312, and A316 in the closed and open configurations of the BK channel are not the same. Hence, it is expected that the strength of binding will be different whether the channel is closed or open. This point needs to be discussed.

      We apologize for the typing error and thank the reviewer for indicating this erroneous PDB ID. (“6vg3”). It should have read PDB ID 6v3g as in the legend to Fig. 4B. The reviewer appropriately points out that there are differences in the S6 segment addressed in our study between the two available cryo-EM structures obtained in the presence (PDB ID 6v38) and absence of Ca<sup>2+</sup> (PDB ID 6v3g) (Tao and MacKinnon, 2019).

      We had actually performed the docking with both structures, but chosen to show the Ca<sup>2+</sup>-free structure to better visualize the I308 position. a-Mangostin is found in the same S6 region in both, not obstructing the K<sup>+</sup> conduction pathway. The binding energies of the favored poses are very similar; the binding energy in the best-ranking conformational cluster in the Ca<sup>2+</sup>-bound structure even was slightly lower (-8.64 kcal mol<sup>-1</sup>) than in the docking with the Ca<sup>2+</sup>-free channel (-8.58 kcal mol<sup>-1</sup>; Fig. 4B), which may not be a relevant difference.

      We compared the residue interactions in both dockings (Author response table 1). S317 and Y318, which did not reduce the shift in V<sub>½</sub> upon substitution, were not predicted to contact a-Mangostin in either structure. In both structures, L312 and F315 were predicted to interact in virtually all poses analyzed. In the docking to the Ca<sup>2+</sup>-free state, also I308 was predicted to interact in 17/20 poses, while contacts to A316 occurred in 5/20 poses. In the Ca<sup>2+</sup>-bound state, predicted interactions shifted from I308 (which is expected as it is buried in the protein) to A316, and the isoprenyl moiety close to I308 rotated downwards. This could indicate that a-Mangostin adopts a more horizontal position following the upward reorientation of S6 in the Ca<sup>2+</sup>-bound state when the channel moves from one to the other conformation (Fig. S4).

      Author response table 1.

      Number of interactions of S6 residues in 20 analyzed α-Mangostin poses in the molecular dockings to the Ca2+-free and Ca2

      These docking results are consistent with our functional measurements. Recent structures of the BK/γ1 complex showed that the VSD and Ca<sup>2+</sup>-bowl are stabilized in an active-like conformation that corresponds to the conformation seen in the Ca<sup>2+</sup>-bound state (Kallure et al., 2023; Yamanouchi et al., 2023; Redhardt, Raunser and Raisch, 2024), indicating that very likely the Ca<sup>2+</sup>-bound and Ca<sup>2+</sup>-free structures indeed represent open and closed conformations of the channel. We observed that α-Mangostin can bind to both of these states to activate the channel (Fig. 3C, D), showing the presence of a binding site in both conformations. Further, α-Mangostin induced a left-shift in V<sub>½</sub> also in higher Ca<sup>2+</sup> concentration (Fig. 2D), indicating that it still binds to and activates the channel after the conformational change in S6. As we could not determine affinity for the mutants due to limited solubility, we have no information on the nature of the contribution of the substitutions, i.e., reduced binding or allosteric effect. As I308 is buried in the Ca<sup>2+</sup>-bound state, its contribution is likely mostly allosteric. We have also proposed dewetting as possible activation mechanism, which we expect to be less sensitive to the exact pose of a molecule (as shown for NS11021, Nordquist et al., 2024). Therefore, α-Mangostin could, e.g., change solvent accessibility of the I308 sidechain, energetically favoring the buried (open) state.

      We have now included both dockings and Author response table 1 in Fig. S4, and we have added passages to the results section (starting at line 373) and discussion section (starting at lines 544, 588).

      Minor:

      (1) From Figure 3A, it is apparent that the increase in Po is at the expense of the long periods (seconds) that the channel remains closed. One might suggest that α-mangostin increases the burst periods. It would be beneficial if the authors measured both closed and open dwell times to test whether α-mangostin primarily affects the burst periods.

      We thank the reviewer for this valuable suggestion, which we have implemented. In our single channel measurements shown in our original Fig. 3 we have not observed burst behavior of the BKɑ channels. This can be explained by the fact that we measured in resting condition (100 nM free Ca<sub>i</sub></sup>2+</sup>) and with rather mild depolarisation (+40 mV) where Po was very low. We have therefore analyzed measurements in 5 µM free a<sub>i</sub></sup>2+</sup> where we recorded sufficient burst activity also in the basal state.

      The burst analysis showed that ɑ-Mangostin indeed prolongs bursts and shortens the interburst closures. Within bursts, both closed times and open times were increased, and we recorded a higher number of opening events per burst. We conclude that ɑ-Mangostin acts in both the closed and the open state, where it slows open-closed transitions resulting in less flicker, and stabilizes the open state via longer open times and a higher probability for closed-open transitions.

      We now show this data in Fig. 3D-F and Table S8, and have accordingly added passages to the results section (starting at line 285), the discussion (line 510), and the methods section (starting at line 746).

      (2) In several places, the authors make similarities in the mode of action of other BK activators and α-mangostin; however, the work of Gessner et al. PNAS 2012 indicates that NS1619 and Cym04 interact with the S6/RCK linker, and Webb et al. demonstrated that GoSlo-SR-5-6 agonist activity is abolished when residues in the S4/S5 linker and in the S6C region are mutated. These findings indicate that binding of the agonist is not near the selectivity filter, as the authors' results suggest that α-mangostin binds.

      We will gladly clarify our ideas concerning the binding sites of other activators and ɑ-Mangostin. We first hypothesized that ɑ-Mangostin may share characteristics and mode of action with the class of negatively charged activators (NCA) that we have described before (Schewe et al., 2019). NCA were found to occupy a common fenestration site that is located close to the selectivity filter in TREK K2P channels, and in this manuscript we have shown by THexA competition and mutagenesis experiments that ɑ-Mangostin also binds in this fenestration region in TREK-1 channels (Fig. S3).

      The existence of this common NCA binding site was also proposed for BK channels, as a docking placed the NCA NS11021 in an equivalent binding region, and, among others, NS11021 and GoSlo-SR-5-6 competed with THexA for binding in the pore (Schewe et al., 2019). These results were indeed not fully in agreement with the proposed binding site of GoSlo-SR-5-6 in Webb et al. (2015), although the most effective (double) mutants were located at S317 and I323, at the intracellular end of the cleft between neighboring S6 segments. In this manuscript, we have shown that α-Mangostin is present in the pore of BK channels by molecular docking, a THexA competition assay, and two mutations that reduced the shift in V<sub>½</sub> induced not only by ɑ-Mangostin but also by GoSlo-SR-5-6 (Fig. 4). While the docking was rather a starting point, both functional tests argue against a binding site in the S4/5 linker/S6C region; however, allosteric mechanisms could still reduce activation also in mutants in the S4/5 linker/S6C region far from the pore binding region proposed by us in the 2019 study and the present manuscript.

      To summarize, we did not mean to imply that all BK activators should bind to this site, especially if they are not part of the NCA class (as NS1619, Cym4, as well as BC5, whose different binding site enabled us to use it as a control in our THexA competition assay). However, the cleft close to gating relevant S6 residues may well pose a region especially susceptible to modulator binding (as BL-1249, GoSlo-SR-5-6, and ɑ-Mangostin). We have moved, respectively separated, the initial GoSlo references from the reference to the pore binding site in the paragraph (lines329, 358) to improve clarity.

      (3) The sentence starting in line 452 states that there is a pronounced allosteric coupling between the voltage sensors and Ca2+ binding. If the authors are referring to the coupling factor E in the Horrigan-Aldrich gating model, the references cited, in particular, Sun and Horrigan, concluded that the coupling between those sensors is weak.

      We are grateful for the opportunity to improve this passage. We intended to express that observed effects (in this case the shift in V<sub>½</sub>) are pronounced around 1 µM Ca<sup>2+</sup>. As the reviewer states, the coupling factor between the voltage and calcium sensors (E; 2.4) is weak compared to the coupling of Ca<sup>2+</sup> (C; 8) and voltage (D; 25) to the pore in the Horrigan-Aldrich model. However, the shape of the Ca<sup>2+</sup>-dependence of V<sub>½</sub> cannot be completely described when E is neglected, with the highest difference around 1-2 µM Ca<sup>2+</sup> (Horrigan and Aldrich, 2002). Deletion of the gating ring underlines the allosteric sensor coupling (Clay, 2017). This together with the steep Ca<sup>2+</sup>-dependence in this concentration range (meaning high Po changes upon occupancy increase; Cui, Cox and Aldrich, 1997) explains the higher apparent activation, visible as the higher shift in V<sub>½</sub> observed at the 1 µM Ca<sup>2+</sup>. Speaking with the model of Sun and Horrigan (2022), the suppressing “molecular logic gate” is already relieved by the presence of intermediate Ca<sup>2+</sup>, and the direct “gating lever” pathway via voltage acts synergistically and achieves the observed higher V<sub>½</sub> shift upon depolarization. We have adapted the sentence and separated the citations for better understanding (lines 503-507).

      References:

      Clay, J.R. (2017) “Novel description of the large conductance Ca2+-modulated K+ channel current, BK, during an action potential from suprachiasmatic nucleus neurons,” Physiological Reports, 5(20), p. e13473. Available at: https://doi.org/10.14814/phy2.13473.

      Cui, J., Cox, D.H. and Aldrich, R.W. (1997) “Intrinsic Voltage Dependence and Ca2+ Regulation of mslo Large Conductance Ca-activated K+ Channels,” Journal of General Physiology, 109(5), pp. 647–673. Available at: https://doi.org/10.1085/jgp.109.5.647.

      Horrigan, F.T. and Aldrich, R.W. (2002) “Coupling between voltage sensor activation, Ca2+ binding and channel opening in large conductance (BK) potassium channels,” The Journal of General Physiology, 120(3), pp. 267–305. Available at: https://doi.org/10.1085/jgp.20028605.

      Kallure, G.S. et al. (2023) “High-resolution structures illuminate key principles underlying voltage and LRRC26 regulation of Slo1 channels.” bioRxiv, p. 2023.12.20.572542. Available at: https://doi.org/10.1101/2023.12.20.572542.

      Nordquist, E.B., Jia, Z., Chen, J., 2024. “Small Molecule NS11021 Promotes BK Channel Activation by Increasing Inner Pore Hydration.” J. Chem. Inf. Model. 64, 7616–7625. https://doi.org/10.1021/acs.jcim.4c01012

      Redhardt, M., Raunser, S. and Raisch, T. (2024) “Cryo-EM structure of the Slo1 potassium channel with the auxiliary γ1 subunit suggests a mechanism for depolarization-independent activation,” FEBS Letters, 598(8), pp. 875–888. Available at: https://doi.org/10.1002/1873-3468.14863.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Sun, L. and Horrigan, F.T. (2022) “A gating lever and molecular logic gate that couple voltage and calcium sensor activation to opening in BK potassium channels,” Science Advances, 8(50), p. eabq5772. Available at: https://doi.org/10.1126/sciadv.abq5772.

      Tao, X. and MacKinnon, R. (2019) “Molecular structures of the human Slo1 K+ channel in complex with β4,” eLife 8, p. e51409. Available at: https://doi.org/10.7554/eLife.51409.

      Webb, T.I. et al. (2015) “Molecular mechanisms underlying the effect of the novel BK channel opener GoSlo: Involvement of the S4/S5 linker and the S6 segment,” Proceedings of the National Academy of Sciences, 112(7), pp. 2064–2069. Available at: https://doi.org/10.1073/pnas.1400555112.

      Yamanouchi, D. et al. (2023) “Dual allosteric modulation of voltage and calcium sensitivities of the Slo1-LRRC channel complex,” Molecular Cell, 83(24), pp. 4555-4569.e4. Available at: https://doi.org/10.1016/j.molcel.2023.11.005.

      Reviewer #3 (Public review):

      Summary:

      This research shows that a-mangostin, a proposed nutraceutical, with cardiovascular protective properties, could act through the activation of large conductance potassium permeable channels (BK). The authors provide convincing electrophysiological evidence that the compound binds to BK channels and induces a potent activation, increasing the magnitude of potassium currents. Since these channels are important modulators of the membrane potential of smooth muscle in vascular tissue, this activation leads to muscle relaxation, possibly explaining cardiovascular protective effects.

      Strengths:

      The authors present evidence based on several lines of experiments that a-mangostin is a potent activator of BK channels. The quality of the experiments and the analysis is high and represents an appropriate level of analysis. This research is timely and provides a basis to understand the physiological effects of natural compounds with proposed cardio-protective effects.

      We sincerely thank the reviewer for appraising the achievements of our study.

      Weaknesses:

      The identification of the binding site is not the strongest point of the manuscript. The authors show that the binding site is probably located in the hydrophobic cavity of the pore and show that point mutations reduce the magnitude of the negative voltage shift of activation produced by a-mangostin. However, these experiments do not demonstrate binding to these sites, and could be explained by allosteric effects on gating induced by the mutations themselves.

      We are aware that our functional data are unfortunately not sufficient to clearly distinguish between effects due to affinity loss or due to allosteric mechanisms. Our attempts to generate complete dose–response curves for the mutants to determine accurate apparent IC<sub>50</sub> values were unfortunately limited by the solubility of the compound. Consequently, we have avoided making claims about affinity loss in the mutant analysis, and have instead only reported the reduction in potency, expressed as the shift in V<sub>½</sub>. To reduce confounding effects from the mutations themselves, we selected substitutions that preserved the most wildtype-like GV-relationships, based on the extensive mutagenesis work of (Chen, Yan and Aldrich, 2014). We address this matter also in our answer to Recommendation (6) below, and we have replaced the word “binding” in the title of the manuscript. Nevertheless, we consider the proposed binding region to be well supported by the THexA competition experiments in combination with molecular docking, even though the specific mechanistic contributions of individual residues cannot yet be resolved.

      Reviewer #3 (Recommendations for the authors):

      (1) Natural xanthones as α-Mangostin induce vasorelaxation via binding to key gating residues in the S6 domain of BK channels.

      (2) If α-Mangostin occupies a similar binding site to quaternary ammoniums, what is the explanation for not observing a reduction in the single-channel current (fast blocking effect)? The α-Mangostin site proposed here is in a region of the channel that should occlude ion permeation. The authors should discuss possible explanations for this apparently contradictory observation.

      As the reviewer states, we indeed have not observed a reduced single channel amplitude in any measurement. The THexA competition assay showed that ɑ-Mangostin is present in the pore cavity and interferes with THexA access to its binding site. However, we do not think that their binding sites are similar, as QA ions bind directly below the filter entrance to block permeation, while our studies suggest that ɑ-Mangostin binds in the upper portion of the cleft between S6 helices. In this position, it would clearly overlap with the QA binding site and hinder access, but not block permeation. We would therefore not expect to see an amplitude reduction by intermittent α-Mangostin block. Consistently, all binding poses in our dockings were close to the cavity wall, without interfering with the central ion conduction pathway. To better illustrate this, we have added updated intracellular views of the dockings in the Ca<sup>2+</sup>-free and Ca<sup>2+</sup>-bound state (which we have also now included as suggested by another reviewer) to the supplementary information (Fig. S4A).

      (3) In Figure 2D, it is difficult to appreciate the differences between the symbols representing the G-V relationships of BKa channels at different intracellular Ca concentrations, before and after activation with 10 μM a-Mangostin. A clearer distinction between the curves would help to interpret the data more easily.

      We thank the reviewer for the suggestion to improve figure accessibility. We have changed the line appearance for better discrimination of the overlying portions.

      (4) Both THexA and TPA block BK channels through voltage and state-dependent mechanisms. Therefore, their apparent affinity could change if a-Mangostin simply increases open probability or alters dwell times rather than physically blocking access to the binding site.

      The reviewer addresses valid limitations that can affect the meaningfulness of competition experiments under certain conditions. However, we think that this does not apply to our results:

      Previous studies have shown that the voltage dependence of quaternary ammonium blockers up to C<sub>10</sub> is rather weak in BK channels, and only a slight increase in block is present in the voltage range +30 mV to +100 mV (Li and Aldrich, 2004; Thompson and Begenisich, 2012). Hence, THexA voltage dependence has already reached a plateau in the competition assay (at +40 mV), and its voltage dependence would have little effect on our results.

      Controversy exists about the nature of the state dependence of different quaternary ammonium blockers, but TBA is often recognized as an open channel blocker of BK channels, which probably also applies to THexA (Wilkens and Aldrich, 2006; Tang, Zeng and Lingle, 2009; Thompson and Begenisich, 2012; Posson, McCoy and Nimigean, 2013). Assuming such an open-channel block, apparent IC<sub>50</sub> values would be inversely proportional to Po. The THexA IC<sub>50</sub> was about 80 nM in the basal state, when Po is very low (0.024 at +40 mV as derived from the GV-relationship); an increase of open dwell times, respectively Po, in the presence of α-Mangostin to, e.g., 0.3 would therefore lead to a ≈10-fold decrease in apparent IC<sub>50</sub>. However, the apparent THexA IC<sub>50</sub> strongly increased rather than decreased (more than 20-fold to around 1.6 µM). This cannot arise from Po change and must reflect the altered access of THexA to its binding site caused by α-Mangostin. Assuming a pure closed channel block where apparent IC<sub>50</sub> would correlate with the closed times, an increase of about 1.4-fold is expected. However, we recorded a much stronger 20-fold increase. Therefore, we are convinced that we have conclusively shown that α-Mangostin is present in the BK pore irrespective of the state dependence of THexA block.

      (5) The pH dependence of the V1/2 shift supports the idea that α-Mangostin becomes more negatively charged at higher pH (enhancing its effect.) However, although the data are consistent with this interpretation, additional controls such as using a non-ionizable analog or assessing solubility changes with pH would be needed to confirm that the shift is caused specifically by ionization of α-Mangostin and not by indirect pH effects on channel gating.

      We agree with the reviewer that the pH experiment by itself is not sufficient to clearly tie the existence of a charge to a possible activation mechanism. We still think that this is an interesting observation and should be made known, as we have investigated the mechanism of negatively charged activators in different K<sup>+</sup> channel families before (Schewe et al., 2019). Unfortunately, we do not have access to uncharged derivatives mimicking the 3D conformation. From the commercially available substances, the bare xanthone backbone is completely insoluble in water. We have therefore tested the derivative 3-hydroxyxanthone as example with a minimal number of hydroxyl substituents (Author response image 2, Author response table 2 ). The 3-hydroxyxanthone indeed shows reduced activation compared to α-Mangostin. The shift in V<sub>½</sub> induced by 10 µM 3-hydroxyxanthone was only 14.99 ± 5.67 mV (≈50 mV for α-Mangostin). This supports that the presence of several (potentially) charged substituents is important for the activation mechanism. However, we have no knowledge about the efficacy of the compound or the local pK<sub>a</sub> of the different hydroxyl groups. As the reviewer stated, systematic chemical modifications would be necessary to elucidate the importance of the charged substituent number and positions, which is not within our capabilities.

      Author response image 2.

      Activation of BKα by 3-hydroxyxanthone. (A) GV-relationship before and after application of 10 µM 3-hydroxyxanthone. (B) V<sub>½</sub> before and after application of 10 µM 3-hydroxyxanthone compared to α-Mangostin and the resulting difference in V<sub>½</sub> (ΔV<sub>½</sub>). Measurements were conducted as described in the main manuscript with 100 nM free Ca<sub>i</sub><sup>2+</sup>.

      Author response table 2.

      Comparison of the V<sub>½</sub> ± SEM and ΔV<sub>½</sub> ± SEM before and after activation by 10 µM α-Mangostin or 10 µM 3-hydroxyxanthone in BKα channels. Unpaired t-test, two-tailed P values (α=0.05)

      (6) The reduced V1/2 shifts observed in the I308A, L312M, and A316PP mutants may result from intrinsic gating alterations rather than a true loss of a-Mangostin binding. The GoSlo-SR-5-6 control is informative, but the persistence of activation in A316P does not fully resolve this. A more convincing test would be employing double or triple mutants.

      As stated above, we acknowledge that our functional data do not allow us to definitively separate effects arising from a true loss of binding affinity from those due to potential allosteric effects. We tried to minimize intrinsic gating alteration brought by substitutions by not conducting a pure alanine or cysteine scanning mutagenesis. Instead, substitutions were chosen to be closest to the wildtype GV-relationship in (Chen, Yan and Aldrich, 2014) where possible. While L312M was virtually identical to the wildtype, A316P showed a change in slope in high Ca<sup>2+</sup> concentrations, which could indicate a changed voltage sensitivity. Additionally, A316P completely abolished α-mangostin activation. We therefore also used A316G to ensure that the channel is functional and retains voltage sensitivity, even if its V<sub>½</sub> was shifted stronger. As we have conducted paired measurements and assessed the V<sub>½</sub> before and after activation, we are confident that we can attribute a reduced shift to the reduced action of α-mangostin.

      Following the reviewer’s suggestion, we have generated and measured the double mutants I308A/L312M, I308A/A316G, and L312M/A316G (the triple mutant I308A/L312M/A316G did not produce measurable currents). The mutants I308A/L312M and I308A/A316G showed a moderate energy-additive effect and reduced the shift in V<sub>½</sub> by further ≈7 mV compared to the single mutation with the stronger shift. The combination L312M/A316G, however, did not further reduce the shift seen in the single mutations and did not even produce the shift induced by A316G alone.

      Author response image 3.

      Double Mutants I308A/L312M, I308A/A316G and L312M/A316G compared to the single mutations in the main manuscript. The V½ before and after activation with 10 µM α-Mangostin, the resulting shift in V½, and the GV-relationships are shown (n=6-7), measurements were made as in Fig. 4.

      Author response table 3.

      Summary of the V<sub>½</sub> before and after Mangostin activation and the resulting shifts in V<sub>½</sub> for the double mutants compared to the single mutants shown in the main manuscript.

      Following a suggestion by another reviewer, we have generated Alphafold3 (AF3) models for I308A, L312M and A316P and repeated the Mangostin docking. We learned that the mutations are all predicted to substantially impact the structure of the S6 helix, therefore altering the binding region, and A316P especially impacted the nature of residue interactions. This could be an explanation why the double mutants do not show a clear and consistent additive effect.

      Unfortunately, this outcome is not conclusive and the double mutants do not reveal further information compared to the single mutants. We have therefore decided not to include these measurements in the manuscript.

      As we do not know if our answers will be sent to all reviewers, we repeat the relevant part about the AF3 models here:

      (…) According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift). (…)

      (7) The subtraction approach used to isolate BK currents (difference before and after a-Mangostin) assumes that the compound affects only BK channels. However, a-Mangostin could also modulate Cav currents directly, as reported for other polyphenolic compounds. No vehicle (DMSO) control is shown.

      We agree with the reviewer that α-Mangostin could also modulate Ca<sub>v</sub> currents; however, this would not interfere with the conclusions drawn from this nanodomain experiment. We intended to show the overall current modulation by ɑ-Mangostin in the voltage range relevant for Ca<sub>v</sub>-BK coupling, as this would be the determinant for the membrane potential mediating the vasoactive effect. In native tissue, BK and Ca<sub>v</sub> channels (among others) would likewise contribute to the net membrane conductance, with BK channels being a major contributor when activated. In fact, a concomitant inhibition of Ca<sub>v</sub> channels could act synergistically in favor of vasodilation. This could therefore be a subject for the further investigation of potential ɑ-Mangostin targets. However, the fact that iberiotoxin prevented relaxation in aortic preparations conclusively showed that BK channels are the major player in native tissue.

      We have reformulated some sentences to prevent misunderstandings that we refer to isolated BK currents instead of α-Mangostin activated currents.

      DMSO controls were conducted and did not impact BK or Ca<sub>v</sub>1.2 currents or the aortic tissue contraction. We have added representative measurements as Fig. S6 and stated the DMSO concentration in the Methods section (line 655).

      (8) Most kinetic fits were obtained at strong depolarizations (around +100 mV), which limits how well these results can be extrapolated to physiological voltages. Although the BK-Cav experiments show facilitation between -50 and +50 mV, providing plots for activation and deactivation in that range would strengthen the physiological relevance.

      We thank the reviewer for this valuable suggestion. We now additionally show that the impact of ɑ-Mangostin on activation is high at lower depolarisation, indeed underlining its physiological relevance. To address the activation time course in a more physiological voltage range, we have used our measurements of BKɑ channels in 10 µM Ca<sub>i</sub></sup>2+</sup> (where the V<sub>½</sub> shift induced by ɑ-Mangostin is equal to 100 nM ca<sub>i</sub><sup>2+</sup>+; Fig. 2D). The outward currents already present in the lower voltage range under these conditions allowed us to fit a monoexponential function to the traces of 0 mV to 100 mV prepulses. The τ of activation decreased from 29.6 ± 3.1 ms at 0 mV to 2.4 ± 2 ms at +100 mV. After ɑ-Mangostin activation, the time course was accelerated, with a τ of activation of 9.5 ± 4.7 ms at 0 mV to 2 ± 0.6 ms at +100 mV. This faster activation was particularly effective in the lower voltage range far from high Po, e.g., ɑ-Mangostin caused a decrease of more than half of the τ of activation at +20 mV (from 12.2 ± 0.6 ms to 4.98 ± 1.6 ms).

      Our data consists of families of different prepulse voltages and a fixed repolarisation step (to -50 mV for 100 nM free Ca<sub>i</sub><sup>2+</sup>, and to -100 mV for 10 µM free Ca<sub>i</sub><sup>2+</sup>). Thus, we are not able to add plots for the voltage-dependence of deactivation in the same way as for activation. However, we can present the deactivation time constants of lower prepulse voltage steps that produce outward currents in symmetrical ion conditions with 10 µM free Ca<sub>i</sub></sup>2+</sup>. For -20 mV and +20 mV prepulse voltages, which better reflect physiological depolarisation, the deactivation time constant shows a 3-to 5-fold increase after ɑ-Mangostin activation.

      We now show the plot for the voltage dependence of activation in Fig. S2A and a bar graph for activation/ deactivation time constants at +20 mV as Fig. S2B; data are summarized in Table S5. We hope this adds to illustrating the effect of ɑ-Mangostin under physiological conditions.

      (9) Minor: In several parts of the paper, induced shifts to negative voltages are referred to "leftward shifts". It would be useful to be consistent and employ a more specific reference to negative or positive directions.

      We thank the reviewer for the careful reading and have harmonized the terminology.

      References

      Chen, X., Yan, J. and Aldrich, R.W. (2014) “BK channel opening involves side-chain reorientation of multiple deep-pore residues,” Proceedings of the National Academy of Sciences, 111(1), pp. E79–E88. Available at: https://doi.org/10.1073/pnas.1321697111.

      Li, W. and Aldrich, R.W. (2004) “Unique Inner Pore Properties of BK Channels Revealed by Quaternary Ammonium Block,” Journal of General Physiology, 124(1), pp. 43–57. Available at: https://doi.org/10.1085/jgp.200409067.

      Posson, D.J., McCoy, J.G. and Nimigean, C.M. (2013) “The voltage-dependent gate in MthK potassium channels is located at the selectivity filter,” Nature Structural & Molecular Biology, 20(2), pp. 159–166. Available at: https://doi.org/10.1038/nsmb.2473.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Tang, Q.-Y., Zeng, X.-H. and Lingle, C.J. (2009) “Closed-channel block of BK potassium channels by bbTBA requires partial activation,” The Journal of General Physiology, 134(5), pp. 409–436. Available at: https://doi.org/10.1085/jgp.200910251.

      Thompson, J. and Begenisich, T. (2012) “Selectivity filter gating in large-conductance Ca2+-activated K+ channels,” Journal of General Physiology, 139(3), pp. 235–244. Available at: https://doi.org/10.1085/jgp.201110748.

      Wilkens, C.M. and Aldrich, R.W. (2006) “State-independent block of BK channels by an intracellular quaternary ammonium.,” The Journal of General Physiology, 128(3), pp. 347–364. Available at: https://doi.org/10.1085/jgp.200609579.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their careful reading of our manuscript and thoughtful comments on it. We appreciate the overall positive opinion on our manuscript and helpful comments and suggestions from the reviewers. Overall, the main points identified by reviewers were 1) further broadening of the system to a range of inputs as well as the construct types that can be generated with the system and 2) Further consideration of any off-target joining or off-target effects on genes/proteins and the limits to the expandability of the kit. To address these concerns, we have added new data in Figure 6, illustrating the generation of a new construct using PCR and dsDNA fragments, new constructs for mpeg1.1 and for CRISPR gRNA expression and have revised the text to further address concerns and limitations of the toolkit. We thank the reviewers and editors for these suggestions and feel that they have substantially improved the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors introduce ImPaqT, a modular toolkit for zebrafish transgenesis, utilizing the Golden Gate cloning approach with the rare-cutting enzyme PaqCI. The toolkit is designed to streamline the construction of transgenes with broad applications, particularly for immunological studies. By providing a versatile platform, the study aims to address limitations in generating plasmids for zebrafish transgenesis.

      Strengths:

      The ImPaqT toolkit offers a modular method for constructing transgenes tailored to specific research needs. By employing Golden Gate cloning, the system simplifies the assembly process, allowing seamless integration of multiple genetic elements while maintaining scalability for complex designs. The toolkit's utility is evident from its inclusion of a diverse range of promoters, genetic tools, and fluorescent markers, which cater to both immunological and general zebrafish research needs. Furthermore, the modular design ensures expandability, enabling researchers to customize constructs for diverse experimental designs. The validation provided in the manuscript is solid, demonstrating the successful generation of several functional transgenic lines. These examples highlight the toolkit's efficacy, particularly for immune-focused applications.

      We appreciate the overall positive evaluation of our toolkit and the time and effort in evaluating it.

      Weaknesses:

      While the toolkit's technical capabilities are well-demonstrated, there are several areas where additional validation and examples could enhance its impact. One limitation is the lack of data showing whether the toolkit can be directly used for rapid cloning and testing of enhancers or promoters, particularly cloning them directly from PCR using PaqCI overhangs without needing an entry vector. Similarly, the feasibility of cloning genes directly from PCR products into the system is not demonstrated, which would significantly increase the utility for researchers working with genomic elements.

      This is an excellent point. Given the increased use of gene synthesis and dsDNA fragments, we also thought it was good to demonstrate incorporation of these as well. We have added a new figure, Figure 6, which demonstrates generation of two new transgene constructs constructed by direct cloning of three PCR products along with a synthetic dsDNA fragment into a Tol2 flanked backbone plasmid as an alternative, rapid approach to generation of transgenes. The resulting plasmids, encoding the mpeg1.1. promoter, a separate p2a, and a tdTomato fluorescent protein along with either wildtype or dominant negative rac2 were properly assembled and in transient transgenic zebrafish injected with these constructs, dominant negative rac2 prevented macrophage recruitment to tail wounds, indicating that this approach worked for the generation of functional transgenes. These results are discussed in new text (lines 304-391) describing this new experiment and the finding that both PCR products and synthesized dsDNA could be efficiently incorporated in constructions generated with our approach as well as in the discussion (lines 494-499).

      The authors discuss potential applications such as using the toolkit for tissue-specific knockout applications by assembling CRISPR/Cas9 gRNA constructs. However, they do not demonstrate the cloning of short fragments, such as gRNA sequences downstream of a U6 promoter, which would be an important proof-of-concept to validate these applications. Furthermore, while the manuscript focuses on macrophage-specific promoters, the widely used mpeg1.1 promoter is not included or tested, which limits the toolkit's appeal for researchers studying macrophages and microglia.

      Yes, in the new figure described above, we have now shown that this method works with shorter PCR fragments such as the p2a fragment cloned within the tdTomato-p2a-rac2 constructs described above. This fragment is ~70 bp and while this is somewhat longer than a simple gRNA targeting sequence (though smaller than a complete sgRNA), we believe that this indicates that smaller size fragments can still be incorporated within these constructs. We also agree with the general idea of increasing functionality to incorporate CRISPR/Cas9 and now include a 3E encoding the zebrafish U6 promoter. As CRISPR expression constructs frequently incorporate complex construction, for instance, expression of tagged Cas9 along with the U6 driven gRNA as in Zhou et al., 2018 or along with rescue constructs as in Wang et al., 2021, we have given these constructs the non-standard 5’ end O3c, to enable multiplexing in these complex constructs.

      We agree that it is important to include mpeg1.1, given the broad use of this promoter within the field, we’ve now included an 5E mpeg1.1 construct within the toolkit.

      Another potential limitation is the handling of sequences containing PaqCI recognition sites. Although the authors discuss domestication to remove these sites, a demonstration of cloning strategies for such cases or alternative methods to address these challenges would provide practical guidance for users.

      Absolutely, we have now included a new figure (Supplementary Figure 6) that illustrates one domestication approach using PCR and homology-based cloning as an easy approach to domestication. In addition, we have also mentioned alternative approaches for domestication in the discussion (lines 439-444).

      Reviewer #2 (Public review):

      Summary:

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, an Immunological toolkit for PaqCl-based Golden Gate Assembly of Tol2 Transgenes, to facilitate the production of transgenic zebrafish lines. This Golden Gate assembly-based approach relies on only a short 4-base pair overhang sequence in their final construct, and the insertion construct and backbone vector can be assembled in a single-tube reaction using PaqCl and ligase. This approach can also be expandable by introducing new overhang sequences while maintaining compatibility with existing ImPaqT constructs, allowing users to add fragments as needed.

      Strengths:

      The generation of several lines of transgenic zebrafish for the immunologic study demonstrates the feasibility of the ImPaqT in vivo. The lineage tracing of macrophages by LPS injection shows this approach's functionality, validating its usage in vivo.

      We appreciate the positive sentiments for our toolkit and the effort put into reviewing our manuscript.

      Weaknesses:

      (1) There is no quantitative data analysis showing the percentage of off-target based on these 4bp overhang sequences.

      While we agree that this is an important variable for the method, we feel that previous studies that have broadly tested off-target effects of all potential 4 bp overhang sequences have already given an effective overview of interactions between each of these overhangs (Potapov et al., 2018; Pryor et al., 2020). The results from these studies were incorporated into the NEB ligase fidelity viewer that we used to predict the overhangs that would have minimal off-target with each other: the tool also reports the expected off-target ligation of individual 4 bp overhangs. In all cases, we selected overhangs that would have minimal off-target efficiency, with each of the overhangs showing 1% or less off-target ligation with any of the other overhangs chosen. We have added new text, lines 119-124, that further clarifies that our selection for these ends.

      (2) There is no statement for the upper limitation of the expandability.

      Yes, we’ve been curious as well. While our cloning of 6 distinct fragments in Figure 5 and a new 5 fragment cloning added in revision seen in Figure 6, suggests that 5-6 fragments can be readily assembled, in the course of revisions we also attempted to generate a larger product of 11 fragments that ultimately failed. While the 11 fragment construct was unsuccessful, it is unclear whether this is due to the constructs chosen, the potential size of the plasmid or due to a failure of the technique/enzymes themselves. Given that published descriptions of PaqCI Golden Gate cloning approaches have found that PaqCI can assemble at least 32 fragments and can produce large sequences (e.g. in Sikkema et al., 2023, where they assemble the ~40 kbp T7 genome from 12, 24 and 32 distinct fragments using a PaqCI Golden Gate reaction), we suspect that our issues with the 11 fragment assembly are likely due to complications with the specific group of constructs that were combined, however, we have not been able to exhaustively test a range of constructs and assemblies of varying complexity levels. To recognize this, we have added additional text (lines 490-493) to the discussion describing that we have only combined 6 constructs, but that we think that this likely encompasses many of the applications that may be needed for this system, while recognizing that expansion beyond this number may be possible.

      (3) There is no data about any potential side effect on their endogenous function of promoter/protein of interest with the ImPaqT method.

      Absolutely, we have added new text (lines 457-470) to our discussion describing the potential side effects on protein function. For instance, the need to be aware of whether N- or C-termini of proteins can be modified and recognition of the potential for affecting/creating ectopic transcription factor binding sites as potential pitfalls to keep in mind.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The data presented in the manuscript is robust and well-supported. However, to fully demonstrate the broad applicability of the toolkit and strengthen its impact, a few additional experiments could be beneficial. Specific suggestions for these experiments and areas of improvement are outlined in the 'Weaknesses' section of the Public Review. Additionally, Figures 2-4 illustrate the same concept - cloning three fragments from entry vectors-which comes across as repetitive. Incorporating a more diverse range of use cases would better highlight the versatility of the toolkit.

      As we described in our replies to your public points above, we have now added new Figure 6 and new Supplementary Figure 6 addressing the cloning of PCR fragments, short fragments as well as a mechanism of domestication. We have also included the mpeg1.1 promoter within the toolkit. In addition, your point on the repetition of assay is fair and in our new Figure 6, we instead used wild type and dominant-negative Rac2 expression and failure of macrophage recruitment to the tail wound.

      Reviewer #2 (Recommendations for the authors):

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, it is interesting and potentially efficient, but I have a few concerns:

      (1) The author claimed that the ImPaqT system is more efficient than other existing systems. The authors should provide such data to support their claim.

      Our argument wouldn’t be that the ImPaqT system is strictly speaking more efficient, but rather that the combination of minimal added sequence, the ability to expand or contract the fragments used, and, in our new Figure 6, the ability to directly utilize PCR products and dsDNA fragments, while retaining the ability to combinatorially build constructs from a suite of existing sequences is the main point of the method. We now explicitly state that Golden Gate cloning isn’t more efficient than existing techniques in the text (lines 534-537), but rather the particular strength of the method is the flexibility and minimal added sequence.

      (2) The ImPaqT is theoretically less prone to have off-target effects than existing systems, the authors should provide such data to validate their claim.

      Good point, we have now searched the zebrafish genome for PaqCI sites as well as for BsaI and BsmBI which are the 6-base cutters most commonly used for Golden Gate cloning. We found that PaqCI cuts every ~17 kb in the zebrafish genome while BsaI and BsmBI cut every ~9 kb or ~13 kb respectively, further supporting that PaqCI sites are rarer in the genome and should generally require domestication less often. We have now added new text describing this in lines 129-132.

      (3) The authors should mention any potential side effects of this system on the endogenous function of the promoter/protein of interest, at least in their discussion part.

      Yes, this should absolutely be expanded, as we said in your public comments above, we have now added new text describing potential pitfalls that this method may have on promoter or gene expression.

      (4) The authors are suggested to provide a balanced discussion about the expandable usage of this system beyond the immune system.

      We agree, this is also a good point that we should have emphasized more. We’ve added new text (lines 537-541) recognizing that in principle, many of the components we’ve derived should be useful in non-immune systems, but we also recognize that adapting this to new tissues will require the development of new promoters within the Golden Gate system which can be combined with these already developed tools.

      References

      Potapov, V., Ong, J.L., Kucera, R.B., Langhorst, B.W., Bilotti, K., Pryor, J.M., Cantor, E.J., Canton, B., Knight, T.F., Evans, T.C., Jr., et al. (2018). Comprehensive Profiling of Four Base Overhang Ligation Fidelity by T4 DNA Ligase and Application to DNA Assembly. ACS Synth Biol 7, 2665-2674.

      Pryor, J.M., Potapov, V., Kucera, R.B., Bilotti, K., Cantor, E.J., and Lohman, G.J.S. (2020). Enabling one-pot Golden Gate assemblies of unprecedented complexity using data-optimized assembly design. PLoS One 15, e0238592.

      Sikkema, A.P., Tabatabaei, S.K., Lee, Y.J., Lund, S., and Lohman, G.J.S. (2023). High-Complexity One-Pot Golden Gate Assembly. Curr Protoc 3, e882.

      Wang, Y., Hsu, A.Y., Walton, E.M., Park, S.J., Syahirah, R., Wang, T., Zhou, W., Ding, C., Lemke, A.P., Zhang, G., et al. (2021). A robust and flexible CRISPR/Cas9-based system for neutrophilspecific gene inactivation in zebrafish. J Cell Sci 134.

      Zhou, W., Cao, L., Jeffries, J., Zhu, X., Staiger, C.J., and Deng, Q. (2018). Neutrophil-specific knockout demonstrates a role for mitochondria in regulating neutrophil motility in zebrafish. Dis Model Mech 11.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The main weakness of this paper, in my view, is that it felt disconnected from the larger body of work on fitness and genotype-phenotype landscapes, including previous data on TFBSs in E. coli, genotype-phenotype maps of TFBSs in other systems, protein sequence landscapes (e.g., from mutational scans or combinatorially-complete libraries), and fitness landscapes of genomic mutations (e.g., combinatorially-complete landscapes of antibiotic resistance alleles). I have no doubt the authors are experts in this literature, and they probably cite most of it already given the enormous number of references. But they don't systematically introduce and summarize what was already known from all that work, and how their present study builds on it, in the Abstract and Introduction, which left me wondering for most of the paper why this project was necessary. Eventually, the authors do address most of these points, but not until the end, in the Discussion. Readers who have no familiarity with this literature might read this paper thinking that it's the first paper ever to study topography and evolutionary paths on genotype-phenotype landscapes, which is not true.

      There were two points that made this especially confusing for me. First, in order to choose which nucleotides in the binding sites to vary, the authors invoke existing data on the diversity of these sequences (position-weight matrices from RegulonDB). But since those PWMs can imply a genotype-phenotype map themselves, an obvious question I think the authors needed to have answered right away in the Introduction is why it is insufficient for their question. They only make a brief remark much later in the Results that the PWM data is just observed sequence diversity and doesn't directly reflect the regulation strength of every possible TFBS sequence. But that is too subtle in my opinion, and such a critical motivation for their study that it should be a major point in the Introduction.

      The second point where the lack of motivation in the Introduction created confusion for me was that they report enormous levels of sign epistasis in their data, to the point where these landscapes look like random uncorrelated landscapes. That was really surprising to me since it contrasts with other empirical landscape data I'm familiar with. It was only in the Discussion that I found some significant explanation of this - namely that this could be a difference between prokaryotic TFBSs, as this paper studies, and the eukaryotic TFBSs that have been the focus of many (almost all?) previous work. If that is in fact the case - that almost all previous studies have focused on eukaryotic TFBSs or other kinds of landscapes, and this is the first to do a systematic test of prokaryotic TFBS, then that should be a clear point made in the Abstract and Introduction. (I find a comparable statement only in the very last paragraph of the Discussion.) If that's the case, then I would also find that point to be a much stronger, more specific conclusion of this paper to emphasize than the more general result of observing epistasis and contingency (as is currently emphasized in the Abstract), which has been discussed in tons of other papers. This raises all sorts of exciting questions for future studies - why do the landscapes of prokaryotic TFBSs differ so dramatically from almost all the other landscapes we've observed in biology? What does that mean for the evolutionary dynamics of these different systems?

      We thank the reviewer for this thoughtful and detailed critique. We agree that the original version of the manuscript did not sufficiently motivate the study early on, nor did it clearly position our work within the broader literature on genotype–phenotype (GP) and fitness landscapes. We also agree that two specific issues, the role of PWMs and the unexpectedly high levels of sign epistasis, were insufficiently explained early on, which could lead to confusion for readers not already familiar with this field.

      Positioning within the broader landscape literature

      In response, we have substantially revised the Abstract and Introduction to explicitly situate our work within existing empirical studies of GP and fitness landscapes, including TFBS landscapes in bacteria, eukaryotic TFBS genotype–phenotype maps, in vitro TF–DNA binding studies, deep mutational scans of proteins, and combinatorially complete fitness landscapes such as antibiotic resistance alleles (Abstract; Introduction, lines 64–85). We now make clear that our study builds directly on this extensive body of work, rather than introducing the landscape framework itself. For example, we write in the introduction:

      “Over the last decade, genotype–phenotype (GP) maps and fitness landscapes have become central tools for understanding how molecular systems evolve under mutation and selection[22–25]. Such maps and landscapes have been experimentally studied for DNA[6,8,18,19,26,27], protein[28–32] and RNA[33–35] molecules, revealing key topographical properties that shape evolutionary outcomes, including epistasis[24,36]—the non-additive effects of multiple mutations on phenotype—landscape ruggedness, reflected in the number and distribution of fitness peaks, and constraints on adaptive evolution.”

      At the same time, we clarify what remains rare in the literature: large-scale, in vivo genotype–phenotype landscapes for bacterial transcription factor binding sites that are sufficiently dense to support explicit evolutionary analyses. While numerous high-throughput studies have characterized bacterial regulatory elements, these datasets typically do not provide quantitative regulatory phenotypes across large genotype spaces, nor do they analyze evolutionary accessibility. To our knowledge, only one such in vivo TFBS landscape had previously been characterized at comparable resolution for a bacterial local regulator (TetR). Our work extends this approach to three global regulators, enabling systematic comparisons across prokaryotic systems (Abstract, Introduction, lines 64–85). For example, we write in the introduction:

      “For transcription factor binding sites, most pertinent large-scale studies are based on in vitro binding assays, such as protein-binding microarrays (PBMs), and they focus predominantly on eukaryotic transcription factors[6]. While these studies have been instrumental in characterizing transcription factor binding preferences, they typically do not measure regulatory output in a native cellular context. In contrast, comprehensive in vivo data for bacterial TFBSs remain extremely rare. To our knowledge, only two high-resolutionin vivo landscapes have been previously mapped for bacterial regulators, those of the local regulators TetR[18] and LacI[27]. As a result, it remains unclear whether principles inferred from protein landscapes, eukaryotic TFBSs, or in vitro binding assays generalize to transcriptional regulation in bacteria, particularly for global regulators[11] that integrate multiple physiological signals.”

      Why PWMs are insufficient for our question.

      We agree with the reviewer that our original explanation of the role of PWMs was too cursory and should have been addressed explicitly in the Introduction. We have now revised the Introduction to clearly explain why PWMs derived from RegulonDB cannot substitute for empirical GP landscapes in our study (Introduction, lines 102–113).

      In this passage we now explain that, first, PWMs are inferred from a limited number of naturally occurring binding sites—typically on the order of hundreds of sequences—whose diversity reflects evolutionary history and genomic context rather than systematic exploration of sequence space. As a result, PWMs sample only a small and biased subset of the possible TFBS variants, whereas our libraries probe tens of thousands of sequences in a controlled manner, providing substantially broader and more uniform coverage of genotype space (Introduction, lines 102–113).

      Second, PWM scores are not direct measurements of regulatory strength. Instead, they represent probabilistic or heuristic scores that are primarily used for identifying candidate binding sites in genomes. Numerous studies have shown that PWM scores often correlate weakly with in vivo binding affinity or regulatory output, where DNA shape, cooperative interactions, and chromosomal context play important roles. As such, PWMs do not provide quantitative genotype–phenotype relationships for regulation strength (Introduction, lines 102–113).

      Third, PWMs assume independent and additive contributions of individual nucleotide positions. This assumption excludes epistatic interactions by construction. Because epistasis is central to landscape ruggedness, peak structure, and evolutionary accessibility, PWM-based models are fundamentally unsuited to address the evolutionary questions we study here (Introduction, lines 102–113). We now explicitly state this limitation early in the manuscript, rather than only alluding to it later in the Results.

      Sign epistasis and contrast with prior TFBS landscapes.

      We also agree with the reviewer that the extensive sign epistasis we observe—approaching levels expected for uncorrelated random landscapes—is surprising in light of much of the existing empirical landscape literature. Importantly, as the reviewer notes, most previous TFBS landscape studies have focused on in vitro binding systems or on eukaryotic transcription factors, which tend to exhibit smoother and more additive landscapes.

      To address this concern, we have revised the Abstract and Introduction to explicitly frame this contrast as a central result of the study (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “We showed that the regulatory landscapes of all three TFs are highly rugged and have multiple peaks. The ruggedness of all three landscapes is also supported by the prevalence of epistasis between pairs of TFBS mutations (Supplementary Table S5). A particularly important form of epistasis is sign epistasis[24,93,94], because it can lead to multiple adaptive peaks [24,93,94] (see Supplementary Methods 7.5). Our landscapes contain up to 65% of mutation pairs with sign epistasis, a value that is especially high compared to the almost exclusively additive interactions of mutations in eukaryotic TFs[6,125].”

      We now emphasize that prokaryotic TFBS landscapes, particularly for global regulators, appear to be substantially more rugged and epistatic than most previously characterized TFBS landscapes, and that this difference likely reflects fundamental biological distinctions between regulatory systems.

      Revised emphasis and conclusions.

      Following the reviewer’s suggestion, we have adjusted the emphasis of the manuscript accordingly. Rather than highlighting epistasis and contingency as generic evolutionary phenomena, we now present the extreme ruggedness of prokaryotic TFBS landscapes as a system-specific finding with important implications for the evolution of gene regulation. We explicitly note that this raises new questions for future work—such as why prokaryotic regulatory landscapes differ so markedly from eukaryotic ones, and how these differences shape evolutionary dynamics—which we now highlight in the Introduction and Discussion (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “… A possible reason for this greater incidence of epistasis lies in the nature of prokaryotic TFBSs. Specifically, prokaryotic TFBSs are at approximately 20bps twice as long as eukaryotic TFBSs[80,128] and exhibit symmetries that reflect the dimeric state of their cognate TFs[129–131]. These factors may increase the likelihood of intramolecular epistasis. Our observations raise important questions for future work, such as why the landscapes of prokaryotic TFBSs differ so dramatically from those of eukaryotic ones. And what do these differences imply for the evolutionary dynamics of gene regulation?”

      We believe that these revisions substantially improve the clarity, motivation, and positioning of the manuscript, and directly address the reviewer’s concerns by making both the necessity and the novelty of the study clear from the outset.

      (2) I am a bit concerned about the lack of uncertainties incorporated into the results. The authors acknowledge several key limitations of their approach, including the discreteness of the sort-seq bins in determining possible values of regulation strength, the existence of a large number of unsampled sequences in their genotype space, as well as measurement noise in the fluorescence readouts and sequencing. While the authors acknowledge the existence of these factors, I do not see much attempt to actually incorporate the effect of these uncertainties into their conclusions, which I suspect may be important. For example, given the bin size for the fluorescence in sort-seq, how confident are they that every sequence that appears to be a peak is actually a peak? Is it possible that many of the peak sequences have regulation strengths above all their neighbors but within the uncertainty of the fluorescence, making it possible that it's not really a peak? Perhaps such issues would average out and not change the statistical nature of their results, which are not about claiming that specific sequences are peaks, just how many peaks there are. Nevertheless, I think the lack of this robustness analysis makes the results less convincing than they otherwise would be.

      We thank the reviewer for raising this important concern. We fully agree that uncertainties arising from experimental resolution, measurement noise in fluorescence and sequencing, and incomplete sampling of genotype space should be incorporated explicitly into the analysis. While these limitations were acknowledged qualitatively in the original manuscript, we recognize that a direct, quantitative assessment of their impact on our conclusions is essential to strengthen the robustness of the study.

      We first clarify that regulation strength is not discretized in our analysis. For each TFBS, regulation strength is calculated as a continuous weighted average of fluorescence across all sorting bins, based on the sequencing read-count distribution of each sequence across bins. We clarified this information in the main text (Results, lines 201-203). Nevertheless, finite binning resolution and experimental noise introduce uncertainty in these estimates, which could in principle affect the identification of local peaks.

      Importantly, our study does not aim to assert that specific TFBS sequences are definitively peaks. Rather, our focus is on landscape-level statistical and topological properties—such as ruggedness, the abundance and distribution of peaks, and the evolutionary accessibility of strong regulation. We therefore centered our new analyses on testing whether these conclusions are robust to experimentally plausible sources of uncertainty, rather than on the identity of individual peaks.

      To address the reviewer’s concern, we performed two complementary analyses. The first evaluates whether the observed ruggedness of the landscapes could arise as an artifact of incomplete sampling. It addressed the effects of missing genotypes and the possibility of spurious peak identification due to unsampled neighbors. Sparse sampling can introduce opposing biases: true peaks may be missed, while other genotypes may be falsely classified as peaks because fitter neighbors are absent. As shown for uncorrelated random (House-of-Cards) landscapes (Kauffman & Levin, 1987), these effects can partially cancel.

      In this analysis, we constructed a null model by randomly permuting regulation strengths across the mapped genotype network while preserving its topology. The number of peaks in these randomized landscapes is only modestly higher than in the empirical data, indicating that the measured landscapes are close to the maximal ruggedness compatible with the sampled network (Results, lines 308–320).

      In addition, we quantified potential sampling bias by analyzing genotype connectivity. Here we defined the relative connectivity of a genotype as the fraction of possible single-mutant neighbors for which we had measured regulation strength. We observed only a very weak correlation between connectivity and regulation strength (R=-0.1, -0.1, 0.01 for the CRP, Fis, and IHF landscapes, Figures S13-S15). Similarly, the relative connectivity of peak genotypes is only weakly correlated with their regulation strength (R=-0.05, -0.04, 0.06 for the CRP, Fis, and IHF landscapes). (Results, lines 321–330), indicating that strongly regulating genotypes are not preferentially oversampled or undersampled (Results, lines 321–330).

      The second, and most important, analysis directly addresses the reviewer’s concern that experimental uncertainty could affect peak classification and, consequently, landscape navigability. We explicitly incorporated experimentally measured, genotype-specific noise estimates from biological replicates when comparing fitness values between neighboring genotypes. Using these uncertainty-aware comparisons, we then recomputed adaptive-walk dynamics and genotype visitation frequencies on the resulting noisy landscapes.

      We observe strong correlations between visitation frequencies in the noise-free and noisy landscapes across all three transcription factors (new Supplementary Figure S35), indicating that evolutionary accessibility patterns are robust to realistic levels of experimental uncertainty. These analyses are described in the revised Results (lines 622–636) and in a new Supplementary Methods section (“Incorporation of experimental uncertainty into adaptive walks”).

      Reviewer #2 (Public review):

      The authors aim to investigate the ability of evolution to create strong transcription factor binding sites (TFBSs) de novo in E. coli. They focus on three global transcriptional regulators: CRP, Fis, and IHF, using a massively parallel reporter assay to evaluate the regulatory effects of over 30,000 TFBS variants. By analyzing the resulting genotype-phenotype landscapes, they explore the ruggedness, accessibility, and evolutionary dynamics of regulatory landscapes, providing insights into the evolutionary feasibility of strong gene regulation. Their experiments show that de novo adaptive evolution of new gene regulation is feasible. It is also subject to a blend of chance, historical contingency, and evolutionary biases that favor some peaks and evolutionary paths.

      (1) Strengths of the methods and results:

      The authors successfully employed a well-designed sort-seq assay combined with high-throughput sequencing to map regulatory landscapes. The experimental design ensures reliable measurement of regulation strengths. Their system accounts for gene expression noise and normalizes measurements using appropriate controls.

      Comprehensive Landscape Mapping:

      The study examines ~30,000 TFBS variants per transcription factor, providing statistically robust and thorough maps of the regulatory landscapes for CRP, Fis, and IHF. The landscapes are rigorously analyzed for ruggedness (e.g., number of peaks) and epistasis, revealing parallels with theoretical uncorrelated random landscapes.

      Evolutionary Dynamics Simulations:

      Through simulations of adaptive walks under varying population dynamics, the authors demonstrate that high peaks in regulatory landscapes are accessible despite ruggedness. They identify key evolutionary phenomena, such as contingency (multiple paths to peaks) and biases toward specific evolutionary outcomes.

      Biological Relevance and Novelty:

      The author's work is novel in focusing on global regulators, which differ from previously studied local regulators (e.g., TetR). They provide compelling evidence that rugged landscapes are navigable, facilitating de novo evolution of regulatory interactions. The comparison of landscapes for CRP, Fis, and IHF underscores shared topographical features, suggesting general principles of global transcriptional regulation in bacteria.

      (2) Weaknesses of the methods and results:

      Undersampling of Genotype Space:

      While the quality filtering of the data ensures robustness, ~40% of the TFBS space remains uncharacterized. The authors acknowledge this limitation but could improve the analysis by employing subsampling or predictive modeling.

      We thank the reviewer for raising this point. We agree that undersampling of genotype space is an important limitation of our dataset and that, in principle, subsampling or predictive modeling approaches could be used to address missing genotypes. We have now clarified in the manuscript why these approaches are not straightforward in the context of our analyses and why we did not pursue them here.

      Although approximately 40% of TFBS genotypes were removed during the filtering step due to lack of reliable measurements, this filtering step was necessary to ensure robust estimation of regulation strength from sort-seq data. Importantly, random subsampling of the genotypes in our data set would not alleviate this limitation, because many of our key analyses—such as peak identification, quantification of epistasis, and assessment of evolutionary accessibility—require combinatorially complete local neighborhoods in genotype space. Subsampling would remove mutational neighbors from many neighborhoods, and thus further limit our ability to characterize landscape topology.

      Predictive modeling approaches could, in principle, be used to infer missing genotypes and reconstruct more complete landscapes. However, developing, experimentally validating, and benchmarking such models would not only substantially expand the scope of an already long paper, it would  also require additional assumptions about genotype–phenotype relationships that entail their own limitations. Our primary goal in this work was to provide the first large-scale empirical in vivo regulatory landscapes for global bacterial transcription factors, comprising tens of thousands of experimentally measured variants. We view these empirical landscapes as a necessary foundation upon which predictive modeling and landscape completion can be built in future, complementary studies.

      We have now revised the Discussion (lines 760-770) to explicitly articulate these points and to clarify that, while undersampling remains a limitation, it does not invalidate the landscape-level conclusions we draw from the combinatorially complete neighborhoods present in our data. There we also outline predictive modeling as an important directions for future work.

      For a more detailed answer regarding subsampling and peak classification, please also see our response to comment (2) of Reviewer #1.

      Simplified Regulatory Architecture:

      The study considers a minimal system of a single TFBS upstream of a reporter gene. While this may have been necessary for clarity, this simplification may not reflect the combinatorial complexity of transcriptional regulation in vivo.

      Point well taken. We have added paragraph to state explicitly that the system we use to study gene regulation is much simpler than most in vivo regulatory circuits (Discussion, lines 797-802)

      Lack of Experimental Validation of Simulations:

      The adaptive walks are based on simulated dynamics rather than experimental evolution. Incorporating in vivo experimental evolution studies would strengthen the conclusions. Although this is a large request for the paper, that would not prevent publication.

      We thank the reviewer for this important point. We fully agree that in vivo experimental evolution would provide a valuable and complementary way to validate the evolutionary dynamics inferred from our simulations. However, we ask for the reviewer's understanding that adding experimental evolution to an (already long) paper would go far beyond the scope of our study.

      Also, the goal of our study was not to reproduce evolutionary trajectories experimentally, but to characterize the structure of large empirical regulatory landscapes, and to use these landscapes as a data-driven basis for exploring evolutionary accessibility under well-defined population-genetic assumptions. The adaptive walks we employ are parameterized directly from experimentally measured genotype–phenotype maps, and incorporate established fixation probabilities. Such walks have been widely used to study evolutionary dynamics on empirical landscapes when experimental evolution is not tractable, because it would involve tens of thousands of genotypes that represent small mutational targets and would thus take a long time to evolve.

      An additional issue related to the feasibility of experimental evolution is that performing in vivo experimental evolution for the regulatory landscapes analyzed here would require tracking large populations across a combinatorially vast TFBS space, while simultaneously measuring regulatory phenotypes for thousands of evolving lineages, which is currently not experimentally feasible. This is another reason why simulation-based approaches have been the standard method for linking large-scale empirical landscapes to evolutionary dynamics in both theoretical and experimental studies.

      Furthermore, our conclusions are intentionally framed at the level of statistical and landscape-wide properties (e.g., accessibility of high peaks, contingency, and evolutionary bias), rather than at the level of specific mutational trajectories. As such, they do not rely on the precise reproduction of any single evolutionary path, but on aggregate patterns that are robust to reasonable variation in population-genetic parameters.

      In sum, we do not view experimental evolution as essential for the conclusions we draw, but as an important and exciting direction for future work that may be enabled by the landscapes we have experimentally mapped.

      Impact on the Field:

      This study advances our understanding of adaptive landscapes in gene regulation and offers a critical step toward deciphering how global regulators evolve de novo binding sites. The findings provide foundational insights for synthetic biology, evolutionary genetics, and systems biology by highlighting the evolutionary accessibility of strong regulation in bacteria.

      Utility of Methods and Dat

      The sort-seq approach, combined with landscape analysis, provides a robust framework that can be extended to other transcription factors and systems. If made publicly available, the study's data and code would be valuable for researchers modeling transcriptional regulation or studying evolutionary dynamics.

      Additional Context:

      The study builds on a growing body of work exploring regulatory evolution. For instance, recent studies on local regulators like TetR and AraC have revealed high ruggedness and epistasis in TFBS landscapes. This study distinguishes itself by focusing on global regulators, which are more biologically complex and influential in bacterial gene networks. The observed evolutionary contingency aligns with findings in other biological systems, such as protein evolution and RNA folding landscapes, underscoring the generality of these evolutionary principles.

      Conclusion:

      The authors successfully mapped the genotype-phenotype landscapes for three global regulators and simulated evolutionary dynamics to assess the feasibility of strong TFBS evolution. They convincingly demonstrate that ruggedness and epistasis, while prominent, do not preclude the evolution of strong regulation. Their results support the notion that gene regulation evolves through a blend of chance, contingency, and evolutionary biases.

      This paper makes a significant contribution to the understanding of regulatory evolution in bacteria. While minor limitations exist, the authors' methods are robust, and their findings are well-supported. The work will likely be of broad interest to researchers in molecular evolution, synthetic biology, and gene regulation.

      We thank the reviewer for their thorough evaluation and for their supportive opinion of this paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28 (Abstract): "Landscape ruggedness does not prevent the evolution of strong regulation, because more than 10% of evolving populations can attain one of the highest peaks." I did not find this interpretation very convincing; only 10% of populations being able to achieve strong regulation sounds to me like ruggedness DOES impede adaptation in the vast majority of cases.

      We thank the reviewer for this thoughtful comment and agree that our original phrasing in the Abstract overstated this conclusion. We did not intend to imply that landscape ruggedness has only a minor effect on adaptation. On the contrary, our results clearly show that ruggedness strongly constrains evolutionary outcomes and prevents the majority of evolving populations from reaching the globally highest regulatory peaks. We have therefore toned down the wording in both the Abstract and the Discussion (lines 670-679) to reflect this more accurately. For example, in the abstract we now state

      “Nonetheless, evolutionary simulations show that ~10% of evolving populations can reach a peak of strong regulation, a proportion that is significantly greater than in comparable random landscapes.”

      In the discussion we state:

      “… Specifically, our evolutionary simulations show that 10% of populations with a size typical of E. coli reach one of the highest peaks. This percentage is significantly higher than in randomized landscapes (Supplementary Methods 9; Supplementary Figure S30)"

      Our intended interpretation was more limited: namely, that ruggedness does not fully preclude the evolution of strong regulation. In highly rugged landscapes with extensive sign epistasis—whose topological properties approach those of uncorrelated random landscapes—the a priori expectation is that access to the strongest peaks could be vanishingly rare or effectively impossible under Darwinian evolution. In this context, observing that a non-negligible fraction of populations (on the order of 10%) can reach one of the highest peaks suggests that strong regulation remains evolutionarily attainable, even though it is far from guaranteed.

      Motivated by the reviewer’s suggestion, we also added a null-model analysis that makes this point more explicitly and quantitatively. Specifically, we constructed randomized landscapes by permuting regulation-strength values across genotypes while preserving the experimentally sampled genotype network topology and all parameters of the evolutionary simulations (Supplementary Methods 9, “Randomized landscape null model for peak accessibility”). We then repeated the adaptive-walk simulations on these shuffled landscapes. This null model provides an expectation for peak accessibility in landscapes with identical sampling, neighborhood structure, and evolutionary dynamics, but without genotype–phenotype correlations.

      Using this null model, we find that the fraction of populations that reach high peaks in the empirical landscapes is substantially higher than expected by chance alone (new Supplementary Figure S30; Results, lines 504–516). Specifically, across the three transcription factors, empirical landscapes exhibit on average a ~3-fold higher accessibility of high regulatory peaks than shuffled landscapes. This comparison does not weaken the conclusion that ruggedness strongly impedes adaptation; rather, it shows that the structure of the measured genotype–phenotype landscapes enables greater accessibility of strong regulation than would be expected in equally rugged but unstructured landscapes.

      In response to the reviewer’s concern, we have revised the abstract and main text to avoid the phrase “does not prevent” and to more accurately convey this balance between constraint and accessibility. We now emphasize that ruggedness strongly constrains adaptation, while still allowing access to strong regulatory peaks at rates that exceed null expectations. (Discussion, lines 512-516). For example, in the discussion we state:

      “… In sum, rugged regulatory landscapes strongly constrain evolutionary trajectories, yet do not render the evolution of strong regulation vanishingly rare. Instead, strong regulatory phenotypes remain evolutionarily attainable at levels that exceed null expectations, even though they are reached by only a minority of evolving populations.”

      We believe that the revised wording, together with the added null-model analysis more faithfully represents our results and strengthens the quantitative interpretation of accessibility in these landscapes.

      (2) Line 123: I found the explanation of the plasmid system and the accompanying SI figures (Figures S1 and S2) confusing in terms of how many plasmids there were. In particular, the Figure S1 graphics show the plasmid specifically with CRP but the text in the graphic and in the caption refers to the plasmid pCAW-Sort-Seq-V2 (which, according to Table S1, isn't that just the base plasmid without any TF?). Figure S2 also shows the plasmid with CRP and does specify pCAW-Sort-Seq-V2-CRP-CRP0 in the graphic, but then the caption refers again only to the base plasmid pCAW-Sort-Seq-V2. I recommend the authors clarify these items for readers who might want to reproduce or build upon their system. In particular, I recommend the main text explain more explicitly that they generate three versions of this plasmid (one for each TF), and then on the backgrounds of each of those three plasmids, a whole library with all the binding site variants.

      We thank the reviewer for pointing out this lack of clarity. We agree that the original description of the plasmid system and the accompanying Supplementary Figures S1 and S2 could be confusing with respect to how many plasmids were used and how they differ.

      To clarify the experimental design, we start from a common backbone plasmid, pCAW-Sort-Seq-V2, which contains all shared regulatory and reporter elements but does not encode any transcription factor. From this backbone, we generated three distinct TF-specific plasmids, each carrying one of the transcription factors studied here—CRP, Fis, or IHF—resulting in pCAW-Sort-Seq-V2-CRP, pCAW-Sort-Seq-V2-Fis, and pCAW-Sort-Seq-V2-IHF. On the background of each TF-specific plasmid, we then constructed a complete library of plasmids containing all variants of the corresponding TF binding site cloned upstream of the reporter gene.

      We have revised the main text to explicitly describe this plasmid hierarchy and library construction strategy and to clarify that three TF-specific plasmids were generated prior to TFBS library construction (Results, Landscape mapping section; lines 159–193). In addition, we have redesigned Supplementary Figures S1 and S2 to facilitate understanding of the plasmid system. Specifically, these figures now clearly distinguish between the base plasmid backbone and the TF-specific plasmid derivatives. Also, the plasmid names shown in the graphics and captions are now consistent with those listed in Supplementary Table S1. Upon final publication, we will also deposit the sequences of all plasmids in Addgene to further facilitate reproducibility.

      (3) Line 135: Can the authors clarify whether these TFs are essential in these media conditions and, if not, why? I was expecting them to be so given the core functions of these TFs as described in the Introduction, but then Figure S3 appears to show that all knockouts are viable.

      We thank the reviewer for raising this important point and apologize for the lack of clarity in the original version of the manuscript. The transcription factors CRP, Fis, and IHF are not essential for viability under the growth conditions used in this study, but they are important for optimal growth and cellular fitness, consistent with their roles as global regulators.

      Under our experimental conditions, single-gene knockout strains (Δcrp, Δfis, and Δihf) are viable but exhibit slower growth dynamics compared to the wild-type strain, reflecting impaired regulation of core cellular processes (Supplementary Figure S3). This behavior is consistent with previous work showing that many global transcriptional regulators in E. coli are conditionally essential or strongly fitness-affecting, rather than absolutely essential under standard laboratory conditions.

      Importantly, while single knockouts remain viable, double mutants involving these global regulators are not viable, indicating substantial functional redundancy and network-level essentiality among global transcription factors. This explains why each TF can be studied individually in isolation, while combinations of deletions cannot be maintained.

      We have now clarified this point in the Results section by explicitly stating that the knockout strains show reduced growth rates but reach comparable cell densities during late exponential or early stationary phase, the growth phase at which all measurements were performed (Results, Landscape mapping section; lines 185–193). This clarification reconciles the apparent discrepancy between the biological importance of these transcription factors discussed in the Introduction and the viability of the single-knockout strains shown in Supplementary Figure S3.

      (4) Lines 141 and 227: The authors appear to refer to two different citations for different versions of RegulonDB (refs. 47 and 66). Did they actually use both versions for different purposes (if so, why?), or is this a typo?

      We thank the reviewer for noticing this inconsistency. We did not use two different versions of RegulonDB. The two separate references were an error. We have now corrected this by using a single, consistent RegulonDB citation in both locations.

      (5) Line 166 (Figure 1 caption): I think 2^8 here should be 4^8.

      Thank you. We have corrected “2<sup>8</sup>” to “4<sup>8</sup>” in the Figure 1 caption.

      (6) Figure 2Are the distributions in Figure 2a (regulation strengths across all TFBSs in the libraries) equivalent to the distributions in Figures S4-S6 (direct fluorescence readout from cell sorting), just transformed from fluorescence to regulation strength? If so I think that would be helpful to clarify, perhaps in the captions to Figures S4-S6 so that it's clear these contain the same information.

      No. Figures S4–S6 and Figure 2a do not show the same distributions. Figures S4–S6 display the raw fluorescence distributions obtained from cell sorting, whereas Figure 2a shows regulation strengths (S), which are derived quantities computed from these fluorescence data. Specifically, regulation strength is calculated as a weighted average over fluorescence bins using the sequencing read distribution for each TFBS (see Methods, “Regulation strengths”).

      To clarify this relationship, we have revised the main text (lines 201-203 and Figure 1b-c), to explicitly state how regulation strengths (S) were calculated.

      (7) Figure 2b: Can the authors label each logo/frequency matrix with its corresponding TF name in the graphic itself? I think this is only implied in the caption.

      We have updated Figure 2b to label each sequence logo / frequency matrix directly in the graphic with its corresponding transcription factor name (CRP, Fis, or IHF), in addition to mentioning these names in the caption. This change clarifies the figure and makes the TF identity immediately apparent to the reader.

      (8) Lines 290 and 298 (Figure 2 caption): The labels for panels b and c appear to be swapped in the caption.

      We thank the reviewer for pointing this out. The labels for panels b and c in the Figure 2 caption were indeed swapped. This has now been corrected.

      (9) Line 379: There is a missing period at the end of this line.

      We have added the missing period at the end of this line.

      (10) Line 400 (Figure 3 caption): There is a missing subtitle for panel c in the caption for this figure (all other panels seem to have bolded subtitles in their captions).

      We have added the missing subtitle for panel c in the Figure 3 caption to match the formatting of the other panels.

      (11) Line 583: There is a missing period after "Methods 7.5)".

      We have added the missing period after “Methods 7.5)”.

      (12) Line 641: "All three landscapes highly rugged" should probably be "All three landscapes are highly rugged".

      We have corrected the sentence to read “All three landscapes are highly rugged.”

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We agree with the reviewer that a limitation of our study is its focus on cell-based assays rather than in vivo experiments. We did consider evaluating the effects of statins on B cell responses in vivo; however, this approach is complicated by findings that statins can influence antigen presentation by dendritic cells, thereby impacting antibody responses (Xia et al, 2018). We have revised the discussion section to acknowledge this points.

      The reviewer also noted that our study assessed the roles of HMGCR, SQLE, and prenylation in B cell activation using pharmacological inhibitors and genetic knockdown/out approaches. Loss-of-function techniques such as RNAi, siRNA, and CRISPR can be challenging to apply to primary B cells, but we are exploring their feasibility for future revisions. While we acknowledge the limitations of using pharmacological inhibitors, we have taken several steps to mitigate these, including targeting multiple steps in the cholesterol biosynthetic pathway using structurally distinct inhibitors and conducting rescue experiments by supplementing downstream metabolites. To strengthen the results on prenylation further, we have added data using two further distinct prenylation inhibitors (revised Figure 6). To further investigate potential off-target effects of statins, we performed proteomic analysis of B cells treated with and without fluvastatin. The data suggest that fluvastatin primarily affects cholesterol metabolism and does not cause widespread off-target effects (new Supplementary Figure 9).

      Reviewer #1 (Recommendations for the authors):

      What signalling mechanisms link LPS sensing to proteomic and metabolic changes? Do these changes depend on specific signalling modules downstream of TLR4 (e.g., MyD88, TRIF, NF-kappaB, MAPKs)? Other receptors found to produce similar effects (TLR7, TLR9, CD40) may share these modules. This information could strengthen the conclusion by showing the chain of molecular events through which immune stimuli reprogram B cell metabolism.

      Signalling through most TLRs, including TLR4, TLR7 and TLR9, requires the adaptor protein MyD88. To determine if MyD88 is required for LPS-induced signalling, we carried out immunoblotting to compare signalling in B cells between WT mice and MyD88-deficient mice. We found that phosphorylation of key downstream proteins, including p38 and ERK1/2 (MAPK signalling), Akt, p70S6K and S6 (mTOR signalling) was diminished in MyD88-deficient mice (Figure S11). These results have been added to the manuscript as Supplementary Figure 11.

      We assessed the requirement of these signalling pathways for LPS-induced proliferation by treating B cells with rapamycin to block mTORC1, PD184352 for MEK1/MEK2 (the upstream activators of ERK1/2), VX745 for p38 or a combination of PD184352 and VX745. These results have been added to the manuscript as the new Figure 9. Rapamycin demonstrated the strongest inhibitory effect on proliferation, and combinatorial blocking of MAPK signalling mildly reduced proliferation (Figure 9A-B). In terms of cholesterol metabolism, treatment with all of these inhibitors reduced cholesterol levels; however, treatment with PD184352 and VX745 reduced cholesterol to the same level as naïve B cells (Figure 9F).

      Other activating stimuli appear to have similar effects, we showed originally that TLR7 and TLR9 activation had a similar effect on proliferation and cholesterol to TLR4, as did activation of CD40 and the BCR (Figure 10). We have now expanded this and shown that these other receptors can also promote protein synthesis (new Supplementary Figure 4).

      There seem to be errors in the manuscript text.

      (1) Page 6, line 232: ssRNAseq?

      We that the reviewer for spotting these issues. This has been amended to scRNAseq.

      (2) Page 13, line 490: SC7A5?

      This has been amended to SLC7A5

      (3) The abbreviation CF (cholesterol-free?) is not defined when it first appears.

      This has been amended to cholesterol-free (CF) on page 9, line 411.

      Reviewer #2 (Public review):

      The reviewer suggested that the study would be strengthened by determining whether the observed changes are specific to LPS + IL-4 stimulation or represent a more general B cell response to mitogenic signals. We believe that these effects are not specific to LPS and also occur with other mitogenic stimuli. We have expanded on the data in the original draft showing that other TLR agonists as well as CD40 and BCR stimulation increase both B cell proliferation and cholesterol levels and also looked at the effects of these stimuli on protein synthesis.

      Reviewer #2 (Recommendations for the authors):

      (1) One of the most highly enriched processes is 'response to interferon alpha'. This stands out as most of the other processes identified involve more general cellular processes (i.e., cell proliferation, cell metabolism, etc...). Minimally, interferon alpha should be discussed. It would also be interesting to test whether type I interferons regulate any of the metabolic changes identified.

      Response to interferon alpha has the highest fold enrichment of 6.78. To look at this further compiled a list of proteins upregulated by IFN-α stimulation in murine B cells, derived from (Mostafavi et al, 2016) and compared these with our proteome. We found that most of the IFNα regulated genes were not significantly upregulated following LPS + IL-4 stimulation compared to naïve B cells (Figure S3A). We also measured phosphorylation of the transcription factor STAT1, which is induced by IFNα and IFNβ signalling, and found that LPS stimulation did not induce p-STAT1 (Figure S3B-C). These results have been added to the manuscript as Supplementary Figure 3. Despite this, as discussed further in the manuscript we cannot rule out a weak interferon response in the proteomics.

      (2) The proteome of BCR-stimulated B cells has been analyzed by mass spectrometry. This dataset should be compared with the LPS + IL-4 dataset of the current study. This may reveal whether these two stimulations have similar or different effects on B-cell function. In particular, it is interesting to know whether BCR stimulation induces SLC7A5 expression and whether proteins involved in cholesterol metabolism are altered by BCR stimulation.

      A similar study using anti-IgM and anti-CD40 to activate murine B cells has found an upregulation of amino acid transporters, including SLC7A5, in their proteomic data, suggesting that this is not a stimulus-specific effect. This has been added to the text subsection “Protein synthesis in LPS + IL-4 stimulated B cells is dependent on the uptake of amino acids.” In line with this we have also shown that stimulation of the BCR upregulates protein synthesis (new Supplementary Figure 4). We have added data on HMGCR, SQLE and LDLR form the BCR proteomics experiments to the new Supplementary Figure 13. As the BCR proteome published as a preprint (James et al 2024) is about to be resubmitted as a distinct paper that does not deal with cholesterol metabolism, we have not expanded on this dataset further.

      (3) A role for mTORC1 has been shown for proteome remodelling following BCR stimulation of naïve B cells, regulating the expression of amino acid transporters. Is mTORC1 involved in any of the changes detected following LPS + IL-4 stimulation? (i.e., cell proliferation, ribosome biogenesis, amino acid transport, cholesterol biogenesis).

      To determine the importance of mTORC1 for B cell function, we treated B cells with rapamycin. We found that rapamycin treatment slightly reduced protein synthesis (Figure S12A) and amino acid uptake (Figure S12B). These results have been added to the manuscript as Supplementary Figure 12. Rapamycin reduced cholesterol to almost the levels in naïve B cells (new Figure 9F) and had a significantly inhibitory effect on proliferation (new Figure 9A-B).

      (4) Analysis of Slc7a5 knockout B cells showed that SLC7A5 is required for LPS-induced proliferation (Figure 4G). Is SLC7A5 required for B cell growth following LPS + IL-4 stimulation? Is SLC7A5 required for BCR-induced B cell proliferation/growth?

      There appears to be a misunderstanding, as Figure 4G compares proliferation between WT and SLC7A5 KO B cells following LPS + IL-4 stimulation and not LPS stimulation alone.

      Unfortunately, we no longer have access to Slc7a5fl/fl/Vav-iCre+/- mice and will not be able to measure CTV staining for proliferation following BCR stimulation. However, a similar study using anti-IgM and anti-CD40 to activate murine B cells found that B cells from Slc7a5fl/fl/Vav-iCre+/- mice were significantly smaller, had reduced expression of the chaperone protein CD98 and impaired expression of the transferrin receptor CD71, which is required for iron uptake, compared to WT B cells (James et al, 2024).

      (5) The expression of several key proteins (regulating proliferation/amino acid transport/cholesterol metabolism) is shown to be significantly upregulated by LPS + IL-4 stimulation of naïve B cells. It would be interesting to determine whether these increases result from induced transcription of the relevant genes. This could initially be assessed by qRT-PCR analysis of LPS + IL-4 stimulated primary B cells, or alternatively, mining of online RNAseq datasets.

      We mined RNA-Seq data from C57BL/6 mice (Tesi et al, 2019) which compared naïve B cells and B cells after 2,4, or 8 hours of LPS stimulation. We found that the transcription of genes that coded for the amino acid transporter SLC7A5/SLC3A2 (Figure S6A-B) and key genes involved in cholesterol metabolism followed the same pattern of upregulation as our proteomic data (Figure S6C-F). These results have been added to the manuscript as a new Supplementary Figure 6.

      (6) Cholesterol levels are shown to be increased following resiquimod, CpG, anti-IgM, and CD40L stimulation (Figure 9). What effect do these agonists have on levels of HMGCR, SQLE, and LDLR in B cells? Is B-cell growth by these agonists impaired by Fluvastatin.

      We found that stimulation of murine B cells with either IL-4, anti-IgM or anti-CD40 could increase levels of HMGCR, SQLE and LDLR, with the largest increase seen with a combination of these stimuli (Figure S13A-D) (James et al, 2024). These results have been added to the manuscript as Supplementary Figure 13.

      Figures 10C-E show that B cell growth, survival and proliferation are impaired by Fluvastatin after Resiquimod, CpG, anti-IgM, and CD40L stimulation, although we do not have proteomic data from these stimuli to confirm the levels of HMGCR, SQLE and LDLR.

      We carried out proteomics after 24 hours of LPS + IL-4 stimulation in normal/CF media, with or without Fluvastatin. We found that Fluvastatin treatment in normal media increased the expression of HMGCR, SQLE and LDLR. Fluvastatin treatment in CF media had the highest increase in the expression of these key proteins (Figure S9G-J). These results have been added to the manuscript as Supplementary Figure 9.

      (7) Do Fluvastatin or FGTI-2734 affect early activation of signaling pathways by LPS + IL-4 stimulation of B cells? (eg. MAPKs, STATs, PI3K/AKT).

      This is an interesting question, we will pursue this in our future work.

      References:

      James O, Sinclair LV, Lefter N, Salerno F, Brenes A & Howden AJM (2024) A proteomic map of B cell activation and its shaping by mTORC1, MYC and iron. bioRxiv 2024.12.19.629506 doi:10.1101/2024.12.19.629506

      Xia Y, Xie Y, Yu Z, Xiao H, Jiang G, Zhou X, Yang Y, Li X, Zhao M, Li L, et al (2018) The Mevalonate Pathway Is a Druggable Target for Vaccine Adjuvant Discovery. Cell 175: 1059-1073.e21

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Adult laboratory mice produce ultrasonic vocalizations during free social interactions, as well as lower-frequency, voiced calls (squeaks) during aversive contexts. The question of whether mice possess a more complex repertoire of vocalizations has been of great interest to scientists studying rodent vocal behavior. In the current study, the authors analyze the rates and acoustic features of vocalizations produced by pairs of mice that are allowed to interact across a barrier, which prevents direct physical interaction. In this context, they find that same-sex (but not opposite-sex) pairs of mice produce vocalizations that are lower in frequency than the typical 70 kHz ultrasonic vocalizations produced during free interactions and that are also distinct from squeaks. These lower frequency vocalizations were observed in both male-male and female-female pairs, as well as in same-sex pairs from multiple mouse strains. The authors also report that call rates and acoustic features are not affected in male-male pairs that have been treated with the anxiolytic drug buspirone, suggesting that anxiety is not a major driver of vocalization in this behavioral context.

      Strengths:

      (1) The observation that same-sex pairs of mice produce lower frequency (<70 kHz) vocalizations in this behavioral context is novel.

      (2) The consideration of multiple types of pairs (female-female, male-male, and female-male), as well as the inclusion of multiple strains of mice and barriers with different hole diameters, are all strengths of the study.

      (3) The authors include detailed analyses of vocalization acoustic features, as well as detailed tracking of mouse positions relative to the barrier.

      Weaknesses:

      The categorization applied to vocalizations based on their mean frequencies is poorly supported and ignores the distinction in laryngeal production mechanism between voiced and ultrasonic vocalizations. Specifically, the authors are likely lumping together voiced and ultrasonic vocalizations into their "low frequency" (< 30 kHz) category, while they reserve the term "ultrasonic" exclusively for the subset of ultrasonic vocalizations with the highest mean frequencies (> 50 kHz). This categorization scheme also does not align well with past work on lower frequency rodent vocalizations, which complicates the comparison of the present findings to that past work.

      We thank the reviewer for their assessment. Firstly, we did not use mean frequencies, but peak frequencies of each single call.

      The distinction between ‘voiced’ and ‘whistled’ vocalizations based on their spectral-temporal features is hardly possible. While evidence in form of audio recordings made from both deer mouse and grasshopper mouse in helium-enriched air suggests vocalizations with lower fundamental frequency being ‘voiced’ (Pasch et al., 2017; Riede et al., 2022), a computational model considering the laryngeal anatomy of Mus musculus estimates fundamental frequencies of vocalizations at subglottal phonation threshold pressures usual for USVs to be in the range of 1 – 5 kHz and approaching 10 kHz for higher subglottal pressures usually found in the production of ‘voiced’ vocalizations (Pasch et al., 2017). Furthermore, a recent study in the singing mouse (Scotinomys teguina) found minimal fundamental frequencies of single song notes, produced by a whistle mechanism, to be about 4 kHz (Zheng et al., 2025). Thus, the presence of low fundamental (peak) frequencies in mouse vocalizations alone appears to be insufficient for deducing the production mechanism of these vocalizations.

      We did not observe differences in acoustic features clearly separating our ‘LFV’ calls into two groups suggestive of different production mechanisms. Thus, we cannot rule out that our ‘LFV’ class contains vocalizations produced by different mechanisms. However, we did not observe any squeaks in our experiments and can therefore rule out that this prominent type of ‘voiced’ call is lumped together with other calls in the ‘LFV’ calls.

      While the questions regarding production mechanism, the neurocircuitry involved, and the context-dependent choice of which mechanism to use is intriguing/enticing, the distinction between ‘voiced’ and ‘whistled’ vocalizations lies beyond the scope of our manuscript. Instead, the neurocircuitry involved in mouse vocalization production, particularly USVs and squeaks has been revealed by other laboratories. Optogenetical activation of RAm Nts neurons elicited emission of both audible vocalizations (fundamental frequencies of 10 kHz and below) and USVs in awake mice in a stimulus-dependent manner (Veerakumar et al., 2023). Furthermore, optogenetical activation of RAm-vocalization neurons led to immediate measurable adduction of vocal folds and emission of canonical USVs (Park et al., 2024). While different populations of PAG neurons are responsible for the production both squeaks and USVs (Ziobro et al., 2024), the two input streams seem to converge on RAm vocalization neurons, as silencing the output of these neurons abolished both squeak and USV emission completely (Park et al., 2024). Thus, while near complete closing of the vocal folds is necessary for the production of canonical USVs (Mahrt et al., 2016; Park et al., 2024), it is not clear which degree of vocal fold opening would result in what fundamental frequencies.

      We will add a paragraph on this issue to the discussion in the next version of the manuscript.

      In some analyses, the authors report that different groups of mice produce different relative proportions of vocalization types (as defined by mean frequency) but then compare acoustic features of vocalizations between groups after pooling all vocalizations together. The analyses of acoustic features conducted in this way may be confounded by the different proportions of vocalization types across groups.

      We displayed the relative distribution of the different call classes demonstrating that 80% of the call repertoire during the separation consisted of noisy calls and ‘LFV’. Thus, the per individual averaged acoustic features e.g. peak frequency would be predominantly shaped by the features of these two call classes. However, we agree with the reviewer’s criticism and will provide a more detailed display and analysis of the acoustic features of each call class.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine vocal communication during same-sex dyadic interactions in mice, comparing periods of physical separation (with limited sensory access) to direct social contact. They report that separation dramatically alters the vocal repertoire, shifting it away from canonical ultrasonic vocalizations (USVs) toward low-frequency vocalizations (LFVs) and broadband "noisy" calls. While LFVs and noisy calls have been described previously, largely in aversive contexts, this study provides a detailed, systematic characterization of these vocalizations during social interactions, thereby extending prior work.

      The authors explore several experimental manipulations and analyses, including divider hole size, strain and sex differences, anxiolytic drug treatment, and correlations with spatial proximity, to infer potential functions of these call types. Although the dataset is rich, the results are largely descriptive, and many conclusions remain tentative. Several experimental variables are not fully controlled, and in some cases, the interpretation exceeds what the data can clearly support. Nonetheless, with improved experimental framing, additional analyses of existing data, and a clearer discussion of limitations, this work has the potential to make a valuable contribution by broadening the field's focus beyond USVs to understand a wider vocal repertoire relevant to social context.

      Strengths:

      Much work on mouse vocal communication focuses almost exclusively on USVs. This manuscript convincingly demonstrates that non-USV vocalizations (LFVs and noisy calls) are prominent and systematically modulated by social context, highlighting an underappreciated dimension of mouse communication. Furthermore, the authors employ several experimental manipulations, including sensory access, strain, sex, and pharmacological treatment, to assess changes in vocalization repertoire. This provides a valuable resource for the field and reveals robust context dependence of vocalization. The discussion is thoughtful and integrative, particularly in its consideration of potential communicative roles of LFVs and noisy calls and their relationship to sensory constraints and signal propagation, although these ideas will require further experimental validation.

      Weaknesses:

      There are several concerns regarding experimental design and data interpretation that could be addressed to strengthen the manuscript.

      (1) The terminology used for vocalization types is confusing and needs better clarification. The authors refer to Grimsley et al. (2016) multiple times, yet they use the same names for their vocalizations while applying different definitions. This makes it very difficult to compare the two papers. Since this study and Grimsley et al. use different mouse strains (FVB vs CBA), a direct comparison of absolute frequencies may also not be appropriate. Please explicitly clarify the definitions of the call types (e.g., frequency range, voiced vs. USV) and explain how they relate to those in the previous study earlier in the manuscript.

      The existence and use of various distinct classification systems for mouse vocalizations is well known and the need to agree on a common classification system is consensus in the field. Thus, it was not our intention to complicate mouse call classification even more.

      Grimsley at al. (2016) reserve the ‘low frequency’ band solely for squeaks (or “low frequency harmonics”). Hence, it appears straight forward to name mouse calls with “mean dominant frequencies” falling between squeaks and USVs, “mid-frequency tonal vocalizations (MFVs)” (Grimsley et al., 2016). We did not observe the emission of squeaks in our experiments, but instead we observed tonal vocalizations in a peak frequency spectrum encompassing both squeaks and Grimsley and colleagues’ ‘MFVs’, representing the lowest peak frequencies we observed (< 32 kHz). Furthermore, we observed vocalizations in the range of 32 – 50 kHz (which were not low frequency components of canonical USVs) and of > 50 kHz (corresponding to canonical USVs). Leaning on the terminology of Grimsley and colleagues (2016), we thought it to be straightforward to name these call classes according to their location on the frequency spectrum: low frequency vocalizations (LFVs; up to 32 kHz), encompassing squeaks, but also Grimsley and colleagues’ MFVs, middle frequency vocalizations (MFVs; 32 – 50 kHz), and finally canonical USVs (> 50 kHz). Admittedly, choosing ‘MFVs’ for mouse calls with different acoustic features than those described by Grimsley and colleagues (2016) has caused unnecessary confusion. We therefore consider adapting our classification scheme for the next version of the manuscript.

      Regarding the comparison of call classes between different mouse strains, strain differences of spectral-temporal features of call classes have been described for canonical USVs (e.g. Scattoni et al., 2008). However, the acoustic features as well as call repertoire are still quite comparable. Furthermore, we have additionally tested both CBA/J and C57BL/6J mice in our study confirming the presence of both noisy calls, ‘LFVs’, ‘MFVs’, and ‘USVs’ in the vocal repertoire of these two strains.

      We will provide a more detailed display and analysis of the acoustic features of the call classes with the next version of the manuscript.

      (2) In the initial experiment, mice always experience separation first (15 minutes), followed by unification (5 minutes), using novel same-sex dyads. Multiple factors besides physical contact could influence vocalization across this sequence, including habituation to the arena, reduced anxiety over time, or increasing familiarity with the partner despite physical separation. It is unclear whether the authors have tested the reverse order (unification first, followed by separation). If not, this limitation should be explicitly acknowledged. In addition, examining whether vocalizations or behaviors change over the course of the 15-minute separation period, for example, by comparing early vs late phases, could help disentangle effects of habituation from those of physical separation per se.

      We had not tested mice in the reverse order, beginning with 5 minutes of unification followed by 15 minutes of separation. Therefore, we acknowledge this limitation of our study and will address it explicitly in the next version of our manuscript. We appreciate the reviewer’s note regarding the inclusion of vocalizations over time and aim to provide this analysis in the next version of the manuscript.

      (3) The conclusion that separation-induced LFVs are unlikely to be anxiety-driven may overinterpret the buspirone experiment (Figure 8). Vehicle injections themselves produced large changes in call rate and call-type distribution, raising concerns about stress or arousal induced by the injection procedure. Comparisons between buspirone-treated animals and untreated animals are therefore problematic, as these groups differ in their experimental histories, including the number of exposures. The manuscript would benefit from independent measures confirming the anxiolytic efficacy of buspirone compared to vehicle injection in this paradigm, such as behavioral readouts of anxiety. In addition, the experimental design requires a clearer description. It is not always clear whether the same dyads were tested twice, or how social familiarity, contextual familiarity, and habituation to injections were handled. Male data comparing first and second exposures should also be included as supplementary figures to allow direct comparison with the excluded female dataset.

      We agree with the reviewer’s point that the injection procedure itself appeared to have an impact on vocalization behavior. In fact, we had included the ‘untreated’ cohort in Fig. 8 despite their different experimental history to appreciate the potential impact of injection onto vocal behavior.

      Furthermore, we appreciate the reviewer’s point of confirming the anxiolytic effect of buspirone treatment with further behavioral readouts and aim to provide such analysis in the next version of the manuscript.

      Regarding the reviewer’s query for clearer experimental design description, the same dyads were tested twice. All mice lived in groups in their home cage, however, they had not met the individual they would face during the experiment before the first experiment. We will improve the description of the experimental design addressing the reviewer’s points in the next version of the manuscript.

      (4) The idea that noisy calls function to attract conspecific attention is intriguing. However, in Figure 5, all call types, including LFVs and USVs, are most likely to occur when mice are already in close proximity during separation, which seems inconsistent with a long-distance signaling role. Analyses of the temporal relationship between vocalizations and behavior would strengthen this claim. For example, it would be informative to test whether bouts of noisy calls precede approach behavior or a reduction in inter-animal distance. Examining whether calls occur before, during, or after orientation toward the partner could further clarify whether these vocalizations actively modulate social behavior.

      We appreciate the reviewer’s remarks regarding the apparent inconsistencies between noisy calls as conspecific attraction calls and their occurrence in close mouse-to-mouse proximity. We must concede that the size of our testing arena limited the maximum distances mice could achieve. Thus, we aim to provide a more extensive analysis including approach behavior and changes of inter-animal distances for resubmission of the manuscript as suggested by the reviewer.

      (5) The effects of divider hole size on vocal repertoire are striking but difficult to interpret. Unexpectedly, small holes and no holes yield similar call distributions, whereas large holes produce a markedly different profile dominated by LFVs, which also differs from free interactions. If large holes allow greater tactile or close-range interaction, the reduction in USVs and MFV is counterintuitive. Incorporating behavioral metrics such as distance, orientation, or specific interaction types alongside call classification would greatly aid interpretation and help link vocal output to interaction quality rather than divider type alone.

      We agree with the reviewer that the interpretation of the divider-hole-size-experiment are difficult and following this reviewer’s input, aim to provide additional behavioral analysis for the effect of divider hole size with the next version of the manuscript.

      (6) Throughout the study, vocalizations are pooled across both animals in the dyad. Because the arena is neutral rather than a home cage, either animal could be initiating vocalization. Assigning calls to individuals, where possible, using spatial or acoustic cues, would substantially strengthen functional interpretations. Even limited analyses, e.g., identifying which animal vocalizes first or whether calls precede approach by the partner, could provide important insight into the communicative role of different call types.

      We agree with the points raised by the reviewer regarding the importance of assigning recorded calls to the respective individual for deciphering the communicative role of different call types. Unfortunately, our system was only equipped with one condenser microphone therefore we are not able to assign calls to individual mice.

      Literature:

      Grimsley, J. M. S., Sheth, S., Vallabh, N., Grimsley, C. A., Bhattal, J., Latsko, M., Jasnow, A., & Wenstrup, J. J. (2016). Contextual Modulation of Vocal Behavior in Mouse: Newly Identified 12 kHz „Mid-Frequency“ Vocalization Emitted during Restraint. Frontiers in Behavioral Neuroscience, 10, 38. https://doi.org/10.3389/fnbeh.2016.00038

      Mahrt, E., Agarwal, A., Perkel, D., Portfors, C., & Elemans, C. P. H. (2016). Mice produce ultrasonic vocalizations by intra-laryngeal planar impinging jets. Current Biology: CB, 26(19), R880–R881. https://doi.org/10.1016/j.cub.2016.08.032

      Park, J., Choi, S., Takatoh, J., Zhao, S., Harrahill, A., Han, B.-X., & Wang, F. (2024). Brainstem control of vocalization and its coordination with respiration. Science (New York, N.Y.), 383(6687), eadi8081. https://doi.org/10.1126/science.adi8081

      Pasch, B., Tokuda, I. T., & Riede, T. (2017). Grasshopper mice employ distinct vocal production mechanisms in different social contexts. Proceedings. Biological Sciences, 284(1859), 20171158. https://doi.org/10.1098/rspb.2017.1158

      Riede, T., Kobrina, A., Bone, L., Darwaiz, T., & Pasch, B. (2022). Mechanisms of sound production in deer mice (Peromyscus spp.). The Journal of Experimental Biology, 225(9), jeb243695. https://doi.org/10.1242/jeb.243695

      Scattoni, M. L., Gandhy, S. U., Ricceri, L., & Crawley, J. N. (2008). Unusual repertoire of vocalizations in the BTBR T+tf/J mouse model of autism. PloS One, 3(8), e3067. https://doi.org/10.1371/journal.pone.0003067

      Veerakumar, A., Head, J. P., & Krasnow, M. A. (2023). A brainstem circuit for phonation and volume control in mice. Nature Neuroscience, 26(12), 2122–2130. https://doi.org/10.1038/s41593-023-01478-2

      Zheng, X. M., Harpole, C. E., Davis, M. B., & Banerjee, A. (2025). Vocal repertoire expansion in singing mice by co-opting a conserved midbrain circuit node. Current Biology: CB, 35(23), 5762-5778.e6. https://doi.org/10.1016/j.cub.2025.10.036

      Ziobro, P., Woo, Y., He, Z., & Tschida, K. (2024). Midbrain neurons important for the production of mouse ultrasonic vocalizations are not required for distress calls. Current Biology: CB, 34(5), 1107-1113.e3. https://doi.org/10.1016/j.cub.2024.01.016

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

      Thank you. In this revision, we addressed Reviewer 3’s remaining concern on the potential confound between posterior probability and time in neuroimaging results. First, as suggested by the reviewer, we provided images of activations for the effect of Pt and delta Pt after controlling for intertemporal prior in GLM-2. Second, we compared the effect of Pt and delta Pt between GLM-1 (without intertemporal prior) and GLM-2 (with intertemporal prior) and showed the results in a new figure (Figure 4).

      Regarding issue on reliance on explicitly instructed probabilities, we wish to point out that most of the concerns such as response mode and regression to the mean were addressed in the original behavioral paper by Massey and Wu (2005). Please see our response to this point in detail in Weakness (2) posted by Reviewer 3.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

      Thank you for reviewing our paper and providing constructive comments that helped us improve our paper.

      Reviewer #3 (Public review):

      Thank you again for reviewing the manuscript. In this revision, we focused on addressing your concern on the potential confound between posterior probability and time in neuroimaging results. First, we presented whole-brain results of subjects’ probability estimates (Pt, their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we compared the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. These results will be summarized in a new figure (Figure 4) in the revised manuscript.

      As suggested by the reviewer, we also added slice-by-slice images of the whole-brain results on Pt and delta Pt in the supplement in addition to the Tables of Activation so that the activated brain regions can be clearly seen through these images.

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      Thank you. Yes, people do struggle with conditional probabilities in many studies. However, as our previous work suggested (Massey and Wu, 2005), system-neglect was likely not due to response mode (having to enter probability estimates or making binary predictions, and etc.).

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      We thank the reviewer for this comment. We thank you for putting out that there are alternative models that can describe the over- and underreaction seen in the dataset. Massey and Wu (2005) dealt with this possibility in their original paper. Their concern was not so much about alternative ways of modeling their results, but in terms of alternative psychological processes. For example, asymmetric noise accounts have been posited in the judgment and decision making literature as possible accounts of phenomena like over-confidence. They addressed what might be crudely called “regression/attraction to the mean” in two ways. First, they looked at median responses as well as mean responses (because medians are less affected by the regressive effect) and found the same patterns of over- and underreactions. Second, they also generated sequences that matched particular posterior probabilities (so that over- and underreaction cannot be explained by regression to the mean) and still found under- and overreactions.

      We also wish to point out in the judgment and decision making literature starting from Edwards (1968), there is a long history of using normative Bayesian model as the starting model and subsequently develop quasi-Bayesian models (like the system-neglect model) to describe systematic deviations from the normative Bayesian.

      Finally, we want to clarify that our primary goal is not to engage in model fitting exercise that examines different possible models. To us, what is more important is that system neglect is a psychologically motivated hypothesis. It is built on the idea that the lack of sensitivity to the system parameters is due to the fact that people focus primarily on the signals and secondarily on the system parameters that generate the signals. Massey and Wu (2005) dealt with a host of other potential explanations through experimental manipulations and data analysis. In this paper, we built on Massey and Wu to examine the neurocomputational basis that gives rise to over- and underreactions.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      Thank you. In response, we added a new figure, now Figure 4, showing the results of Pt and delta Pt from GLM-2 where we added the intertemporal prior as a regressor to control for temporal confounds. We compared Pt and delta Pt results in vmPFC and ventral striatum between GLM-1 and GLM-2. We also showed the results on intertemporal prior on vmPFC and ventral striatum from GLM-2.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments (n = 30 subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, P<sub>t</sub>, in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of P<sub>t</sub>. First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of P<sub>t</sub> did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of P<sub>t</sub> can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Thank you for pointing out the inclusion of the intertemporal prior in glm2, this seems like an important control that would address my criticism. Why not present this better-controlled analysis in the main figure, rather than the results for glm1 which has no effective control of the increasing posterior probability of a reversal with time?

      Thank you for this suggestion. We added a new figure (Figure 4) that showed results of Pt and delta Pt from GLM-2. We also compared the effect of Pt and delta Pt between GLM-1 and GLM-2. We found that the effect of Pt and delta Pt did not differ between GLM-1 and GLM-2. GLM-1 and GLM-2 differed on whether various task-related regressors contributing to Pt, including the intertemporal prior, were included in the model. In GLM-1, those task-related regressors were not included. In GLM-2, the task-related regressors were included in addition to Pt and delta P.

      The reason we kept results from GLM-1 (Figure 3) was primarily because we wanted to compare the effect of Pt between experiments under identical GLM. In other words, the regressors in GLM-1 was identical across all 3 experiments. In Experiments 1 and 2, Pt and delta Pt were respectively probability estimates and belief updates that current regime was the Blue regime. In Experiment 3, Pt and delta Pt were simply the number subjects were instructed to press (Pt) and change in number between successive periods (delta Pt).

      Here is the section in the main text where we discussed the new Figure 4 on page 19-22:

      We further examined the robustness of P<sub>t</sub> and ∆P<sub>t</sub> representations in vmPFC and ventral striatum in three follow-up analyses. In the first analysis, we implemented a GLM (GLM-2 in Methods) that, in addition to P<sub>t</sub> and ∆P<sub>t</sub>, included various task-related variables contributing to P<sub>t</sub> as regressors. Specifically, to account for the fact that the probability of regime change increased over time, we included the intertemporal prior as a regressor in GLM-2. The intertemporal prior is the natural logarithm of the odds in favor of regime shift in the t-th period, , where q is transition probability and t = 1, …, 10is the period (Eq. 1 in Methods). It describes normatively how the prior probability of change increased over time regardless of the signals (blue and red balls) the subjects saw during a trial. Including it along with P<sub>t</sub> would clarify whether any effect of P<sub>t</sub> can otherwise be attributed to the intertemporal prior. We found that the results of P<sub>t</sub> and ∆P<sub>t</sub> in the vmPFC and ventral striatum in GLM-2 were identical to those in GLM-1 (Fig. 4): Fig. 4A was meant to depict the results in slices identical to those shown in Fig. 3B for results based on GLM-1. For slice-by-slice results, see Fig. S7 in SI for results based on GLM-1 and Fig. S9 for GLM-2. For Tables of activations, see Tables S1-S3 in SI for GLM-1 and Tables S7-S9 for GLM-2. In a separate, independent region-of-interest (ROI) analysis on vmPFC and ventral striatum (Fig. 4BC; see Independent regions-of-interest (ROIs) analysis in Methods for details), we further compared the effect of both P<sub>t</sub> and ∆P<sub>t</sub> between GLM-1 and GLM-2. For P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.72, p = 0.47 in vmPFC, t(58) = −0.21, p = 0.83 in ventral striatum), while the effect of P<sub>t</sub> from GLM-1 (one sample t-test, t(29) = −3,82, p <.01 in vmPFC; t(29) = −3.06, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.69, p =.01 in vmPFC; t(29) = −2.50, p .02 in ventral striatum). For ∆P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.07, p =0.94 in vmPFC; t(58) = −0.14, p =0.88 in ventral striatum), while the effect of  from GLM-1 (one-sample t-test, t(29) = −3.12, p <.01 in vmPFC; t(29) = −4.14, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.92, p <.01 in vmPFC; t(29) = −3.59, p <.01 in ventral striatum). For the intertemporal prior, activity in both vmPFC and ventral striatum did not correlate significantly with the intertemporal prior (one-sample t-test, t(29) = −0.07, p =0.95 in vmPFC; t(29) = −0.53, p =0.60 in ventral striatum). All the t-tests described above were two-tailed. Taken together, these results suggest that vmPFC and ventral striatum represented P<sub>t</sub> and ∆P<sub>t</sub> regardless of whether the intertemporal prior and other task-related regressors contributing to P<sub>t</sub> were included in the GLM. We also did not find that vmPFC and ventral striatum to represent the intertemporal prior. In the second analysis, we implemented a GLM that replaced P<sub>t</sub> with the log odds of P<sub>t</sub>, 1n (P<sub>t</sub>/(1 - P<sub>t</sub>)) (Fig. S10 in SI). In the third analysis, we implemented a GLM that examined P<sub>t</sub> separately on periods when change-consistent (blue balls) and change-inconsistent (red balls) signals appeared (Fig. S11 in SI). Each of these analyses showed significant correlation with P<sub>t</sub> in vmPFC and ventral striatum, further establishing the robustness of the P<sub>t</sub> findings.

      As a further point I could not navigate the tables of fMRI activations in SI and recommend replacing or supplementing these with images. For example I cannot actually find a vmPFC or ventral striatum cluster listed for the effect of Pt in GLM1 (version in table S1), which I thought were the main results? Beyond that, comparing how much weaker (or not) those results are when additional confound regressors are included in GLM2 seems impossible.

      As suggested by the reviewer, we added slice-by-slice images showing the effect of Pt and delta Pt (Figure S9 in SI for GLM-2 and Figure S7 for GLM-1). The clusters in blue represent Pt effect, the clusters in orange represent delta Pt effect. As can be seen, both Pt and delta Pt are represented in the vmPFC and ventral striatum.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors were seeking to identify a molecular mechanism whereby the small molecule RY785 selectively inhibits Kv2.1 channels. Specifically, the authors sought to explain some of the functional differences that RY785 exhibits in experimental electrophysiology experiments as compared to other Kv inhibitors, namely the charged and non-specific inhibitor tetraethylammonium (TEA). The authors used a recently published cryo-EM Kv2.1 channel structure in the open activated state and performed a series of multi-microsecond-long all-atom molecular dynamics simulations to study Kv2.1 channel conduction under the applied membrane voltage with and without RY785 or TEA present. They observed that while TEA directly blocks K+ permeation by occluding ion permeation pathway, RY785 binds to multiple non-polar residues near the hydrophobic gate of the channel driving it to a semi-closed non-conductive state. They confirmed this mechanism using an additional set of simulations and used it to explain experimental electrophysiology data,

      Strengths:

      The total length of simulation time is impressive, totaling many tens of microseconds. The authors develop their own forcefield parameters for the RY785 molecule based on extensive QM based parameterization. The computed permeation rate of K+ ions through the channel observed under applied voltage conditions is in reasonable agreement with experimental estimates of the single channel conductance. The authors have performed extensive simulations with the apo channel as well as both TEA and RY785. The simulations with TEA reasonably demonstrate that TEA directly blocks K+ permeation by binding in the center of the Kv2.1 channel cavity, preventing K+ ions from reaching the SCav site. The authors conclude that RY785 likely stabilizes a partially closed conformation of the Kv2.1 channel and thereby inhibits K+ current. This conclusion is plausible given that RY785 makes stable contacts with multiple hydrophobic residues in the S6 helix, which they can also validate using a recently published closed-state Kv2.1 channel cryo-EM structure. This further provides a possible mechanism for the experimental observations that RY785 speeds up the deactivation kinetics of Kv2 channels from a previous experimental electrophysiology study.

      Weaknesses:

      The authors, however, did not directly observe this semi-closed channel conformation and in fact acknowledge that more direct simulation evidence would require extensive enhanced-sampling simulations beyond the scope of this study. They have not estimated the effect of RY785 binding on the protein-based hydrophobic pore constriction, which may further substantiate their proposed mechanism. And while the authors quantified K+ permeation, they have not made any estimates of the ligand binding affinities or rates, which could have been potentially compared to experiment and used to validate their models.

      However, despite those relatively minor weaknesses, the conclusions of the study are convincing, and overall this is a solid study helping us to understand two distinct molecular mechanisms of the voltage-gated potassium channel Kv2.1 inhibition by TEA and RY785, respectively.

      Reviewer #2 (Public review):

      Summary

      In this manuscript, Zhang et al. investigate the conduction and inhibition mechanisms of the Kv2.1 channel, with a particular focus on the distinct effects of TEA and RY785 on Kv2 potassium channels. Using microsecond-scale molecular dynamics simulations, the authors characterize K⁺ ion permeation and RY785-mediated inhibition within the central pore. Their results reveal an inhibition mechanism that differs from those described for other Kv channel inhibitors.

      Strengths

      The study identifies a distinctive inhibitory mode for RY785, which binds along the channel walls in the open-state structure while still permitting a reduced level of K⁺ conduction. In addition, the authors propose a long-range allosteric coupling between RY785 binding in the central pore and changes in the structural dynamics of Kv2.1. Overall, this is a well-organized and carefully executed study, employing robust simulation and analysis methodologies. The work provides novel mechanistic insights into voltage-gated potassium channel inhibition and may offer useful guidance for future structure-based drug design efforts.

      Weaknesses:

      The study needs to consider the possibility of multiple binding sites for PY785, particularly given its impact on voltage sensors and gating currents. Specifically, the potential for allosteric binding sites in the voltage-sensing domain (VSD) should be assessed, as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019). Increasing structural and functional evidence supports the existence of multiple ligand-binding modes in voltage-gated ion channels. For example, polyunsaturated fatty acids have been shown to bind to KCNQ1 at both the voltage sensor domain and the pore domain (https://doi.org/10.1085/jgp.202012850). Similarly, cannabidiol has been structurally resolved in Nav1.7 at two distinct sites, one in a fenestration and another near the IFM-binding pocket (https://doi.org/10.1038/s41467-023-39307-6). These advances illustrate that ligand effects cannot always be interpreted based solely on a single binding site identified previously.

      Reviewing Editor: 

      The comments of the reviewers seem thoughtful and constructive. The weaknesses noted in reviews mainly concern mismatch between expectations, created by reading the Abstract, and data in the manuscript. The mismatch could be reconciled by either new simulations examining a semi-open state of the gate and additional RY785 binding sites, or by adjusting wording of the Abstract and Discussion to make it more clear that such simulations were not done. 

      The Abstract and Discussion have been revised to make clear the computer-simulations presented in our study were designed to specifically validate or refute the hypothesis that RY785 is recognized by the pore domain, not the voltage sensors. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors): 

      The authors addressed all the major issues in the original submission identified by the reviewers. I noticed a few minor issues, listed below, which can potentially fix small errors and further improve the readability of the manuscript. 

      p.3 tetramethyl-ammonium -> tetraethylammonium 

      p.7 "Snapshot of the final snapshot" -> "Snapshot of the final simulation coordinates" 

      p. 8 "sigma value" - please spell out what it is. 

      p. 9 "one or other subunit of the tetramer" -> "one or another subunit of the tetramer" or "one or more subunits of the tetramer" 

      p 15 "(the net charge of these constructs is thus zero)." -> ""(the net charge of these constructs is zero for these systems)." Please note that using ionizable amino acid residues in their default protonation state does not guarantee net zero charge of the system since the number of cationic and anionic residues is generally not the same. 

      p. 15 "Two K+ ions were initially positioned in the selectivity filter, one coordinated by residues 373..." Please indicate at which ion binding sites S_1, S_2, e.g. K+ were located and what the residue names are . 

      SI Figs. S3-S20. Please indicate in the figure captions that all those data are for RY785 

      SI Fig. S22 and SI Table S1 captions "shown in Fig. S20" -> "shown in Fig. S21" 

      We thank the Reviewer for this thorough proofreading. We have made the necessary corrections. 

      Reviewer #2 (Recommendations for the authors): 

      The authors have addressed most of my comments satisfactorily, with the exception of the first point. Below, I provide further clarification regarding my concern. 

      First, it appears that the authors may have misunderstood what is meant by the possibility of multiple binding sites for RY785. This does not imply that the central pore is excluded as a binding site. Rather, it refers to the possibility that, in addition to a pore-domain site, the ligand may interact with additional binding sites, either simultaneously or in a statedependent manner. Increasing structural and functional evidence supports the existence of multiple ligand-binding modes in voltage-gated ion channels. For example, polyunsaturated fatty acids have been shown to bind to KCNQ1 at both the voltage sensor domain and the pore domain (https://doi.org/10.1085/jgp.202012850). Similarly, cannabidiol has been structurally resolved in Nav1.7 at two distinct sites, one in a fenestration and another near the IFM-binding pocket (https://doi.org/10.1038/s41467-02339307-6). These advances illustrate that ligand ecects cannot always be interpreted based solely on a single binding site identified previously. Therefore, even if one assumes that there is no precedent for a small-molecule inhibitor that simultaneously acts on both the voltage sensor and pore domain, this does not exclude the possibility that a ligand may bind to both regions in dicerent functional states.  

      The Reviewer’s opinion came across clearly in the previous version. We however disagree that a computational investigation of the possibility that RY785 binds to the voltagesensors is well-advised at this point, given that the model we propose seemingly ocers a rationale for the inhibitory ecects observed experimentally. Our opinion is also that there is no compelling precedent for the mechanism of inhibition envisaged by the Reviewer – and would argue that neither of the two studies referenced above are compelling examples.  As we stated in our previous response to the Reviewer, we believe that the logical next step in this research will be to validate or refute the computational prediction we have put forward, experimentally. 

      In addition, the present computational study does not provide direct mechanistic evidence to explain the statement that RY785 accelerates voltage-sensor deactivation. Specifically, no simulations were performed to model pore-domain closure or voltage-sensor motion upon RY785 binding. Moreover, alternative binding sites were neither explored nor explicitly excluded, as the simulations only involved placing a single molecule of TEA or RY785 approximately 10 Å below the cytoplasmic gate. Under these conditions, conclusions regarding ecects on voltage-sensor dynamics remain speculative. 

      That is a fair characterization. 

      These concerns do not detract from the overall quality of this otherwise strong computational study. There are several straightforward ways to address this issue. For example: 

      (1) Perform molecular docking or related screening approaches to evaluate potential ligand-binding sites beyond the central pore, particularly in regions proximal to the voltage sensor. This should not impose a substantial additional computational burden for a computational chemistry group. 

      (2) Revise the abstract and discussion to clarify that the current work focuses exclusively on pore-domain binding and does not explore possible additional binding sites near the voltage sensor. Explicitly stating this limitation would help prevent potential overinterpretation by readers.

      We have opted for (2), as noted above.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using electron microscopy, the authors report discontinuities in the plasma membrane of C. elegans embryos. They associate these discontinuities with cell division and speculate that membrane rupture and subsequent resealing contribute to cytokinesis. They further discuss the proximity of these sites to vesicles and propose a role for vesicle-mediated membrane extension. 

      Weaknesses:

      (1) The possibility that the membrane discontinuity is an artifact

      Although the authors focus on discontinuities in the plasma membrane, similar discontinuities are also observed in mitochondria, the nuclear envelope, and yolk granules. This raises concerns about whether the electron micrographs presented are suitable for assessing membrane continuity.

      Electron micrographs result from a lengthy sample preparation process, including high-pressure freezing, freeze substitution in acetone containing OsO4, gradual warming, uranyl acetate staining, resin embedding, and ultrathin sectioning. In general, lipids are soluble in acetone at temperatures above −30 {degree sign}C, and preservation of membrane structures relies heavily on efficient OsO4 fixation.

      Insufficient OsO4 treatment would be expected to reduce membrane contrast.

      C. elegans embryos are encapsulated by an eggshell that forms at fertilization and gradually develops during the first few cell divisions. It is unclear how efficiently OsO4 in acetone penetrates the eggshell during freeze substitution, raising further concern about plasma membrane preservation under the conditions used.

      We thank the reviewer for raising this important technical concern. We have taken this question seriously since first observing membrane discontinuities six years ago, and we have since conducted extensive controls to rule out fixation artifacts. Below, we present multiple lines of evidence—ranging from technical reproducibility to orthogonal imaging approaches—that collectively demonstrate the biological reality of these structures.

      (1) Technical expertise and standard protocols

      Our laboratory has extensive experience with electron microscopy across diverse biological systems, including neurons, muscle cells, and hypodermis in C. elegans, as well as tissues from Drosophila, mouse, bacteria, and cultured cells (Chen et al., 2013; Ding et al., 2018; Guan et al., 2022; Y. Li et al., 2018; Miao et al., 2024; Qin et al., 2014; Wang et al., 2026; J. Xu et al., 2022; M. Xu et al., 2021; L. Yang et al., 2020; X. Yang et al., 2019; Zhu et al., 2022). Importantly, we did not introduce any novel or unconventional steps in our EM preparation; all protocols were standard and well-established. Thus, the observed membrane discontinuities are unlikely to stem from technical inexperience or idiosyncratic methods.

      In addition to membrane discontinuities, we would like to emphasize that a large number of single plasma membranes separating adjacent cytoplasmic domains were also detected under EM (Figure 1, 3 and 4, for instance). This observation is particularly significant because the invagination model cannot generate single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes could explain the formation of cytoplasm-enclosed membranes. Furthermore, as the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, this indicates successful EM processing and argues against inefficient fixation or other technical issues.

      (2) Reproducibility across independent preparations and techniques

      To test whether the discontinuities were preparation-specific, we examined four independent sample batches collected in the lab over the years. Membrane discontinuities, as well as cytoplasm-immersed membranes, on embryonic cells were consistently observed across all batches, indicating that the phenomenon is not dependent on a single preparation method. Furthermore, we validated our findings using two EM techniques: transmission electron microscopy (HPF-TEM) and dualbeam scanning electron microscopy (SEM). Membrane discontinuities were clearly identifiable with both techniques, further supporting their robustness.

      (3) Validation using an independent public dataset

      We examined the publicly available C. elegans embryo EM collection (WormAtlas). In several instances, particularly at the embryonic periphery where plasma membrane discontinuities are more readily visualized (https://www.wormimage.org/image.php?id=140265&page=1), we identified similar structures. The presence of these features in an independent dataset generated by different researchers confirms that they are not artifacts unique to our sample preparation.

      (4) Developmental regulation of membrane discontinuities

      We analyzed embryos across multiple developmental stages. Membrane discontinuities were observed in both intrauterine and laid embryos at early stages. However, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity argues strongly against a general fixation artifact, which would be expected to occur randomly across stages. Additionally, the eggshell is present throughout the embryonic stage of C. elegans; therefore, the dramatic reduction of membrane discontinuities in comma-stage of embryo argues against the possibility that the eggshell poses a fixation problem.

      (5) Rigorous criteria for identifying membrane discontinuities

      To ensure unbiased analysis, we systematically collected images from early embryonic cells using the following criteria:

      (1) Random section selection: For each sample, we randomly selected one section containing the largest number of embryos or cells (Sup Figure 2) for initial analysis. We found membrane discontinuities in 159 cells distributed across 57 embryos, representing 95% of the total sampled embryos This portion of the data is summarized in Figure 1.

      (2) Whole-membrane examination: Each putative membrane discontinuity was identified only after examining the entire plasma membrane of the cell on a given section. Importantly, aside from the discontinuity, the remainder of the plasma membrane remained intact. Moreover, in most cells, only a single discontinuity was present per section, arguing against random, widespread membrane tearing during preparation.

      (3) Neighboring section verification: Because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Again, the same membrane discontinuity was confirmed only after inspecting the entire plasma membrane on those neighboring sections as well. We will include this verification protocol in the revised Methods and additional imaging of consecutive sections would be provided if needed.

      (4) Serial section reconstruction: To further determine whether a dividing cell indeed contains one membrane rupture, we performed two serial reconstruction experiments.

      First, we used HPF-TEM to analyze 105 consecutive sections of a metaphase cell, reconstructing the entire plasma membrane and chromosome configuration. We found that one membrane rupture largely encircled the chromosomal disc (Figure 2 and Video S1), spatially aligning with the future segregation zone. Second, we used AutoCUTS-SEM to collect approximately 600 sections covering ~95% of a telophase cell containing three nuclei sharing a common cytoplasm. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site. These three ruptures converged to form a Y-shaped exposed cytoplasmic region spanning >351 sections (Figure 5). Collectively, these reconstructions demonstrate that each cell contains only one discontinuity from a 3D point of view, further supporting that the phenomenon is not due to random sample preparation damage.

      (6) Orthogonal validation by live imaging: In addition to EM, we performed live imaging of plasma membrane dynamics. While live imaging provides important temporal context, we recognize its limitations in resolving membrane ultrastructure. The rapid kinetics of membrane extension (approximately 20–30 seconds for metaphase and less than 3 minutes for cytokinesis), combined with embryo motility, introduces spatiotemporal ambiguities. To capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, nevertheless, both membrane ruptures and free-ended sister membrane structures could be detected (Figures 6), providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos. Notably, 3D membrane dynamics analysis using light-sheet microscopy (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) revealed membrane ruptures in dividing early C. elegans embryonic cells, including during telophase or metaphase. Therefore, live imaging further validates the membrane rupture phenomena in dividing embryonic cells in C. elegans

      While future advances in imaging technology may enable real-time visualization at near-EM resolution, our extensive, multi-year effort to test the artifact hypothesis has convinced us that these membrane discontinuities are genuine biological features of dividing C. elegans embryonic cells.

      We are confident that the cumulative evidence presented here addresses the reviewer's concerns and demonstrates that the observed membrane discontinuities, as well as cytoplasm-immersed membranes, are not procedural artifacts but rather reflect a previously underappreciated aspect of plasma membrane dynamics during embryonic cell division.

      (2) Lack of evidence linking membrane discontinuity to cell division 

      The reported plasma membrane discontinuities are not specific to mitotic cells. If this were a physiological process playing an important role in cytokinesis, it should occur in a temporally and spatially coordinated manner with nuclear division. However, it remains unclear at what stage of the cell cycle the membrane rupture occurs and where it is located relative to chromosomes and the mitotic spindle.

      Thank you for this insightful comment. We agree that establishing a direct link between plasma membrane discontinuities and mitotic progression is critical, and we appreciate the opportunity to clarify this point.

      In C. elegans embryos, the early stages of development are characterized by rapid and extensive cell division. Within approximately 100 minutes, a two-cell embryo develops into an embryo containing nearly 30 cells. The majority of the electron microscopy analyses in our study were performed on embryos at stages with fewer than 30 cells, where most cells are actively dividing. Thus, it is reasonable to infer that the cells exhibiting membrane discontinuities are predominantly mitotic cells.

      Supporting this notion, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of membrane discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity strongly suggests that membrane discontinuities are tightly linked to cell division.

      Importantly, mitotic features such as metaphase chromosomes aligned at the equatorial plane or two (or more) nuclei sharing common cytoplasm can be identified in EM images. In our single random EM section analysis, we captured membrane discontinuities in cells at metaphase, anaphase (characterized by fewer than 10 chromosomal clumps), and telophase (defined by two nuclei sharing cytoplasm). Hence, membrane discontinuities are indeed present on mitotic cells. In addition, a published work by Fu et al (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) using light-sheet microscopy captured similar membrane discontinuities in cells displaying classical mitotic features, including anaphase or telophase.

      To further investigate the spatial relationship between membrane ruptures and chromosome organization, we performed three-dimensional reconstructions on a metaphase cell. As shown in Figure 2 and Video S1, the membrane discontinuities largely encircled the condensed chromosome disc and were spatially aligned with the future segregation zone, further revealing the relative location of membrane discontinuities to chromosomes, at least at metaphase.

      We further collected 3D information for a telophase cell containing three nuclei. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site that merged to form a single rupture. The observation that membrane ruptures are present in a tri-nucleated cell is particularly informative. The tri-nucleated feature indicates that this cell underwent two rounds of cell division and that both divisions were at telophase. The presence of a single membrane rupture suggests that membrane discontinuities may persist throughout the cell cycle, as the second cell cycle began from a mother cell that still shared cytoplasm with its sister cell and already had one membrane rupture. Therefore, in addition to the mitotic phase, membrane discontinuities—at least in this context—also exist during the DNA synthesis stage.

      (3) Lack of evidence for extension of the separated membrane 

      Although the authors speculate that resealing of the ruptured membrane occurs via extension of the separated membrane, no direct evidence supporting this mechanism is presented. Proximity to vesicles alone does not demonstrate that membrane extension occurs through vesicle fusion. More direct evidence is required to support this claim.

      Thank you for raising this important point. We appreciate the opportunity to clarify our conclusion.

      In our study, EM analysis revealed the presence of cellular vesicles in close proximity to both free membrane edges and the already separated sister plasma membranes (Figure 4). However, we acknowledge that without advanced live-cell imaging, it is not possible to conclusively determine whether the extension of these separated sister membranes occurs through vesicle fusion.

      We realize that a statement in the Discussion section—“The expansion of the plasma membrane is generally driven by vesicle fusion”(page 16)—may have inadvertently led the reviewer to interpret this as our own conclusion regarding the mechanism of membrane extension in this context. In fact, that statement was intended to reflect the current general understanding of membrane expansion, not to imply that we had demonstrated such a mechanism for the free-ended sister membranes. As we subsequently noted, “However, this remains speculative and requires further experimental validation.”

      To avoid any misunderstanding, we will revise this section to clearly state that the mechanism by which the separated sister membranes extend remains unknown and that further investigation is needed to determine how existing models of membrane expansion may apply to or be adapted for this novel context.

      We thank the reviewer again for their thoughtful comment, which has helped us improve the clarity of our manuscript

      (4) Inconsistency with published work

      Numerous studies have examined cell division in developing C. elegans embryos using the GFP::PH(PLC1δ1) marker expressed from the ltIs38 transgene [pAA1; pie-1::GFP::PH(PLC1δ1) + unc-119(+)], generated by the Oegema lab (https://wormbase.org/species/c_elegans/transgene/WBTransgene00000911#01--10 ). To date, no study has reported membrane ruptures of the magnitude described here. The complexity of cell surface morphology from the 8- to 12-cell stages onward has been well documented, for example, by Fu et al. (2016) using light-sheet microscopy and 3D reconstruction (doi:10.1038/ncomms11088).

      Supplementary Movies 5, 6, and 10 of this paper illustrate how single-plane images can easily produce apparent membrane discontinuities, for example, due to membrane orientations nearly parallel to the imaging plane.

      The three single-plane images from only three embryos presented in Figure 6 are insufficient to support the authors' strong conclusions. Raw 3D data should be provided.

      Thank you for this important comment. We fully agree that the GFP::PH(PLC1δ1) marker, generated by the Oegema lab, has been widely and effectively used to study various aspects of C. elegans embryonic development. In fact, we also employed this same marker in our study to assess membrane integrity.

      However, while live imaging provides invaluable temporal resolution, its limitations in resolving membrane ultrastructure are substantial. In C. elegans embryos, early development is marked by rapid and extensive cell divisions. Within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. During this fast-dividing stage, the rapid kinetics of membrane extension—approximately 20–30 seconds during metaphase and less than 3 minutes during cytokinesis— combined with embryo motility, introduce considerable spatiotemporal ambiguities. Furthermore, the longstanding invagination model of cytokinesis has shaped interpretations in the field, which may have led to ambiguous structures such as free-ended extensions being dismissed as potential artifacts rather than recognized as alternative morphological features. Theoretical and computational models have largely been built upon invagination-centric assumptions, which may have further constrained conceptual frameworks. Therefore, fluorescence protein-based live imaging analysis alone could not serve as a convincing approach to challenge the current dogma of cell division, nor did we intend it to.

      However, when reexamined in light of our findings, previous studies using this same GFP marker have in fact revealed membrane discontinuities that went unnoticed. For example, Fu et al (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) using light-sheet microscopy and 3D reconstruction, captured membrane discontinuities in cells undergoing mitotic phases such as anaphase or telophase. Similarly, an earlier study by Harrell and Goldstein (Harrell and Goldstein. 2011. Internalization of multiple cells during C. elegans gastrulation depends on common cytoskeletal mechanisms but different cell polarity and cell fate regulators. Developmental Biology. DOI:10.1016/j.ydbio.2010.09.012) showed regions where the GFP::PH signal appeared fuzzy and discontinuous.

      Nevertheless, given the inherent limitations of fluorescence microscopy in resolving membrane ultrastructure, high-resolution electron microscopy—supported by rigorous controls and serial section analysis—remains the gold standard for definitively identifying such membrane discontinuities.

      We acknowledge that our findings are surprising. We did not set out to challenge the long-held view of membrane integrity during cell division. In fact, this study began when our dedicated EM technician, Jingjing Liang, first observed membrane discontinuity phenomena in control samples—wild-type embryos. Had she not come across this observation, we likely would never have pursued this line of inquiry.

      We appreciate the opportunity to clarify these points and thank the reviewer for thoughtful engagement with our work.

      Reviewer #2 (Public review):

      Summary:

      Liang et al. explore an unusual observation of membrane discontinuities in dividing C. elegans embryonic cells. This report is the first to demonstrate that, instead of the classical invagination of membranes during cytokinesis, cells in the early embryos of C. elegans exhibit separation of sister membranes that extend independently. TEM images of high-pressure-frozen samples provide strong evidence for the presence of Membrane Openings (MOs) in cells at various stages of the cell cycle, predominantly during mitosis. High-resolution images (x 30,000) clearly show the wrinkled plasma membrane and smooth MOs.

      The electron microscopy data are supported by the live cell imaging of strains with fluorescently tagged membrane markers. This study opens up the possibility of tracking MOs at other stages of C. elegans development, and also asks if it might be a common phenomenon in other species that exhibit rapid embryonic growth and divisions. 

      Strengths:

      (1) Thorough verification of Membrane Openings (MO) by several methods: 

      (a) 4 independent sample batches.

      (b) Examined historical collections.

      (c) Analysed embryos at different stages of development. The absence of MOs in later stages (comma) serves as a negative control and gives confidence that MOs are genuine and not technical artifacts. 

      (2) Live cell imaging of strain with fluorescently labelled membranes provides realtime dynamics of membrane rupture.

      (3) After observing the membrane rupture, the next obvious question is - what prevents the cytosol from leaking out? The EM images showing PBL and PEL - extracellular matrix serving as barriers for the cytosol are convincing.

      Thanks to the reviewer for the encouragement. Highly appreciated.

      Weakness:

      (1) The association of membrane discontinuities with cell division is not convincing, as there are 159 cells out of 425 showing MOs, but it is not mentioned clearly how many of these are undergoing cell division. Also, it's not clear whether the 20 dividing cells analysed for MOs are a part of the 159 cells or a separate dataset. A graphical representation of the number of samples and observed frequencies would be helpful to understand the data collection workflow.

      We sincerely thank the reviewer for raising this important question and appreciate the opportunity to clarify these points.

      (1) Relationship between membrane discontinuities and cell division

      In C. elegans embryos, early development is characterized by rapid and extensive cell division: within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. Most of our electron microscopy (EM) analyses were performed on embryos at stages with fewer than 30 cells, in which the majority of cells are actively dividing. Therefore, it is reasonable to infer that the cells exhibiting membrane discontinuities (MOs) are predominantly mitotic. Supporting this, as embryos reached the comma stage—when cell proliferation declines and elongation begins—the incidence of MOs dropped sharply (0/13, 0/17, and 0/30 cells examined. This developmental specificity strongly links MOs to cell division.

      Moreover, in single random EM sections, we observed MOs in cells displaying clear mitotic features, such as metaphase chromosomes aligned at the equatorial plate, or anaphase/telophase configurations (fewer than 10 chromosomal clumps or two nuclei sharing common cytoplasm). Thus, MOs are indeed present in mitotic cells.

      From our 3D reconstruction (Figure 5), we identified a telophase cell containing three nuclei, each enclosed by its own plasma membrane, with each membrane harboring a single rupture that converged into a single opening. This tri-nucleated configuration indicates that the cell had undergone two rounds of division and was at telophase in both. The presence of a single membrane rupture in this context suggests that MOs can persist beyond mitosis, as the second cell cycle initiated from a mother cell that already shared cytoplasm with its sister and already contained a rupture. Thus, in this case, MOs were also present during DNA synthesis stage.

      (2) Clarification of sample numbers and datasets

      In Figure 1, we present results from a single EM section per embryonic cell, with sections randomly selected per embryo as detailed in Sup Figure 2. This initial dataset (425 cells) forms the basis of Figure 1.

      From the same pool of 425 cells, we used additional EM sections—distinct from those shown in Sup Figure 2—to locate 20 dividing cells for analysis of membrane discontinuities. Thus, while these 20 cells originated from the same set of embryos, they were not derived from the sections used in Figure 1 or Sup Figure 2.

      A graphical summary of sample numbers from the single-section analysis is already provided in Figure 1. Notably, cells with two clearly visible nuclei are more likely to be sectioned through or near their maximal diameter. In contrast, the randomly selected sections used for Figure 1 captured cells at variable planes, reducing the likelihood of observing MOs. Consistent with this, in the three embryos where no MOs were detected (one example is Sup Figure 2N), the sections likely passed through peripheral regions of the cells. Consequently, the frequency of MOs in randomly sectioned cells (Figure 1) is not directly comparable to that observed in the 20 dividing cells, which were analyzed using sections more likely to capture cells near their maximal diameter. These 20 dividing cells should therefore be considered a separate analysis. We will add detailed explanations in the Methods section to ensure this distinction is clearly understood.

      We are grateful for the reviewer’s thoughtful feedback and believe these clarifications will improve the clarity and rigor of the manuscript.

      (2) In Figures 3A and 3B, the resolution of the images is not enough to verify 3A as classical membrane invagination and 3B as detached sister membranes.

      Thank you for your valuable comment. In the revised manuscript, we will provide additional images at higher magnification to better illustrate the classical membrane invagination in Figure 3A and the detached sister membranes in Figure 3B.

      (3) Figure 6 lacks controls. How does the classical invagination look in this strain? Also, adding nuclear dye would be informative, in order to correlate the nuclear division with membrane rupture, as claimed. 

      Thank you for this important comment. As we addressed how we correlated nuclear division with membrane rupture in response to weakness (1), below we will focus on how we may distinguish classical invagination from membrane rupture.

      While live imaging provides invaluable temporal resolution, its limitations in resolving membrane ultrastructure are substantial. In C. elegans embryos, early development is marked by rapid and extensive cell divisions. Within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. During this fast-dividing stage, the rapid kinetics of membrane extension—approximately 20–30 seconds during metaphase and less than 3 minutes during cytokinesis— combined with embryo motility, introduce considerable spatiotemporal ambiguities. Furthermore, the longstanding invagination model of cytokinesis has shaped interpretations in the field, which may have led to ambiguous structures such as free-ended extensions being dismissed as potential artifacts rather than recognized as alternative morphological features. Theoretical and computational models have largely been built upon invagination-centric assumptions, which may have further constrained conceptual frameworks. Therefore, fluorescence protein-based live imaging analysis alone could not serve as a convincing approach to challenge the current dogma of cell division, nor did we intend it to.

      However, when reexamined in light of our findings, previous studies using GFP::PH or similar markers have in fact revealed membrane discontinuities that went unnoticed. For example, using light-sheet microscopy and 3D reconstruction, Fu et al captured membrane discontinuities in cells undergoing division such as anaphase or telophase (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016.DOI:10.1038/ncomms11088)

      Similarly, an earlier study by Goldstein et al. (Harrell and Goldstein. 2011. Internalization of multiple cells during C. elegans gastrulation depends on common cytoskeletal mechanisms but different cell polarity and cell fate regulators. Developmental Biology. DOI:10.1016/j.ydbio.2010.09.012) showed regions where the GFP::PH signal appeared fuzzy and discontinuous.

      Here, to capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, both membrane ruptures and free-ended sister membrane structures (Figures 6) could be detected, providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos.

      However, given the inherent limitations of fluorescence microscopy in resolving membrane ultrastructure, high-resolution electron microscopy—supported by rigorous controls and serial section analysis—remains the gold standard for definitively distinguishing invagination from membrane discontinuities.

      While future advances in imaging technology may enable real-time visualization at near-EM resolution, our extensive, multi-year effort to test the artifact hypothesis has convinced us that these membrane discontinuities are genuine biological features of dividing C. elegans embryonic cells.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors challenge a dogma in cell biology, namely that cells are at any time point engulfed by a continuous plasma membrane. Liang et al. find that during C elegans embryogenesis, a high number of cells are not entirely surrounded by a plasma membrane but show membrane openings (MOs). These openings are enriched at the embryo's periphery, towards the eggshell. The authors propose that plasma membrane discontinuities emerge during metaphase of mitosis and that independent extension of "sister membranes" engulfs the daughter cells.

      Strengths:

      On the positive side, the authors find plasma membrane discontinuities not only by electron microscopy but also by fluorescence microscopy and provide information about the dynamics of membrane openings and their emergence. While this is assuring, the authors conclude that MOs emerge during metaphase. From what the authors show, this particular information cannot be deduced, as there is no dynamic capture of a membrane scission event together with a chromatin marker that would indicate mitosis. The authors could, however, attempt to find such events in live movies, given the high incidence of MOs reported from their EM data.

      Thanks to the reviewer for the encouragement. Highly appreciated.

      Weaknesses:

      In order to convincingly demonstrate the absence of any plasma membrane in the respective regions of the embryonic periphery or between cells of the embryo, the authors would have to show consecutive serial TEM sections where MOs are detected over more z-planes, beyond the mere 3D reconstructions. Although the authors state in the methods section that continuous ultrathin sections were cut for the metaphase sample (page 21, line 472), consecutive sections are never shown in TEM. While we do see the 3D reconstructions, better documentation of the underlying TEM data is missing. It would be necessary to show a membrane opening in consecutive z sections. Alternatively, the authors could seek the possibility to convincingly back up their claims with volume imaging by focused ion beam scanning EM (FIBSEM), where cellular volumes can be sectioned in almost isotropic resolution

      We Thank the reviewer for raising these important technical concerns. We have taken this question seriously since first observing membrane discontinuities six years ago, and we have since conducted extensive controls to rule out fixation artifacts.

      First of all, in addition to membrane discontinuities, we would like to highlight that a large number of single plasma membranes separating adjacent cytoplasmic domains were detected by EM (Figure 1, 3 and 4). This observation is particularly significant because the invagination model cannot account for the formation of single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes offers a plausible explanation for the generation of cytoplasm-immersed membranes. Furthermore, the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, indicating successful EM processing and arguing against potential issues such as inadequate fixation or other technical limitations.

      Second, we applied rigorous criteria for identifying membrane discontinuities:

      (1) To test whether the discontinuities were preparation specific, we examined four independent sample batches and validated our findings using two EM techniques: transmission electron microscopy (HPF-TEM) and dual-beam scanning electron microscopy (SEM).

      (2) We analyzed embryos across multiple developmental stages. Membrane discontinuities were observed in both intrauterine and laid embryos at early stages. However, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity argues strongly against a general fixation artifact, which would be expected to occur randomly across stages. Additionally, the eggshell is present throughout the embryonic stage of C. elegans; therefore, the dramatic reduction of membrane discontinuities in comma-stage of embryo argues against the possibility that the eggshell poses a fixation problem.

      (3) Each putative membrane discontinuity was identified only after examining the entire plasma membrane of the cell on a given section. Importantly, aside from the discontinuity, the remainder of the plasma membrane remained intact. Moreover, in most cells, only a single discontinuity was present per section, arguing against random, widespread membrane tearing during preparation. Because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Again, the same membrane discontinuity was confirmed only after inspecting the entire plasma membrane on those neighboring sections as well. We will include this verification protocol in the revised Methods and additional imaging of consecutive sections would be provided if needed.

      To further determine whether a dividing cell indeed contains one membrane rupture, we performed two serial reconstruction experiments using consecutive sections, as the reviewer suggested. First, we used HPF-TEM to analyze 105 consecutive sections of a metaphase cell, reconstructing the entire plasma membrane and chromosome configuration. We found that one membrane rupture largely encircled the chromosomal disc (Figure 2 and Video S1), spatially aligning with the future segregation zone. Second, we used AutoCUTS-SEM to collect approximately 600 sections covering ~95% of a telophase cell containing three nuclei sharing a common cytoplasm. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site. These three ruptures converged to form a Yshaped exposed cytoplasmic region spanning >351 sections (Figure 5). Collectively, these reconstructions demonstrate that each cell contains only one discontinuity from a 3D point of view, further supporting that the phenomenon is not due to random sample preparation damage.

      (4) In addition to EM, we performed live imaging of plasma membrane dynamics. While live imaging provides important temporal context, we recognize its limitations in resolving membrane ultrastructure. The rapid kinetics of membrane extension (approximately 20–30 seconds for metaphase and less than 3 minutes for cytokinesis), combined with embryo motility, introduces spatiotemporal ambiguities. To capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, nevertheless, both putative membrane ruptures (Figure 6A) and free-ended sister membrane structures could be detected (Figures 6B and 6C), providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos. Notably, 3D membrane dynamics analysis using light-sheet microscopy (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088). revealed membrane ruptures in dividing early C. elegans embryonic cells, including during telophase and metaphase. Therefore, live imaging further validates the membrane rupture phenomena in dividing embryonic cells in C. elegans

      We are confident that the cumulative evidence presented here addresses the reviewer's concerns and demonstrates that the observed membrane discontinuities, as well as cytoplasm-immersed membranes, are not procedural artifacts but rather reflect a previously underappreciated aspect of plasma membrane dynamics during embryonic cell division.

      Another critical issue concerns the detection of the membrane discontinuities in electron micrographs, which, in my opinion, is ambiguous. How do the authors reliably discriminate in their TEM images whether there is a plasma membrane or not? The absence - or weak appearance - of the stain of the electron dense material at membranes, which seems to be their criterion for MOs, is also apparent at other, intracellular membranes, like at the NE or at the ER (for example, see Figure 1C). Also, the plasma membrane itself appears unevenly stained in regions that the authors delineate as intact (for example, Figure 1C, 2B/1).

      We thank the reviewer for raising this important concern.

      First, our laboratory has extensive experience with electron microscopy across diverse biological systems, including neurons, muscle cells, and hypodermis in C. elegans, as well as tissues from Drosophila, mouse, bacteria, and cultured cells (Chen et al., 2013; Ding et al., 2018; Guan et al., 2022; Y. Li et al., 2018; Miao et al., 2024; Qin et al., 2014; Wang et al., 2026; J. Xu et al., 2022; M. Xu et al., 2021; L. Yang et al., 2020; X. Yang et al., 2019; Zhu et al., 2022). Importantly, we did not introduce any novel or unconventional steps in our EM preparation; all protocols were standard and well established. Thus, the observed membrane discontinuities are unlikely to result from technical inexperience or idiosyncratic methods.

      Second, because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Specifically, a membrane discontinuity was confirmed only after inspecting the entirety of the plasma membrane in neighboring sections. We will include this verification protocol in the revised Methods section, and additional images of consecutive sections can be provided if needed.

      Third, in addition to membrane discontinuities, a large number of single plasma membranes separating adjacent cytoplasmic domains were detected by EM (Figure 1, 3 and 4). This observation is particularly significant because the invagination model cannot account for the formation of single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes offers a plausible explanation for the generation of cytoplasm-immersed membranes. Furthermore, the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, indicating successful EM processing and arguing against potential issues such as inadequate fixation or other technical limitations.

      EM-related publications by Jingjing Liang:

      Chen D, Jian Y, Liu X, Zhang Y, Liang J, Qi X, Du H, Zou W, Chen L, Chai Y, Ou G, Miao L, Wang Y, Yang C. 2013. Clathrin and AP2 Are Required for Phagocytic Receptor-Mediated Apoptotic Cell Clearance in Caenorhabditis elegans. PLoS Genetics 9:e1003517. DOI: https://doi.org/10.1371/journal.pgen.1003517

      Ding L, Yang X, Tian H, Liang J, Zhang F, Wang G, Wang Y, Ding M, Shui G, Huang X. 2018. Seipin regulates lipid homeostasis by ensuring calcium‐dependent mitochondrial metabolism. The EMBO Journal 37:e97572. DOI: https://doi.org/10.15252/embj.201797572

      Guan L, Yang Y, Liang J, Miao Y, Shang A, Wang B, Wang Y, Ding M. 2022. ERGIC2 and ERGIC3 regulate the ER‐to‐Golgi transport of gap junction proteins in metazoans. Traffic 23:140–157. DOI: https://doi.org/10.1111/tra.12830

      Li Y, Zhang Y, Gan Q, Xu M, Ding X, Tang G, Liang J, Liu K, Liu X, Wang X, Guo L, Gao Z, Hao X, Yang C. 2018. C . elegans -based screen identifies lysosome-damaging alkaloids that induce STAT3-dependent lysosomal cell death. Protein & Cell 9:1013–1026. DOI: https://doi.org/10.1007/s13238-018-0520-0

      Miao Y, Du Y, Wang B, Liang J, Liang Y, Dang S, Liu J, Li D, He K, Ding M. 2024. Spatiotemporal recruitment of the ubiquitin-specific protease USP8 directs endosome maturation. eLife 13:RP96353. DOI: https://doi.org/10.7554/eLife.96353

      Qin J, Liang J, Ding M. 2014. Perlecan Antagonizes Collagen IV and ADAMTS9/GON-1 in Restricting the Growth of Presynaptic Boutons. Journal of Neuroscience 34:10311–10324. DOI: https://doi.org/10.1523/JNEUROSCI.5128-13.2014

      Wang Z, Zhang L, Zhou B, Liang J, Tian Y, Jiang Z, Tao J, Yin C, Chen S, Zhang W, Zhang J, Wei W. 2026. A single MYB transcription factor GmMYB331 regulates seed oil accumulation and seed size/weight in soybean. Journal of Integrative Plant Biology 68:470– 485. DOI: https://doi.org/10.1111/jipb.70101

      Xu J, Chen S, Wang W, Man Lam S, Xu Y, Zhang S, Pan H, Liang J, Huang Xiahe, Wang Yu, Li T, Jiang Y, Wang Yingchun, Ding M, Shui G, Yang H, Huang Xun. 2022. Hepatic CDP-diacylglycerol synthase 2 deficiency causes mitochondrial dysfunction and promotes rapid progression of NASH and fibrosis. Science Bulletin 67:299–314. DOI: https://doi.org/10.1016/j.scib.2021.10.014

      Xu M, Ding L, Liang J, Yang X, Liu Y, Wang Y, Ding M, Huang X. 2021. NAD kinase sustains lipogenesis and mitochondrial metabolism through fatty acid synthesis. Cell Reports 37:110157. DOI: https://doi.org/10.1016/j.celrep.2021.110157

      Yang L, Liang J, Lam SM, Yavuz A, Shui G, Ding M, Huang X. 2020. Neuronal lipolysis participates in PUFA‐mediated neural function and neurodegeneration. EMBO reports 21:e50214. DOI: https://doi.org/10.15252/embr.202050214

      Yang X, Liang J, Ding L, Li X, Lam S-M, Shui G, Ding M, Huang X. 2019. Phosphatidylserine synthase regulates cellular homeostasis through distinct metabolic mechanisms. PLOS Genetics 15:e1008548. DOI: https://doi.org/10.1371/journal.pgen.1008548

      Zhu J, Lam SM, Yang L, Liang J, Ding M, Shui G, Huang X. 2022. Reduced phosphatidylcholine synthesis suppresses the embryonic lethality of seipin deficiency. Life Metabolism 1:175–189. DOI: https://doi.org/10.1093/lifemeta/loac02

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their paper entitled "Alpha-Band Phase Modulates Perceptual Sensitivity by Changing Internal Noise and Sensory Tuning," Pilipenko et al. investigate how pre-stimulus alpha phase influences near-threshold visual perception. The authors aim to clarify whether alpha phase primarily shifts the criterion, multiplicatively amplifies signals, or changes the effective variance and tuning of sensory evidence. Six observers completed many thousands of trials in a double-pass Gabor-in-noise detection task while an EEG was recorded. The authors combine signal detection theory, phase-resolved analyses, and reverse correlation to test mechanistic predictions. The experimental design and analysis pipeline provide a clear conceptual scaffold, with SDT-based schematic models that make the empirical results accessible even for readers who are not specialists in classification-image methods.

      Strengths:

      The study presents a coherent and well-executed investigation with several notable strengths. First, the main behavioral and EEG results in Figure 2 demonstrate robust pre-stimulus coupling between alpha phase and d′ across a substantial portion of the pre-stimulus interval, with little evidence that the criterion is modulated to a comparable extent. The inverse phasic relationship between hit and false-alarm rates maps clearly onto the variance-reduction account, and the response-consistency analysis offers an intuitive behavioral complement: when two identical stimuli are both presented at the participant's optimal phase, responses are more consistent than when one or both occur at suboptimal phases. The frontal-occipital phase-difference result suggests a coordinated rather than purely local phase mechanism, supporting the central claim that alpha phase is linked to changes in sensitivity that behave like changes in internal variability rather than simple gain or criterion shifts. Supplementary analyses showing that alpha power has only a limited relationship with d′ and confidence reassure readers that the main effects are genuinely phase-linked rather than a recasting of amplitude differences.

      Second, the reverse-correlation results in Figure 3 extend this story in a satisfying way. The classification images and their Gaussian fits show that at the optimal phase, the weighting of stimulus energy is more sharply concentrated around target-relevant spatial frequencies and orientations, and the bootstrapped parameter distributions indicate that the suboptimal phase is best described by broader tuning and a modest change in gain rather than a pure criterion account. The authors' interpretation that optimal-phase perception reflects both reduced effective internal noise and sharpened sensory tuning is reasonable and well-supported. Overall, the data and figures largely achieve the stated aims, and the work is likely to have an impact both by clarifying the interpretation of alpha-phase effects and by illustrating a useful analytic framework that other groups can adopt.

      Weaknesses:

      The weaknesses are limited and relate primarily to framing and presentation rather than to the substance of the work. First, because contrast was titrated to maintain moderate performance (d′ between 1.2 and 1.8), the phase-linked changes in sensitivity appear modest in absolute terms, which could benefit from explicit contextualization. Second, a coding error resulted in unequal numbers of double-pass stimulus pairs across participants, which affects the interpretability of the response-consistency results. Third, several methodological details could be stated more explicitly to enhance transparency, including stimulus timing specifications, electrode selection criteria, and the purpose of phase alignment in group averaging. Finally, some mechanistic interpretations in the Discussion could be phrased more conservatively to clearly distinguish between measurement and inference, particularly regarding the relationship between reduced internal noise and sharpened tuning, and the physiological implementation of the frontal-occipital phase relationship.

      We appreciate the reviewer’s thoughtful and constructive feedback, particularly regarding clarity and framing. In response, we have made several revisions to improve transparency and contextualization throughout the manuscript.

      First, we now explicitly contextualize the relatively modest change in sensitivity by adding discussion of the contrast-titration procedure and its implications for effect size interpretation. Second, we address the coding error that led to unequal numbers of double-pass stimulus pairs across participants sooner in the manuscript by reporting the average number of pairs per participant in the Results (as well as the Methods), allowing for readers to interpret the results more appropriately. Third, we have provided additional detail, including precise stimulus timing parameters, electrode selection criteria, and a clearer explanation of the rationale for phase alignment in the Results (in addition to the Methods) section. Finally, we have revised portions of the Discussion to adopt more conservative language when interpreting our results, which more clearly distinguishes between empirical observations and mechanistic inferences, along with offering additional interpretations for the frontal-occipital phase relationship.

      We believe these revisions substantially improve the clarity, transparency, and interpretability of the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study of Pilipenko et al evaluated the role of alpha phase in a visual perception paradigm using the framework of signal detection theory and reverse correlation. Their findings suggest that phase-related modulations in perception are mediated by a reduction in internal noise and a moderate increase in tuning to relevant features of the stimuli in specific phases of the alpha cycle. Interestingly, the alpha phase did not affect the criterion. Criterion was related to modulations in alpha power, in agreement with previous research.

      Strengths:

      The experiment was carefully designed, and the analytical pipeline is original and suited to answer the research question. The authors frame the research question very well and propose several models that account for the possible mechanisms by which the alpha phase can modulate perception. This study can be very valuable for the ongoing discussion about the role of alpha activity in perception.

      Weaknesses:

      The sample size collected (N = 6) is, in my opinion, too small for the statistical approach adopted (group level). It is well known that small sample sizes result in an increased likelihood of false positives; even in the case of true positives, effect sizes are inflated (Button et al., 2013; Tamar and Orban de Xivry, 2019), negatively affecting the replicability of the effect.

      Although the experimental design allows for an accurate characterization of the effects at the single-subject level, conclusions are drawn from group-level aggregated measures. With only six subjects, the estimation of between-subject variability is not reliable. The authors need to acknowledge that the sample size is too small; therefore, results should be interpreted with caution.

      Conclusion:

      This study addresses an important and timely question and proposes an original and well-thought-out analytical framework to investigate the role of alpha phase in visual perception. While the experimental design and theoretical motivation are strong, the very limited sample size substantially constrains the strength of the conclusions that can be drawn at the group level.

      Bibliography:

      Button, K., Ioannidis, J., Mokrysz, C. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14, 365-376 (2013). https://doi.org/10.1038/nrn3475

      Tamar R Makin, Jean-Jacques Orban de Xivry (2019) Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript eLife 8:e48175 https://doi.org/10.7554/eLife.48175

      We thank the reviewer for their supportive remarks on our design and analysis, and for raising this important statistical concern about our sample size (n=6). Our choice of a small sample size was driven by methodological considerations. Specifically, our reverse correlation analysis requires a large number of trials per participant, as it estimates perceptual tuning by regressing behavioral responses against fluctuations in the energy of stimulus features (orientation and spatial frequency). This approach, as well as the computation of signal detection theory (SDT) metrics such as d′ and criterion, depends on high trial counts to obtain reliable estimates, particularly given that our analysis further subdivides trials across eight phase bins. For this reason, we prioritized collecting a large number of trials per participant (∼5,000), which is consistent with established practices in psychophysical research.

      Importantly, our approach means that our design is reliable on the individual level, which motivated us to include a new binomial probability testing in our revised paper. This binomial test helps address concerns about the generalizability of our results. Binomial testing considers each participant as an independent replication of the effect and then computes the p-value associated with the probability of having observed the given number of statistically significant participants by chance, with a false positive rate of 0.05. In our data, 3 out of 6 participants showed significant effects, which corresponds to a probability of 0.002 of having observed these effects by chance alone. We believe this converging evidence supports the replicability and generalizability of our results. To improve the transparency of the single-subject data, we have included single-participant results in the Supplemental Materials to allow readers to directly assess the consistency of effects across individuals and to better contextualize between-subject variability.

      Thank you again for your suggestions, we believe that these additions have greatly improved our manuscript by demonstrating the robustness of our findings and increasing the transparency of our single-subject results.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The issue of generalizability arose during the review process, as your results are based on a small sample of participants who undertook a very large number of trials. In the revised version, it would be useful to discuss why this approach is valid, especially in the context of linking EEG with modeling (i.e., why it is more powerful than having many participants with fewer trials), and the extent to which your results can generalize to the population.

      We sincerely appreciate all of the helpful comments provided by the reviewers and hope we can address the concerns of our experimental approach. In the introduction, we have emphasized the importance of our current small sample size design, which allows us to reliably compute our signal detection theory metrics across 8 phase bins in addition to including the reverse correlation analysis. In the methods section, we have added a description of the binomial probability statistical framework, which addresses the generalizability of our results. In this framework, each participant is viewed as an independent replication and the p-value reflects the probability of having observed the number of individually significant subjects from the total sample size by chance. In this regard, observing a significant effect in 3 out of 6 participants (as in our study) from chance alone has a 0.002 probability, which we believe is unlikely and instead reflects a true effect present in the general population.

      Below I have copied our changes in the introduction and methods sections.

      “... in a large number of trials (6,020 per observer, n = 6) across multiple EEG sessions. This approach ensures a sufficient number of trials in order to reliably compute signal detection theory (SDT) metrics across multiple alpha phase bins while also affording enough statistical power for reverse correlation analysis (Xue et al., 2024), making it preferred over having a larger sample size with fewer trials.”

      “Additionally, we used a binomial probability testing framework that is designed for small sample sizes and treats each participant as an independent replication. As such, it computes the probability of having observed the number of statistically significant outcomes by chance given our sample size (Schwarzkopf & Huang, 2024).”

      Reviewer #1 (Recommendations for the authors):

      My suggestions are intended to be light-touch and focused on strengthening the clarity and durability of the Reviewed Preprint rather than on additional experimentation or major new analyses.

      (1) Limitation statement for the double-pass coding error:

      Add a short statement in the Methods or Results acknowledging that the coding error led to markedly fewer repeated stimulus pairs for the first three participants than for the last three. For the response-consistency result in Figure 2E, a simple acknowledgement that the available evidence is stronger for some participants than others will help readers calibrate their confidence without detracting from the main story.

      Thank you for this suggestion, we have now added a statement to this effect in the Results section, in addition to the description already mentioned in the Methods section.

      “To examine this, we implemented a double-pass stimulus presentation (~600 stimulus pairs for participants 1-3 and ~2,500 pairs for participants 4-6) and analyzed participant’s response consistency (Xue et al., 2024) to two identical stimuli.”

      (2) Contextualizing the titrated performance level:

      In the Discussion, explicitly note that contrast was titrated to keep d′ between approximately 1.2 and 1.8, which intentionally maintains moderate performance. This contextualization will help readers understand that while the phase-linked changes appear modest in absolute terms, they are mechanistically informative within this design.

      Thank you, we have included a sentence to the Discussions speaking to this point.

      “We also note that the observed modulation of d’ between optimal and suboptimal phases was relatively modest in absolute terms (0.21) in our study and could therefore require many trials per subject to detect. Two reasons for this modest effect size could be related to specific features of our task design. First, we titrated stimulus contrast to maintain consistent task performance. This titration could have reduced the magnitude of the phase effect on d’ that would otherwise be apparent if the stimulus intensity were kept constant. Additionally, the use of (relatively) high-contrast random noise likely means that trial-to-trial variability in perception is largely driven by random fluctuations in the noise properties and, to a lesser extent, internal brain state. Although both of these choices were necessary to perform SDT and reverse correlation analysis, they differ from many previous studies investigating alpha phase using only near-threshold detection in the absence of external noise and may contribute to an underestimation of the true effect size.”

      (3) Methods clarifications:

      (a) Replace placeholder text such as "{plus minus}" and "{degree sign}" with the appropriate symbols, and ensure that any equations implied in the reverse-correlation section are fully present.

      Thank you for bringing this to our attention, these placeholder texts are an artifact of the conversion process and we will correct this.

      (b) State explicitly that the 8 ms stimulus duration corresponds to a single frame on your 120 Hz display, which will clarify the timing in Figure 1A and the pre-stimulus windows in the phase analyses.

      Thank you, we have added language to both the Method and Results sections explicitly indicating that the 8 ms stimulus choice corresponds to a single screen refresh. Additionally, we changed the text in Figure 1A to include inter-trial interval timing (as opposed to merely saying “Start Trial”):

      “(A) Task design. Each trial contained a brief, filtered-noise stimulus (8 ms; one screen refresh) presented to the right or left of fixation with equal probability.”

      “Each participant (n = 6) completed 5-6 EEG sessions of a Yes/No detection paradigm whereby participants reported the presence or absence of a brief (8 ms; one screen refresh) vertical Gabor target (2 cycles per degree) with concurrent confidence judgments (see Figure 1A), along with an additional imagination judgement (reported in the supplemental materials).”

      (c) In the description of the post-stimulus taper, consider phrasing the rationale in terms of minimizing contamination from evoked responses rather than asserting that the taper ends before the earliest evoked response, which keeps the argument correct without committing to a precise latency boundary.

      Thank you for this suggestion. We have changed our rationale for the taper to “minimizing”, rather than avoiding, the evoked response.

      “This resulted in the post-stimulus data being flat after 70 ms, which is intended to minimize the evoked response in our data.”

      (4) Analysis transparency:

      (a) In the description of posterior electrode selection, explicitly note that channels were chosen solely on the basis of alpha power, independent of behavioural performance, and that the same electrodes were used for each participant across sessions.

      We have gladly made this clarification to the methods.

      “This was individually determined by rank-ordering 17 of the posterior channels (Pz, P3, P7, O1, Oz, O2, P4, P8, P1, P5, PO7, PO3, POz, PO4, PO8, P6, and P2) and algorithmically choosing the three with the highest power. This ensured that electrode selection was made independent of performance and instead was based upon maximizing alpha signal strength.”

      (b) Describe the phase-alignment step used to center each participant's optimal bin before group averaging as a device for visualization and summary, and clarify that inferential statistics are based on the underlying, non-aligned data as appropriate. This will reassure readers who are cautious about circularity.

      We agree that this should be made more explicit throughout the manuscript and have added statements clarifying this aspect in the Figure 2B caption, the Results, and Method sections.

      “The data have been aligned across participants so that each individual's highest d’ was assigned to bin 8 (omitted from the plot), with the remaining data circularly shifted, and is averaged across -450 ms to stimulus onset. This graph is for visualization purposes only. Error bars represent ± 1 SEM. The pattern shows a clear phasic modulation of d’ across bins.”

      “... requiring us to phase-align the performance data across participants in order to visualize the underlying phasic effects. To this end, we aligned all metrics (d’, c, HR, and FAR) by circularly shifting the data so that the bin with the highest d’ was assigned to bin 8, which was then omitted from further visualizations.”

      “Bin 8 was then omitted from further visualizations. The shifted data were then averaged across all time points from -450 ms to 0 ms, based on significant effects at the group level, and averaged across participants. No statistics were conducted on these shifted variables and instead are for visualization purposes only.”

      (c) Add a short note on the number of permutations and the cluster-forming threshold in the phase-coupling analyses, if not already stated in the Results or captions, to complete the description of your non-parametric testing procedure.

      Thank you, we agree that reiterating this information in the Results section is helpful for the reader to clarify the analysis procedure.

      “After smoothing the resultant vector length over time with a 50 ms moving average, we compared the observed vector lengths to a permuted threshold (95th percentile of 1,000 permutations) at each time point from –700 to 0 ms and performed cluster correction (95th percentile of the permuted cluster size) to account for multiple comparisons.”

      (5) Discussion framing:

      Make one or two small adjustments to your mechanistic phrasing so that the distinctions between measurement and interpretation are fully explicit:

      (a) State that the combination of phase-d′ coupling, counterphased hit and false-alarm rates, response consistency, and phase-dependent classification images is "consistent with" a reduction in effective internal noise and sharper estimated tuning at optimal alpha phase, within the assumptions of your SDT and reverse-correlation framework.

      Thank you for this suggestion. We have changed the language in the discussions to reflect this framing and interpretation of the results.

      “Moreover, our data are consistent with a model in which the variability of internal responses changes systematically across the alpha cycle, as reflected in the inverse relationship between hit rate and false alarm rate.”

      (b) Emphasize that reduced effective internal noise and sharpened sensory tuning are two complementary descriptions of a better match between sensory evidence and decision template rather than fully separable mechanisms.

      Thank you, we have added this language for clarity of our interpretation.

      “Together with decreases in the variance of sensory tuning during the optimal phase, our results suggest that alpha phase impacts sensitivity by shaping trial-to-trial variation in internal noise during perceptual decision making, leading to better matches between sensory evidence and decision templates as opposed to a change in the gain of internal sensory responses.”

      (c) Note that the frontal-occipital phase relationship is consistent with a coordinated, possibly top-down component to the alpha-phase effect, while remaining agnostic about the precise physiological implementation.

      Thank you for raising this additional interpretation. We have added this as a plausible alternative to the single-source account in the Discussion section.

      “Moreover, our results suggest that prior literature reporting phasic effects in the alpha-band range from both frontal and occipital regions may plausibly be reporting the same effect from a single projected dipole source; however, these results are also consistent with two synchronized alpha sources which are anti-phase.”

      Reviewer #2 (Recommendations for the authors):

      Major issues:

      Given that collecting more data may not be doable, the authors should take some actions to test the reliability of their results. For instance, simulations could be run to test the robustness of the results with such a small sample size (Zoefel, 2019). It would also be of interest to include in the report statistics and plots at the individual level, not only the aggregates. It is also important to report which electrodes were used in the analysis for each of the subjects, in the Methods section, it is clearly stated that these electrodes differed between subjects.

      Thank you for these suggestions. To assess the reliability of our results at the single-subject level, we have included a new binomial probability test which is a framework suitable for small sample size experiments with large trail numbers (Schwarzkopf & Huang, 2024). Binomial testing views each individual as an independent replication and considers the probability of having observed the number of significant participants given the total number tested participants, and outputs the probability of having observed the results by chance. We believe this framework adequately addresses the reviewer’s concern of generalizability in addition to being well-suited to the design of our study.

      To assess individual significance, we averaged the resultant vector length and permutations over the analysis window from -450 to 0 ms. If the resultant vector length exceeded the permutation for that participant, then they were considered to be a significant participant. In total, 3 out of 6 participants (participants 1, 4, and 5) showed significant d’ coupling. The binomial probability (equivalent to a p-value) of having observed this outcome as a result of three false positives at the individual-subject level is very small (p = 0.002), which is sufficiently low for psychological studies.

      Below is the text which we have added to the Results and Methods sections.

      “To interrogate the robustness of our findings at the single-subject level, we adopted a test of binomial probability, which is a statistical framework that treats each individual as an independent replication and is ideal for small sample size studies that utilize a large number of trials per observer (Schwarzkopf & Huang, 2024). For our data, we assessed individual significance by averaging the actual and permuted resultant vector lengths across time (-450 to 0ms) and comparing the real vector length to the 95% percentile of the permuted datasets. With this approach, 3 out of 6 participants showed significant d’-phase coupling which corresponds to a binomial probability of p = 0.002, indicating a very low probability that we observed these results by chance alone.”

      “Additionally, we used a binomial probability testing framework that is designed for small sample sizes and treats each participant as an independent replication. As such, it computes the probability of having observed the number of statistically significant outcomes by chance given our sample size (Schwarzkopf & Huang, 2024). To assess significance at the participant level, we averaged the participant’s resultant vector length and permutations from -450 to 0 ms and obtained the 95th percentile of the time-averaged permutations. We then compared the averaged resultant vector lengths to the permutation thresholds for each subject, which revealed 3 out of 6 significant subjects. We then used the MATLAB function myBinomTest.m (Nelson, 2026) to compute the p-value associated with the probability of having observed 3 out of 6 significant subjects by chance (with a false-positive rate of 0.05).”

      To address the reviewer's second request, we now include a supplemental figure which has each individual’s results for the main analysis (see Supplementary figure 3). These graphs, in addition to the methods, now provides the reader with each participant’s given set of analysis electrodes.

      “Each participant had a different combination of electrodes which were used in the analyses; however, the same three channels were used across sessions within a participant (participant 1: POz, PO3, O1; participant 2: P7, PO7, PO4; participant 3: P2, P1, Pz; participant 4: O1, Oz, O2; participant 5: O2, PO8, PO4; participant 6: Oz, O2, O1).”

      As an alternative approach, linear mixed models (LMM) could be used for statistics, as they are more suitable for small sample sizes (Wiley et al., 2019). LMM improve generalization by modelling subject-specific random effects. Although raw circular data is not suitable for LMM, the sine and cosine of the phases could have been used as predictors, for instance. Given that data were collected for 6 different sessions, sessions could be included as a factor in the model to improve statistical power.

      We appreciate the suggestion but feel that LMMs would be a challenge in this case not only because the main predictor variables are circular, but because the main outcome variables are not defined on the single-trial level and require many trials to be computed (e.g., classification images, SDT measures, response consistency). As such, computing these measures within a session may also lead to noisier estimates than we had designed our experiment for. We therefore prefer the more straightforward approach we have taken in the paper, which has now been supplemented by a binomial test of individual-subject level significance.

      Given that the number of subjects is quite small, I believe that individual data should be presented (either in the main text or supplementary materials) also for figures: 2A, B, C and D.

      Thank you, we have included all of these results to the individual graphs in the Supplemental Materials (see Supplementary figure 3).

      In plot 2B (HR and FAR) a p-value = 0.015 appears. However, in the text you write:

      "Indeed, this showed that the difference between the HR and FAR vector angle was significantly clustered around a mean of 180{degree sign} (v = 3.78, p = 0.01), indicating that the phase angle associated with the greatest hits was counterphase to the phase angle associated with the greatest false alarms."

      Which one is correct? Or do they refer to different tests?

      We appreciate you catching this confusing discrepancy. The two values refer to the same test which has a p-value of 0.0145. In the figure, this value was rounded to the thousandths decimal place (i.e., 0.015), whereas in the text it was rounded to the hundredths value (0.01). We now consistently report p-values out to three decimal places throughout the manuscript.

      Did you perform any statistical test for phasic modulation of dprime and criterion? I say that because in Figure 2B, you state that the data shows a "clear phasic modulation of d' across bins", but no statistic is mentioned. On the other hand, in Figure 2D, you state, "We did not & observe any significant phase-dependent relationship between phase and criterion." Is this sentence referring to both 2C and 2D panels or only to 2C?

      Figure 2B and 2D show the phase-behavior relationship across bins after aligning the phase bins to each participant's “best” d’ bin. This bin is omitted from the plots because it is used for alignment, making the analysis circular. Accordingly, these panels were intended purely for visualization and were not used for statistical inference. Additional language has been added to the figure caption highlighting this aspect.

      “The data have been aligned across participants so that each individual's highest d’ was assigned to bin 8 (omitted from the plot), with the remaining data circularly shifted, and is averaged across -450 ms to stimulus onset. This graph is for visualization purposes only.”

      The primary statistical test for phase-behavior coupling was performed using permutation testing of the resultant vector length, which quantifies the magnitude of phase-dependent modulation. These results are shown in Figures 2A (for d′) and 2C (for criterion). In the original manuscript, we reported only the time points that survived cluster-based correction, but did not explicitly report the cluster p-values. We have now added these cluster p-values to the manuscript for completeness.

      “The data revealed significant cluster-corrected coupling between alpha phase and d’ in the prestimulus window from -220 ms until stimulus onset (cluster p = 0.046),...”

      Additionally, we have changed the caption of Figure 2 to be separate for C) and D).

      “(C) No evidence for the coupling of criterion to pre-stimulus alpha-band phase. Graph C reveals the time course of the resultant vector lengths for alpha phase-criterion coupling, which shows no significant phase-dependent relationship between phase and criterion.

      (D) The underlying shifted c across phase bins (shifted to participants’ optimal phase, as in graph B) did not visually demonstrate a phasic modulation pattern.”

      Minor issues:

      In general, the paper is very clear. I found a statement confusing in the Response consistency section:

      "To quantify response consistency, we computed the proportion of trials in which participants provided the same response across the two identical trials. This procedure was done for each channel at each time point (from -450 to 0 ms) and then averaged."

      Which makes no sense, as response consistency is independent of channel and time point. I believe here you refer to the phase, maybe by just changing the order (start with response consistency and then proceed to phase), the paragraph would be clearer.

      We appreciate you catching this mistake. We have clarified the Methods section in the following way:

      “To quantify response consistency, we computed the proportion of trials in which participants provided the same response across the two identical trials. Since the optimal phase changes over time, the set of trials were classified as either both having occurred during the optimal phase (or otherwise) for each time point (from -450 to 0 ms) and channel. The proportion of consistent responses was then averaged across channels and time.”

      Could you include a plot of the power spectrum used for IAF estimation of all the subjects?

      Thank you for the suggestion. In Supplemental Figure 3 we have included the power spectrum that was used to estimate IAF in addition to a topoplot of alpha power (IAF +/- 2 Hz) that has the analysis electrodes labelled.

      Bibliography:

      Wiley RW, Rapp B. Statistical analysis in Small-N Designs: using linear mixed-effects modeling for evaluating intervention effectiveness. Aphasiology. 2019;33(1):1-30. doi: 10.1080/02687038.2018.1454884.

      Zoefel B, Davis MH, Valente G, Riecke L, How to test for phasic modulation of neural and behavioural responses, NeuroImage, Volume 202, 2019,116175, https://doi.org/10.1016/j.neuroimage.2019.116175.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      Despite this compelling data regarding the protective role of HSF1 in the febrile response, what remains unexplained and complicates the authors' model is the observation that losing LvHSF1 at 'normal' temperatures of 25 ℃ is not detrimental to survival, even though viral loads increase and nSWD is likely still subject to LvHSF1 regulation. These observations suggest that WSSV infection may have other detrimental effects on the cell not reflected by viral load and that LvHSF1 may play additional roles in protecting the organism from these effects of WSSV infection, such as perhaps, perturbations to protein homeostasis. This is worth discussing, especially in light of the rather complicated roles of hormesis in protection from infection, the role of HSF1 in hormesis responses, and the findings from other groups that the authors discuss.

      We are grateful for your unbiased advice by reviewer. And we have added the description about the role of HSF1 in hormesis responses in discussion in Lines 422-425 in the revised manuscript. Thank you.

      Reviewer #2 (Public review):

      Temperature is a critical factor affecting the progression of viral diseases in vertebrates and invertebrates. In the current study, the authors investigate mechanisms by which high temperatures promote anti-viral resistance in shrimp. They show that high temperatures induce HSF1 expression, which in turn upregulates AMPs. The AMPs target viral envelope proteins and inhibit viral infection/replication. The authors confirm this process in drosophila and suggest that there may be a conserved mechanism of high-temperature mediated anti-viral response in arthropods. These findings will enhance our understanding of how high temperature improves resistance to viral infection in animals.

      The conclusions of this paper are mostly well supported by data, but some aspects of data analysis need to be clarified and extended. Further investigation on how WSSV infection is affected by AMP would have strengthened the study.

      We are grateful for your unbiased advice by reviewer. We have provided additional experimental evidence and supplementary instructions in the revised manuscript. Thank you.

      Reviewer #3 (Public review):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved.

      We are grateful for the positive comments and the unbiased advice by reviewer. We have improved the logical flow of the paper and added corresponding instructions in the revised manuscript. Thank you.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: The analysis compares Group TW to Group W (not the other way around).

      Thank you very much. To uncover the molecular mechanisms by which high temperature restricts WSSV infection, two shrimp groups, Group TW and Group W, were cultured at 25 °C. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing (Figure 1A). RNA-seq was used to identify genes responsive to high temperature, particularly those encoding potential transcriptional regulators. Thank you.

      (2) The RNA-seq data in Figure 1 focus only on the TFs. The manuscript would benefit from showing all the RNA-seq data and the differentially expressed genes. In particular, are the AMPs upregulated at the same time point? This should not be the case if LvHSF1 were responsible for the transcription of the AMPs, given the time lag between transcription and translation.

      Thank you for your suggestion. In Author response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024).

      Additionally, we also analyzed the AMPs expression between Group TW and Group W, and the results show that some antimicrobial peptides such as Lysozyme and C-type lectin are upregulated between Group TW and Group W. Notably, we did not detect upregulated expression of SWD between Group TW and Group W. We agree with the reviewer's point of view that there is a time lag between transcription and translation. Supplementary experimental evidences show that the expression level of LvHSF1 is strongly induced by WSSV stimulation, and then the expression level of SWD begins to increase. We have added a description in Lines 136-138 in the revised manuscript.

      Author response image 1.

      The Figure of the heat shock proteins in Group TW and Group W

      Author response image 2.

      Transcriptional expression levels of HSF1 and SWD after WSSV stimulation

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (3) The data showing the tissue distribution of LvHSF1 and nSWD is a rigorous approach and adds to the manuscript. A similar approach to understanding the time course of expression of AMPs in relationship to LvHSF1 expression levels would strengthen the authors' conclusions that LvHSF1 induction in response to high temperatures and viral infection, in turn, upregulates SWD and other antibacterial genes.

      Thank you for your suggestion. As you good suggestion, we detected the transcriptional expression levels of HSF1 and SWD after WSSV stimulation for 0, 2, 4, 6, 8, 12, 16, 20, and 24 hours. The transcriptional expression level of SWD was set to 1.00 at 0 h, in the early stage of WSSV infection (0-12 h, except 6 h), the expression level of LvHSF1 is strongly induced, and then the expression level of SWD begins to increase. Theses results show that LvHSF1 induction in response to viral infection, in turn, upregulates SWD and other antibacterial genes. Thank you.

      (4) The data (Figures 3 and 4) show that LvHSF1 is necessary to survive WSSV infection at high temperatures but does not affect survival at lower temperatures, even though LvHSF1 limits VP28 levels, and viral load at both temperatures is confusing. Does this suggest that LvHSF1 is not primarily important for protection against the virus but instead, for protection from the heat-induced damage caused by high temperatures, which would not be surprising? The manuscript would benefit if the authors could address this point. How do the authors envision the protection conferred by LvHSF1 only at high temperatures?

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection.

      Notably, the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. When infected with WSSV, shrimp use behavioral fever to elevate their body temperature (~32 °C), thereby inhibiting WSSV infection (Rakhshaninejad et al., 2023; Xiao et al., 2024). And this temperature (~32 °C) will not cause heat-induced damage to the shrimp. Our results demonstrate that febrile temperatures induce HSF1, which in turn upregulates antimicrobial peptides (AMPs) that target viral envelope proteins and inhibit viral replication.

      Only at high temperatures, we observed that knockdown of HSF1 did not affect shrimp survival rate (Figure 4A). Thank you again for your valuable feedback.

      Reference:

      Rakhshaninejad, M., Zheng, L., Nauwynck, H., 2023. Shrimp (Penaeus vannamei) survive white spot syndrome virus infection by behavioral fever. Sci Rep 13, 18034.

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (5) Related to the previous comment, the authors do not clearly distinguish between basal effects of LvHSF1 or nSWD induction and heat-induced effects and the differences related to the requirement of LvHSF1 for protection. Simply increasing LvHSF1 levels can result in increased nSWD. SWD levels increase upon WSSV infection even at 25 ℃, and the knockdown experiments suggest that this could also occur through LvHSF1. It would be useful to explicitly differentiate between basal functions of HSF1 and induced functions.

      Thank you for your suggestion. In previous responses, we have distinguished between basal effects of LvHSF1 or nSWD induction and heat-induced effects.

      As your good suggestion, we injected GST or rHSF1 protein into shrimp, the results showed that recombinant protein HSF1 could significantly induced the expression level of SWD (Supplementary Fig. 5C). Further, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 253-255 and Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      Reviewer #2 (Recommendations for the authors):

      (1) Two temperatures are used in the experiments of shrimp. It seems that HSF1 is also upregulated by WSSV infection at 25 ℃. However, this upregulation seems not to be able to protect the animals. The authors compare the infection at 25 and 32 ℃ but did not discuss the findings.

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection. We have added a discussion of this finding in Lines 461-464 in the revised manuscript. Thank you.

      (2) In the abstract the authors say that "These insights provide new avenues for managing viral infections in aquaculture and other settings by leveraging environmental temperature control." However, this point has not been discussed in the main text.

      We appreciated your comments. We have added a discussion about the environmental temperature control in Lines 512-514 in the revised manuscript. Thank you.

      (3) Line 142: "These results suggest that LvHSF1 may play a key role in enhancing shrimp resistance to WSSV at elevated temperatures." Although this type of conclusion has been made in many studies, I think it is impossible to see a "KEY role" based mainly on change in expression.

      Thank you for your suggestion. We have revised this conclusion in the revised manuscript. Thank you.

      (4) Section 2.1 Induction of Heat Shock Factor 1 in Response to WSSV at High Temperature

      Figure 1. Identification of HSF1 as a key factor induced by high temperature.

      The two titles are confusing. Whether the upregulation of HSF1 is a response to high temperature or WSSV infection? I think it is more likely a response to high temperature. Did the authors see the difference in HSF1 expression in shrimp with and without WSSV infection at high temperatures?

      Thank you for your comment. We have modified the title of Section 2.1 in the revised manuscript. As your good suggestion, we have measured the expression of LvHSF1 after WSSV challenge at high temperatures (32 ℃) in revised Figure 2F-2H in Line 122 in the revised manuscript. The results demonstrate that the expression of LvHSF1 is strongly induced by WSSV stimulation at high temperatures (32 ℃) in the revised manuscript. Thank you.

      (5) Figure 2. Upregulation of LvHSF1 in shrimp challenged by WSSV at both low and high temperatures. Results for WSSV challenge at high temperatures are not included in this figure.

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. The results demonstrate that the expression of LvHSF1 is strongly induced by Poly (I: C) and WSSV stimulation at high temperatures (32 ℃). And we have added a description in Lines 168-179 in revised manuscript. Thank you.

      (6) Section 2.2 Expression Profiles of LvHSF1 in Shrimp Under Varied Temperature Conditions and WSSV Challenge. Did the authors try poly IC and WSSV challenge at 32℃, and compare with the un-challenge group? Why were only low temperature was analyzed?

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. And we have added a description about the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in Lines 168-179 in revised manuscript. Thank you.

      (7) Figure 2: Please indicate the temperature used in C-E and F-H in the figure legend. Statistical significance: compared with which group? Please provide information in the legend or show it in the bar chart.

      Thank you for your suggestion. We have added the description of temperature used in revised Figures 2C-2E. The expression changes of HSF1 were compared with those of PBS control group at the corresponding time and we modified the comparison method of significance in revised Figures 2C-2E. Thank you.

      (8) Figure 3H: There are two groups (dsGFP+PBS; dsHSF1+PBS) showing with the same symbol (dot line).

      Thank you for your comment. The revised Figure 3H has used different symbols to distinguish the two groups. Thank you.

      (9) Line 205: qPCR

      Thank you for your careful checks. We have corrected this error in the revised manuscript. Thank you.

      (10) Figure 5d and f: Please indicate the sample in each row.

      Thank you for your suggestion. We have marked the samples in each row in the revised Figures 5d&5f.

      (11) Figure 3 and Figure 4: Why different tissues were analyzed in the two experiments? Low temperature: gill and hemocytes. High temperature: gill and muscle? It is better to use the same tissues so that they can be compared. Please indicate the tissue analyzed in D and d.

      Thank you for your suggestion. We have repeated the experiment to detect the copy number of WSSV in hemocyte at high temperature (32 °C) after LvHSF1 knockdown. The results showed that knockdown LvHSF1 showed increased viral loads in shrimp hemocyte (Figure 4C). We have supplemented the tissue information in Figure 4D&4d. Thank you.

      (12) Figure 2A The time for temperature treatment? hours or days?

      Thank you for your comment. Transcriptional expression of LvHSF1 in different tissues of healthy shrimp subjected to low (25 °C) and high (32 °C) temperatures for 12 hours. We have supplemented this information in the legend of Figure 2A in Lines 840-841 in revised manuscript. Thank you.

      (13) Line 249: purified by SDS-PAGE gel?

      Thank you for your comment. We have modified this description in Lines 272-274 in current manuscript. Thank you.

      (14) Line 258 "Next, to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperature". I think it is confusing to use "mediated" here. It seems that HSF1 is downstream of nSWD. Actually, HSF1 controls the expression of nSWD and thus regulates the anti-WSSV effect of shrimp at high temperatures.

      We appreciated your comments. We have modified this description in Lines 282-283 in current manuscript. Thank you.

      (15) Line 458 "The most probable anti-WSSV mechanism of nSWD is its direct interaction with WSSV envelope proteins VP24 and VP26, potentially inhibiting viral entry into target cells. I suggest the author analyze the entry of WSSV to see whether nSWD blocks this process.

      Thank you for your comment. In general, the antimicrobial mechanism of action of AMPs is thought to involve direct membrane disruption, especially for enveloped virus (such as WSSV) (Wilson et al., 2013).

      Thanks to the reviewers for their valuable comments. Our manuscript mainly focuses on the febrile temperature-inducible HSF in host antiviral immunity, and the role of HSF1 in regulating antimicrobial effectors (such as SWD). Due to the limitation of the manuscript's length, we will further investigate the functional mechanisms of SWD-specific anti-WSSV in future studies. Thank you.

      Reference:

      Wilson, S.S., Wiens, M.E., Smith, J.G., 2013. Antiviral Mechanisms of Human Defensins. Journal of Molecular Biology 425, 4965-4980.

      (16) Line 435-456 The author discusses the difference between two shrimp species. Did the two studies measure the same immune parameters? I wonder whether the different observation is due to true differences or different methods they used to evaluate the response. If no immune response was promoted in the previous study, what's the possible anti-viral mechanism?

      We appreciated your comments. Firstly, the shrimps in the two experimental groups have different adaptability to temperature. The optimal water temperature for M. japonicus growth ranges from 25 to 32 °C, and the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. Secondly, the experimental environmental factors are different in the two experimental groups. Ammonia is a key stress factor in aquatic environments that usually increases the risk of pathogenic diseases in aquatic animals, however, High temperatures (32°C) have been shown to inhibit the replication of WSSV and reduce mortality in WSSV-infected shrimp. Thirdly, the two studies tested different immune indicators. Ammonia-induced Hsf1 suppressed the production and function of MjVago-L, an arthropod interferon analog. In this study, our findings revealed the molecular mechanism through which the HSF-AMPs axis mediates host resistance to viruses induced by febrile temperature. Taken together, the benefits of HSF1 can be attributed to either the host or the pathogen, depending on the nature and context of the host-virus-environment interaction.

      (17) Line 472 "directly bind to WSSV envelope proteins and inhibit WSSV proliferation"

      I think it is confusing to use "proliferation" here. It seems that the binding of HSF affects the replication process. However, based on the authors' discussion, HSF may likely block viral entry.

      Thank you for your suggestion. We have modified this description in Lines 505-507 in the current manuscript. Thank you.

      Reviewer #3 (Recommendations for the authors):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved. Following are my specific concerns.

      Major comments

      (1) The study design is pretty good, but the logical flow is not. The following should be improved.

      (a) In Figure 1, the reason for selecting HSF1 as the focus of the study is not clearly explained.

      Thank you for your comment. In a previous study, we have revealed that heat shock proteins exerted a significant role in enhancing the resistance of shrimp to WSSV at elevated temperature (32 ℃) (Xiao et al., 2024). GO functional enrichment analysis of DEGs between group TW and group W, indicating that most DEGs were involved in biological processes such as protein refolding, chaperone-mediated protein folding, and heat response. Therefore, special attention has been paid to heat shock factor 1 (HSF1), the master regulator of the heat shock response. We have added the description in Lines 136-138 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (b) As the authors draw models in Figure 9, the established activation mechanism of HSF1 is via trimerization by the release of HSP90, which binds to misfolded proteins under stress conditions, such as heat shock. Therefore, the increase in the HSF1 mRNA level in Figure 1 is strange. The authors need to clarify this issue by explaining this established activation mechanism of HSF1 and also must provide the basis of upregulation of HSF1 by mRNA increase via citing papers in the Introduction.

      We appreciated your comments. Under non-stress conditions, HSF monomers are retained in the cytoplasm in a complex with HSP90. During the stress response, such as high temperature, HSF dissociates from the complex, trimerizes, and converts into a DNA-binding conformation through regulatory upstream promoter elements known as heat shock elements (HSEs) (Andrasi et al., 2021). Previous studies have demonstrated that the expression of HSF1 was remarkably induced by stress response, such as high temperature (Ren et al., 2025), virus infection (Merkling et al., 2015), and ammonia stress (Wang et al., 2024). Our results also showed that the expression of LvHSF1 was significant induced by WSSV infection and high temperature (Figure 2). Therefore, this is not surprising that the increase in the HSF1 mRNA level in Figure 1.

      In response, we have revised the proposed model to better reflect our experimental findings and the accompanying description. This revision ensures that the schematic is consistent with our data and accurately represents the proposed mechanism. We appreciate your careful review and constructive feedback.

      Reference:

      Andrasi, N., Pettko-Szandtner, A., Szabados, L., 2021. Diversity of plant heat shock factors: regulation, interactions, and functions. J Exp Bot 72, 1558-1575.

      Ren, Q., Li, L., Liu, L., Li, J., Shi, C., Sun, Y., Yao, X., Hou, Z., Xiang, S., 2025. The molecular mechanism of temperature-dependent phase separation of heat shock factor 1. Nature Chemical Biology.

      Merkling, S.H., Overheul, G.J., van Mierlo, J.T., Arends, D., Gilissen, C., van Rij, R.P., 2015. The heat shock response restricts virus infection in Drosophila. Sci Rep 5, 12758.

      Wang, X.X., Zhang, H., Gao, J., Wang, X.W., 2024. Ammonia stress-induced heat shock factor 1 enhances white spot syndrome virus infection by targeting the interferon-like system in shrimp. mBio 15, e0313623.

      (c) For RNA seq analysis in both in Figures 1 and 5, they need to provide changes in conventional HSF1 target chaperones (many HSPs) to validate their RNA seq data.

      Thank you for your suggestion. In Authopr response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024). We have added the description in Lines 136-138 in the revised manuscript.

      In Figure 5, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (d) In Figure 5, they did experiments by focusing on the changes by HSF1 knockdown at 32 ℃. However, the logical flow should be focusing on genes whose expression was increased by 32 ℃ compared with 25 ℃ (in figure 1), among them they need to characterize HSF1 target genes. Here as mentioned above, classical HSP genes must be included in addition to those AMP genes.

      Thank you for your suggestion. As your good suggestion, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      (e) What is the logical basis of just picking nSWD? It is another example of cherry-picking similar to picking HSF1 in Figure 1.

      We appreciated your comments. To determine how temperature-induced LvHSF1 restricts WSSV infection, RNA-seq was performed to identify target genes regulated by HSF1. By analyzing the differentially expressed genes (DEGs), we screened eight candidate proteins for immunity-effector molecules, including SWD, CrustinⅠ, C-type lectin, Anti-lipopolysaccharide factor (ALF), and Vago. CrustinⅠ has been shown to play an important role in antiviral immunity (Li et al., 2020); C-type lectin (CTL1) can bind to the VP28, VP26, VP24, VP19, and VP14, thereby inhibiting the infection of WSSV (Zhao et al., 2009); Anti-lipopolysaccharide factor (ALF3) performs its anti-WSSV activity by binding to the envelope protein WSSV189 (Methatham et al., 2017); Vago can inhibit WSSV infection by activating the Jak/Stat pathway in shrimp (Gao et al., 2021). However, the detailed regulatory mechanism of SWD against WSSV was unclear, and particular attention was paid to the SWD. We have added the description in Lines 215-220 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Li, S., Lv, X., Yu, Y., Zhang, X., Li, F., 2020. Molecular and Functional Diversity of Crustin-Like Genes in the Shrimp Litopenaeus vannamei, Marine Drugs 18, 361.

      Zhao, Z.Y., Yin, Z.X., Xu, X.P., Weng, S.P., Rao, X.Y., Dai, Z.X., Luo, Y.W., Yang, G., Li, Z.S., Guan, H.J., Li, S.D., Chan, S.M., Yu, X.Q., He, J.G., 2009. A novel C-type lectin from the shrimp Litopenaeus vannamei possesses anti-white spot syndrome virus activity. Journal of Virology 83, 347-356.

      Methatham, T., Boonchuen, P., Jaree, P., Tassanakajon, A., Somboonwiwat, K., 2017. Antiviral action of the antimicrobial peptide ALFPm3 from Penaeus monodon against white spot syndrome virus. Dev Comp Immunol 69, 23-32.

      Gao, J., Zhao, B.R., Zhang, H., You, Y.L., Li, F., Wang, X.W., 2021. Interferon functional analog activates antiviral Jak/Stat signaling through integrin in an arthropod. Cell Rep 36, 109761.

      (f) Likewise, choosing Atta in S2 cells needs logic.

      We appreciated your comments. Our manuscript revealed that febrile temperature inducible HSF1 confers virus resistance by regulating the expression of antimicrobial peptides (AMPs) in L. vannamei. Further, we want to know that whether HSF1 regulation of antimicrobial peptides is a conserved defense mechanism induced by elevated temperature in arthropods, and experiments were performed in an invertebrate model system (Drosophila S2 cells). Previous study showed that DmAMPs (such as Attacin A, Cecropins A, Defensin, Metchnikowin, and Drosomycin) exerted a significant role in the antiviral immunity in Drosophila (Zhu et al., 2013). Our results showed that the expression of Attacin A, Cecropins A and Defensin were remarkably induced by DmHSF, and the expression of Attacin A was the highest induced. Therefore, DmAtta was chosen as a representative to further demonstrate that DmHSF1 exerts its anti-DCV function by regulating DmAMPs. We have added the description in Lines 328-330 and Lines 361-364 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Zhu, F., Ding, H., Zhu, B., 2013. Transcriptional profiling of Drosophila S2 cells in early response to Drosophila C virus. Virol J 10, 210.

      (2) From Figure 6I to 6K, the authors aimed to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperatures. However, what they showed was just showing that nSWD plays anti-WSSV function downstream of HSF1. The authors should show additional data for dsControl+rnSWD.

      Thank you for your suggestion. As your suggestion, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      (3) For the physical interaction between nSWD and WSSV, it will be great if the authors perform Alphafold3 prediction analysis (Abramson et al PMID: 38718835).

      Thank you for your suggestion. As you suggestion, we performed Alphafold3 prediction analysis on SWD and WSSV (VP24 and VP26). The predicted template modeling (pTM) score measures the accuracy of the entire structure. A pTM score above 0.5 means the overall predicted fold for the complex might be similar to the true structure. The Alphafold3 prediction results show that there is a possible interaction between SWD and WSSV. Notably, our manuscript demonstrated that rSWD could interact with VP24 and VP26 by pulldown assays and confocal analysis.

      Author response image 3.

      Alphafold3 prediction analysis of SWD&VP24 as follow (pTM = 0.64)

      Author response image 4.

      Alphafold3 prediction analysis of SWD&VP26 as follow (pTM = 0.53)

      Minor comments

      (1) In the Abstract and many other places, the authors need to specifically write "Drosophila S2 cells" instead of "Drosophila" because conventionally Drosophila implies fruit fly as an organism. We don't say cultured human cells as "human" or "Homo sapiens" in papers.

      Thank you for your suggestion. We have modified the description of Drosophila in the revised manuscript. Thank you.

      (2) Figure numbers can be reduced for better readability. I would combine Figures 1 and 2, and Figures 3 and 4. If the combined figures are too crowded, some can go to into supplementary figures.

      Thank you for your suggestion. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. However, we have added some experimental data to Figures 1, 2, 3, and 4. Therefore, we did not combine Figure 1 and Figure 2, and Figures 3 and 4. Thank you.

      (3) One of the best-understood roles of HSF1 in physiology other than heat shock response is longevity, in particular with C. elegans. The authors need to mention this in the Discussion by citing the following recent review paper (Lee PMID: 36380728).

      Thank you for your suggestion. We have supplemented the description of HSF1 regulating longevity and aging of organisms and cited the above reference in the revised manuscript (Lee and Lee, 2022). Thank you.

      Reference:

      Lee, H., Lee, S.V., 2022. Recent Progress in Regulation of Aging by Insulin/IGF-1 Signaling in Caenorhabditis elegans. Mol Cells 45, 763-770.

      (4) Please make your own label for small letter panels or transfer small letter panels to supplementary figures.

      Thank you for your suggestion. We have adjusted the relevant letter labels. The uppercase letters represent the main image of the Figure, and the small letter panels are the corresponding supplementary instructions in the revised manuscript. Thank you.

      (5) In the introduction part, I recommend changing the references for HSFs and HSR with recent ones.

      Thank you for your suggestion. We have added the latest references for HSFs and HSR in the Introduction part of the revised manuscript. Thank you.

      (6) In Figure 1, it is not intuitive to understand the name groups W and TW.

      We appreciated your comments. We have added the description of Group W and Group TW in revised Figure 1. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing. Thank you.

      (7) Please add some kinds of sequence comparisons of SWD and nSWD for readers to understand the homology.

      We appreciated your comments. We have added the multiple sequence alignment of SWD proteins in shrimp species in revised Supplementary Figure 3. Highly conserved amino acid residues and cysteine and residues are highlighted in red, indicating that LvSWD is a conserved antimicrobial peptide of the Crustin family. Thank you.

      (8) Naming nSWD with "newly identified" is strange as it will not be new anymore as time goes by. Please change the name.

      Thank you for your suggestion. We have modified the name of nSWD to SWD in the revised manuscript. Thank you.

      (9) Please write the full name for Lv (Litopenaeus vannamei), Dm (Drosophila melanogaster), ds (double-stranded) before using LvHSF1, DmHSF1, and dsLvHSF1.

      Thank you for your comments. We have added the full name of LvHSF1, DmHSF1, and dsLvHSF1 in the revised manuscript. Thank you.

      (10) In Figure 2, it will be better to transfer poly I:C data to supplementary figures.

      Thank you for your comments. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. Thank you.

      (11) The label for pGL3-nSWD-M12 is confusing. M1 and M2 are OK. Please change M12 with M1/2 or another one.

      Thank you for your suggestion. We have changed pGL3-nSWD-M12 with pGL3-nSWD-M1/2 in the revised manuscript. Thank you.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This article presents useful findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework for the various ways in which warming can affect but set timing. The statistical analysis is compelling, but indicates some factors that may temper the authors' claims, while the designs of experiments offer incomplete support for the current claims as they rely on one population under extreme conditions for only one year each while a confounding effect (time in a chamber) sometimes lacks a control.

      We thank the editor and reviewers for their consideration of our revised manuscript and for their constructive suggestions. In response to the editor’s guidance, we have ensured that: 1) the experimental design is clearly presented as physiological forcing, 2) the Solstice-as-Phenology-Switch concept is explicitly defined, limited, and framed as inferred, 3) conclusions are strictly aligned with the scope of the evidence, and limitations are acknowledged transparently.

      We hope these revisions fully address the remaining concerns and clarify both the conceptual framework and the appropriate scope of inference.

      Public Review:

      Reviewer #1 (Public review):

      The authors identified the summer solstice (June 21) as a phenological "switch point", but the flexibility of this switch point remains poorly understood. A more precise explanation of what "flexibility" means in this context is needed, along with a description of the specific experimental results that would demonstrate this flexibility.

      We agree that the concept of “flexibility” required clearer definition and a more explicit link to the experimental results. In the Introduction, we now explicitly define flexibility as the capacity for the effective timing of the phenological switch to shift earlier or later depending on developmental progression, rather than occurring at a fixed calendar date. This switch occurs at the compensatory point between the antagonistic influences of early-season development [ESD effect] and late-season temperature [LST effect](L92-98). We have extended and clarified our explanation of the summer solstice’s role in this framework (L69-90). We propose that the solstice acts as an environmental switch that initiates the LST effect, as declining daylengths signal trees to become responsive to late-season cooling (L92-94). The compensatory point then occurs where the advancing ESD effect is balanced by the delaying LST effect. This point should therefore not be fixed to a calendar date but instead vary with developmental progression each year (L75-95).

      In the Discussion, we clarify that flexibility is demonstrated experimentally by the observation that the magnitude of July cooling effects (LST effect) on autumn phenology depend on prior developmental rate (ESD effect) [3.4 times greater delay in late-leafing trees], indicating that the position of the compensatory point is development-dependent rather than fixed to June 21 (L398-410). We have made consistent edits throughout the Discussion, in particular in the ‘Support for the Solstice-as-Phenology-Switch Hypothesis’ subsection (L514-530).

      The experiment did not directly measure the specific date of the phenological switch point. Instead, it was inferred by comparing temperature effects before and after the solstice. The manuscript should clearly state that this switch point remains an inferred conceptual node rather than a directly measured variable.

      We fully agree and have clarified this in the revised manuscript. In the Discussion, we now clearly state that the compensatory point is a conceptual node inferred from responses to cooling before the solstice (June), directly after it (July), or later in the growing season (August) rather than a directly observed phenological event (L352-358 & L405-406).

      In Experiment 1, the effect of bud type (terminal vs. lateral) was inconsistent across the overall model and the different leafing groups. The authors should provide a more thorough discussion of potential reasons for this inconsistency.

      This inconsistency reflects biological complexity. In the Discussion, we now expand our interpretation to note that terminal and lateral buds may differ in developmental status, resource allocation and hormonal context. We emphasize that bud-type effects are therefore expected to be context-dependent and to interact with wholeplant developmental state, which plausibly explains why effects differ across leafing groups and models (L390-396).

      In addition, the statistical model for Experiment 1 indicates that the measured variables (summer cooling and leaf emergence date) explain only 23.4% of the variation in bud formation timing. This leaves over 76% of the variation unexplained, suggesting that other important factors are involved. The discussion should address this limitation in greater depth, moving beyond a focus on the measured variables.

      We now discuss the explained and unexplained variance in more detail. We also make it clear that our experiment was designed to test specific mechanistic pathways rather than to fully explain all phenological variability or maximise predictive power L417-419).

      In the Discussion, we acknowledge that a substantial fraction of variation remains unexplained (L419-421). We discuss the possibility of other physiological mechanisms, such as photosynthetic assimilation, contributing to the unexplained variation (L421-427). However, large inter-individual variability is commonplace in autumn phenology. A low intra-class correlation coefficient (ICC = 0.26; see L276-280 for methods) suggests much of the remaining variation is attributable to individual-level differences rather than missing explanatory variables (L429-431). In line with the literature, we suggest that genetic and epigenetic differences likely contributed significantly to inter-individual variation, even within a single provenance population (L431-434). In this context of high individual variability, leaf-out timing (ESD effect) and summer cooling treatment (LST effect) together explaining 23.4% of variation in bud set timing is biologically meaningful and demonstrates the mechanistic importance of these processes (L438-441). For completeness, we also briefly discuss alternate sources of within-treatment variability (L434-437).

      Reviewer #2 (Public review):

      I think the experiments are interesting, but I found the exact methods of them somewhat extreme compared to how the authors present them.

      We appreciate this concern and have substantially revised the manuscript to clarify the experimental logic. In the Introduction, we now state explicitly that the study uses temperature regimes that were designed as strong physiological forcing treatments, intended to deeply constrain development and isolate mechanisms rather than to simulate natural or future climatic conditions (L113-115).

      In the Methods, we have enhanced our description of the non-linear effects of temperatures below 10°C on physiological processes (L154-158).

      At the start of the Discussion, we have added a dedicated paragraph clarifying the scope of inference: the experiment tests causality and constraint (i.e. whether specific physiological processes can drive phenological shifts), not quantitative responses under realistic climate scenarios (L346-363). Throughout the Discussion, we have revised language that could be read as scenario-based interpretation, replacing it with mechanistic phrasing.

      Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species.

      Given the large individual variation expected in phenological experiments, we used single experimental populations of single provenance beech saplings to minimise uncontrolled for variation arising from genetic differences (L358-360). This allowed us to elucidate mechanisms despite noisy biological heterogeneity associated with phenology.

      In the last round of revision, we toned down statements of generalisation. In the Discussion, we now go further to clarify what mechanistic understanding can be gleamed directly from our findings and then cautiously make suggestions how these mechanisms may play out in natural systems. We repeatedly state the intention of the study as mechanistic inference rather than predictive power, e.g. “However, extrapolations to more complex natural ecosystems should be made with caution as our experimental design prioritised mechanistic inference over generalisability and predictive power.” (L417-419). Alongside our previous calls for tests on other species, we now additionally call for tests on other provenances of beech (L511-512).

      I was also very concerned by the revisions.

      If this concern stems from the confusion regarding line-numbers and the two submitted versions of the manuscript (with tracked changes and without tracked changes; as required by eLife), then we hope that situation is now clarified. Otherwise, the authors do not understand why our previous revisions would be perceived as being concerning. Regardless, we have made every attempt to address the remaining comments comprehensively.

      Further, I am at a loss about their hypothesis, when they write in their letter: "Importantly, the Solstice-asPhenology-Switch hypothesis does not assume that the reversal is fixed to June 21." Why on earth reference the solstice if the authors do not mean to exactly reference the solstice?

      We appreciate this important conceptual point. The Solstice-as-Phenology-Switch hypothesis is central to our conceptual model and therefore requires clear explanation. In concert with our changes in response to Reviewer 1’s comment regarding flexibility, we have substantially revised and improved our description of this hypothesis (L69-108).

      Whilst the summer solstice is fixed to a calendar date (June 21), the timing of when trees change their autumn phenological responses to temperature is not (L88-90 & L515-517). This occurs when the compensatory point of two antagonistic effects is crossed. Higher early-season development rates (which are driven by temperature) have an advancing (negative) effect on autumn phenology, which we now refer to as the ESD effect (L71-78). Warmer late-season temperatures have a delaying (positive) effect because trees become phenologically susceptible to cooling, i.e. overwintering responses are induced in response to cooling, which we now refer to as the LST effect (L78-82). The point in time when these two effects balance each other out, i.e. the net effect = 0, is the compensatory point (L95-97 & L523-525). The reason this point occurs after the solstice, is because the LST effect only becomes active when days begin to shorten (L92-94 & L522-523). The solstice acts as an environmental switch, initiating trees’ susceptibility to cooling. Therefore, the solstice is referenced in the hypothesis because it forms a daylength barrier. In this framework, the compensatory point cannot occur earlier than the solstice because day lengths are still increasing (L517-519).

      In the Introduction and Discussion, we clarify that the solstice is referenced as a biologically meaningful photoperiodic cue, not as a fixed threshold date. We now emphasise that the hypothesis concerns a seasonal reversal in responses to temperature structured around photoperiod, whose effective timing depends on developmental state, rather than a reversal occurring precisely on June 21. To avoid confusion, we have reworded phrases such as “summer solstice effect reversal” to “reversal of phenological responses to temperature after the summer solstice” (L371). In accordance, we have also changed the title to “Developmental constraints mediate the reversal of temperature effects on the autumn phenology of European beech after the summer solstice”.

      The following comments stem from the first round of review. We have previously revised the manuscript in accordance with these comments. For most of these points we do not see further cause for changes except for any overlap with comments above. We therefore predominantly copy our previous responses in quotes for clarity, the exception being the comment regarding the framing of our results in relation to natural systems.

      The comments below relate to my original review with many of them still applying.

      Methods: As I read the Results I was surprised the authors did not give more info on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods I feared they were burying this as the methods feel quite extreme given the framing of the paper.

      “We understand the concern regarding the structure of the manuscript and note that the methods section was moved to the end of the paper in accordance with eLife’s recommended formatting. We have now moved the methods section before the results to ensure that readers are familiar with the treatments before encountering the outcomes.

      Regarding presentation, treatment details are now described in both the Methods and the relevant figure legends. Given this structure, we have chosen not to restate the full treatment conditions in the main Results text to avoid repetition.”

      The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe of which I have worked in. For example a low of 2 deg C at night and 7 deg C during the day through end of May and then 7/13 deg C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      We appreciate the reviewer’s concern regarding the use of relatively extreme temperature treatments and the need to ensure that our conclusions are consistent with the motivation for using them. The manuscript was also revised in this regard in the previous round, and we copy the relevant responses at the bottom of this response. Despite this, we agree that further explanation of how our experimental treatments suited the aims of our study was still required.

      The aim of these treatments was not to reproduce typical ambient conditions, but to act as a mechanistic probe. Such mechanisms are not readily identifiable from observations or mild manipulations, because the expected effects are small relative to natural variability; stronger perturbations are therefore required to generate a diagnostic contrast. By strongly constraining development in the early-season, and by providing a robust cooling signal in the late-season, we sought to reveal the causal structure underlying the observed solstice-related reversal in temperature effects on autumn phenology.

      Temperatures below 10°C intensively slow down cell division and mitotic rates, these rates then rapidly and non-linearly approach 0 as temperatures drop towards 0°C (Körner, 2021). As reflected in L152-158 of the revised manuscript, we selected a spring cooling regime of 2–7 °C to strongly slow developmental processes while maintaining a clear thermal safety margin that eliminates the risk of frost damage. Although a milder cooling regime (e.g. 5–10 °C) would be less extreme, it would also be expected to produce only a comparatively small reduction in developmental rates, thereby substantially reducing our ability to generate distinct early- and late-developing individuals and to detect carry-over effects on autumn phenology. Applying strong cooling therefore increases signal-to-noise and allows us to detect the underlying mechanism, which would not be possible with temperature treatments that represent average contemporary climatic variation.

      The use of conditions out with the norm is a standard practice to elucidate mechanisms in ecology, where organisms are often pushed to their physiological limits or transplanted into environments fundamentally different to those which they are adapted (Somero, 2010; Berend et al., 2019). Experiments targeting autumn phenology have utilised a broad range of environmental conditions from moderate to extreme manipulations (Tanino et al., 2010). For example, to test the controls of growth cessation and dormancy induction in Prunus species, one study applied a range of treatments including constant 9°C temperature and 24 hour photoperiod between April and July (Heide, 2008).

      Our experimental design aimed to reduce rates of development, cell division and maturation. In the Methods, we describe this aim and clearly state that the experimental design was not intended to mimic natural climatic variation (L154-156 & L181-186). Importantly, our conclusions are framed at the level of direction, timing, and interaction of effects, rather than the magnitude expected under contemporary or future field conditions (L360-363).

      This framing intends to reflect the primary inference of this study, which concerns when and why temperature effects reverse around the solstice, and how this timing depends on developmental state and diel temperature exposure, rather than making quantitative predictions for present-day or future climates. This aligns our conclusions with the experimental design. We have further revised the Discussion to explain these aims and conclusions more clearly, including the addition of a subsection at the beginning titled “Experimental forcing and scope of inference” (L346-363). We have also set up this expectation in the Introduction (L113-115).

      Additionally, we have improved the Discussion in a number of related aspects.

      We explicitly separate mechanistic conclusions and any relation to natural systems, remaining cautious to not overgeneralise or overstate our findings (L417-419).

      We now include a dedicated paragraph explaining that, although these specific conditions are not likely to be found in beech’s range, analogous developmental constraints can arise during cold springs, late cold spells following budburst, or at high-elevation and continental sites where temperatures remain low despite increasing photoperiod (L540-545, L583-588). We further explain that because developmental progression integrates temperature cumulatively over time, even short episodes of strong cooling can exert lasting carry-over effects on seasonal timing, thereby linking the forced experimental responses to processes relevant under natural, fluctuating conditions (L545-550).

      We explicitly state that the decoupling of day and night temperatures was not intended to represent realistic meteorological states (L458-460). We explain that this design was used diagnostically to isolate inherently diel physiological processes (e.g. nocturnal growth, cell division and expansion versus daytime carbon assimilation), and that the observed responses demonstrate the importance of diel timing of temperature exposure rather than the realism of the imposed cycles (L460-468).

      Previous response:

      We recognise that our temperature treatments were severe and do not mimic real world scenarios. They were deliberately designed to create large contrasts in developmental rates, thereby maximising our ability to detect the mechanisms underpinning the solstice switch. For example, the severe cooling between 4 April and 24 May was specifically designed to slow spring development as much as possible without damaging the plants. We have added text in the Methods to clarify this aim.

      I also think the control is confounded with growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2) so I think they need to be more upfront about this. The study is still very valuable, but -- again -- we may need to be more cautious in how much we infer from the results.

      We appreciate the reviewer’s concern about the potential confounding effect of chamber exposure in experiment 1. We have now discussed this limitation more explicitly, adding further explanation to the Methods and Discussion.

      Note that chamber-related problems (e.g. aphid infestations) primarily occurred under warm chamber conditions, whereas our experiment 1 cooling treatments maintained low temperatures that suppressed such issues. This means that an equivalent “warm chamber control” could have been associated with its own artefacts, as trees kept under warm chamber conditions would have been exposed to additional stressors that were not present under natural growing conditions. To address this point, we included a chamber control in experiment 2. While aphid abundance was indeed higher in the warm chamber controls, chamber exposure itself had no detectable effect on autumn phenology. This suggests that the main findings of experiment 1 are unlikely to be artefacts of chamber conditions.

      Nevertheless, we agree that chamber exposure remains a potential limitation of experiment 1, which requires clear acknowledgement. We now state this more explicitly in the manuscript while also emphasising that our results are supported by experiment 2 and by converging lines of external evidence.

      Also, I suggest the authors add a figure to explain their experiments as they are very hard to follow. Perhaps this could be added to Figure 1?

      We have now added figures to the methods section to depict the experimental timelines and settings more clearly (Figs. 2 and 3).

      Finally, given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      We agree that carbon assimilation is an important component of forest carbon dynamics. However, the primary aim of this study was to identify how developmental state and diel cycles mediate temperature effects on autumn phenology, rather than to quantify carbon assimilation per se. Assessing photosynthetic controls on autumn phenology would require a substantially different experimental design and is therefore beyond the scope of the present study.

      That said, we were able to include measurements of photosynthetic assimilation during pre-solstice cooling (now presented as Fig. S12 for all treatments). These data show that cooling strongly reduced assimilation across all treatments, despite their markedly different phenological outcomes. This supports our interpretation that variation in assimilation alone cannot explain the observed phenological responses, consistent with previous manipulative and observational studies reporting a weak role of late-season assimilation in controlling autumn phenology.

      Fagus sylvatica: Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late) so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      We agree that Fagus sylvatica has a stronger photoperiod dependence than many other European tree species. As we note in our response to Reviewer 1, our findings align with previous research across temperate northern forests. Within our framework, interspecific variation in leaf-out timing would not alter the overall response pattern, though it could shift the specific timing of effect reversals. For example, earlier-leafing species may approach completion of development sooner and thus show sensitivity to late-season cooling earlier than F. sylvatica. Nevertheless, we acknowledge the importance of not overstating generality. We have therefore revised the manuscript to phrase conclusions more cautiously and highlight the need for further research across species.

      And the referenced response to Reviewer one:

      We agree that extrapolation from our experiments on Fagus sylvatica to other species and natural forests requires caution. However, it is precisely the controlled nature of our design that allowed us to isolate the precise mechanisms that appear to underpin the solstice switch, highlighting the role of diel and seasonal temperature variation. In natural systems, additional variables such as competition, precipitation, and soil heterogeneity can strongly influence phenology, but they also make it difficult to disentangle causal mechanisms. By minimising these confounding factors, our experiment provided a clear test of how temperature before and after the solstice regulates growth cessation.

      To acknowledge the limitation, we have toned down statements about generalisation (e.g. “likely generalisable” to “other temperate tree species may display similarities”) and explicitly call for follow-up studies across species and forest contexts. At the same time, we highlight that our findings align with independent evidence from manipulative experiments, satellite observations, flux measurements, and groundbased phenology, which suggests the mechanisms we report may extend beyond the specific populations studied here.”

      As described in responses above, we have further clarified what can be directly concluded from our study, avoiding overgeneralisation.

      Measuring end of season (EOS): It's well known that different parts of plants shut down at different times and each metric of end of season -- budset, end of radial expansion, leaf coloring etc. -- relate to different things. Thus I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised the authors cite almost none of the literature on budset, which generally suggests is it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may different with a different population of plants. 

      We thank the reviewer for pointing out that our discussion of the responses of different EOS metrics needs more clarity. We agree with much of this perspective, and we have added an additional analysis of leaf chlorophyll content data to use leaf discolouration as an alternative EOS marker. On this we would like to make two important points:

      Firstly, we agree that bud set often occurs before leaf discolouration, although this can depend on which definition of leaf discolouration is used. In experiment 1, budset occurred on average on day-of-year (DOY) 262 and leaf senescence (50% loss of leaf chlorophyll) occurred on DOY 320. However, we do not necessarily agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and loss of leaf chlorophyll) are similar, even if only directionally. Figure S11 shows how, across both experiments, treatment effects were tightly conserved (R<sup>2</sup> = 0.49) amongst the two phenometrics. In accordance with these revisions, we have updated the manuscript title to “Developmental constraints mediate the summer solstice reversal of climate effects on the autumn phenology of European beech”.

      Secondly, shifts in bud set timing remain the primary focus of the manuscript as these shifts are of direct physiological relevance to plant development and dormancy induction, whereas leaf discolouration may simply follow bud set as a symptom of developmental completion. This is supported by our results, which show stronger responses of bud set than leaf senescence (Figs. 4 & 5 vs. Figs. S9 & S10).

      Following the reviewer’s suggestion, we have included more references on the topic of bud set and its environmental controls. The reviewer rightly stresses that photoperiod is considered the most important factor. Photoperiod is therefore key in our conceptual model. However, the responses we observed in F. sylvatica cannot be explained by photoperiod alone. For example, in experiment 1, July cooling delayed the autumn phenology of late-leafing trees but had negligible impact on early-leafing trees, even though both experienced the exact same photoperiod. Moreover, in experiment 2, day, night and full-day cooling showed substantial variations in their effects despite equal photoperiod across the climate regimes. This is why we suggest that the annual progression of photoperiod modulates the responses to temperature variations instead of eliciting complete control.

      Following the addition of an analysis of leaf senescence data, we also revised the terminology in places (including the title) from “primary growth cessation/bud set” to the broader term “autumn phenology.” This term is intended to encompass two distinct but related physiological processes—bud set and leaf senescence—both of which are commonly used as markers of autumn phenology and the end of the growing season.

      Somewhat minor comments:

      (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.

      We have revised the analysis to include bud type as a fixed effect. There are only very minor numerical adjustments (e.g. rounding to 4.8 days instead of 4.9) and inferences are not altered. We also report the bud type effects for experiment 1 and experiment 2.

      (2) I didn't fully see how the authors results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end of season timing?

      Our responses to the main comments in this new round of revision have comprehensively covered this topic.

      References

      Berend K, Haynes K, MacKenzie CM. 2019. Common garden experiments as a dynamic tool for ecological studies of alpine plants and communities in northeastern North America. Rhodora 121: 174.

      Heide OM. 2008. Interaction of photoperiod and temperature in the control of growth and dormancy of Prunus species. Scientia Horticulturae 115: 309–314.

      Körner C. 2021. Alpine Plant Life: Functional Plant Ecology of High Mountain Ecosystems. Cham: Springer International Publishing.

      Somero GN. 2010. The physiology of climate change: how potentials for acclimatization and genetic adaptation will determine ‘winners’ and ‘losers’. Journal of Experimental Biology 213: 912–920.

      Tanino KK, Kalcsits L, Silim S, Kendall E, Gray GR. 2010. Temperature-driven plasticity in growth cessation and dormancy development in deciduous woody plants: a working hypothesis suggesting how molecular and cellular function is affected by temperature during dormancy induction. Plant Molecular Biology 73: 49–65.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study combined careful computational modeling, a large patient sample, and replication in an independent general population sample to provide a computational account of a difference in risk-taking between people who have attempted suicide and those who have not. It is proposed that this difference reflects a general change in the approach to risky (high-reward) options and a lower emotional response to certain rewards. Evidence for the specificity of the effect to suicide, however, is incomplete, which would require additional analyses.

      We thank the editors and reviewers for this important assessment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      Moreover, as Reviewer 3 pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M<sub>1</sub>), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Beyond these specific findings, this work highlights the broader utility of computational modelling and mood to better understand behavioral effect, showing how to use both mood and choice data to better comprehend a psychiatric issue. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use a gambling task with momentary mood ratings from Rutledge et al. and compare computational models of choice and mood to identify markers of decisional and affective impairments underlying risk-prone behavior in adolescents with suicidal thoughts and behaviors (STB). The results show that adolescents with STB show enhanced gambling behavior (choosing the gamble rather than the sure amount), and this is driven by a bias towards the largest possible win rather than insensitivity to possible losses. Moreover, this group shows a diminished effect of receiving a certain reward (in the non-gambling trials) on mood. The results were replicated in an undifferentiated online sample where participants were divided into groups with or without STB based on their self-report of suicidal ideation on one question in the Beck Depression Inventory self-report instrument. The authors suggest, therefore, that adolescents with decreased sensitivity to certain rewards may need to be monitored more closely for STB due to their increased propensity to take risky decisions aimed at (expected) gains (such as relief from an unbearable situation through suicide), regardless of the potential losses.

      Strengths:

      (1) The study uses a previously validated task design and replicates previously found results through well-explained model-free and model-based analyses.

      (2) Sampling choice is optimal, with adolescents at high risk; an ideal cohort to target early preventative diagnoses and treatments for suicide.

      (3) Replication of the results in an online cohort increases confidence in the findings.

      (4) The models considered for comparison are thorough and well-motivated. The chosen models allow for teasing apart which decision and mood sensitivity parameters relate to risky decision-making across groups based on their hypotheses.

      (5) Novel finding of mood (in)sensitivity to non-risky rewards and its relationship with risk behavior in STB.

      Weaknesses:

      (1) The sample size of 25 for the S- group was justified based on previous studies (lines 181-183); however, all three papers cited mention that their sample was low powered as a study limitation.

      We thank the Reviewer for rising this concern. We agree that the sample size for S<sup>-</sup> group (n=25) is modest, and the prior studies we cited also acknowledged limited power. We wanted to point out that we obtained a comparable sample size to a prior study. In the revision, we therefore updated the section to justify this sample size in which we acknowledge the limited power of our study in the limitation section. Please see our clarification below:

      Page 32:

      “Third, despite replicating our main results in an independent dataset (n=747), the modest S<sup>-</sup> subgroup size (n=25) has a limited statistical power.”

      (2) Modeling in the mediation analysis focused on predicting risk behavior in this task from the model-derived bias for gains and suicidal symptom scores. However, the prediction of clinical interest is of suicidal behaviors from task parameters/behavior - as a psychiatrist or psychologist, I would want to use this task to potentially determine who is at higher risk of attempting suicide and therefore needs to be more closely watched rather than the other way around (predicting behavior in the task from their symptom profile). Unfortunately, the analyses presented do not show that this prediction can be made using the current task. I was left wondering: is there a correlation between beta_gain and STB? It is also important to test for the same relationships between task parameters and behavior in the healthy control group, or to clarify that the recommendations for potential clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. Indeed, in line 672, the authors claim their results provide "computational markers for general suicidal tendency among adolescents", but this was not shown here, as there were no models predicting STB within patient groups or across patients and healthy controls.

      Thank you for these thoughtful comments. Our study focuses on why adolescent patients with suicidality have increased risk behavior, aiming to provide a mechanism-based target for suicide prevention. Therefore, our dependent variable in the mediation model was gambling behavior. We also agree that the clinically relevant question is whether suicidality can be predicted from task-derived behavior/parameters. We thus used risky behavior and the potential mental parameters to predict STB. Linear regressions showed that gambling behavior, as well as the value-insensitive approach parameter, can predict suicidal symptom scores among patients (former: β = 9.189, t = 2.004, p = 0.048; latter: β = 5.587, t = 2.890, p = 0.005). In healthy controls, these predictions failed (gambling behavior: β = 1.471, t = 0.825, p = 0.411; approach: β = 0.874, t = 1.178, p = 0.241). These results suggest that clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. We found same patterns for the mood parameter (mood sensitivity to certain rewards: patients: β = -28.706, t = -2.801, p = 0.006; healthy controls: β = -2.204, t = -0.528, p = 0.599). In sum, we believe that our statement of “computational markers for general suicidal tendency among adolescents” is reasonable now. Please see our revisions below:

      Page 17:

      “Furthermore, linear regression showed that gambling rate can predict the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048) among patients, but not among HC (β = 1.471, t = 0.825, p = 0.411), suggesting that gambling behavior has patient-specific predictive utility for suicidal symptoms.”

      Page 19:

      “Furthermore, linear regression showed that approach parameter can predict the current suicidal ideation score (β = 5.587, t = 2.890, p = 0.005) among patients, but not among HC (β = 0.874, t = 1.178, p = 0.241), suggesting that value-insensitive approach parameter has patient-specific predictive utility for suicidal symptoms.”

      Page 21:

      “Furthermore, linear regression showed that mood sensitivity to CR can predict the current suicidal ideation score (β = -28.706, t = -2.801, p = 0.006) among patients, but not among HC (β = -2.204, t = 0.528, p = 0.599), suggesting that mood sensitivity to CR has patient-specific predictive utility for suicidal symptoms.”

      (3) The FDR correction for multiple comparisons mentioned briefly in lines 536-538 was not clear. Which analyses were included in the FDR correction? In particular, did the correlations between gambling rate and BSI-C/BSI-W survive such correction? Were there other correlations tested here (e.g., with the TAI score or ERQ-R and ERQ-S) that should be corrected for? Did the mediation model survive FDR correction? Was there a correction for other mediation models (e.g., with BSI-W as a predictor), or was this specific model hypothesized and pre-registered, and therefore no other models were considered? Did the differences in beta_gain across groups survive FDR when including comparisons of all other parameters across groups? Because the results were replicated in the online dataset, it is ok if they did not survive FDR in the patient dataset, but it is important to be clear about this in presenting the findings in the patient dataset.

      Thank you for raising the important issue of multiple testing and for asking us to clarify exactly which tests were covered by the FDR procedure. In the clinical dataset we conducted a large number of inferential tests (χ<sup>2</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values. Please see our clarification below:

      Supplementary Page 4:

      “Supplementary Note 8: Clarification for FDR correction.

      In the clinical dataset we conducted a large number of inferential tests (χ<sup2\</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values.”

      (4) There is a lack of explicit mention when replication analyses differ from the analyses in the patient sample. For instance, the mediation model is different in the two samples: in the patient sample, it is only tested in S+ and S- groups, but not in healthy controls, and the model relates a dimensional measure of suicidal symptoms to gambling in the task, whereas in the online sample, the model includes all participants (including those who are presumably equivalent to healthy controls) and the predictor is a binary measure of S+ versus S- rather than the response to item 9 in the BDI. Indeed, some results did not replicate at all and this needs to be emphasized more as the lack of replication can be interpreted not only as "the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients" (lines 582-585) - it may also be that this link is not truly there, and without a replication it needs to be interpreted with caution.

      Thank you for these important comments. This study focused on cognitive and affective computational mechanisms underlying increased risky behavior in STB. Accordingly, we compared patients with STB (S<sup>+</sup>) with patients without STB (S<sup>-</sup>) and healthy controls (HC) to examine the effects of STB on risky behavior. Therefore, group comparison, instead of dimensional measure of suicidal symptoms by Beck Scale for Suicidal Ideation, can answer our research questions directly.

      To enhance consistency between the clinical and replication datasets, we included all participants in each dataset when performing the mediation analysis. Given that S<sup>-</sup> and HC did not differ in gambling behavior or the approach parameter in the clinical dataset, we merged these two groups. In the replication dataset, to mirror the S<sup>+</sup> vs. S<sup>-</sup> contrast used clinically, we categorized the general sample into S+ and S<sup>-</sup> based on BDI item 9. The mediation results remained significant in both datasets (the clinical dataset: a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; the replication dataset: a×b = 0.143, 95% CI = [0.016, 0.288], p = 0.031), suggesting that STB is associated with increased risk behavior via stronger approach motivation.

      We also acknowledge the non-replication of the correlation between gambling behavior and mood sensitivity to certain rewards in the online sample. While this pattern might indicate that the link is specific to suicidal patients, it may also reflect sample-specific or unstable effects; thus, we now state this explicitly and interpret the finding with caution. Please see our revisions below:

      Page 15:

      “We next verified our results in an independent dataset, including the same task and BDI questionnaire in 747 general participants (500 females; age: 20.90±2.41) (46). One item in BDI involves the measurement of STB. In item 9 of BDI, participants chose one option that describes them best: Option 1, “I don't have any thoughts of killing myself.”; Option 2, “I have thoughts of killing myself, but I would not carry them out.”; Option 3, “I would like to kill myself.”; Option 4, “I would kill myself if I had the chance.”. In line with the current definition of S<sup>+</sup>/S<sup>-</sup> in the clinical dataset, we identified S<sup>+</sup> group as choosing Option 2, 3, or 4, while participants selecting Option 1 were categorized as S<sup>-</sup> group.”

      Page 19:

      “Given significant correlations between group, approach parameter, and gambling rate for gain trials (ps < 0.017), we further conducted a mediation analysis with the assumption of the mediating effect of approach motivation of suicidality on the risk behavior. Given that we aimed to test the effect of STB, with S<sup>-</sup> and HC as controls, and given that S<sup>-</sup> and HC did not differ in gambling behavior or in the approach parameter, we merged these two groups for the mediation analysis. Results supported our hypothesis (a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; Figure 2C), confirming that suicidal thoughts and behavior increase risk behavior through stronger approach motivation.”

      Page 26:

      “However, we did not observe any significant correlation between mood sensitivity to CR and gambling behavior (ps > 0.389), which suggests that the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients. Alternatively, this non-replicated result may also reflect sample-specific or unstable effects, which needs to be interpreted with caution.”

      (5) In interpreting their results, the authors use terms such as "motivation" (line 594) or "risk attitude" (line 606) that are not clear. In particular, how was risk attitude operationalized in this task? Is a bias for risky rewards not indicative of risk attitude? I ask because the claim is that "we did not observe a difference in risk attitude per se between STB and controls". However, it seems that participants with STB chose the risky option more often, so why is there no difference in risk attitude between the groups?

      Thank you for pointing out the ambiguity. In our manuscript, “motivation” and “risk attitude” are defined at the computational level. Following prior work with this task Rutledge et al., (2015, 2016), we decompose observed gambling into (i) value-dependent valuation parameters that capture risk attitude (e.g., risk aversion and loss aversion, which scale the subjective value of outcomes), and (ii) value-insensitive, valence-dependent biases that capture approach/avoidance motivation. Accordingly, a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups—which is what we observe for S<sup>+</sup> vs. controls. We have clarified this point in the computational modeling section.

      Pages 12-13:

      “Please note that a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups. Risk attitude is indeed conceptualized in economics as the curvature of the utility function (i.e., the subjective value) of the objective outcomes, with concave curves associated with risk aversion, and convex curves associated with risk seeking (54,56). By contrast, the approach or avoidance bias apply to all the value. A possible interpretation of the approach bias is that participant approach the option with the highest possible gain (the lottery) in the gain frame; the avoidance bias would then reflect a tendency to systematically avoid the highest potential losses (the lottery) in the loss frame.”

      Reviewer #2 (Public review):

      Summary:

      This article addresses a very pertinent question: what are the computational mechanisms underlying risky behaviour in patients who have attempted suicide? In particular, it is impressive how the authors find a broad behavioural effect whose mechanisms they can then explain and refine through computational modeling. This work is important because, currently, beyond previous suicide attempts, there has been a lack of predictive measures. This study is the first step towards that: understanding the cognition on a group level. This is before being able to include it in future predictive studies (based on the cross-sectional data, this study by itself cannot assess the predictive validity of the measure).

      Strengths:

      (1) Large sample size.

      (2) Replication of their own findings.

      (3) Well-controlled task with measures of behaviour and mood + precise and well-validated computational modeling.

      Weaknesses:

      I can't really see any major weakness, but I have a few questions:

      (1) I can see from the parameter recovery that the parameters are very well identified. Is it surprising that this is the case, given how many parameters there are for 90 trials? Could the authors show cross-correlations? I.e., make a correlation matrix with all real parameters and all fitted parameters to show that not only the diagonal (i.e., same data is the scatter plots in S3) are high, but that the off-diagonals are low.

      Thank you for raising these thoughtful concerns. The current task consisted of 90 choices and 36 mood ratings. There were 5 choice parameters and 4 mood parameters. The apparently strong identifiability is not unexpected, as 90 choice trials and 36 mood ratings are comparable to those in prior computational modeling literature (Blain & Rutledge, 2022).

      As suggested, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery. Please see our clarifications below:

      Supplementary Pages 2-3:

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Page 10:

      “The numbers of choice trials and mood ratings were comparable to those in prior computational modeling studies (34,35).”

      (2) Could the authors clarify the result in Figure 2B of a correlation between gambling rate and suicidal ideation score, is that a different result than they had before with the group main effect? I.e., is your analysis like this: gambling rate ~ suicide ideation + group assignment? (or a partial correlation)? I'm asking because BSI-C is also different between the groups. [same comment for later analyses, e.g. on approach parameter].

      Thank you for pointing out the lack of clarity. We performed group difference analysis and correlation of suicidal ideation analysis, separately. We first performed group difference analysis to test our hypothesis of STB effects. We then conducted correlational analysis to further specify our findings.

      (3) The authors correlate the impact of certain rewards on mood with the % gambling variable. Could there not be a more direct analysis by including mood directly in the choice model?

      Thank you for this insightful suggestion. As suggested, we tried to integrate mood into choice models by adding mood bias component(s) in line with previous literature (Vinckier et al., 2018). The first model (mcM1) assumes that mood biases choice, building on cM3 (the winning choice model). cmM2 further separated the mood bias parameter into two components according to participants’ choices.

      However, model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see our clarifications below:

      Supplementary Pages 3-4:

      “Supplementary Note 6: integration of mood into choice models

      Although we modeled choice and mood separately to examine cognitive and affective mechanisms underlying increased risk behavior in adolescent suicidal patients, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model).

      Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2).

      Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. The mood bias parameters in neither cM2 nor cM3 reached significance (ps > 0.091), which may be due to the absence of a blocked design in our experiment, unlike in Vinckier et al. (2018) and Eldar and Niv (2015).”

      (4) In the large online sample, you split all participants into S+ and S-. I would have imagined that instead, you would do analyses that control for other clinical traits. Or, for example, you have in the S- group only participants who also have high depression scores, but low suicide items.

      Thank you for this insightful suggestion. Following prior suicide-related literature (Tsypes et al., 2024), we controlled for depression by including them as covariates. Note that depression scores were derived from our established bifactor model (Wang et al., 2025), which decomposed depression from the anxiety. These results remained largely significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.

      Please see our clarifications below:

      Page 26:

      “After controlling for depression severity using our established bifactor model (see ref 60 for details), these results remained significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.”

      Reviewer #3 (Public review):

      This manuscript investigates computational mechanisms underlying increased risk-taking behavior in adolescent patients with suicidal thoughts and behaviors. Using a well-established gambling task that incorporates momentary mood ratings and previously established computational modeling approaches, the authors identify particular aspects of choice behavior (which they term approach bias) and mood responsivity (to certain rewards) that differ as a function of suicidality. The authors replicate their findings on both clinical and large-scale non-clinical samples.

      (1) The main problem, however, is that the results do not seem to support a specific conclusion with regard to suicidality. The S+ and S- groups differ substantially in the severity of symptoms, as can be seen by all symptom questionnaires and the baseline and mean mood, where S- is closer to HC than it is to S+. The main analyses control for illness duration and medication but not for symptom severity. The supplementary analysis in Figure S11 is insufficient as it mistakes the absence of evidence (i.e., p > 0.05) for evidence of absence. Therefore, the results do not adequately deconfound suicidality from general symptom severity.

      Thank you for this important comment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      As pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M₁), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Please see our revisions below:

      Page 17:

      “Within patients, this group effect on gambling rate remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.024; also see Figure S11, Table S7 and Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) to extract main components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. To further control for anxiety and depression, linear regression using these components as covariates revealed that the group effect on gambling rate remained significant (p = 0.024; Table S9).”

      Pages 18-19:

      “Within patients, this group effect on the approach parameter remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.027; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on approach parameter remained significant (p = 0.027; Table S9).”

      Page 21:

      “Within patients, this group effect on βCR remained significant after controlling for gambling rate, earnings, mood-related outcome effect, mood drift effect, sex, illness duration, family history, diagnosis, and various medications use (ps < 0.032), as well as general symptoms (e.g., depression and anxiety; p = 0.001; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on this mood parameter remained significant (p = 0.001; Table S9).”

      (2) The second main issue is that the relationship between an increased approach bias and decreased mood response to CR is conceptually unclear. In this respect, it would be natural to test whether mood responses influence subsequent gambling choices. This could be done either within the model by having mood moderate the approach bias or outside the model using model-agnostic analyses.

      Thank you for this important suggestion. As suggested, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model). Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2). Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see Supplementary Pages 3-4:

      (3) Additionally, there is a conceptual inconsistency between the choice and mood findings that partly results from the analytic strategy. The approach bias is implemented in choice as a categorical value-independent effect, whereas the mood responses always scale linearly with the magnitude of outcomes. One way to make the models more conceptually related would be to include a categorical value-independent mood response to choosing to gamble/not to gamble.

      We apologise for the unclear statement. The approach bias is implemented in choice as a continuous value-independent effect, ranging from -1 to 1.

      It was true that the mood responses always scale with the magnitude of outcomes, since mood ratings were request after the outcomes. Therefore, mood parameters and the approach bias were both continuous.

      We also attempted to integrate mood into choice modelling. See Response 2 for Reviewer 3 for details.

      (4) The manuscript requires editing to improve clarity and precision. The use of terms such as "mood" and "approach motivation" is often inaccurate or not sufficiently specific. There are also many grammatical errors throughout the text.

      Thank you for this important suggestion. We have now explained motivation and mood in the Introduction section and the computational modeling section. Please see our clarifications below:

      Pages 3-4:

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g., from surprise to fear)(31-33,39).”

      We have corrected grammatical errors throughout the manuscript.

      5) Claims of clinical relevance should be toned down, given that the findings are based on noisy parameter estimates whose clinical utility for the treatment of an individual patient is doubtful at best.

      Thank you for this comment. We agree that we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters, which is outside the scope of the study, and it is indeed possible that parameter estimate is somehow noisy. Therefore, we tone down the clinical relevance of our results. Please see our revision below:

      Page 32:

      “Next, we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters and it is indeed possible that parameter estimate is somehow noisy.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Title: I believe "aberrant mood dynamics" is both too general and overstating the results of this study, which did not measure mood dynamics longitudinally. "Aberrant" is also overly pathologizing. I would suggest sticking more directly to the results, for instance, "Insensitivity of momentary mood to non-risky rewards in adolescent suicidal patients".

      Thank you for this suggestion. We have now corrected it.

      (2) Abstract: in line 61, "Our study uncovers the cognitive and affective mechanisms" suggests that these are the only ones, and you uncovered them. Of course, there could be more mechanisms contributing to risk behavior in STB, so I would suggest removing the word "the" or adding "one of the".

      Thank you for this suggestion. We have now corrected it.

      (3) One major weakness of this study is that suicidal thoughts and behaviors were not assessed via a clinical instrument such as the Columbia Suicide Severity Rating Scale - this should be mentioned upfront.

      Thank you for this comment. According to medical records and information from family and friends by the researcher and psychiatrists, patients with suicidal thoughts and behaviors were categorized as suicidal group (S<sup>+</sup>), while patients without suicidal thoughts and behaviors were identified as control group (S<sup>-</sup>). Note that medical records and information were recorded from clinical interviews where the psychiatrists were vigilant for signs of suicidal ideation and inquired about suicidal-related thoughts and behaviors from both the patients and their families. Therefore, the current group operation was possibly comparable to Columbia Suicide Severity Rating Scale.

      (4) Table 1: female/male are sex, not gender (gender is man/woman/transgender/non-binary).

      Thank you for this suggestion. We have now corrected it.

      (5) Equation 1: It would be good to clarify what happens in gain-only or loss-only trials (the other value is then 0, but this can be clarified as it is not technically a loss or a gain).

      Thank you for this suggestion. We have now corrected it. Please see below for our revision:

      Page 12:

      “Please note that V<sub>gain</sub> is 0 in gain trials and V<sub>loss</sub> is 0 in loss trials.”

      (6) Figure 1E: The model prediction is not informative here. Given the linear regression model, there is no other option except that the mean prediction would overlap with the mean empirical measurement (unless the model was specified incorrectly). The same is true in Figure 2A.

      Thank you for this suggestion. We have now removed plots for model prediction.

      (7) Figure 1G: There was no analysis of the differences between groups in terms of earnings, given that the ANOVA was not significant. Still, if the claim is that risky behavior is sometimes suboptimal in this task, it would be good to show that there is a correlation between, say, symptoms of STB across groups and 1) risky behavior and 2) earnings.

      Thank you for this insightful comment. In the patient cohort, risky behavior (gambling rate)—but not earnings—predicted the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048; earnings, β = 0.001, t = 0.582, p = 0.562). The lack of association for earnings is consistent with the task design, in which there is no stable optimal policy and payouts are only a coarse proxy for decision quality. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB. We have clarified this point below:

      Page 32:

      “Second, although we assumed that increased risky behavior in STB was suboptimal, the current task was not suited to test this, given the task design of random feedback for gambling option. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB.”

      (8) Line 290: "beta_gain: -1-1" is unclear. I believe you meant beta_gain \in [-1,1].

      Thank you for this suggestion. We have now corrected it to make it clear.

      (9) The gain and loss biases are modeled as minimum and maximum probabilities for choosing the gamble. This is a legitimate choice for value-agnostic biases, but it is not the traditional choice (as far as I know). I wonder if the same results would hold with the more traditional formulation of the bias as an added constant to the utility of the gamble, i.e., p(gamble) = 1/(1+ exp(-mu(U_gamble + beta_gain - U_certain)). I believe in this case, you would also not have to specify different equations for positive or negative biases, or to limit the bias to the range of [-1,1] (indeed, the bias would be in reward-equivalent units).

      Thank you for this suggestion. The winning choice model we used here was consistent with previous literature (Rutledge et al., 2015 & 2016), which decomposed the decision process into risk-attitude-driven valuation (e.g., loss and risk aversion) and value-insensitive motivational components. These approach/avoidance parameters are a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference.

      As suggested, we also compared the traditional bias choice model. Model comparison did not support this. Please see our revision below:

      Supplementary Page 4:

      “We also considered the traditional bias parameter (cM4), rather than approach/avoidance parameters. We limited the bias to the range of [-100, 100], which was in reward-equivalent units.

      However, model comparison did not support cM4 (Table S6).”

      (10) Also, for equations 5-8, it seems that 5-6 are identical to 7-8 except for the use of beta_gain versus beta_loss. You might want to consider simplifying by putting beta in the equations and specifying in the text that, depending on the trial type (loss or gain), the relevant beta is used.

      Thank you for this suggestion. We have now simplified it. Please see response to Reviewer 2, point 3.

      (11) It is not clear what equations are applied to mixed trials in cM3.

      Sorry for the confusion. We have now clarified this point.

      Page 12:

      “Approach/avoidance parameters are not applied to in mixed trials.”

      (12) Model comparison: the mood models are nested within each other (e.g., mM3 can be derived from mM1 by setting beta_EV = beta_RPE). In this case, model comparison can use the likelihood ratio test instead of BIC, which can be too conservative (and therefore does not support the extra beta parameter for RPE, different from previous results in the literature). I wonder if a likelihood ratio test would lead to results more in line with previous findings with this task?

      Thanks for this suggestion. We agree that mM1 (CR+EV+RPE) and mM3 (CR+GR) are nested. However, our model space also included unnested models, such as mM5 (CR+GR<sub>better</sub>+GR<sub>worse</sub>). Therefore, it was not reasonable in our model space to use likelihood ratio tests.

      (13) Line 346: The replication sample is described as "healthy participants," however, their health (or mental health) status was not assessed, and they may as well have mental health concerns. I would suggest calling this a general sample or an undifferentiated sample - but not a healthy sample.

      Sorry for the confusion. We have now corrected this phrase.

      (14) Line 363: "in addition to the replication of previous findings in the validation dataset" is unclear. Are those tests not two-tailed?

      Sorry for the unclear statement. In the replication analyses, we used one-tailed t-tests because the direction of the effect was revealed on the clinical dataset. Please see our clarification below:

      Page 15:

      “For the replication of previous findings in the validation dataset, we used one-tailed tests in line with our clinically motivated directional hypothesis.”

      (15) Line 372: "validating our group manipulation" - the presented work does not have a manipulation. Maybe you meant "validating our grouping of participants"?

      Thank you for this suggestion. We have now corrected it to make it clear.

      (16) Figure 2B: It is not clear how the data were binned for illustration purposes only, and why this binning is necessary (I have not seen it in other papers) - presenting the data from each subject and the correlation line with error margins (as is done here) should be sufficient.

      Thank you for flagging this. For illustration only, we binned the data proportional to group sizes: in the patient sample (S<sup>-</sup> n = 25; S<sup>+</sup> n = 58; ≈1:2), we displayed 3 bins for S<sup>-</sup> and 6 bins for S<sup>+</sup>. We agree that binning is not necessary; all statistics were computed on raw, unbinned data. The binned panel was included solely for visualization, consistent with our prior work (Blain et al., 2023).

      (17) Table 2: delta BIC should be presented per subject (that is, divided by the number of subjects in each group), as the groups are of different sizes, so as presented now, the columns are not comparable across groups.

      Thank you for the helpful suggestion. Our goal in Table 2 is not to compare ΔBIC magnitudes across groups, but to identify the winning model within each group. The ΔBICs are aggregated at the group level solely to rank models for that group. Dividing by the number of participants would rescale each group’s column by a constant and would therefore not affect the within-group ranking or the conclusion that cM3 is the best model in all groups. For this reason, we retain the current presentation and interpret each column within group rather than across groups.

      (18) Line 640 - the effect of expectations and prediction errors on mood was not only shown in healthy people, but also in people with depression (Rutledge et al., 2007, https://pubmed.ncbi.nlm.nih.gov/28678984/)

      Thank you for this comment. Indeed, Rutledge et al., (2017) showed evidence for CR+EV+RPE mood model in adult people with depression. However, our study recruited adolescents with depression or anxiety, given that adolescent period might provide a developmental window for opportunities for early intervention of suicidality. Therefore, it is also possible that the current winning model was specific to adolescents. Please see our clarifications below:

      Page 28:

      “It is also possible that the current winning model was specific to adolescents. Given that Rutledge et al., (2017) supported the “CR-EV-RPE model” in adults with depression, our study with adolescent populations may suggest a developmental change for mood sensitivities.”

      (19) Supplemental material: Is the R2 section about R-squared? Perhaps you can use superscript on the 2 to make that clearer? For Figure S2, how was model recovery determined? Should I interpret the confusion matrix as suggesting that the winning model for each and every simulated subject was the generating model, or was the winning model determined for the whole simulated population in each of the 100 simulations? Traditionally, confusion matrices use the former measure, but the results of 100% recoverability make me suspect the latter was used here. In Figure S3, should we not be looking at simulated parameters and recovered parameters? What are "real parameters" here?

      Thank you for these important comments. We now consistently denote the coefficient of determination as R<sup>2</sup> (with a superscript 2) throughout the manuscript and Supplementary Materials.

      For the model recovery analysis in Figure S2, we have clarified that the confusion matrix is computed at the population level. Specifically, for each of the 100 simulations we generated a full dataset under each candidate model, fit all models to that dataset, and selected the winning model based on group-level model evidence (BIC). Each cell in the confusion matrix therefore reflects the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. This operation was reasonable because the decision of the winning model is made on the population-level dataset rather than on individual subjects.

      In Figure S3, the term “real parameters” referred to the parameters used to generate the simulated data. To avoid confusion, we now relabel these as “simulated (generating) parameters” and explicitly describe the figure as showing the relationship between simulated (generating) parameters and recovered parameters. Please see our revisions below:

      Supplementary Pages 2-3:

      “Model recovery: We generated 100 simulated datasets for each model (3 choice models and 8 mood models) using the fitted parameters of each model as the ground truth. Each dataset contained 201 trials and included 3 (or 8) sets of simulated data corresponding to the respective models. For each simulated dataset, we then fit all models and determined the winning model at the population level based on group-level BIC, yielding a confusion matrix in which each entry represents the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. As shown in Figure S2, all models are highly identifiable, indicating excellent recovery performance for both the choice and mood models.”

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“generating”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Typos:

      (1) Line 90: original → originate

      (2) Line 596-598 - the same phrase is repeated twice.

      (3) Line 616: on the other word → hand.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      For people unfamiliar with interpersonal theory or motivational-volitional model, or three-step theory (lines 105-106), could you briefly explain the key idea of mood and suicide before going to the decision-making tasks? And from this, maybe motivate the predictions in your task? In particular, in the abstract and introduction, the phrasing could be a bit more concise and simpler. In the abstract, sentences were sometimes quite long. In the introduction, some paragraphs are somewhat repetitive. In the discussion, there were some typos.

      Thank you for these suggestions. We have now explained the key idea of mood and suicide before going to the decision-making tasks in the introduction, which can be seen below:

      Pages 4-5:

      “Contemporary theories of suicide converge on the idea that STB is initially caused by low mood experience. The interpersonal theory of suicide proposes that suicidal desire arises when people simultaneously feel socially disconnected (“thwarted belongingness”) and like a burden on others (“perceived burdensomeness”), experiences that are tightly linked to chronically low mood(25). The motivational–volitional model(26) and the three-step theory(27,28) similarly emphasize that when negative mood and feelings of defeat or entrapment are experienced as inescapable, they can give rise to suicidal ideation, and that the progression from ideation to suicide attempts depends on additional factors such as reduced fear of death, increased pain tolerance, and a tendency to act impulsively under intense affect. Some official organizations, e.g., National Institute of Mental Health, have also listed mood problems as warning signals(8). Interestingly, within the framework of decision making under uncertainty, gambling on lotteries with a revealed outcome has been found to induce high mood variance(29), providing an opportunity to assess the relationship between deficient mood and increased gambling decisions in STB.”

      We have also refined the wording and corrected typos throughout the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Since many readers might only read the abstract, it is important that it is both informative and accurate. I have two suggestions in this respect. First, for the abstract to be more informative, it may be helpful to indicate already there that these are value-insensitive approach-avoidance parameters, in the sense that they favor/disfavor the gamble regardless of the potential outcomes' magnitude or probability. This issue is also present throughout the text, where the phrases "approach and avoidance motivation" are referred to as if they have established and precise computational definitions. In my view, these terms could just as easily be interpreted as parameters that multiply the value of potential gains or losses, which is not what the authors mean. It would be helpful to clarify this terminology.

      Thank you for these suggestions. In line with previous literature (Rutledge et al., 2015 & 2016), approach and avoidance motivation are indeed defined at the computational level, referring to a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. We have cited these papers in the manuscript. We also make it clear to further clarify approach and avoidance parameters in the abstract and introduction. Please see our revisions below:

      Page 2 (Abstract):

      “Using a prospect theory model enhanced with value-insensitive approach-avoidance parameters revealed that this rise in risky behavior resulted only from a heightened approach parameter in S<sup>+</sup>.Altogether, model-based choice data analysis indicated dysfunction in the approach system in S<sup>+</sup>, leading to greater propensity for gambling in the gain domain regardless of the lottery expected value.”

      Page 3 (Introduction):

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      (2) The statement "our study uncovers the cognitive and affective mechanisms contributing to increased risk behavior in STB" is overstating the findings, as the study may have uncovered some contributing mechanisms, but likely not all of them. Removing the word "the" would fix this issue.

      Thank you for this suggestion. We have now corrected it.

      (3) Since mood is typically defined as lasting hours, it's inappropriate to refer to ratings that only reflect the last few trials as self-reports of mood. To be sure, I view the distinction between emotions and moods as quantitative, not qualitative, so I do not think there is a problem studying the former to understand the latter, but to avoid confusion, the terminology should follow common usage.

      Thank you for this suggestion. We follow previous work and operational definitions regarding mood (Rutledge et al., 2014, Eldar & Niv, 2015, Vinckier et al., 2018). Emotion is usually a very brief response to a specific stimulus (Emanuel & Eldar, 2023), e.g., leading to rapid changes like surprise then fear. In contrast, mood is defined as a diffuse state that is not specific to one stimulus. Here, we operationally and computationally define mood as an affective state reflecting the recent history of safe and gamble outcomes. We now clarify that point in the main text. Please see our revision below:

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g. from surprise to fear)(31-33,39).”

      (4) Line 78: The phrases "increase in risk attitude", "decrease in loss attitude", and "decrease in value-independent choice biases" are unclear to me in terms of their directionality. An attitude might be avoidant or embracing. If it is the former then increasing it would decrease risk-taking.

      Thank you for pointing out the ambiguity. We have now corrected them throughout the manuscript. Please see our revision below:

      Page 4:

      “We therefore hypothesized that heightened approach motivation, or weakened avoidance motivation, would account for increased risk behavior in STB.”

      (5) Line 125: I was not sure why one would expect the mood response to gamble-related quantities (EV and RPE) to be lower in STB and not higher.

      Sorry for the typo. We hypothesized that mood would respond more strongly to gambling-related quantities—expected value (EV) and reward prediction error (RPE)—in adolescents with STB than in controls, given prior evidence that STB is associated with greater risk-taking.

      (6) The text could use proofreading, as there are many typos. These are from the first 100 lines alone:

      a) Abstract: regardless the lotteries -> regardless of the lotteries'.

      b) Line 78: it remains whether.

      c) Line 80: can each -> each can.

      d) Line 90: may original from.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      (7) The rationale for focusing on the S+ group for mood model comparison is incorrect. The purpose is to identify parameters that vary as a function of suicidality, and for that, the S- group is just as important.

      Thank you for this comment. We agree that the S<sup>-</sup> group is as important as the S<sup>+</sup> group. A direct comparison was complicated because the winning mood models differed (S<sup>+</sup>: mM3; S<sup>-</sup>: mM5; Table 3). To ensure comparability, we checked results from both model specifications (mM3 and mM5). The conclusions were convergent: mood sensitivity to certain rewards (CR) was lower in S<sup>+</sup> than in S<sup>-</sup> (see Fig. 3 for mM3 and Fig. S8 for mM5).

      (8) There appears to be a contradiction between the inclusion criteria, which include having experienced suicidal thoughts and behaviors, and the definition of the S- group as not having suicidality.

      Thank you for pointing out this mistake. The corrected version of inclusion criteria can be seen on Page 7:

      “Patients were included if they met the following criteria: 1) both the researcher and psychiatrists agreed on their group classification; 2) they had a current diagnosis of major depressive disorder (MDD; unipolar depression), generalized anxiety disorder (GAD), or bipolar disorder with depressive episodes (BD), confirmed by two experienced psychiatrists using the Structured Clinical Interview for DSM-IV-TR-Patient Edition (SCID-P, 2/2001 revision; see Supplementary Note 1 for details); 3) they were between 10 and 19 years of age; 4) they had no organic brain disorders, intellectual disability, or head trauma; 5) they had no history of substance abuse; 6) they had no experience of electroconvulsive therapy.”

      (9) It would be helpful to specify whether mood modeling was based on objective or subjective values, and why.

      Thank you for this helpful suggestion. We have now clarified whether mood modeling was based on objective or subjective values, and why. Specifically, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000). Based on this result and for parsimony, we report and interpret the mood modeling results from the objective-value family in the main text. We have clarified this point below:

      Supplement Pages 4-5:

      “Supplementary Note 9: Mood model comparison using subjective values.

      To identify whether mood modeling was based on objective or subjective values, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox (Daunizeau et al., 2014) to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000).”

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Data:

      (a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      Our single-cell RNA-seq findings show that genes related to motile cilia are specifically expressed in vestibular hair cells. This has not been demonstrated before. We have also provided supporting evidence using electrophysiology and imaging from bullfrogs and mice. Although no ultrastructural images of mouse vestibular kinocilia were provided in our study, transmission electron micrograph of mouse vestibular kinocilia has been published (O’Donnell and Zheng, 2022). The mouse vestibular kinocilia have a “9+2” microtubule configuration with nine doublet microtubules surrounding two central singlet microtubules. This finding contrasts with a previous study, which demonstrated that the vestibular kinocilia from guinea pigs lack central singlet microtubules and inner dynein arms, whereas outer dynein arms and radial spokes are present (Kikuchi et al., 1989). The central pair of microtubules is absent at the end of the bullfrog saccular kinocilium (Fig. 7A). We would like to point out that the dual identity of primary and motile cilia is not just based on the TEM images. The kinocilium has long been considered a specialized cilium, and its role as a primary cilium during development has been demonstrated before (Moon et al., 2020; Shi et al., 2022).

      In most motile cilia, the central pair complex (CPC) does not originate directly from the basal body; instead, it begins a short distance above the transition zone, a feature that already illustrates variation in CPC assembly across systems (Lechtreck et al., 2013). The CPC can also show variation in its spatial extent: for example, in mammalian sperm axonemes, it can terminate before reaching the distal end of the axoneme (Fawcett and Ito, 1965). In addition, CPC orientation differs across organisms: in metazoans and Trypanosoma, the CPC is fixed relative to the outer doublets, whereas in Chlamydomonas and ciliates it twists within the axoneme (Lechtreck et al., 2013). Such variation has been described in multiple motile cilia and flagella and is therefore not unique to vestibular kinocilia. What appears more unusual in our data is the organization at the distal tip, where a distinct distal head is present, similar to cilia tip morphologies recently described in human islet cells (Polino et al., 2023). Although this feature is intriguing, we interpret it primarily as a structural signature rather than as evidence for a specialized motile adaptation, and we have moderated our interpretation accordingly in the revision.

      (b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      While these genes (e.g., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) appear more highly expressed in P2 cochlear hair cells, they are not uniquely associated with the axoneme. For example, Dynll1/2 and Dynlrb1 are components of the cytoplasmic dynein-1 complex (Pfister et al., 2006), Cetn2 has multiple basic cellular functions beyond cilia (e.g., centrosome organization, DNA repair), and Mdh1 encodes a cytosolic malate dehydrogenase involved in central metabolic pathways such as the citric acid cycle and malate–aspartate shuttle. This contrasts with axonemal dyneins, which are uniquely required for cilia motility. To avoid ambiguity, we have marked such cytoplasmic or multifunctional genes with red asterisks in both Fig. 5G and Fig. 6D in the revised manuscript.

      Our comparison showed that key genes for motile machinery are not detected in cochlear hair cells. For example, Dnah6 and Dnah5 are not expressed in the P2 cochlear hair cells. Dnah6 and Dnah5 encode axonemal dynein and are part of inner and outer dynein arms. Importantly, we did not detect the expression of CCDC39 and CCDC40 in kinocilia of P2 cochlear hair cells. Furthermore, axonemal CCDC39 and CCDC40, the molecular rulers that organize the axonemal structure in the 96-nm repeating interactome were not detected in cochlear hair cells. We have revised the text to emphasize key differences.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Spontaneous flagella-like rhythmic beating of kinocilia in vestibular HCs in frogs and eels (Flock et al., 1977; Rüsch and Thurm, 1990) and in zebrafish early otic vesicle (Stooke-Vaughan et al., 2012; Wu et al., 2011) has been reported previously. Based on Rüsch and Thurm (1990), spontaneous kinocilia motility occurred under non-physiological conditions and was interpreted as a sign of cellular deterioration rather than a normal feature. We speculate that deterioration under non-physiological conditions may lead to the disruption of lateral links between the kinocilium and the stereociliary bundle, effectively unloading the kinocilium and allowing it to move more freely. Additionally, fluctuations in intracellular ATP levels may contribute, as ciliary motility is highly ATP-dependent; when ATP is depleted, beating ceases. Similar phenomena have been documented in respiratory epithelia, where ciliary activity can temporarily pause. Nevertheless, the fact that kinocilia can exhibit spontaneous motility under these conditions indicates that they possess the motile machinery necessary for such beating. Irrespective of the condition, cilia without the molecular machinery required for motility will not be able to move.

      We agree with the reviewer that, based on the present data, it is difficult to know the functional role of kinocilia and whether the presence of such autonomous rhythm would interfere with temporal fidelity. Spontaneous bundle motion, driven by the active process associated with mechanotransduction, was observed in bullfrog saccular hair cells (Benser et al., 1996; Martin et al., 2003). We have revised the discussion to clarify this important point of the reviewer. Specifically, we will emphasize that our observations of ciliary beating in the ex vivo conditions may not reflect its properties in the mature in vivo context, but rather a byproduct of motile machinery clearly present in the kinocilia. We speculate that this machinery in mature hair cells could operate in a more subtle mode—modulating the rigor state of dynein arms or related axonemal structures to influence kinociliary mechanics and, in turn, bundle stiffness in response to stimuli or signaling cues. Such a mechanism could either enhance sensitivity or introduce filtering properties, thereby contributing to the fine control of mechanosensory function without compromising temporal fidelity. Future studies using loss-of-function approach will be needed to reveal the unexplored role(s) of kinocilia for vestibular hair cells in vertebrates.

      We note that spontaneous activity exits throughout nervous system. It allows the nervous system to maintain baseline activity and interpret signals. Retinal cells are spontaneously active even in the dark and spiral ganglion neurons also fire spontaneously. Spontaneous hair bundle motion driven by mechanotransduction-related mechanism has been observed in bullfrog saccular hair cells. So, it is unlikely that spontaneous kinocilia beating would interfere with generating temporally faithful representations.

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

      We thank the reviewer for these excellent suggestions. We agree that kinociliary motility could plausibly serve roles during development, for example by guiding hair bundle formation or by contributing to early mechanosensitivity and spontaneous neural activity before mature stimulation mechanisms are established. It is also possible that the motility machinery represents a latent capacity in mature vestibular hair cells that could be reactivated under stress or pathological conditions. We have revised the Discussion to address these possibilities and to provide a more nuanced consideration of whether the observed motility is normal and what potential functions it might serve.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

      We would like to thank reviewer 2 for his/her comments and hope that the datasets provided in this manuscript will be a useful resource for researchers in the auditory and vestibular neuroscience community.

      Joint Recommendations for the authors:

      (1) Figure 1 - Explain how hair cell types are recognized after dissociation. Figure 1 will not be clear in this regard for non-aficionados. Some of the dissociated cells shown appear quite distorted and even unhealthy - e.g., the bottom right crista type II hair cell; the second from left crista type I hair cell; can you address why this doesn't matter for the purposes of this study?

      HC types in Fig. 1C were identified based on their morphological features: Type I HCs are flask-shaped with a narrow neck while type II HCs are cylindrical and short. We have replaced those cells with new images. In our study, HCs were identified based on their marker genes. Although some HCs such as those shown in Fig. 3C were impossible to avoid during preparation of single cells for library (most people did not examine their morphology), quality of mRNA and sequencing was high, better than those datasets published in previous studies.

      (2) Line 98 - Explain accessory cells (as opposed to supporting cells).

      We changed accessory cells to other cell types.

      (3) Line 246 - The primary cilium is...

      Changed.

      (4) Figure 6D - The scale bar is missing. Please use arrows to point to the genes you call out in the text. Also, the genes called out in the text as differently expressed (line 342) are quite faint bands in both cell types. It would be a service to the reader to point them out in the panel.

      A scale bar has been added. We also marked those genes as suggested and edited the text accordingly.

      (5) Figure 7 - mixes frog crista and mouse middle ear images with waveforms and FFTs from frog crista, mouse middle ear, and mouse crista. Related to these still images are 2 videos of frog kinocilium beating (2 hair cells). The mouse images must be underwhelming, or we would have been shown those, yet they were considered adequate to analyze.

      Yes, the spontaneous kinocilia motion of mouse crista HCs is very small. The peak motion is about 40 nm, which is very close to the resolution of our camera. That is why we used photodiode technique to detect its motion. Photodiode is more sensitive, and this technique allows us to observe dynamic response waveform.

      (6) I recommend labeling each figure panel with the tissue of origin to avoid confusion.

      Labeled as suggested.

      (7) I suggest dropping the mouse middle ear data, as they are not directly adequate as a positive control (or no more so than the more beautiful frog data).

      We keep the waveforms of middle ear cilia movement in Fig. 7. The main reason is that we would like to show the magnitude difference between airway cilia and kinocilia. The kinocilia movement was at least an order of magnitude less than the movement of airway cilia. This has led to our effort to generate a model to predict the 96-nm modular repeat and explain why kinocilia movement in mice is much smaller than airway cilia and bullfrog kinocilia.

      (8) Focus on the hair bundle motions:

      (a) Show the waveforms for the frog crista hair cells and their FFTs.

      These images were captured many years ago using camera. The kinocilia motion is between 5 and 10 Hz. We did not present any waveforms of kinocilia motion since we no longer have access to bullfrogs. However, although we did not present response waveforms, the videos are very powerful for visualization of kinocilia beat of bullfrog saccular HCs.

      (b) Find some way to show us how you measured the mouse hair bundle beating.

      Photodiode technique was used to measure spontaneous kinocilia motion in mice. More details are now included in the text.

      (c) Does EGTA break links between kinocilium and stereocilia? (Could that contribute to the higher beat frequency?) Just applying the same treatment and viewing from above could clarify whether kinocilia dissociate from stereocilia rows. This would likely be more straightforward with an otolith organ.

      All these links (tip links, side links) are vulnerable to Ca concentration and Ca-free medium is often used to break these links as shown in many previous studies. Breaking the kinocilia links leads to reduced load to the kinocilia, which may result in larger motion of the kinocilia. The frequency is inherent to motile machinery and subject to temperature and intracellular ATP concentration. When facing upward, the hair bundles in otolith organ do not have a good contrast against HCs in the background. This makes measurement of their motion difficult, especially when the motion is small and random and can’t be averaged to improve signal to noise ratio. Besides, unlike cochlear HCs whose hair bundles are short and can easily be oriented in parallel with light path, the long hair bundle of vestibular HCs is more difficult to orient and image. For these reasons, we chose to use crista hair bundles for our measurements since they can be oriented in perpendicular to the light path without interference from background HCs. The lateral motion of the entire bundle is also relatively easy to measure in this preparation.

      (6) Is there no reason to cite McInturff et al. (2018), given that they compared type I and II VHC transcriptomes at P12 and P100? This database is also available on gEAR.

      Their studies are now cited. We also compared their datasets with ours.

      (7) Line 374 - Eatock et al., 1998 citation does not work for this purpose. Eatock & Songer (2011) would be better, or Li, Xue, Peterson (2008): mouse utricle anatomy; significant discussion of relative heights of kinocilia and tallest stereocilia.

      Changed and cited.

      (8) In Figure 3, 2 of the 18 panels in B are missing labels.

      The bar, applied to all panels, was there at the bottom of Fig. 3B. The bar is bigger and more visible in the revision.

      (9) Line 187 should "Sppl1" be Spp1?

      Corrected.

      (10) Define BBSome on line 244.

      Added.

      (11) Looking at Figure 5, it seems that all the motile genes are expressed in the vestibular hair cells and not the cochlear hair cells. It is surprising that there are any cilia-related genes expressed in these adult cochlear hair cells, given that they do not retain their cilia into adulthood. Could the authors make a comment on this finding in the discussion? Also, are there any ciliopathies that show a vestibular defect but normal hearing in mice or humans? Have you compared the cilia-related gene expression in neonatal/embryonic vestibular hair cells to your dataset?

      There are many kinocilia related genes still expressing adult cochlear HCs. It is not surprising to see many kinocilia related genes in cochlear HCs. Most of these genes are related to primary cilia structure including the basal body and transporters in cilia. The basal body is still present in cochlear HCs. Many other primary cilia-related proteins are also expressed in soma, especially those related to signal transduction, microtubule cytoskeleton, actin cytoskeleton, vesicle transport, metabolic enzyme, protein folding, translation, nuclear transport, ubiquitination, RNA binding, mitochondrial proteins and transcription factors. Of course, some of them are vestigial. We added discussion of this in the text. Comparison between neonatal cochlear and vestibular was presented in Fig. 6D. We compared those genes related to the axonemal repeat (96 nm repeat complex). Due to quality of mRNA, the total genes and genes related to kinocilia detected in previous developmental studies were much less than our datasets. While we detected 112 out of 128 genes related to axonemal repeat, only 90 genes were detected in previous studies (Burns et al., 2015; McInturff et al., 2018). Therefore, we only compared neonatal cochlear and vestibular HCs using their datasets. As far as we know, no ciliopathies with vestibular defects but normal hearing have been reported in mice or humans. But we plan to use a Ccdc39 mutant mouse model to examine how loss of function of a key motile cilia signature gene would affect kinocilia motility and vestibular function.

      (12) How is "expression level" in the violin plots being calculated? Is this a measure of read count? The normalization is cursorily explained in the methods. Is this value comparable across genes? Did the authors switch to z-score by Figure 6?

      We dissected the auditory and vestibular sensory epithelia from the same groups of mice and prepared libraries and sequenced them at the same time. All parameters are the same. The violin Plots are based on values presented in Supplementary Table 1. Each dot in the plot reflects an aggregated number of reads across all cells for each gene. They are all normalized across different HC types and biological repeats. The details for normalization are now provided.

      (13) The authors comment on the 16/128 motile cilia axonemal repeat genes that are not expressed in the vestibular hair cells. Listing these somewhere may be helpful to the readers.

      We thank the reviewer for this helpful suggestion. Most of the 128 motile cilia axonemal repeat genes were listed in Figs 8C and S5, along with known loss-of-function mutations and ciliopathy associations identified in human diseases or observed in animal models. To improve clarity, we have now included Table S2, which provides the complete list of all 128 motile cilia axonemal repeat genes, including those not expressed in vestibular HCs.

      (14) Figure 5D needs some refinement. While the authors used databases, including CiliaCarta, SYSCILIA gold standard, and CilioGenics, to identify the primary cilia-related genes, they have included many genes that are not highly specific to primary cilia function (e.g., HSP90, HSPA8, DNAJA4, GNAS...). Perhaps the authors would be able to do a better job of specifically querying primary cilia function by using genes that are common to these three databases.

      We presented comparison and analysis based on three major cilia databases, which are generated from proteomics of cilia from different tissues/organisms. In addition, we have provided more comprehensive list of primary cilia-related genes in Fig. S2. While majority of cilia-related genes/proteins are highly conserved, some genes/proteins are tissue-/organism-specific. Majority of the genes presented in Fig. 5D of our manuscript are shared among all three databases. The cilium is a complex structure, composed of proteins for microtubule cytoskeleton, actin cytoskeleton, vesicle transport, metabolic enzyme, signaling, and protein folding. It also contains proteins for translation, nuclear transport, ubiquitination, RNA binding as well as mitochondrial proteins and transcription factors (https://ciliogenics.com/?page=Home). Proteins such as HSP90 and HSPA8 are important for protein folding. HSPA8 also functions as an ATPase in the disassembly of clathrin-coated vesicles during transport of membrane components through the cell. GNAS is part of a G protein complex that transmits signals. DNAJA4 is one of the high-confidence cilia proteins (mean score of 1.26, expression rank is 938). These proteins are detected in cilia according to CilioGenics (https://ciliogenics.com/?page=Home). These proteins are not highly specific to cilia and are expressed in soma as well. Most of these proteins for signaling such as WNT (Supplementary Fig. 2) are detected in both cilia and soma.

      (15) The authors state, "Furthermore, we observed robust spontaneous kinocilia motility in bullfrog crista HCs and small spontaneous bundle motion in mouse crista HCs." This statement should be moderated by acknowledging that this motility was observed in only some cells. The authors favor the hypothesis that the lack of motility in some crista HCs is due to depolarization or damage to the sample. The authors should also acknowledge the possibility that there may be cell-to-cell variability in the motility of the kinocilia.

      We address these issues in public review section. We modified the statement as suggested.

      (16) The first few pages of the Results section include many lists of genes. Readability may be improved if this is curtailed modestly.

      Changed as suggested. We removed comparison among different types of HCs and replotted Fig. 2B. This has reduced the number of genes mentioned in the text.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this revised manuscript, Qin and colleagues aim to delineate a neural mechanism that is engaged specifically in the sated flies to suppress the intake of sugar solution (the "brake" mechanism for sugar consumption). They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in active state when the concentration of glucose is high. This activation depends on the cell-autonomous function of Hugin-releasing neurons that sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces fly's sugar intake motivation. They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostal nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      The shift of the narrative, which focuses specifically on the hugin-AstA axis as the "brake" on the satiety signal and feeding behavior, clarified the central message of the presented work. The authors have provided multiple lines of compelling evidence generated through rigorous experiments. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers.

      While I deeply appreciate the authors' efforts to substantially restructure the manuscript, I have a few suggestions for further improvements. First, there remains room for discussion whether the "brake" function of the hugin-AstA axis is truly satiety state-dependent. The fact that neural activation (Fig. Supp. 8), peptide injection (Fig. 3A, 4A), receptor knockdown (Fig. 3C,G, 4E), and receptor mutants (Fig. Supp. 10, 12) all robustly modulate PER irrespective of the feeding status suggests that the hugin-AstA axis influences feeding behaviors both in sated and hungry flies. Additionally, their new data (Fig. Supp. 13B, C) now shows that synaptic transmission from hugin-releasing neurons is necessary for completely suppressing feeding even in sated flies. If the hugin-AstA axis engages specifically in sated (high glucose) state, disruption of this neuromodulatory system is expected to have relatively little effect in starved flies (in which the "brake" is already disengaged).

      We thank the reviewer for pointing out this inconsistency. We have corrected this interpretation. Specifically:

      (1) We removed statements suggesting that the circuit is fully disengaged during starvation.

      (2) We now state that endogenous hugin activity is reduced during starvation, but the circuit retains modulatory capacity when experimentally perturbed.

      (3) The Discussion now emphasizes that the system operates as a state-modulated inhibitory tone rather than a strictly fed-state switch.

      We believe this revised framing resolves the discrepancy.

      In this context, it is intriguing that the knockdown of PK2-R2 hugin receptor modestly but consistently decreases proboscis extension reflex specifically in starved flies (Fig. 3D, H). The manuscript does not discuss this interesting phenotype at all. Given the heterogeneity of hugin-releasing neurons (Fig. Supp. 7), there remains a possibility that a subset of hugin-releasing neurons and/or downstream neurons can provide a complementary (or even opposing) effect on the feeding behavior.

      We agree that this is an important observation. Although the effect size is modest, it is reproducible and suggests that hugin signaling may not operate as a strictly linear pathway.

      To address this:

      (1) We added a paragraph in the Results acknowledging the PK2-R2-dependent phenotype.

      (2) We included a discussion noting the potential functional heterogeneity of hugin neurons.

      (3) The schematic model (now Figure Supplementary 17, previously Figure Supplementary 16) includes a dashed line indicating a possible parallel PK2-R2-dependent branch.

      Given these intriguing yet unresolved issues, it is important to acknowledge that whether this system is "selectively engaged in fed states to dampen sweet sensation (in Discussion)" requires further functional investigations. Consistent effects of manipulation of the hugin-AstA system across multiple experimental approaches underscores the importance of this molecular circuitry axis for controlling feeding behaviors. Moderation of conclusions to accommodate alternative interpretation of data will be beneficial for field to determine the precise mechanism that controls feeding behaviors in future studies.

      We fully agree with the reviewer. Our original description of the circuit as a “satiety brake” implied exclusive engagement in fed states, which is not strictly supported by the behavioral data. Although endogenous hugin activity is elevated under fed conditions (as shown by CaMPARI), experimental manipulations demonstrate that the circuit retains functional capacity to modulate feeding behavior across feeding states.

      To address this concern, we have:

      (1) Removed the term “satiety-specific brake” throughout the manuscript.

      (2) Reframed the circuit as a glucose-responsive, state-modulated inhibitory module.

      (3) Revised the Discussion to explicitly state that the hugin–AstA pathway biases sweet sensitivity according to circulating glucose levels rather than functioning as an on/off switch.

      (4) Substantially revised Supplementary Figure 17 to reflect graded modulation across metabolic states rather than binary state engagement.

      These changes better align our conclusions with the experimental observations.

      Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations, and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest in both fed and starved flies, suggesting that glucose intake through Glut1 may only be part of the mechanism.

      We agree that the modest PER phenotype suggests that Glut1-mediated glucose uptake represents one component of glucose sensing in hugin neurons. We have clarified this in the Discussion and now explicitly state that additional glucose-sensing mechanisms may contribute to hugin activation.

      Additionally, many of the manipulations testing the "brake" circuitry throughout the study show similar effects in both fed and starved flies. This suggests that the focus of the discussion and Supplemental Figure 16 on a satiety-specific "brake" mechanism may not be fully supported by the data.

      We fully agree that the previous framing overstated state specificity.

      As described above, we have:

      (1) Removed “satiety-specific brake” terminology.

      (2) Reframed the circuit as a glucose-responsive inhibitory module.

      (3) Revised the Discussion to explicitly acknowledge modulation across feeding states.

      (4) Updated the schematic model (Figure Supplementary 17, formerly Figure Supplementary 16) accordingly.

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the authors):

      Both the reviewers and I agree that the conclusion about a "satiety-dependent" brake needs to be modified to discuss the phenotypes that are also observed under starved conditions. Reviewer 1 would further like to emphasize that the authors are not required to follow through with the specific recommendations suggested by them. Modifying the conclusion and Supplementary Figure 16 should suffice.

      We sincerely thank the Reviewing Editor for the clear guidance. We fully agree that our previous framing of the hugin–AstA circuit as a strictly “satiety-dependent” brake may have overstated the state specificity of the system.

      In response to this recommendation, we have:

      (1) Revised the Abstract, Results, and Discussion to moderate the conclusion and explicitly acknowledge the phenotypes observed under starved conditions.

      (2) Reframed the circuit as a glucose-responsive, state-modulated inhibitory module, rather than a satiety-exclusive brake.

      (3) Supplementary Figure 17 (formerly Figure Supplementary 16) has been substantially revised to illustrate graded modulation across metabolic states rather than binary engagement.

      We appreciate the clarification that no additional experiments were required and are grateful for the opportunity to improve the conceptual framing of our work.

      Please include full statistical reporting in the main manuscript (e.g., figure legends or results).

      We have revised all figure legends to include full statistical reporting.

      Reviewer #1 (Recommendations for the authors):

      By re-framing their finding as the "brake" mechanism on satiety-induced suppression of feeding behavior and sensitivity to sweet taste, the authors substantially improved the clarity of their findings and their significance. The additional data (Fig. Supp. 13B, C) allows "apple-to-apple" comparisons of behavioral data. I support the publication of this manuscript with no further experiments, although I have several suggestions for the text.

      As I write in the public review, I have a reservation on the authors' argument that hugin-AstA system is the "'satiety brake' - that is selectively engaged in fed states to dampen sweet sensation (lines 392-394)". Manipulation of both hugin system (Fig. 2C, Fig. 3A, C, D, G, Fig. Supp. 8A, C, Fig. Supp. 10A-C, Fig. Supp. 13B, C) and AstA system (Fig. 4A, E, Fig. Supp., 8C, D, Fig. Supp. 12A-C, Fig. Supp. 13D) all indicate that hugin-AstA system suppresses feeding regardless of the satiety state. Specifically, Fig. Supp. 13B shows that synaptic blockade does further increases PER, causing contradictions to authors' statements ("silencing hugin+ neurons led to enhanced sweet-driven feeding behavior (line 299-300)" and "...further silencing has little additional effect (line 402)"). The CaMPARI data (Fig. 1J) provides the link between the activity levels of hugin-releasing neurons and satiety state. However, the fact that eliminating hugin-AstA signal can promote further PER in starved flies suggests that this brake is not completely satiety-dependent. I ask authors to at least discuss this perceived discrepancy between their data and conclusions.

      Also, the authors' finding that PK2-R2 reduction actually suppresses PER specifically among starved flies (Fig. 3D, H), albeit with relatively small effect size, suggests that hugin-AstA axis is not a singular, linear pathway as authors suggest in Fig. Supp. 16. While delineating the PK2-R2-dependent pathway is beyond the scope of this study, at least a line of discussion would be helpful.

      Minor comments:

      (1) Fig. Supp. 8 (dTRPA1 activation of hugin and AstA neurons), and Fig. Supp. 13B-D (inhibition of hugin and AstA neurons) should be in the main figure given its relevance to the narrative of this manuscript.

      We agree with the reviewer regarding their importance. The key behavioral panels from these figures have now been moved to the main figures to strengthen the narrative flow.

      (2) Fig. Supp. 11 (PER and imaging using decapitated heads only), despite its creativity, leaves me wonder how PER of fly heads looks like. It is a highly artificial and invasive experiment. Supplementary movies would be helpful.

      We apologize for the lack of clarity in our description. In this experiment, flies were not decapitated. Instead, we surgically severed the connection between the brain and the ventral nerve cord (VNC), while keeping the body and proboscis musculature intact. Thus, the flies remained physically intact, and PER was measured using the same behavioral protocol as in intact animals.

      We have revised the figure legend to clarify this point and avoid confusion. Because the behavioral procedure was identical to standard PER assays and the flies retained normal proboscis motor function, we did not include supplementary videos.

      (3) Expression patterns of PK2-R1 and AstA-R2 in proboscis are mentioned in text but with no data (lines 229 and 279). I strongly encourage authors to show images.

      We have now included the relevant expression images in the revised manuscript.

      (4) A citation for the "previous study (line 486)" describing PER method is required.

      The appropriate citation has been added.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      We thank the editor and reviewers for their thoughtful and constructive feedback, which has enabled us to greatly strengthen the manuscript. We apologize for the delay in resubmitting this as we were dealing with a large turnover in the lab due to trainee graduations which has We have carefully revised the text, figures, and supplementary materials in response to these comments. Below, we summarize the key revisions made followed by a point-by-point response to the reviewers’ critiques.

      (1) Performed CUTS analyses in human neuronal system: In the revised manuscript, we included new data demonstrating that the CUTS system can be applied to additional cellular models, specifically neuronal cells (Figure 5, Figure S4). To address whether CUTS functions effectively in neuronal contexts, we generated stable CUTS-expressing lines in differentiated BE(2)-C and ReN VM–derived differentiated neurons (Figure 5A-D, Figure S4 A-C). To ensure this was neuronal expression, we developed a new Tet-On3G system construct where the Tet-On3G transactivating protein is driven by the SYN1 promoter to ensure neuron-specific inducible expression for these experiments.

      (2) Define the relationship between CUTS and endogenous/physiological cryptic exons inclusion: To evaluate how well the CUTS system reflects physiological cryptic exon regulation, we performed RT-PCR analysis of several cryptic exons previously reported by us and evaluated CUTS activation at the RNA level in parallel (Figure S2E) . CUTS is sensitive to low-mild reductions in TDP-43 levels, whereas the tested endogenous cryptic exons exhibit variable responses to TDP-43 knockdown.

      (3) Defining stress-induced TDP-43 loss of function: We included new data demonstrating that the CUTS system can detect TDP-43 loss of function induced by acute sodium arsenite (NaAsO₂) treatment in HEK cells (Figure 3D–I). We have also tested additional stressor as part of a separate ongoing study where this work will be expanded upon (Xie et al., 2025). We selected this paradigm since TDP-43 loss of function in response to acute NaAsO₂ treatment is also supported by work from other labs(Huang et al., 2024).

      (4) Implications of using a TDP-43 Loss-of-Function sensor for therapeutic applications: In the revised manuscript, we clarify that CUTS-TDP43 is auto-regulated and we highlight two potential therapeutic applications: i) TDP-43 Knockdown-and-replacement: CUTS-TDP43 provides a strategy for simultaneous depletion of pathological TDP-43 species while enabling autoregulated re-expression of wild-type TDP-43. This design mitigates the risk of supraphysiologic overexpression, a known liability in conventional replacement approaches, by restoring TDP-43 within a self-limiting regulatory network that maintains homeostatic control. ii) Aggregation-independent correction: Because CUTS is autoregulatory, it can be repurposed to regulate alternative downstream effectors, including splicing modifiers or TDP-43 functional interactors, without expressing TDP-43 itself. This approach provides a potential aggregation-independent strategy to compensate for TDP-43 loss-of-function (LOF) by restoring downstream splicing. We are evaluating this work in a follow up study (Xie et al., 2025). In these ongoing studies, we show that CUTS-regulated expression of splicing proteins in response to TDP-43 loss restored subsets of cryptic exon events (24/28 events evaluated). These findings suggest CUTS as a versatile tool for both autoregulated TDP-43 replacement and trans-regulatory therapeutic correction. We expanded on this concept in the discussion section of this revised manuscript. We also note that autoregulatory TDP-43 biosensor strategies have been proposed in related systems, including TDP-Reg, underscoring broader interest in self-regulated TDP-43 systems (Wilkins et al., 2024).

      (5) Clarified mechanism of TDP-43 5FL causing strong loss of function: The TDP-43 5FL exhibits reduced RNA binding capacity, and we previously showed that the lack of RNA binding promotes aberrant homotypic phase separation of TDP-43 (Mann et al., 2019). Expression of RNA-deficient TDP-43 variant forms nuclear “anisomes” (Yu et al., 2021), which evidence suggests sequesters endogenous TDP-43 protein into insoluble structures. We expanded on this in our results section in this revised manuscript.

      (6) Improved figure clarity and data presentation: To enhance clarity and organization, we maintained the main structure of the manuscript while reorganizing figures and improved data visualization. Some examples include:

      Figure 1: We revised the schematic layout for greater clarity and simplicity. The figure now focuses more specifically on the CUTS data, with additional data on the UNC13A-TS and CFTR-TS moved to Figure S1. To improve readability, titles were added to all schematic panels. Visual consistency was also improved by refining the color labelling for each sensor in Figures 1C and 1D and adjusting the corresponding bar graphs accordingly.

      Figure 2: We reorganized the figure to clearly distinguish between protein and mRNA analyses for greater clarity. In the revised layout, western blot quantifications of TDP-43 and CUTS (GFP) signals are shown in Figures 2D and 2E, respectively, while the corresponding qPCR analyses are presented in Figures 2H and 2I. Minor edits include removing the percentage knockdown and fold-change annotations from the graphs and incorporating these values into a mini-table in Figure S2E.

      The original Figure 2D and 2G were reincorportated as reference panels in Figure S2A–B, while new graphs showing CUTS protein-level changes as a function of TDP-43 knockdown were added (Figure S2C–D). We also incorporated new data showing the behavior of endogenous cryptic exons under low siTDP-43 treatment (Figure S2E).

      Figure 3: We added new data demonstrating that the application of the CUTS system in detecting TDP-43 loss of function induced by stress conditions. Specifically, we show that sodium arsenite (NaAsO₂) treatment leads to TDP-43 functional impairment detectable by CUTS and supported with endogenous cryptic exon via RT-PCR (Figure 3D-I).

      Figure 5 and Figure S4: We introduced a new figure that demonstrates the effective application of the CUTS system in differentiated neuronal systems, thereby extending its usability to disease-relevant cell types.

      Figures 2SA and 4B were edited to include the corresponding labels on the sides of each image for clarity. Sup Figure 2A was moved to Sup Figure 3A, while Figure 4B remains in its original configuration.

      We thank the reviewers again for their insightful critiques and helpful suggestions, which have enabled us to substantially improve the manuscript. Please find our detailed response to each review below:

      Reviewer #1 (Public review):

      Summary:

      The authors create an elegant sensor for TDP -43 loss of function based on cryptic splicing of CFTR and UNC13A. The usefulness of this sensor primarily lies in its use in eventual high throughput screening and eventual in vivo models. The TDP-43 loss of function sensor was also used to express TDP-43 upon reduction of its levels.

      Strengths:

      The validation is convincing, the sensor was tested in models of TDP-43 loss of function, knockdown and models of TDP-43 mislocalization and aggregation. The sensor is susceptible to a minimal decrease of TDP-43 and can be used at the protein level unlike most of the tests currently employed,

      Weaknesses:

      Although the LOF sensor described in this study may be a primary readout for high-throughput screens, ALS/TDP-43 models typically employ primary readouts such as protein aggregation or mislocalization. The information in the two following points would assist users in making informed choices.

      (1) Testing the sensor in other cell lines

      We thank the reviewer for raising this important point. In agreement with this suggestion, we generated ReN VM cell lines and used a neuroblastoma cell line model (BE(2)-C) expressing the TetOn3G CUTS system under a human synapsin I (hSYN1) promoter. In this construct the transactivator protein is under the control of a neuronal specific hSYN1 promoter whereas the classical TetOn3G system uses a CMV-like promoter. Several studies have reported reduced activity or silencing of CMV and PGK-driven transgenes in neurons. Therefore, we for our neuronal experiments, we removed this promoter to generate a new version of a doxycycline-inducible CUTS system in which Tet-On 3G transactivator is now driven by the hSYN1 promoter which will express CUTS in response to doxycycline treatment. In this improved construct, we also replaced mCherry with mScarlet to enhance the fluorescent signal.

      To test this neuronal-adapted system, we established stable CUTS expression in undifferentiated BE(2)-C cells, a subclone of the SK-N-BE(2) neuroblastoma line that has been used to study TDP-43–dependent splicing function(Brown et al., 2022). This model can be differentiated into neuron-like cells within 10 days, as shown in Supplementary Figure 4A. Using this model, we confirmed that TDP-43 knockdown leads to robust activation of the CUTS system (Figure 5B-E). We additionally tested this in in a stable polyclonal ReN VM cells following differentiation into cortical-like neurons (Figure 5D, Figure S4B-C).

      (2) Establishing a correlation between the sensor's readout and the loss of function (LOF) in the physiological genes would be useful given that the LOF sensor is a hybrid structure and doesn't represent any physiological gene. It would be beneficial to determine if a minor decrease (e.g., 2%) in TDP-43 levels is physiologically significant for a subset of exons whose splicing is controlled by TDP43.

      We agree with the reviewer that correlating the sensor’s readout with physiological TDP-43 splicing targets is essential to validate its biological relevance. To this end, we complemented our sensor expression profile with endogenous cryptic exons (CEs) sensitive to TDP-43 depletion. We tested a panel of five physiological cryptic exons regulated by TDP-43 (LRP8, EPB41L4A, ARHGAP32, HDGFL2, and ACBD3). To address the reviewer’s concerned, we performed RT-PCR on samples from the low-dose siTDP-43 experiment shown in Figure S2E.

      The endogenous CEs used in the panel were selected based on our own and others’ preliminary observations. Among these, HDGFL2 showed a particularly robust increase in cryptic exon inclusion at very low siTDP-43 concentrations (38 pM), while untreated samples showed almost no CE inclusion. This finding strongly supports a direct mechanism linking mild TDP-43 reduction to loss of physiological splicing control.

      (3) Considering that most TDP-LOF pathologically occurs due to aggregation and or mislocalization, and in most cases the endogenous TDP-43 gene is functional but the protein becomes non-functional, the use of the loss of function sensor as a switch to produce TDP-43 and its eventual use as gene therapy would have to contend with the fact that the protein produced may also become nonfunctional. This would eventually be easy to test in one of the aggregation modes that were used to test the sensor.. However, as the authors suggest, this is a very interesting system to deliver other genetic modifiers of TDP-43 proteinopathy in a regulated fashion and timely fashion.

      We thank the reviewer for this thoughtful point and agree that in the disease-relevant context where endogenous TDP-43 is intact but TDP-43 function is lost due to mislocalization and/or aggregation, a re-supply of TDP-43 risks sequestration and loss of activity. In our manuscript, the CUTS-TDP43 module was presented as a control circuit proof-of-concept rather than a stand-alone approach: it demonstrates that CUTS can (i) sense LOF with high dynamic range and proportionality, and (ii) drive a payload under negative feedback such that total TDP-43 remains near baseline while partially rescuing a splicing readout (CFTR minigene) under knockdown conditions.

      Importantly, we evaluated CUTS in aggregation/mislocalization-prone contexts: ΔNLS, 5FL, and ΔNLS+5FL variants trigger CUTS activation (ref), allowing us to quantify LOF arising from these aggregation modes. This confirms that CUTS can operate precisely in the very settings where sequestration is likely to occur.

      To directly address the reviewer’s suggestion, in the revision we (i) clarify in the Discussion that CUTS-TDP43 is a circuit demonstration and not our proposed monotherapy in aggregation-dominant disease; and (ii) expand our therapeutic framing into two approaches:

      Knockdown-and-replacement: concurrently deplete aggregation-prone/endogenous pathologic TDP-43 species (i.e., mutant TDP-43) while using CUTS to re-deliver wild-type TDP-43 under autoregulation. Aggregation-independent correction: use of CUTS to deliver modifiers that bypass TDP-43 sequestration (e.g., downstream effectors or splicing correctors that restore LOF consequences without expressing TDP-43 itself).

      (4) I don't think the quantity of siRNA is directly proportional to the degree of TDP-43 knockdown/extent of TDP-43 loss. Therefore, to enhance the utility of the dose-response curves, I'd suggest using TDP-43 levels as the variable on the x-axis, rather than the amount of siRNA administered or even just adding a plot alongside the current plots would enable readers to quickly evaluate LOF response levels concerning the protein. While I understand that the sensitivity of Western blots for quantification might be why the authors have not created the graphs in this manner, having this information would be useful.

      We appreciate the reviewer’s insightful comment. As noted, in the original version of the graph, we incorporated the percentage of TDP-43 knockdown corresponding to each siTDP-43 concentration (indicated in red text). However, we agree that this format was not easy to interpret, given the amount of information presented. To address this, we generated two new plots in which the x-axis represents TDP-43 levels (percentage of remaining protein or mRNA), and the y-axis shows the fold change in CUTS signal measured by (i) TDP-43 protein pixel intensity and (ii) TDP-43 mRNA levels, respectively. These new plots are now included as Supplementary Figures 2C–D, which allow a clearer visualization of CUTS readout in relation to actual TDP-43 levels rather than siRNA dose. As the reviewer anticipated, the reason we did not originally present the data in this format was that at low siTDP-43 concentrations, the fold change is minimal and more difficult to quantify by Western blot. Nevertheless, we have now incorporated the revised plots to strengthen the interpretation of the dose–response relationship. Additionally, we experience batch effects across siRNA lots. We believe this revised format should enhance the clarity of the result.

      (5) p3 line 74: one of the reasons cited as a pitfall of using the endogenous cryptic exons exhibit variable responses to TDP-43 loss and may be cell type-specific. has the sensor been used in different cell lines?

      We tested the CUTS system in differentiated neuronal models using two differentiated neuronal cell types, BE(2)C and ReN VM cells. The results are presented in Figure 5 and Figure S4 of the revised manuscript.

      (6) The order of the text describing 1A and 1B is confusing. The text starts describing the TS cassettes referring to 1A using the CUTS cassettes which haven't been introduced yet as an example. I'd suggest reorganising this section. The graph, always in 1A showing readout proportional to GFP should be taken out or highlighted in the figure legend that it is theoretical.

      We agree with the reviewer’s point. In the original schematic (Figure 1A), we included the CUTS system as an example to introduce the TS cassette design, since it contains the three possible sensor configurations. However, we recognize that this could be confusing. Therefore, we have removed the CUTS cassette from Figure 1A, along with the theoretical graph showing GFP readout proportional to the degree of TDP-43 LOF. In agreement with this change, we also restructured Figure 1. As the focus is the CUTS system, we have moved the Western blot and quantification of UNC13A-TS and CFTR-TS to Supplementary Figure 1.

      Reviewer #2 (Public review):

      Summary:

      The authors goal is to develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR-based assays to determine whether targets of TDP-43 were up or down-regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, it's cost-effective, rapid and reliable.

      Strengths:

      In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFPfluorescence) adding additional rigor. The final major strength I'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:

      (1) Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed.

      We thank the reviewer for highlighting the importance of validating the sensor in neuronal models, given the central role of TDP-43 dysfunction in ALS/FTD and related neurodegenerative disorders. While initial characterization in established cell lines provides experimental control and scalability, we agree that demonstrating functionality in neuronal systems is essential. To address this, we adapted the CUTS platform for neuronal application by incorporating the human synapsin-1 (hSYN1) promoter into the Tet-On 3G system to enable inducible, neuronal specific expression. We validated this configuration in differentiated BE(2)-C cells (Figures 5A-C, S4A-C), where CUTS retained robust responsiveness to TDP-43 perturbation. In parallel, we generated stable CUTS-expressing ReN VM neural progenitor cells and differentiated them for three weeks prior to functional assessment (Figures 5A-C, S4A-C). In both neuronal models, CUTS was functional and responsive to TDP-43 siRNA. We are currently optimizing promoter selection and expression paradigms for fully differentiated iPSC-derived neuronal models and will be the subject of future studies.

      (2) The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogeneous in the image panels, for example, Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      We thank the reviewer for this thoughtful suggestion. We agree that flow cytometry and sorting of GFP-positive populations would provide a higher-resolution, single-cell–level relationship between TDP-43 abundance and sensor output. Such an approach would reduce heterogeneity arising from incomplete siRNA penetrance and allow more precise quantification of how incremental changes in TDP-43 protein levels track with GFP fluorescence. In the present study, our goal was to establish proof-of-principle functionality of the CUTS circuit and to demonstrate that graded TDP-43 depletion produces a proportional sensor response at the population level. While GFP signal heterogeneity is visible in imaging panels, we hypothesize that this variability likely reflects known differences in siRNA uptake and transfection efficiency rather than instability of the circuit itself. Importantly, bulk measurements consistently demonstrated dose-dependent sensor regulation across independent experiments, supporting the robustness of the system despite cellular heterogeneity. Furthermore, we were able to quantify CUTS activation in HeLa TARDBP<sup>-/-</sup> cells. We also note that CUTS was developed as a practical tool for rapid assessment of TDP-43 LOF in standard laboratory settings. Although flow cytometry increases resolution, the ability to detect functional perturbation using bulk fluorescence measurements supports the utility of the system for routine and high-throughput applications.

      We agree that flow cytometry would provide a more refined analysis of the dynamic range and sensitivity of CUTS, particularly for defining thresholds such as minimal TDP-43 knockdown required for measurable activation. We plan to include this work in future studies. Specifically, we have implemented FACs sorting of CUTS-expressing cells in a parallel study in which we are conducting a CRISPR knockout screen to identify modifiers of TDP-43 splicing function. For this, we incorporate TDP-43 knockdown followed by FACs to stratify cells based on CUTS activation. This strategy enables direct evaluation of the relationship between the extent of TDP-43 LOF and CUTS sensor activation. These analyses are ongoing and provide a more quantitative analyses linking TDP-43 depletion to CUTS activation and address the reviewer’s concern regarding heterogeneity in bulk measurements. We plan to include this in a future study.

      (3) Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs.

      We thank the reviewer for this suggestion. In response, we have split the graphs previously shown in Figures 2D and 2G to improve clarity, as we agree that these panels contained an extensive amount of data. We Specifically split Figure 2D into two separate graphs showing TDP-43 and GFP pixel intensity from Western blots on the Y-axis, plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 D and Figure 2E in the new manuscript.

      Furthermore, for Figure 2G we also split into graphs showing the fold change of mRNA for TDP-43 and the CUTS cryptic exon plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 H and Figure 2I in the new manuscript. We have maintained the previous graphs in Supplementary Figure 2 to preserve the full dataset for reference.

      (4) Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      We appreciate the reviewer’s careful observation. In both figures, we are showing mCherry and GFP signals. In the revised version, we have added the corresponding labels to the side of each image for clarity. Therefore, Sup Figure 2A has been moved and is now Sup Figure 3A, while Figure 4B remains in its original configuration.

      (5) Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and it's unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified.

      The TDP-43 5FL variant exhibits reduced RNA-binding capacity, and we previously demonstrated that impaired RNA binding promotes aberrant homotypic phase separation of TDP-43. Consistent with this mechanism, expression of RNA-binding–deficient TDP-43 variants induces the formation of nuclear “anisomes” which have been shown to sequester endogenous TDP-43 into insoluble fractions via dominant-negative mechanisms (Cohen et al., 2015; Keating et al., 2023; Mann et al., 2019; Yu et al., 2021). These findings support a model in which disruption of RNA engagement alters TDP-43 biophysical behavior and promotes functional depletion through self-association. We have expanded this mechanistic explanation in the Results section of the revised manuscript to better contextualize the behavior of the 5FL construct and its impact on endogenous TDP-43.

      (6) Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      We appreciate this suggestion and agree with this important point. Due to the lack of methods to directly induce endogenous TDP-43 aggregation and loss of function, the use of stressors has become a partial solution to address this issue. In line with this, our group has tested several stressors in follow-up research, including sodium arsenite (NaAsO₂), puromycin, KCl, MG132, sorbitol, and tunicamycin, using HEK cells expressing the CUTS system(Xie et al., 2025). We were able to show a dose-response relationship in relative GFP intensity under these conditions, with sodium arsenite showing the strongest effect, consistent with previous reports(Huang et al., 2024). To provide additional relevant findings in the current manuscript, we expanded this analysis by testing sodium arsenite in the CUTS system while also including endogenous cryptic exons. We therefore added a new figure showing the effect of sodium arsenite on the CUTS system, including GFP intensity measurements, qPCR using CUTS cryptic exon primers, and three endogenous cryptic exon reporters (ATG4B, GPSM2, and KCNQ2).

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many other disorders, having these types of sensors is a major boost to the field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

      (7) Regarding the methods, they seem a bit sparse and would benefit from additional detail. For example, I do not see a section in the methods where microscopy images were quantified (%GFP positive cells for example). This information is important and is lacking in the current form.

      We thank the reviewers, and we add the following information in the method section: For live imaging quantification, we measured the mean GFP signal intensity for each group. The values were averaged, and the fold change was calculated and plotted. For immunofluorescent imaging, we first created maximum intensity projection images. We then applied masks to the GFP, mCherry, and Hoechst signals. By overlapping the GFP and mCherry signals, we identified the number of GFP-positive cells. Similarly, by overlapping the mCherry signal with the Hoechst mask, we identified the CUTS-expressing cells. We then calculated the ratio of GFPpositive cells to CUTS-expressing cells and plotted it as a percentage of GFP-positive cells. All analyses were performed using the Nikon NIS software. This information is included in the methods of the revised manuscript.

      Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR-based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well described unique resource that would be of high interest and utility to a number of researchers. Nonetheless, a couple of points should be addressed by the authors to enhance the overall utility and applicability of this biosensor.

      (1) While the rationale for selecting UNC13A CE as the reporting CE species is understood given the relevance to disease, could the authors please comment on whether other CE sequences would behave similarly or as robustly? This is particularly critical given the multitude of different splicing changes that can occur as a result of TDP-43 loss of function (ie cryptic exons of differing sensitivity, skiptic exons, premature polyadenylation).

      We thank the reviewer for this question regarding generalizability beyond the UNC13A CE. While UNC13A was selected due to its strong disease relevance and well-characterized sensitivity to TDP-43 loss-of-function (LOF), our platform is not intrinsically restricted to this sequence. In the manuscript, we directly compared three architectures: UNC13A-TS, CFTR-TS, and the combined CUTS sensor incorporating additional UG motif optimization. Under matched conditions in stable HEK293 lines, CUTS demonstrated superior specificity and sensitivity, exhibiting near-zero baseline activity and a proportional, log-linear response across low-dose siTDP43 (38–1200 pM) (Figures 1–2). Importantly, this head-to-head comparison demonstrates that sensor performance can be engineered and optimized beyond a single CE species.

      TDP-43 LOF is known to induce a spectrum of RNA processing defects, including cryptic exons with differing sensitivities and cell-type dependence, premature polyadenylation events (e.g., STMN2), and, under conditions of excess nuclear TDP-43, exon skipping (“skiptic exons”). This diversity supports the concept in which alternative CE elements, or other TDP-43 regulated RNAs, can be incorporated into the same sensor backbone and tuned for specific biological scenarios (cell type, specific stress responses, etc...). Consistent with this, the recently described TDP-REG system (Wilkins et al., 2024) designed and AI-generated de novo CE sequences to express reporters or gene payloads, and screened multiple candidates to identify the appropriate RNA elements required for this response. These findings demonstrate that CE sequences beyond UNC13A can serve as robust TDP-43 sensing elements when optimized. Our results complement this work by demonstrating that CUTS achieves tight baseline control and a steep dynamic range (>110,000-fold induction over baseline in HEK293 cells), while maintaining compatibility across both non-neuronal and neuronal model systems, as shown in the revised manuscript.

      In the revised manuscript, we show direct comparisons indicating that CUTS outperforms single-CE sensors such as UNC13A-TS and CFTR-TS under identical conditions. This supports independent work from other groups that alternative CE sequences can be engineered into effective sensors, depending on their paradigm and model systems. We have clarified this in the revised Discussion and now note that CUTS is adaptable to alternative CE inserts.

      (3) Could the authors provide evidence of the utility of their biosensor in disease relevant systems that do not rely on TDP-43 KD? For example, does this biosensor report on TDP-43 loss of function in C9orf72 iPSNs in a time-dependent manner? Alternatively, groups have modeled TDP-43 proteinopathy in wildtype iPSNs via MG132 treatment.

      We thank the reviewer for this important suggestion. We agree that demonstrating CUTS responsiveness in disease-relevant models independent of artificial TDP-43 knockdown would further strengthen its translational relevance. In the current study, our primary objective was to establish the sensitivity, dynamic range, and autoregulatory properties of the CUTS circuit under controlled perturbation of TDP-43 levels. siRNA-mediated depletion provides a reliable approach to establish the relationship between graded TDP-43 LOF and the CUTS sensor sensitivity/specificity. That said, CUTS is designed to detect functional TDP-43 loss irrespective of the upstream cause. As the reviewer notes, disease-relevant systems, such as C9orf72 iPSC-derived neurons and proteotoxic stress paradigms (e.g., MG132-induced impairment of TDP-43 nuclear function), are important for future studies. We are currently evaluating CUTS in iPSC-derived neuronal models of TDP-43 proteinopathy, but are optimizing the induction system, promoters, and timing. It should be noted that C9orf72 iPSC neurons do not exhibit TDP-43 LOF using standard differentiation protocols. Regarding pharmacological stress, we have shown that acute sodium arsenite treatment can activate CUTS (Figure 3). In a concurrent study under revision, we show that MG132 similarly causes TDP-43 LOF and CUTS activation (Xie et al., 2025). Notably, none of these induce complete nuclear loss of TDP-43; instead, they show nuclear TDP-43 retention or modest mislocalization. This suggests that TDP-43 LOF may also result from nuclear redistribution and dysfunction under these stress conditions, rather than from complete nuclear loss. We look forward to presenting these ongoing studies in the future.

      References

      Brown A-L, Wilkins OG, Keuss MJ, Kargbo-Hill SE, Zanovello M, Lee WC, Bampton A, Lee FCY, Masino L, Qi YA, Bryce-Smith S, Gatt A, Hallegger M, Fagegaltier D, Phatnani H, NYGC ALS Consortium, Newcombe J, Gustavsson EK, Seddighi S, Reyes JF, Coon SL, Ramos D, Schiavo G, Fisher EMC, Raj T, Secrier M, Lashley T, Ule J, Buratti E, Humphrey J, Ward ME, Fratta P. 2022. TDP-43 loss and ALS-risk SNPs drive mis-splicing and depletion of UNC13A. Nature 603:131–137. doi:10.1038/s41586-022-04436-3

      Cohen TJ, Hwang AW, Restrepo CR, Yuan C-X, Trojanowski JQ, Lee VMY. 2015. An acetylation switch controls TDP-43 function and aggregation propensity. Nat Commun 6:5845. doi:10.1038/ncomms6845

      Huang W-P, Ellis BCS, Hodgson RE, Sanchez Avila A, Kumar V, Rayment J, Moll T, Shelkovnikova TA. 2024. Stress-induced TDP-43 nuclear condensation causes splicing loss of function and STMN2 depletion. Cell Rep 43:114421. doi:10.1016/j.celrep.2024.114421

      Keating SS, Bademosi AT, San Gil R, Walker AK. 2023. Aggregation-prone TDP-43 sequesters and drives pathological transitions of free nuclear TDP-43. Cell Mol Life Sci 80:95. doi:10.1007/s00018-023-04739-2

      Mann JR, Gleixner AM, Mauna JC, Gomes E, DeChellis-Marks MR, Needham PG, Copley KE, Hurtle B, Portz B, Pyles NJ, Guo L, Calder CB, Wills ZP, Pandey UB, Kofler JK, Brodsky JL, Thathiah A, Shorter J, Donnelly CJ. 2019. RNA Binding Antagonizes Neurotoxic Phase Transitions of TDP-43. Neuron 102:321-338.e8. doi:10.1016/j.neuron.2019.01.048

      Wilkins OG, Chien MZYJ, Wlaschin JJ, Barattucci S, Harley P, Mattedi F, Mehta PR, Pisliakova M, Ryadnov E, Keuss MJ, Thompson D, Digby H, Knez L, Simkin RL, Diaz JA, Zanovello M, Brown A-L, Darbey A, Karda R, Fisher EMC, Cunningham TJ, Le Pichon CE, Ule J, Fratta P. 2024. Creation of de novo cryptic splicing for ALS and FTD precision medicine. Science 386:61–69. doi:10.1126/science.adk2539

      Xie L, Zhu Y, Hurtle BT, Wright M, Robinson JL, Mauna JC, Brown EE, Ngo M, Bergmann CA, Xu J, Merjane J, Gleixner AM, Grigorean G, Liu F, Rossoll W, Lee EB, Kiskinis E, Chikina M, Donnelly CJ. 2025. Contextdependent Interactors Regulate TDP-43 Dysfunction in ALS/FTLD. BioRxiv. doi:10.1101/2025.04.07.646890

      Yu H, Lu S, Gasior K, Singh D, Vazquez-Sanchez S, Tapia O, Toprani D, Beccari MS, Yates JR, Da Cruz S, Newby JM, Lafarga M, Gladfelter AS, Villa E, Cleveland DW. 2021. HSP70 chaperones RNA-free TDP-43 into anisotropic intranuclear liquid spherical shells. Science 371. doi:10.1126/science.abb4309.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      Reviewer #1 (Public review):

      The weaknesses are in the clarity and resolution of the data that forms the basis of the model. In addition to general whole embryo morphology that is used as evidence for CE defects, two forms of data are presented, co-expression and IP, as well as a strong reliance on IF of exogenously expressed proteins. Thus, it is critical that both forms of evidence be very strong and clear, and this is where there are deficiencies; 1) For vast majority of experiments general morphology and LWR was used as evidence of effects on convergent extension movements rather than keller explants or actual cell movements in the embryo. 2) the microscopy would benefit from super resolution microscopy since in many cases the differences in protein localization are not very pronounced. 3) the IP and Western analysis data often shows very subtle differences, and some cases not apparent.

      Major points.

      (1) Assessment of CE movement

      The authors conducted an analysis of the subcellular localization of PCP core proteins, including Vangl2, Pk, Fz, and Dvl, within animal cap explants (ectodermal explants). The authors primarily used the length-to-width ratio (LWR) to evaluate CE movement as a basis for their model. However, LWR can be influenced by multiple factors and is not sufficient to directly and clearly represent CE defects. While the author showed that Prickle knockdown suppresses animal cap elongation mediated by Activin treatment, they did not test their model using standard assays such as animal cap elongation or dorsal marginal zone (DMZ) Keller explants. Furthermore, although various imaging analyses were performed in Wnt11-overexpressing animal caps and DMZ explants, the Wnt11-overexpressing animal caps did not undergo CE movement. Given that this study focuses on the molecular mechanisms of Vangl2 and Ror2 regulation of Dvl2 during CE, the model should be validated in more appropriate tissues, such as DMZ explants.

      (2) Overexpression conditions

      Another concern is that most analyses were performed with overexpression conditions. PCP core proteins (Vangl2, Pk, Dvl, and Fz receptors) are known to display polarized subcellular localization in both the neural epithelium and DMZ explants (Ref: PCP and Septins govern the polarized organization of the actin cytoskeleton during convergent extension, Current Biology, 2024). However, in this study, overexpressed PCP core proteins failed to show polarized localization. Previous studies, such as those from the Wallingford lab, typically used 10-30 pg of RNA for PCP core proteins, whereas this study injected 100-500 pg, which is likely excessive and may have created artificial conditions that confound the imaging results.

      (3) Subtle and insufficient effects

      Several of the reported results show quite modest changes in imaging and immunoprecipitation analyses, which are not sufficient to strongly support the proposed molecular model. For example, most Dvl2 remained localized with Fz7 even under Vangl2 and Pk overexpression (Fig. 4). Similarly, Wnt11 overexpression only slightly reduced the association between Vangl2 and Dvl2 (Sup. Fig. 8), and the Ror2-related experiments also produced only subtle effects (Fig. 8, Sup. Fig. 15).

      We thank reviewer 1 for careful reading of our revised manuscript, and additional constructive criticisms. Since the two reviewers had divergent opinions towards our revised manuscript, we think that it might be more productive to request a Version of Record at this point, and have our proposed model debated/ tested by others in the field. We will keep the reviewer’s suggestions in mind while design ongoing studies. We would like to address the criticisms collectively below:

      (1) The primary goal of our current manuscript is to build a mechanistic model for non-canonical Wnt signaling through elucidating the functional relationships between Dvl, Vangl, PK and Ror during CE. They each have been studied extensively in prior literature using DMZ injected embryos, and DMZ, Keller and animal cap explants, so there is little doubt that the reduced LWR following their over-expression or knockdown in DMZ is due to disruption of CE. In the context of our study in the current manuscript, we primarily performed their co-injections in different combinations to differentiate synergistic vs. antagonistic relationship, and in the majority cases we relied on epistatsis to draw conclusions (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). Nevertheless, we did follow the reviewer’s suggestion and used animal cap elongation as an additional assay to confirm that Pk and Vangl2 did synergize to disrupt CE, and their synergy could be blocked by Dvl2 co-overexpression; the new data is added to Fig. 1 (Fig. 1h, h’). Therefore, given the prior literature, our new animal cap explant data, and the specific scope of our current study, we feel that the LWR measurement is a reasonable assay to determine CE phenotype in this manuscript. We fully agree with the reviewer that our model will need to be tested at the cellular level through live imaging of DMZ explants; it is indeed the direction of our future study, but is beyond the scope of the current manuscript.

      (2) A salient feature of non-canonical Wnt signaling is that loss or over-expression of any components can often cause identical CE defects at the tissue/ embryo level. We used many co-injection experiments to demonstrate that this is due, at least in part, to a counterbalance between Dvl/Ror and Vangl/PK (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). It is in this context that we planned the imaging and biochemical experiments to determine the possible molecular mechanisms underlying their functional interaction, and we feel that the moderate over-expression used is reasonable in this case for us to build the first integrated model. We do plan to test our model using lower expression in the future. To acknowledge the limitation of our study, we also added the following sentences in the Discussion:

      “We acknowledge, however, that our model explains primarily the potential molecular actions underlying the regulation of CE at the tissue level. Whether and how our model may explain the cellular behavior during CE, such as polarized remodeling of cell junction or extension of cell protrusions, will require further study.”

      (3) The Wnt11 induced reduction of Dvl2-Vangl2 co-IP (Suppl. Fig. 8, 15) may be moderate, but is statistically significant and reproducible, and we have reported similar findings in two other publications (DOI: 10.1093/hmg/ddx095; DOI: 10.1038/s41467-025-57658-0). Given the limitation of co-IP, we had to rely on high level over-expression to make the experiments feasible. We are building proximity based assays such as NanoBRET, and plan to verify the result with lower level expression in the future.

      Reviewer #2 (Public review):

      We thank the reviewer for the encouraging comments, and the suggestion to clarify the description related to Suppl. Fig. 15. We made revision according to the reviewer’s suggestion, and added Suppl. Fig. 16 to further examine the effect of Ror2 knockdown on the steady state interaction between Dvl2 and Vangl2 using imaging approach.

    1. Author Response:

      Public Review:

      On behalf of all authors I would like to thank the reviewers for highly constructive and helpful comments, which, once addressed fully, will make the paper stronger and more useful as a tools and resources contribution.

      Besides addressing all minor issues that were pointed out by the reviewers, we see three main lines of changes we will need to pursue in order to address all major concerns. We plan to do all of these as fast as possible. Given that new alignments, segmentation and tracing is needed, this will take between one and three months.

      (1) Availability of code, software documentation and accessibility of pipeline. 

      Both reviewers and the editorial summary agreed that we need to improve the availability of our code, provide more instructions and examples of how to use the code, and make our methods more reusable to outsiders. To achieve this we will follow the suggestions made by the reviewers, in particular the list presented by reviewer 1 (point three of weaknesses in the public review).

      We firstly would like to apologize for the faulty link to the SegToPCG (https://github.com/Heinzelab/SegToPCG) repository (the correct name and link is: LSDtoPCG and https://github.com/Heinze-lab/LSDtoPCG) as well as the missing code in the https://github.com/Heinze-lab/synful_312 repository; these issues have already been fixed and will be included in an updated bioRxiv version.

      Second, we will generate an overarching umbrella page that will serve as a go-to site for any user who would like to implement our pipeline. To enable implementation, we will expand the documentation, provide detailed instructions, and include an example dataset with these instructions.

      (2) Quantification of analysis steps, including segmentation, alignment and manual tracing, to validate our claims of increased efficiency and transferability across species.

      As for point 1, both reviewers as well as the editorial summary highlighted the need for more comprehensive quantification of the workflow, especially with respect to segmentation quality as well as time investment into manual tracing and high resolution alignments. In particular, these data should validate the transferability of the segmentation models across species, and support the claims made about the time savings resulting from using our multiresolution workflow compared to a whole sample synaptic resolution approach.

      To this aim, we will generate all analyses according to the reviewer suggestions and incorporate the resulting data in new figures and tables. To make the data fully comparable across species, we will apply the latest version of our alignment and segmentation scripts to at least one high resolution data stack of each species, quantify manual tracing of a comparable, defined set of neurons in each species, and perform VOI analyses of each species segmentation against manually traced neurons in identically sized testing volumes in each dataset. Additionally, we will proof-read identical branches of homologous neurons in each species and quantify the required number of edits from raw segmentation output to completion.

      As the segmentation pipeline has evolved over the last years, a fair comparison between all datasets requires fresh analysis based on the latest version of our machine learning models (cannot be done with existing data) and will therefore take a few weeks of time.

      (3) Clarification of aims for multi-resolution pipeline and how projectomes and connectomes inform each other

      Reviewer 2 highlighted that there is not sufficient clarity about the aims of combining projectome and connectome. Judging from the reviewer comment, we might have inadvertently left the impression that we aimed at predicting a connectome from projectome data, by using spatial proximity of neurons as a proxy for connectivity. In fact, our data show that this is not possible, and that projection level data cannot predict connectivity. For instance, in the head direction system, the projectivity data suggests identical circuits for bees and flies (except at the edges of the ring), but connectivity data shows that the components of the ring attractor circuit are forming circuits that are distinctly different between the species (despite the same neurons with the same projection patterns being involved).

      What we aim to do is slightly different. We define global patterns of information flow using the projectome, and then define circuits in a part of this global circuit at synaptic level. Then, we extrapolate the global connectivity by assuming that the circuits identified in one or two computational units (columns) are repeated in each column. This rests on the assumption that the same neurons form the same connections in each repeated module, as long as the cellular repertoire is identical (verified by the projectome), but does not use proximity data to predict connectivity. This method thus only applies to brain regions that consist of repeated computational modules, i.e. where we can assume that knowing the connectivity in one of them allows extrapolation to the entire brain region. While this is a simplification, the Drosophila CX has in principle confirmed this assumption.

      We will generate a new figure in which we illustrate the process of combining local connectomes and global projectomes using examples from our data, but illustrating this schematically also for other brain regions, e.g. the insect optic lobe or the cerebral cortex of mammals. We will also carefully rewrite the relevant text passages to avoid misunderstandings.

      Overall, we would like to thank the reviewers again for their thorough and detailed comments, which will help to make our connectomics workflow more accessible and reproducible.